Monday, December 31, 2007

Simulate Human Keystrokes...

A recent problem at work resulted in the degradation of three distinct ISP links down to one. This post will explain how the great expect tool assisted with reprogramming a router's ACL when copy paste was too fast.

First some background... (Stick with me and I'll eventually get to the simulating human keystrokes).

The design of the network was that a pair of load balancing devices would use bandwidth and availability statistics to determine which of the three ISP links to send data data out on (TCP connections and UDP/DNS requests). Additionally the load balancers offered DNS services for several of our web sites and would hand out IP addresses that were associated with the ISP link that was the 'healthiest'. Thereby causing inbound HTTP requests to be somewhat balanced across the three links (a kind of intelligent round-robin DNS).

Strike one and two...

The two (HA) load balancers refused to operate one day and basically would not pass data in or out of our network. The configuration was solid and the appliances did not appear to have suffered a physical failure. After a few hours of my team tinkering with the hardware and working through the manufacturer's technical support team I decided to pull the plug on them and replaced the DNS portions with a Knoppix box and move the NAT functionality to one of our Cisco border routers (many thanks go out to a great consultant from Berbee who jumped in a configured the Cisco portions for me in a pinch).

This solution resulted in a working Internet connection for our organization again but it also resulted in cutting our effective bandwidth from the three ISP links down to one (basically cut our aggregate of 24Mbit down to 10Mbit).

After a day of running all of our traffic through a single link and a single router we started to get complaints about the responsiveness when accessing the Internet from our branch locations (the branch locations access the Internet through HQ's ISP links). In looking at the interface utilization on the router I saw that we were using less than 60% of the available bandwidth in both the inbound and outbound directions. However, when I peaked at the CPU utilization ('show processes cpu') it became immediately apparent where the problem was; were running at 99% of available CPU. I had noticed the day before that all the ACL entries on this router (and the other border routers) had a log statement attached to each; including all the allow entires. My first reaction was to strip off the logging and see if we could gain anything back.

This is where we finally get to simulating human keystrokes... (I promise)

So I did what any respectable UNIX guru would do, copy and paste the current ACL into vi and proceed to remove all the log statements; with a '%s/log//g' command of course. Now we have a brand new ACL that I can paste back into my management session to the router; I should mention that the only session available was via the serial console attached to a serial concentrator. I started the 'conf t' session and began to create a new access list, not clobbering the original. I grabbed the new ACL in one terminal window, did a command-C (copy), then switched to the console connection terminal and did a command-V (paste). A no-brainer of course, so why am I wasting your time...? We'll what I proceeded to see 'typed' into the console terminal was a bunch of ACL lines that were stepping all over themselves. The terminal was sending data to the router's serial connection faster than it could read it and as a result characters were being dropped from the lines and causing the config commands to fail.

Just great! I thought to myself, get ready to hand type in each of the ACL's (there were quite a few) and hope that you don't make a mistake! Then the dude that lurks in the inner parts of my brain said, "Hey! Use expect since it can simulate human typing characteristics." That's right, I recalled seeing something in an autoexpect generated script that referenced human-like typing. So I proceed to RTFM for expect and sure enough the 'set send_human {}' command is exactly what I need. I created a small expect script that set the send_human option and then did a send -h of the entire ACL. [At this point you should really take a pause and read through the send_human section of the expect man page.] I experimented with a few values passed to send_human until I saw something on the screen that looked like it should go without causing a problem. But now how do I get the output of expect into the router's config command line. There were two options that came to my mind immediately:
  1. Write a more elaborate expect script that spawns a session to the router's serial connection and interacts with it all the way through logging in and entering the config terminal phase.
  2. Manually start the config terminal session with the router myself and just redirect the output of the expect script at the point where I would normally type each line of the ACL.

I needed to get this done in a very short amount of time so I opted for option #2. But how do I get the output of expect as the input of a ssh/telnet session? I opted to use a fifo instead of trying to send everything through a standard UNIX pipe (I assumed, but didn't confirm, that the standard pipe would buffer all the output and have the same problem as the original past option). Here is the basic setup:

  1. Create a pipe using the mkfifo command:
    $ mkfifo myio
  2. Start the ssh/telnet session with the router (since this is a serial connection the session can be prepared ahead of time to already be in the proper config terminal location and then quit the session with ^] or ~..). The key is to take STDIN from the fifo that we just created.
    $ telnet serial-server 9999 <>
  3. Now start the expect script and send the output to the fifo.
    $ expect -f my.exp > fifo
  4. Next I sat back and watched the expect script 'type' the ACL commands into the router at a pace that was acceptable and did not overrun any buffers.
  5. After a few seconds the new ACL was ready for me to apply in place of the old one.
Hopefully, this will help some of you out if you ever find that you are in a situation where you need to provide a large set of input to one process to appear as though a real human was typing it. Even if you do not have this specific need I urge everyone to look at using expect. I have found many tremendous uses for it over the years and enjoy watching it do my work for me.

... and in case you are wondering if removing the log entries from the ACL's had a positive impact on the high CPU utilization, it only dropped it down by a few points. The real culprit turned out to be a single PC blasting out SYN packets to a site in the .ru domain (I expect to analyze this system in the days to come).

No comments: