Wednesday, October 5, 2011

TCP Keep Alive in Windows Vista and Server 2008

Most if not all of my posts are borne from the technology equivalent of a bare knuckles cage match in which I am paired with a technology foe heavier, taller and with longer arms... but luckily over the course of my years I've come out bruised and battered but triumphant a greater percentage of the time.  A recent challenge involved telnet sessions from a PC running Vista Business (don't laugh, there are some out there) continuously dropping their connections to a remote AIX host.  So, that's the intro, and in the details to follow you'll see how bruised I became but still ended up with a small victory.

To further set this up, understand that the remote site was previously connected to the home office site via an ATT MPLS circuit (T1), same PCs, and were having no connectivity issues.  A location change brought the opportunity to use a cable modem (16Mbs down/4Mbs up) and a Cisco ASA5505 to create a VPN connection back to corporate - much less expensive than a 3Mbs MPLS circuit and faster install at the new location.  And that's when the trouble started.  The PCs from the old location were moved and the VPN set up (host firewall is an ASA5510 on a remote cable internet circuit).  The reports were that if you left the PC running eTerm32 telnet to the AIX host for over 30 minutes without touching the keyboard, the session would drop.  Very frustrating because if you were in the middle of entering a long order you had to start over.  So, the assumption was it was a cable problem - so we spent days working with the cable company - all clear (yeah, I've heard that story before too).  So, we then moved to thinking maybe it was the 5505/5510 VPN tunnel - after a few calls with Cisco TAC and some minor adustments - still no resolution.

Now it's said that even a blind squirrel can sometimes find an acorn and what happens next may be just that, but here are the next set of tests.  I took an Ubuntu Linux server to the host site, set up with open SSH (yeah, tried SSH on the AIX box and sessions still dropped).  Using Putty on one of the Vista PCs I set up an ssh session to the Linux server and left it running overnight - to my surprise it was still working fine the next day.  Tried it again with eTerm32 to Linux and still, no dropped connections.  At this point we dove into AIX settings with IBM, all systems go, no detectable problems there.  Then, I put a Windows Server 2003 server at the host site running Terminal Services and then ran RDP sessions on the Vista PCs with eTerm32 running in the RDP on the terminal server - while we didn't have any dropped sessions, we did notice some times when you walked away and came back it would take a few mouse clicks or keystrokes, for it to respond.  Now here's where the blind squirrel comes in - the next step was to assume that maybe there was a problem with Vista and that's what led me to the following Microsoft Technet Article about optional TCP parameters in Windows Vista and Server 2008.  After adding these registry entries and rebooting the PCs at the remote site, we were no longer experiencing dropped telnet sessions. 

As you can see in the link, these are Optional registry entries for TCP keep alive - meaning by default they are not there.  There are two primary entries, KeepAliveTime and KeepAliveInterval that will then enable OS-level keep alive packets to a remote system.

So, here's a short set of instructions on adding these keys to the Vista registry.  It seems that Server 2008 and Windows 7 are also devoid of these settings, but I have not had time to test on these yet for the same problem of dropped sessions.

logged in with user having admin privileges
run REGEDIT
navigate to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
right click in right panel - select NEW - DWORD
type: KeepAliveTime  for the description and hit enter (no spaces and observe caps for K, A and T)
right click on KeepAliveTime and select MODIFY
click on the button beside decimal
type: 300000   in the box for value
click OK to save (should show 493e0 for hex after save)
**this sets the value to 5 minutes (300,000 milliseconds); default value per Microsoft technet
right click in right panel - select NEW - DWORD
type: KeepAliveInterval  for the description and hit enter (no spaces and observe caps for K, A and I)
right click on KeepAliveInterval and select MODIFY
click on the button beside decimal
type: 1000   in the box for value 
click OK to save (should show 3e8 for hex after save)
**this sets value to 1 second = 1000 milliseconds; default value per Microsoft technet
file - exit
reboot machine


........