[e2e] query on behaviour of tcp_keepalive and tcp retransmit on Linux based systems

Thu Feb 24 03:06:25 PST 2011

First of all, I'm not quite sure whether this is the right list for 
Linux specific issues.

Second, you should have a look at the basic function of TCP. When a 
listening socket dies, the sender will continuously face timeouts. With 
the following consequences:
1. The sending window shrinks to 1 segment.
2. The RTO is doubled, each time a sent packet is not acknowledged in 
time. Whether RTO is limited or not depends on the implementation.
3. The sending socket is shut down after some period of time. (Have a 
look at the various timeouts in TCP.)

> We need some clarifications on TCP_keepalive .  We are facing some 
> issues on our Prod servers related to TCP functionality .
>
> The issue is like this.
>
> We have some machines at one end sending data in real time to another 
> group of machines on the other hand .  Now due to some hardware issues 
> on the other hand , some of the machines becomes unresponsive/crashes. 
> The client system which pumps data never came to know that the server 
> went unresponsive . The connection remains in
> ESTABLISHED state and the client always tries to send data thinking 
> that the connection is alive because of which we are seeing backlog on 
> client sides.
>
> Our understanding is like this on how TCP will handle the connection.
>
>
> Q 1) Since  the server went down , the client will try to the 
> retransmit the data until it times out. What is the behavior of TCP 
> after the timeout? Need clarification on
> the following things.
> a) Will the kernel will close the established connection after the 
> timeout . Looks like no in our case as we still see the connection 
> still in ESTABLISHED state after around more
> than 2 hours.
> b) Are there any kernel parameters which decides the when the client 
> is timeout after retransmission fails. What is the behavior of TCP 
> after the client retransmission timeouts.
>
>
> Q 2 ) There is something called tcp_keepalive which if implemented in 
> the kernel , by default it's there and comes to be around 2 hrs 2 
> minsutes , i think  ,  the client will send some TCP probes after the 
> keepalive time ineterval and if it cannot reach the server , then the 
> established connection in the client side will be closed by the kernel 
> . This is my understanding. But I can see that the connection still 
> remains in established after the tcp_keepalive time . We waited for 
> around 2 hrs 30 minutes but the connection remains in established 
> state only. Tried reducing the keepalive time to be around 10 minutes 
> , but the connection remains in ESTABLISHED state in client side .
>
>
> Where I went wrong .Please clarify my doubts raised above . What 
> should we do to resolve the problem we are seeing above . Any help 
> will be highly appreciated as we are going through a hard time to 
> resolve the issue .
>
> Thanks in Advance
>
>

-- 
------------------------------------------------------------------
Detlef Bosau
Galileistraße 30	
70565 Stuttgart                            Tel.:   +49 711 5208031
                                            mobile: +49 172 6819937
                                            skype:     detlef.bosau
                                            ICQ:          566129673
detlef.bosau at web.de                     http://www.detlef-bosau.de
------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110224/d15babaf/attachment.html