[e2e] end2end-interest Digest, Vol 83, Issue 4

Zama Ques queszama at yahoo.in
Thu Feb 24 03:06:18 PST 2011


Hi Yan,

I tried testing with iperf today. 

Started server on one side and connected to the client from another host and after sometime disconnected the cable on the server host.

Also  , reduced tcp_keepalive to 1200 sec , so timeout value should be like 32 minutes with the other two tcp_keepalive related kernel parameter (probes and interval) .


The following are my findings .

I can see that client terminates the ESTABLISHED connection after around 16 minutes since the server is not reachable , that is before the TCP keepalive timeout. 

Looks to me like this minutes is somehow related to TCP retransmission timeout which probably is determined by the following 3 parameters which comes to be around 18 minutes. . 

$ cat /proc/sys/net/ipv4/tcp_retries1
3
$ cat /proc/sys/net/ipv4/tcp_retries2
15
$ cat /proc/sys/net/ipv4/tcp_fin_timeout 
60


Is my assumption correct here ?


The following is the netstats connection flow during my experiment

$  for i in {1..1000} ; do netstat -atn | egrep "5001" ; date  ; sleep 60  ; done
tcp        0 447432 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
tcp        0      0 10.66.X.Y:43531            10.66.A.B:5001           TIME_WAIT   
Thu Feb 24 14:47:16 IST 2011
tcp        0 3311576 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:48:16 IST 2011
(Network Cable removed during this time from the server) 
tcp        0 3317368 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:49:16 IST 2011
tcp        0 3021976 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:50:16 IST 2011 
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:51:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:52:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:53:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:54:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:55:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:56:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:57:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:58:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 14:59:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 15:00:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 15:01:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 15:02:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 15:03:16 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
Thu Feb 24 15:04:17 IST 2011
tcp        0 2511048 10.66.X.Y:43533            10.66.A.B:5001           ESTABLISHED 
.. (comes out be arnd 15 minutes from the server went unreachable when the connection status changed in client)  

Thu Feb 24 15:05:17 IST 2011
Thu Feb 24 15:06:17 IST 2011
Thu Feb 24 15:07:17 IST 2011


The following packet flow can be seen on client as sniffed by tcpdump during while I removed the network cable . 

=====
14:50:20.158794 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1801285561:1801350721, ack 0, win 92, options [nop,nop,TS val 172872553 ecr 177048109], length 65160
14:50:20.164550 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1801350721:1801415881, ack 0, win 92, options [nop,nop,TS val 172872558 ecr 177048115], length 65160

14:50:20.394916 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172872992 ecr 177048116], length 1448
14:50:21.258921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172873856 ecr 177048116], length 1448
14:50:22.986922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172875584 ecr 177048116], length 1448
14:50:26.442922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172879040 ecr 177048116], length 1448
14:50:33.354923 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172885952 ecr 177048116], length 1448
14:50:47.178932 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172899776 ecr 177048116], length 1448
14:51:14.826929 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172927424 ecr 177048116], length 1448
14:52:10.122922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172982720 ecr 177048116], length 1448
14:54:00.714934 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173093312 ecr 177048116], length 1448
14:56:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173213312 ecr 177048116], length 1448

14:58:00.714920 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173333312 ecr 177048116], length 1448
15:00:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173453312 ecr 177048116], length 1448
15:02:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173573312 ecr 177048116], length 1448
15:04:00.714936 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173693312 ecr 177048116], length 1448


 Does my TCP stack look fine based on the experiments above . 


Thanks
Zaman






--- On Wed, 23/2/11, Yan Cai <ycai at ecs.umass.edu> wrote:

From: Yan Cai <ycai at ecs.umass.edu>
Subject: Re: end2end-interest Digest, Vol 83, Issue 4
To: "Zama Ques" <queszama at yahoo.in>
Date: Wednesday, 23 February, 2011, 2:04 PM



  

    
  Hi Zaman,

    

    I guess there might be some unknown ftp configuration at CLIENT side
    that causes this issue. You can isolate the problem first. I-perf
    can be used to test functionality of tcp stack on your machine. If
    it works as expected, then there is nothing wrong with tcp stack.
    Next check the settings of the ftp client (not the ftp server) to
    see if there is any specific configuration that causes this problem.
    If it is hard to do that, my suggestion is to install a third party
    ftp client application and test with that. 

    

    If none of them works, you might have to trace the traffic over the
    cable attached to the client machine and determine what is going on.

    

    Best wishes,

    Yan

    

    On 2/23/2011 1:52 AM, Zama Ques wrote:
    
      
        
          
            Hi Yan,

              

              Thanks for your suggestion .  I am familiar with iperf but
              the issue with us that it is a prod network and it is
              advisable for me not to pump data on the network . Will
              try to the experiment between two desktops connected by a
              cross over cable. 

              

              What I was trying earlier was that I started FTP server on
              one end  and connected to the server from the client side.
              

              

               $ ftp 10.66.X.X

              Connected to 10.66.X.X

              220 (vsFTPd 2.2.2)

              Name (10.66.74.141:zama): anonymous

              331 Please specify the password.

              Password:

              230 Login successful.

              Remote system type is UNIX.

              Using binary mode to transfer files.

              

              

              After that I disconnected the network cable from the
              server and was monitoring the status of the connection on
              the client side .

              The status of the connection was like this before and
              after disconnecting the network cable. 

              

              ---

              $  for i in {1..1000} ; do netstat -at | egrep "ftp" ;
              date  ; sleep 60  ; done

              tcp        0      0 edgebeauty.c:50179 shopfrobsoon.c:ftp
              ESTABLISHED 

              Wed Feb 23 11:47:53 IST 2011

              

              tcp        0      0 edgebeauty.c:50179 shopfrobsoon.c:ftp
              ESTABLISHED 

              Wed Feb 23 11:48:53 IST 2011

              tcp        0      0 edgebeauty.c:50179 shopfrobsoon.c:ftp
              ESTABLISHED 

              Wed Feb 23 11:49:53 IST 2011

              ...

              ...

              Wed Feb 23 12:14:03 IST 2011

              tcp        0      0 edgebeauty.c:50179 shopfrobsoon.c:ftp
              ESTABLISHED 

              Wed Feb 23 12:15:03 IST 2011

              ===

              

              If we see that the time is more than 25 minutes when the
              server went down and the client has still maintained the
              connection in established state. 

              

              My understanding is that the client should close the
              connection after TCP restarsmit timeout happens or my
              understanding is wrong. 

              

              Please clarify . 

              

              --Zaman

              

              
                

                  Message: 2

                  Date: Tue, 22 Feb 2011 09:55:13 -0500

                  From: Yan Cai <ycai at ecs.umass.edu>

                  Subject: Re: [e2e] query on behaviour of tcp_keepalive
                  and tcp

                      retransmit on    Linux based systems

                  To: end2end-interest at postel.org

                  Message-ID: <4D63CE50.8050606 at ecs.umass.edu>

                  Content-Type: text/plain; charset="iso-8859-1"

                  

                  Hi

                  

                  According to your description, the expected behavior
                  should be as follows.

                  At the beginning senders at one side can send data to
                  the receivers at 

                  the other side, and the receivers can receive data
                  without any problem. 

                  When some of the receivers become off-line, the
                  affected senders should 

                  no long receive positive acknowledgments, therefore,
                  lowering their 

                  congestion windows (i.e., sending rate). Since in your
                  case the receiver 

                  is off forever, some senders should further experience
                  timeout events. 

                  After a few timeouts, the sender should CLOSE this
                  connection itself.

                  

                  As far as I know, the whole procedure above should be
                  automatically 

                  invoked in the sender side. This is how TCP (sender)
                  handles exceptions.

                  

                  My suggestion is that you run a simple experiment on
                  your side to see if 

                  TCP in your machine can work that way. The test can be
                  done using i-perf 

                  to send a long long live TCP flow, and then take off
                  the receiver in the 

                  middle of the transmission. The connection is expected
                  to be closed very 

                  soon after the receiver is off.

                  

                  Hope it helpful.

                  Yan

                  On 2/22/2011 4:24 AM, Zama Ques wrote:

                  > We need some clarifications on TCP_keepalive . 
                  We are facing some 

                  > issues on our Prod servers related to TCP
                  functionality .

                  >

                  > The issue is like this.

                  >

                  > We have some machines at one end sending data in
                  real time to another 

                  > group of machines on the other hand .  Now due to
                  some hardware issues 

                  > on the other hand , some of the machines becomes
                  unresponsive/crashes. 

                  > The client system which pumps data never came to
                  know that the server 

                  > went unresponsive . The connection remains in

                  > ESTABLISHED state and the client always tries to
                  send data thinking 

                  > that the connection is alive because of which we
                  are seeing backlog on 

                  > client sides.

                  >

                  > Our understanding is like this on how TCP will
                  handle the connection.

                  >

                  >

                  > Q 1) Since  the server went down , the client
                  will try to the 

                  > retransmit the data until it times out. What is
                  the behavior of TCP 

                  > after the timeout? Need clarification on

                  > the following things.

                  > a) Will the kernel will close the established
                  connection after the 

                  > timeout . Looks like no in our case as we still
                  see the connection 

                  > still in ESTABLISHED state after around more

                  > than 2 hours.

                  > b) Are there any kernel parameters which decides
                  the when the client 

                  > is timeout after retransmission fails. What is
                  the behavior of TCP 

                  > after the client retransmission timeouts.

                  >

                  >

                  > Q 2 ) There is something called tcp_keepalive
                  which if implemented in 

                  > the kernel , by default it's there and comes to
                  be around 2 hrs 2 

                  > minsutes , i think  ,  the client will send some
                  TCP probes after the 

                  > keepalive time ineterval and if it cannot reach
                  the server , then the 

                  > established connection in the client side will be
                  closed by the kernel 

                  > . This is my understanding. But I can see that
                  the connection still 

                  > remains in established after the tcp_keepalive
                  time . We waited for 

                  > around 2 hrs 30 minutes but the connection
                  remains in established 

                  > state only. Tried reducing the keepalive time to
                  be around 10 minutes 

                  > , but the connection remains in ESTABLISHED state
                  in client side .

                  >

                  >

                  > Where I went wrong .Please clarify my doubts
                  raised above . What 

                  > should we do to resolve the problem we are seeing
                  above . Any help 

                  > will be highly appreciated as we are going
                  through a hard time to 

                  > resolve the issue .

                  >

                  > Thanks in Advance

                  >

                  >

                  

                  -------------- next part --------------

                  An HTML attachment was scrubbed...

                  URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110222/50be8540/attachment-0001.html

                  

                  ------------------------------

                  

                  _______________________________________________

                  end2end-interest mailing list

                  end2end-interest at postel.org

                  http://mailman.postel.org/mailman/listinfo/end2end-interest

                  

                  

                  End of end2end-interest Digest, Vol 83, Issue 4

                  ***********************************************

                
              
            
          
        
      
      

    
    

  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110224/1f446983/attachment-0001.html


More information about the end2end-interest mailing list