From ingemar.s.johansson at ericsson.com  Thu Mar  3 05:14:24 2016
From: ingemar.s.johansson at ericsson.com (Ingemar Johansson S)
Date: Thu, 3 Mar 2016 13:14:24 +0000
Subject: [e2e] Question on RFC6298,
 Managing the RTO Timer and additional lost pakets in Recovery state
Message-ID: <81564C0D7D4D2A4B9A86C8C7404A13DA43D85BDD@ESESSMB205.ericsson.se>

Hi

I am trying to understand the Linux TCP stack and how it rearms the RTO timer, in the process I  read section 5 in RFC6298.
It says quote "(5.3) When an ACK is received that acknowledges new data, restart the retransmission timer so that it will expire after RTO seconds (for the current value of RTO)."
What is the definition of new data ?. The strict interpretation is when SND.UNA advances, but it can also be that the highest SACKed sequence number increases. The former case it is more likely that RTO happens.

The second question is Linux related. Given that a lost packet puts the stack in Recovery state, the congestion window reduces one step as an effect on this. What happens if additional packets are lost when in Recovery state. I guess the congestion window should decrease more or ?.

I am currently porting the Linux TCP stack to fit our Java based system simulator, most things seem to be working but the above leaves me wondering.

/Ingemar

=================================
Ingemar Johansson  M.Sc.
Master Researcher

Ericsson AB
Wireless Access Networks
Labratoriegr?nd 11
971 28, Lule?, Sweden
Phone +46-1071 43042
SMS/MMS +46-73 078 3289
ingemar.s.johansson at ericsson.com<mailto:ingemar.s.johansson at ericsson.com>
www.ericsson.com

"We can be heroes,
  just for one day"
    David Bowie
=================================


From mallman at icir.org  Thu Mar  3 06:31:31 2016
From: mallman at icir.org (Mark Allman)
Date: Thu, 03 Mar 2016 09:31:31 -0500
Subject: [e2e] [tcpm] Question on RFC6298,
	Managing the RTO Timer and additional lost pakets in Recovery state
In-Reply-To: <81564C0D7D4D2A4B9A86C8C7404A13DA43D85BDD@ESESSMB205.ericsson.se>
Message-ID: <40707.1457015491@lawyers.icir.org>


> It says quote ?(5.3) When an ACK is received that acknowledges new
> data, restart the retransmission timer so that it will expire after
> RTO seconds (for the current value of RTO).?
> 
> What is the definition of new data ?. The strict interpretation is
> when SND.UNA advances, but it can also be that the highest SACKed
> sequence number increases. The former case it is more likely that
> RTO happens. 

Seems like something we should have nailed down in the spec at some
point after SACK became widely prevalent.  Alas.

I think "new data" can be interpreted as "cumulative ACK advances".

The spirit of (5.3) is that as long as the connection is making
progress---from an application perspective---we can keep the RTO at
arms length and so we just keep re-arming it.  But, once we have a
stall---or even an indication that we might stall---because a packet
has been lost then we stop pushing the RTO off.

> The second question is Linux related. Given that a lost packet
> puts the stack in Recovery state, the congestion window reduces
> one step as an effect on this. What happens if additional packets
> are lost when in Recovery state. I guess the congestion window
> should decrease more or ?.

First, this is a more generic answer, I have no idea what linux
does.

I can't tell which of two cases you are talking about here.  Let's
say you send 20 packets into the network in some window.  Now, the
cases ...

(1) We lose packets 1, 5, 13 and 17.  I.e., multiple packets are
    lost from a single transmission window.  So, retransmitting
    packet 1 puts us in recovery and causes congestion control
    action.  I believe that the fact that packets 5, 13 and 17 are
    also lost does not mean we should react to congestion again.
    E.g., RFC 6675 calls for a single CC response regardless of how
    many packets are lost from a window of data.

(2) We lose packets 1, 5, 13 and 17 and also the retransmit of
    packet 17.  So, we lose 4 packets from the first single
    transmission window.  This triggers one CC response.  But, the
    retransmit of packet 17 is from a subsequent transmission
    window, indicating that perhaps we haven't yet done enough to
    relieve the congestion.  Conservativeness would likely suggest
    that in this case, yes, we should take another CC action.

    And, e.g., RFC 6675 forces this second CC action by being unable
    to cope with lost retransmissions.  Rather, in this case we fall
    back to the RTO which means another CC response.  I am not
    claiming RFC 6675 is the right approach here.  Just noting what
    some spec does.  We left it this way because we didn't feel that
    the complexity of dealing with this case was really generally
    worth it.  But, one could envision a different algorithm making
    a different choice.

I hope that helps!

allman


--
http://www.icir.org/mallman/


From michawe at ifi.uio.no  Thu Mar  3 06:54:02 2016
From: michawe at ifi.uio.no (Michael Welzl)
Date: Thu, 3 Mar 2016 15:54:02 +0100
Subject: [e2e] [tcpm] Question on RFC6298,
	Managing the RTO Timer and additional lost pakets in Recovery state
In-Reply-To: <40707.1457015491@lawyers.icir.org>
References: <40707.1457015491@lawyers.icir.org>
Message-ID: <58556ABF-D5CA-4C4F-8679-74B29C4E06E4@ifi.uio.no>


> On 03 Mar 2016, at 15:31, Mark Allman <mallman at icir.org> wrote:
> 
> 
>> It says quote ?(5.3) When an ACK is received that acknowledges new
>> data, restart the retransmission timer so that it will expire after
>> RTO seconds (for the current value of RTO).?
>> 
>> What is the definition of new data ?. The strict interpretation is
>> when SND.UNA advances, but it can also be that the highest SACKed
>> sequence number increases. The former case it is more likely that
>> RTO happens. 
> 
> Seems like something we should have nailed down in the spec at some
> point after SACK became widely prevalent.  Alas.
> 
> I think "new data" can be interpreted as "cumulative ACK advances".

This "new data" stuff is also all over RFC 5681. I remember being confused by it long ago, exactly with the same two possible interpretations that Ingemar writes below.
Back then I assumed that it must be me; I'm glad to see that I'm not the only one who got confused by this.

But maybe, in the occurrences in RFC 5681, it really was only me? Some of them are definitely clear, but maybe not all.

Cheers,
Michael


From michawe at ifi.uio.no  Thu Mar  3 06:55:09 2016
From: michawe at ifi.uio.no (Michael Welzl)
Date: Thu, 3 Mar 2016 15:55:09 +0100
Subject: [e2e] [tcpm] Question on RFC6298,
	Managing the RTO Timer and additional lost pakets in Recovery state
In-Reply-To: <58556ABF-D5CA-4C4F-8679-74B29C4E06E4@ifi.uio.no>
References: <40707.1457015491@lawyers.icir.org>
	<58556ABF-D5CA-4C4F-8679-74B29C4E06E4@ifi.uio.no>
Message-ID: <912B8EF4-553F-486C-A4F9-28F8DDF67DE0@ifi.uio.no>


> On 03 Mar 2016, at 15:54, Michael Welzl <michawe at ifi.uio.no> wrote:
> 
> 
>> On 03 Mar 2016, at 15:31, Mark Allman <mallman at icir.org> wrote:
>> 
>> 
>>> It says quote ?(5.3) When an ACK is received that acknowledges new
>>> data, restart the retransmission timer so that it will expire after
>>> RTO seconds (for the current value of RTO).?
>>> 
>>> What is the definition of new data ?. The strict interpretation is
>>> when SND.UNA advances, but it can also be that the highest SACKed
>>> sequence number increases. The former case it is more likely that
>>> RTO happens. 
>> 
>> Seems like something we should have nailed down in the spec at some
>> point after SACK became widely prevalent.  Alas.
>> 
>> I think "new data" can be interpreted as "cumulative ACK advances".
> 
> This "new data" stuff is also all over RFC 5681. I remember being confused by it long ago, exactly with the same two possible interpretations that Ingemar writes below.
> Back then I assumed that it must be me; I'm glad to see that I'm not the only one who got confused by this.

... and what can you expect from a guy who can't even get "above" vs. "below" right?

Sigh... sorry folks

Michael


From ingemar.s.johansson at ericsson.com  Sat Mar  5 07:18:31 2016
From: ingemar.s.johansson at ericsson.com (Ingemar Johansson S)
Date: Sat, 5 Mar 2016 15:18:31 +0000
Subject: [e2e] [tcpm] Question on RFC6298,
	Managing the RTO Timer and additional lost pakets in Recovery state
In-Reply-To: <40707.1457015491@lawyers.icir.org>
References: <81564C0D7D4D2A4B9A86C8C7404A13DA43D85BDD@ESESSMB205.ericsson.se>
	<40707.1457015491@lawyers.icir.org>
Message-ID: <81564C0D7D4D2A4B9A86C8C7404A13DA43D9523A@ESESSMB205.ericsson.se>

Hi

Thanks for the response, and thank Michael as well, guess I need to read RFC5681 and RFC6675 again.

The line of reasoning seen from an application perspective actually helps to put the puzzle together for me.
Also I understand now that case 2 below necessitates an RTO, atleast with TCP. I guess QUIC may be different in this respect as it retransmitted segments have a new transport sequence number ?.

/Ingemar

> -----Original Message-----
> From: mallman at icir.org [mailto:mallman at icir.org]
> Sent: den 3 mars 2016 15:32
> To: Ingemar Johansson S
> Cc: tcpm at ietf.org; end2end-interest at postel.org
> Subject: Re: [tcpm] Question on RFC6298, Managing the RTO Timer and
> additional lost pakets in Recovery state
> 
> 
> > It says quote ?(5.3) When an ACK is received that acknowledges new
> > data, restart the retransmission timer so that it will expire after
> > RTO seconds (for the current value of RTO).?
> >
> > What is the definition of new data ?. The strict interpretation is
> > when SND.UNA advances, but it can also be that the highest SACKed
> > sequence number increases. The former case it is more likely that RTO
> > happens.
> 
> Seems like something we should have nailed down in the spec at some point
> after SACK became widely prevalent.  Alas.
> 
> I think "new data" can be interpreted as "cumulative ACK advances".
> 
> The spirit of (5.3) is that as long as the connection is making progress---from
> an application perspective---we can keep the RTO at arms length and so we
> just keep re-arming it.  But, once we have a stall---or even an indication that
> we might stall---because a packet has been lost then we stop pushing the
> RTO off.
> 
> > The second question is Linux related. Given that a lost packet puts
> > the stack in Recovery state, the congestion window reduces one step as
> > an effect on this. What happens if additional packets are lost when in
> > Recovery state. I guess the congestion window should decrease more or
> > ?.
> 
> First, this is a more generic answer, I have no idea what linux does.
> 
> I can't tell which of two cases you are talking about here.  Let's say you send
> 20 packets into the network in some window.  Now, the cases ...
> 
> (1) We lose packets 1, 5, 13 and 17.  I.e., multiple packets are
>     lost from a single transmission window.  So, retransmitting
>     packet 1 puts us in recovery and causes congestion control
>     action.  I believe that the fact that packets 5, 13 and 17 are
>     also lost does not mean we should react to congestion again.
>     E.g., RFC 6675 calls for a single CC response regardless of how
>     many packets are lost from a window of data.
> 
> (2) We lose packets 1, 5, 13 and 17 and also the retransmit of
>     packet 17.  So, we lose 4 packets from the first single
>     transmission window.  This triggers one CC response.  But, the
>     retransmit of packet 17 is from a subsequent transmission
>     window, indicating that perhaps we haven't yet done enough to
>     relieve the congestion.  Conservativeness would likely suggest
>     that in this case, yes, we should take another CC action.
> 
>     And, e.g., RFC 6675 forces this second CC action by being unable
>     to cope with lost retransmissions.  Rather, in this case we fall
>     back to the RTO which means another CC response.  I am not
>     claiming RFC 6675 is the right approach here.  Just noting what
>     some spec does.  We left it this way because we didn't feel that
>     the complexity of dealing with this case was really generally
>     worth it.  But, one could envision a different algorithm making
>     a different choice.
> 
> I hope that helps!
> 
> allman
> 
> 
> --
> http://www.icir.org/mallman/
> 
> 


From ingemar.s.johansson at ericsson.com  Thu Mar 10 05:22:04 2016
From: ingemar.s.johansson at ericsson.com (Ingemar Johansson S)
Date: Thu, 10 Mar 2016 13:22:04 +0000
Subject: [e2e] [tcpm] Question on RFC6298,
 Managing the RTO Timer and additional lost pakets in Recovery state
In-Reply-To: <CAK6E8=dioCs5Czxij6LCuyBPRMW1Pg_aP3+5wxQxoL+DZk_NXA@mail.gmail.com>
References: <81564C0D7D4D2A4B9A86C8C7404A13DA43D85BDD@ESESSMB205.ericsson.se>
	<40707.1457015491@lawyers.icir.org>
	<81564C0D7D4D2A4B9A86C8C7404A13DA43D9523A@ESESSMB205.ericsson.se>
	<CAK6E8=dioCs5Czxij6LCuyBPRMW1Pg_aP3+5wxQxoL+DZk_NXA@mail.gmail.com>
Message-ID: <81564C0D7D4D2A4B9A86C8C7404A13DA43D9C104@ESESSMB205.ericsson.se>

Hi Yuchung, thanks for the help

I seem to have gotten the RFC6937 (PRR) behavior in place.  Currently I don't see much gain with PRR, one possible reasons is that AQMs in LTE are typically a bit on the "bufferbloated" side as too low drop thresholds easily causes links be underutilized. The effect of this is that when  a loss event occurs, there will be enough data in the RLC queue to transmit even though the congestion window is cut in half immediately. There could still be benefits with PRR in terms of less RTO.

I believe then that I am getting closer to a good Linux TCP model in our simulator. 
I still have one particular behavior that I don't really understand, I am almost 100% sure that the error is on my side

Thanks for the ref to the sigcomm paper, I seem to have missed it.

/Ingemar

> -----Original Message-----
> From: Yuchung Cheng [mailto:ycheng at google.com]
> Sent: den 5 mars 2016 16:53
> To: Ingemar Johansson S
> Cc: mallman at icir.org; Michael Welzl; tcpm at ietf.org; end2end-
> interest at postel.org
> Subject: Re: [tcpm] Question on RFC6298, Managing the RTO Timer and
> additional lost pakets in Recovery state
> 
> Linux implements RFC6937 not RFC6675 to adjust cwnd in fast recovery.
> Specifically it reduces cwnd gradually toward ssthresh as packets are being
> delivered. if inflight, aka pipe, drops below ssthresh, it tries to slow start
> toward ssthresh, provided no additional packets are lost. The last condition
> was added recently and I had a presentation last meeting:
> https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-7.pdf
> 
> btw, Linux may adjust RTO by taking RTT samples from newly SACK blocks,
> which is not standardized. It mitigates issues when RTT continues to raise
> during recovery in LTE networks (see figure 15 in
> http://web.eecs.umich.edu/~zmao/Papers/lte_sigcomm13.pdf)
> 
> On Sat, Mar 5, 2016 at 7:18 AM, Ingemar Johansson S
> <ingemar.s.johansson at ericsson.com> wrote:
> >
> > Hi
> >
> > Thanks for the response, and thank Michael as well, guess I need to read
> RFC5681 and RFC6675 again.
> >
> > The line of reasoning seen from an application perspective actually helps to
> put the puzzle together for me.
> > Also I understand now that case 2 below necessitates an RTO, atleast with
> TCP. I guess QUIC may be different in this respect as it retransmitted
> segments have a new transport sequence number ?.
> >
> > /Ingemar
> >
> > > -----Original Message-----
> > > From: mallman at icir.org [mailto:mallman at icir.org]
> > > Sent: den 3 mars 2016 15:32
> > > To: Ingemar Johansson S
> > > Cc: tcpm at ietf.org; end2end-interest at postel.org
> > > Subject: Re: [tcpm] Question on RFC6298, Managing the RTO Timer and
> > > additional lost pakets in Recovery state
> > >
> > >
> > > > It says quote ?(5.3) When an ACK is received that acknowledges new
> > > > data, restart the retransmission timer so that it will expire
> > > > after RTO seconds (for the current value of RTO).?
> > > >
> > > > What is the definition of new data ?. The strict interpretation is
> > > > when SND.UNA advances, but it can also be that the highest SACKed
> > > > sequence number increases. The former case it is more likely that
> > > > RTO happens.
> > >
> > > Seems like something we should have nailed down in the spec at some
> > > point after SACK became widely prevalent.  Alas.
> > >
> > > I think "new data" can be interpreted as "cumulative ACK advances".
> > >
> > > The spirit of (5.3) is that as long as the connection is making
> > > progress---from an application perspective---we can keep the RTO at
> > > arms length and so we just keep re-arming it.  But, once we have a
> > > stall---or even an indication that we might stall---because a packet
> > > has been lost then we stop pushing the RTO off.
> > >
> > > > The second question is Linux related. Given that a lost packet
> > > > puts the stack in Recovery state, the congestion window reduces
> > > > one step as an effect on this. What happens if additional packets
> > > > are lost when in Recovery state. I guess the congestion window
> > > > should decrease more or ?.
> > >
> > > First, this is a more generic answer, I have no idea what linux does.
> > >
> > > I can't tell which of two cases you are talking about here.  Let's
> > > say you send
> > > 20 packets into the network in some window.  Now, the cases ...
> > >
> > > (1) We lose packets 1, 5, 13 and 17.  I.e., multiple packets are
> > >     lost from a single transmission window.  So, retransmitting
> > >     packet 1 puts us in recovery and causes congestion control
> > >     action.  I believe that the fact that packets 5, 13 and 17 are
> > >     also lost does not mean we should react to congestion again.
> > >     E.g., RFC 6675 calls for a single CC response regardless of how
> > >     many packets are lost from a window of data.
> > >
> > > (2) We lose packets 1, 5, 13 and 17 and also the retransmit of
> > >     packet 17.  So, we lose 4 packets from the first single
> > >     transmission window.  This triggers one CC response.  But, the
> > >     retransmit of packet 17 is from a subsequent transmission
> > >     window, indicating that perhaps we haven't yet done enough to
> > >     relieve the congestion.  Conservativeness would likely suggest
> > >     that in this case, yes, we should take another CC action.
> > >
> > >     And, e.g., RFC 6675 forces this second CC action by being unable
> > >     to cope with lost retransmissions.  Rather, in this case we fall
> > >     back to the RTO which means another CC response.  I am not
> > >     claiming RFC 6675 is the right approach here.  Just noting what
> > >     some spec does.  We left it this way because we didn't feel that
> > >     the complexity of dealing with this case was really generally
> > >     worth it.  But, one could envision a different algorithm making
> > >     a different choice.
> > >
> > > I hope that helps!
> > >
> > > allman
> > >
> > >
> > > --
> > > http://www.icir.org/mallman/
> > >
> > >
> >
> > _______________________________________________
> > tcpm mailing list
> > tcpm at ietf.org
> > https://www.ietf.org/mailman/listinfo/tcpm