From v13 at v13.gr  Fri Oct  1 02:08:16 2010
From: v13 at v13.gr (Stefanos Harhalakis)
Date: Fri, 1 Oct 2010 12:08:16 +0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
	DNS's
In-Reply-To: <4CA5218B.1030804@reed.com>
References: <201009302324.47717.v13@v13.gr> <4CA5218B.1030804@reed.com>
Message-ID: <201010011208.16336.v13@v13.gr>

On Friday 01 of October 2010, David P. Reed wrote:
>   TCP endpoint stacks are not part of IETF standards.  In particular
> "socket" calls and their meaning are not standardized.

Of course. I don't suggest any change required in any point in the underlying 
stack. Only a suggestion for implementations of DNS/TCP clients.

> I suspect that there might be a problem here in the actual *stacks*
> because to "half close" a connection, one issues a "close" to the
> socket.  After such a command, many operating systems may not expect to
> provide more data on the receive side.

In fact one has to use shutdown() instead of close(). From what I understand, 
close() closes the socket, while shutdown handles the connection leaving the 
socket fd valid.

> I suppose you could signal a half-close without an operating system
> close by some kind of ioctl call (in Linux type OS's).  In OS's like
> Symbian and DOS/Windows, etc., there may be some "control" operation
> that one can call. However it would be decidedly non-standard for the
> client to close when it expects more data.

In linux it works fine. You only have to shutdown() the socket with 
how=SHUT_WR. Here's the tcpdump of  a test where a client sends 10K data and 
receives 10K data while the server receives 10K data, sleeps for 1 second and 
sends 10K data. The client half closes the connection after it finishes 
write()ing:

Connect/Accept:
11:38:57.908427 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [SEW], seq 
1205578956, win 32792, options [mss 16396,sackOK,TS val 5097055 ecr 
0,nop,wscale 7], length 0
11:38:57.908453 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [S.E], seq 
1215158356, ack 1205578957, win 32768, options [mss 16396,sackOK,TS val 
5097055 ecr 5097055,nop,wscale 7], length 0
11:38:57.908467 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 1, win 
257, options [nop,nop,TS val 5097055 ecr 5097055], length 0

Client -> Server data:
11:38:57.908520 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [P.], seq 1:10001, 
ack 1, win 257, options [nop,nop,TS val 5097055 ecr 5097055], length 10000
11:38:57.908544 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [.], ack 10001, win 
386, options [nop,nop,TS val 5097055 ecr 5097055], length 0

Client's shutdown(SHUT_WR):
11:38:57.908577 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [F.], seq 10001, 
ack 1, win 257, options [nop,nop,TS val 5097055 ecr 5097055], length 0
11:38:57.948106 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [.], ack 10002, win 
386, options [nop,nop,TS val 5097095 ecr 5097055], length 0

-- 1 second sleep after read() and before write() at server side--

Server -> Client data:
11:38:58.908653 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [P.], seq 1:10001, 
ack 10002, win 386, options [nop,nop,TS val 5098055 ecr 5097055], length 10000
11:38:58.908676 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 10001, win 
386, options [nop,nop,TS val 5098055 ecr 5098055], length 0

Final close() from both sides:
11:38:58.908703 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [F.], seq 10001, 
ack 10002, win 386, options [nop,nop,TS val 5098055 ecr 5098055], length 0
11:38:58.908710 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 10002, win 
386, options [nop,nop,TS val 5098055 ecr 5098055], length 0

> All that said, this is not that big a deal - there are lots of ways to
> deal with this without changing the protocol.  For example, the DNS
> *server* could shorten its fin-wait timeout to a very short timeout,
> after which it just drops any record of the connection.

Agreed, but is there any objection on testing/suggesting such a valid (from 
the protocol's POV) hack? I mean: do you see any problem?

If I understand this correctly the half-closed connection will speed-up the 
client (considering a blocking client) by RTT/2 or even RTT depending on how 
the FINs are exchanged.

Code of the above test is available of course.

From touch at isi.edu  Tue Oct  5 11:25:03 2010
From: touch at isi.edu (Joe Touch)
Date: Tue, 05 Oct 2010 11:25:03 -0700
Subject: [e2e] Call for contribution to middlebox survey
Message-ID: <4CAB6D7F.7040002@isi.edu>

Hi, all,

The following is forwarded from the multipathtcp mailing list.

Joe (list admin)

------

From: Michio Honda <micchie at sfc.wide.ad.jp>
Date: October 3, 2010 2:30:57 GMT+03:00
To: Multipath TCP Mailing List <multipathtcp at ietf.org>, "tcpm at ietf.org" 
<tcpm at ietf.org>
Cc: Mark Handley <m.handley at cs.ucl.ac.uk>
Subject: [multipathtcp] Call for contribution to middlebox survey

Hi,

We are surveying middleboxes affecting TCP in the Internet, and we'd 
like you to contribute to this work by running 1 python script at your 
available networks, because we want data of as many paths as possible.
This script generates test TCP traffic to a server node, and detects 
various middlebox behavior, for example, it detects how unknown TCP 
options are treated and if sequence number is rewritten.

- Overview of script
This generates test TCP traffic by using raw socket or pcap.
Destinations of the test traffic are port 80, 443 and 34343 on 
vinson3.sfc.wide.ad.jp, which is located in Japan.
The total amount of test traffic is approximately 90 connections (not 
parallel), and each of them uses approximately maximum 2048Byte.

- System requirement
Our script works on Mac OSX 10.5 or 10.6, Linux (kernel 2.6) and FreeBSD 
(7.0 or higher).  This also requires python 2.5 or higher, and libpcap
NOTE. if you try in a virtual machine on Windows, please connect the 
guest OS via not NAT but bridge.

How to run experiment is described below per-OS basis.

After the experiment, you will find 3 log files (logxxxxxxxxx.txt) in 
the same directory as the experiment.
Please send them to us (micchie at sfc.wide.ad.jp) and tell me your network 
information as much as you know (e.g., product name of the broadband 
router, ISP name, product name of firewall appliance etc...)
In addition, let us know if you have hesitation to open these information.
This experiment doesn't collect traffic information other than those our 
script generated.

***** How to run the experiment (Mac OSX) *****

1. Filtering RST TCP segment from OS
Execute a following command by root:
ipfw add 101 deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 
34343,80,443 tcpflags rst

NOTE: if you are already running ipfw, please add equivalent rules
After the experiment, you can revert by "ipfw delete 101"

2. Executing script
Download script from 
http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and 
decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz by 
command line)

In the for_distrib directory, execute a following command by root:
sh run-bsd2.sh
(This will take approximately 30 min.)


***** How to run the experiment (Linux) *****

1. Filtering RST TCP segment from OS
Execute following command by root:
/sbin/iptables -A OUTPUT -p tcp -d vinson3.sfc.wide.ad.jp --tcp-flags 
RST RST -m multiport --dports 34343,80,443 -j DROP

NOTE: if you are already running iptables, please add equivalent rules
After the experiment, you can revert by opposite commands - using -D 
instead of -A

2. Executing script
Download script from 
http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and 
decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz)

In the for_distrib directory, execute a following command by root:
sh run-linux2.sh
(This will take approximately 30 min.)


***** How to run the script (FreeBSD) *****

1. Filtering RST TCP segment from OS
If you are using neither ipfw nor pf:
Load pf kernel module with a following command by root:
kldload /boot/kernel/pf.ko

Add following 2 lines to /etc/pf.conf (please replace IFNAME to your 
outgoing interface name (e.g., em0):
pass out all
block out quick on IFNAME proto tcp to vinson3.sfc.wide.ad.jp port 
{34343,80,443} flags R/R

Execute following command by root:
pfctl -e -f /etc/pf.conf

If you are already running pf, please add equivalent rules
After the experiment, you can revert settings by  cleaning up 
/etc/pf.conf and executing "pfctl -d" by root

If you are already using ipfw:
  Please add a following rule to ipfw configuration:
  deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 34343,80,443 
tcpflags rst

2. Executing script
Download script from 
http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and 
decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz)

In the for_distrib directory, execute a following command by root:
sh run-bsd2.sh
(This will take approximately 30 min.)


Best regards,
- Michio

From matta at cs.bu.edu  Wed Oct 13 17:39:29 2010
From: matta at cs.bu.edu (Jerry Humphrey)
Date: Thu, 14 Oct 2010 07:39:29 +0700
Subject: [e2e] draw no reply
Message-ID: <236862680.32279527140706@cs.bu.edu>

Whajm29sft_does_the_Dow7h6nloadable_Softuzqbware_mean?It_is_very_si8gvgumple_-_it_is_zel8ceasy_as_One,_Two,_Three!
You_get_the_ins1tq0p7htallation_instrucjstions,_serial_and_actc85qcelohivation_code!
Find_your_dennsired_soft_and_dowconload_it_now!Downlbmkg2oad_a_soft_archi0wwve_and_save_it_on_your_com4b1qquzputerExtrdsoact_the_aroq4s0achiveInsttc0vsrall_it_and_use! Office_Professional_Plus_2010_32-bit
Office_Profess49ucj5ional_Plus_2010_64-bit
Windows_7_Ultbtbp40imate_32_bit
Adobe_Photosbdg6t0hop_CS5_Extended
Office_Profmzb1essional_2007
Windows_7_U257zultimate_64_bit
Adidspp3n2obe_Acrobat_9_Pro_Extended
Windows_XP_Pr90d2ofessional_with_Service_Pack_3
Office_Home_and_Student_2007
Nero_Multim2struyedia_Suite_10
Adobe_Creative_Suite_5_Master_Collection
Adobe_Photosg349hop_CS5_Extended_for_MAC
Adobe_Creativfgele_Suite_5_Master_Collection_for_MAC
Adobe_Creative_Suite_5_Design_Premium
Adobe_Photoshop_Lightroom_3
Adobe_Creative_Suite_5_Design_Premium_for_MAC
Windows_7_Home_Premium_32_bit
Autdq8ko5oCAD_2011
Office_2003_Professional_(including_Publisher_2003)
Windowsxw34b_7_Professional_32_bit
Adobe_Illustrzh4wxator_CS5
Aperturp5de_3_for_MAC
Microsoft_Office_2008_Standart_Edition_for_Mac
Mac_OS_Xelzkin_v10.5.6_Leopard
FileMaker_Pro_11_Advanced_for_MAC
ABBYY_FineReader_10_Professional_Edition
Parallels_Debdq4jgsktop_5
Adobe_Dre3namweaver_CS5
Adobe_Photoshop_Lightroom_3_for_MAC
Roxio_Toast_10_Titanium_Pro_for_MAC
CyberLink_PowerDVD_10_Ultra_3D
Windows_7_Professional_64_bit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20101014/3bebbc2c/attachment.html

From v13 at v13.gr  Thu Oct 14 02:34:15 2010
From: v13 at v13.gr (Stefanos Harhalakis)
Date: Thu, 14 Oct 2010 12:34:15 +0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
	DNS's
In-Reply-To: <201010131827.UAA11711@TR-Sys.de>
References: <201010131827.UAA11711@TR-Sys.de>
Message-ID: <201010141234.15781.v13@v13.gr>

Hello,

On Wednesday 13 of October 2010, Alfred H?nes wrote:
> Stefanos,
> regarding your question on TCP connection management for DNS, please
> see RFC 5966 (DNS Transport over TCP - Implementation Requirements).

Thanks for the pointer. I just read it. See comments inline:

> For instance, an end-system resolver (or a DNS forwarder, e.g. in an
> access gateway) that is configured with a single recursive resolver,
> to which it will forward all queries, should preferably keep a TCP
> connection open for considerable time, ideally forever.

This should only apply whenever a stub resolver is involved. If it isn't or 
the client API directly contacts the DNS server (like in linux systems without 
nscd running) then it is not possible to use this kind of keep-alived 
connections (no?).

In that case the client may not benefit from an open connection (e.g. when 
running "dig") or it is not aware of using TCP. I believe that this is the 
case of most implementations that use gethostbyname() (or similars) without 
using sethostent(1). Since gethostbyname() will retry using TCP whenever the 
UDP query was truncated and then close the connection, the early FIN would 
work there.

> Unless the server is under serious resource pressure, it should keep
> a TCP connection open for a time span that is comparable to the time
> it would keep the closed connection in TIME-WAIT state.

The proposal (which should apply for all query/reply TCP-based sessions) only 
affects clients. Server-side would only benefit from clients that early-close. 
The decision for early-closing the connection is a client-side decision which 
in many cases is already aware of whether it will keep the connection open.

The server-side implementation only needs to test the socket for reading 
(which I believe it already does if it supports connection keep-alive) and if 
it fails it should close the connection exactly at the point where all data 
are sent.

Am I mistaken in the above rationale?

From v13 at v13.gr  Thu Oct 14 02:35:38 2010
From: v13 at v13.gr (Stefanos Harhalakis)
Date: Thu, 14 Oct 2010 12:35:38 +0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
	DNS's
In-Reply-To: <4CA62E3C.5030208@reed.com>
References: <201009302324.47717.v13@v13.gr> <201010011208.16336.v13@v13.gr>
	<4CA62E3C.5030208@reed.com>
Message-ID: <201010141235.39124.v13@v13.gr>

On Friday 01 of October 2010, David P. Reed wrote:
>   Stefanos - good point about shutdown.  I had pretty much forgotten it
> existed in Unix stacks, I don't even know if I've had much reason to use
> it...  I checked Symbian and Windows, finding that they do have
> similarly named calls. that support declaring the end of writing and
> sending a FIN.

Thanks for testing that!

From fernando at gont.com.ar  Thu Oct 14 23:51:17 2010
From: fernando at gont.com.ar (Fernando Gont)
Date: Fri, 15 Oct 2010 03:51:17 -0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
 DNS's
In-Reply-To: <201009302324.47717.v13@v13.gr>
References: <201009302324.47717.v13@v13.gr>
Message-ID: <4CB7F9E5.10706@gont.com.ar>

On 30/09/2010 05:24 p.m., Stefanos Harhalakis wrote:

>>From what I understand, the FINs are problematic for this kind of traffic so I 
> was wondering: Why not transform this to:
> 
> * Client connects
> * Client queries
> * Client closes
> * Server responds
> * Server closes
> 
> IOW: Since we know that this connection will have a Query and a Response, why 
> not half-close the connection after the query is sent? This requires no 
> modifications to TCP's behavior and will save at least RTT/2 because the 
> server side will not wait for a FIN before closing. (no?)

Are you aiming at reducing state on servers?

If that's the case, one might argue that you still need to handle this
(and any other) sequence gracefully... or else expect that you might be
the subject of DoS attacks....

Thanks,
-- 
Fernando Gont
e-mail: fernando at gont.com.ar || fgont at acm.org
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1


From v13 at v13.gr  Fri Oct 29 06:06:42 2010
From: v13 at v13.gr (Stefanos Harhalakis)
Date: Fri, 29 Oct 2010 16:06:42 +0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
	DNS's
In-Reply-To: <4CB7F9E5.10706@gont.com.ar>
References: <201009302324.47717.v13@v13.gr> <4CB7F9E5.10706@gont.com.ar>
Message-ID: <201010291606.42992.v13@v13.gr>

Hello,

On Friday 15 of October 2010, Fernando Gont wrote:
> On 30/09/2010 05:24 p.m., Stefanos Harhalakis wrote:
> > IOW: Since we know that this connection will have a Query and a Response,
> > why not half-close the connection after the query is sent? This requires
> > no modifications to TCP's behavior and will save at least RTT/2 because
> > the server side will not wait for a FIN before closing. (no?)
> 
> Are you aiming at reducing state on servers?
> 
> If that's the case, one might argue that you still need to handle this
> (and any other) sequence gracefully... or else expect that you might be
> the subject of DoS attacks....

While reading old mails of end2end-interest, I was left with the impression 
that there is concern about the TIME_WAIT state of DNS servers and the port 
allocation period. Having that in mind, I came with the above proposal.

>From what I understand, (when using my proposal) since a DNS client would be 
the first to close the connection, it will have to wait in TIME_WAIT and the 
server will not. In contrast, with the current situation, if a server supports 
persistent connections, it should wait for the client to close the connection 
first, meaning that the DNS server will have to TIME_WAIT.

Early half-closing of a connection is not a violation of TCP and does not 
affect the reliability of the connection. The connections are still closed 
gracefully and there is no possibility of data loss or DoS attacks. 
Furthermore, the proposal does not require any modifications at all in the 
server's implementation and could be implemented with very few changes in 
clients.

I've tested the proposal against BIND using a custom-made client and:
a) It works very well, meaning that the BIND's implementation is able to 
handle half-closed connections without a problem
b) It seems that BIND does not support persistent TCP connections and thus it 
cannot be used for tests right now.

Of course I can share the code if you like.

If you can give me any hints on how I can further test this I'll be grateful.


From v13 at v13.gr  Fri Oct 29 06:20:22 2010
From: v13 at v13.gr (Stefanos Harhalakis)
Date: Fri, 29 Oct 2010 16:20:22 +0300
Subject: [e2e] Early FIN/close for query-response TCP connections like
	DNS's
In-Reply-To: <201010141937.VAA14263@TR-Sys.de>
References: <201010141937.VAA14263@TR-Sys.de>
Message-ID: <201010291620.22707.v13@v13.gr>

Hello,

On Thursday 14 of October 2010, Alfred HÎnes wrote:
> On Thu Oct 14 02:34:15 PDT 2010, Stefanos Harhalakis wrote:
> > On Wednesday 13 of October 2010, Alfred HÎnes wrote:
> >> For instance, an end-system resolver (or a DNS forwarder, e.g. in an
> >> access gateway) that is configured with a single recursive resolver,
> >> to which it will forward all queries, should preferably keep a TCP
> >> connection open for considerable time, ideally forever.
> > 
> > This should only apply whenever a stub resolver is involved. If it
> > isn't or the client API directly contacts the DNS server (like in
> > linux systems without  nscd running) then it is not possible to use
> > this kind of keep-alived connections (no?).
> 
> Most clients -- be they application-based or host-based -- will likely
> talk to a *single* recursive resolver all the time (the one they have
> been configured with via DHCP or PPP[oE], or via static configuration),
> and not to the different authoritative resolvers directly;
> debugging tools like dig are an exception, of course.

Indeed, but the persistent connection may only be used when there is a stub 
resolver running at client-side (For example, you cannot share a TCP 
connection to the DNS server between two clients like Firefox and IE) (this is 
the host-based client you're referring).

> > In that case the client may not benefit from an open connection (e.g.
> > when running "dig") or it is not aware of using TCP.  I believe that
> > this is the case of most implementations that use gethostbyname() (or
> > similars) without using sethostent(1).  Since gethostbyname() will
> > retry using TCP whenever the UDP query was truncated and then close
> > the connection, the early FIN would work there.
> 
> That are per-platform implementation details.

AFAIK, all DNS resolver implementations out there will retry with TCP whenever 
they receive a truncated reply. (No?). I was referring to that, which is not 
platform-specific. gethostbyname() was just an example.

> > The server-side implementation only needs to test the socket for
> > reading ...
> 
>   ... and exceptions! -- receipt of a FIN should perhaps be acted upon
>   quickly!  (Of course, the details depend on the API.)

AFAIK a FIN is not considered an exception. I believe that only URG data are 
considered an exception. At least that's what I've seen when using sockets in 
a couple of Unix OSs and from what is mentioned in various TCP-related mailing 
lists.

> No, but as you have already observed, it depends on the circumstances
> what behavior would be regarded as "optimal" (in some sense).
> A per-host stub resolver, a browser's DNS helper, a small company
> recursive server, a big ISP recursive resolver with many anycast
> replicas, a DNSSEC-validating resolver, a company's authoritative
> server, a large registry's server, ... -- they all will have different
> views of what is "optimal".

Exactly. No argue!

> That's why details in DNS client behavior like what you are
> considering are not standardized.

Exactly^2. That's why I mentioned that as a proposal/guideline which may be 
followed under certain circumstances only, from willing clients only. IOW, 
when it makes sense.

> Note that the RTT of a truncated DNS response that then causes
> the setup of a TCP connection also introduces delay.  The overhead
> amortization over multiple queries is a real advantage -- think of
> an application using service discovery: pick up NAPTRs, then SRVs,
> then address RRs -- all likely served by the same authority; if
> not in cache, there will be bursts of lookups.

Yeap. That's why I was referring only to one-query connections. Most of the 
time, the client already knows whether there is just one query or not (e.g. 
when using gethostbyname() there is always only one query per TCP connection).

> If you are interested in extreme optimization for transactional
> usage of TCP, please wait for the Experimental RFC(-to-be) 6013,
> "TCP Cookie Transactions", to be published soon (maybe next week)
> on the Independent Submission stream or look at the draft for it,
> draft-simpson-tcpct-03 !

I'll. Thanks!