From v13 at v13.gr Fri Oct 1 02:08:16 2010 From: v13 at v13.gr (Stefanos Harhalakis) Date: Fri, 1 Oct 2010 12:08:16 +0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <4CA5218B.1030804@reed.com> References: <201009302324.47717.v13@v13.gr> <4CA5218B.1030804@reed.com> Message-ID: <201010011208.16336.v13@v13.gr> On Friday 01 of October 2010, David P. Reed wrote: > TCP endpoint stacks are not part of IETF standards. In particular > "socket" calls and their meaning are not standardized. Of course. I don't suggest any change required in any point in the underlying stack. Only a suggestion for implementations of DNS/TCP clients. > I suspect that there might be a problem here in the actual *stacks* > because to "half close" a connection, one issues a "close" to the > socket. After such a command, many operating systems may not expect to > provide more data on the receive side. In fact one has to use shutdown() instead of close(). From what I understand, close() closes the socket, while shutdown handles the connection leaving the socket fd valid. > I suppose you could signal a half-close without an operating system > close by some kind of ioctl call (in Linux type OS's). In OS's like > Symbian and DOS/Windows, etc., there may be some "control" operation > that one can call. However it would be decidedly non-standard for the > client to close when it expects more data. In linux it works fine. You only have to shutdown() the socket with how=SHUT_WR. Here's the tcpdump of a test where a client sends 10K data and receives 10K data while the server receives 10K data, sleeps for 1 second and sends 10K data. The client half closes the connection after it finishes write()ing: Connect/Accept: 11:38:57.908427 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [SEW], seq 1205578956, win 32792, options [mss 16396,sackOK,TS val 5097055 ecr 0,nop,wscale 7], length 0 11:38:57.908453 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [S.E], seq 1215158356, ack 1205578957, win 32768, options [mss 16396,sackOK,TS val 5097055 ecr 5097055,nop,wscale 7], length 0 11:38:57.908467 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 1, win 257, options [nop,nop,TS val 5097055 ecr 5097055], length 0 Client -> Server data: 11:38:57.908520 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [P.], seq 1:10001, ack 1, win 257, options [nop,nop,TS val 5097055 ecr 5097055], length 10000 11:38:57.908544 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [.], ack 10001, win 386, options [nop,nop,TS val 5097055 ecr 5097055], length 0 Client's shutdown(SHUT_WR): 11:38:57.908577 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [F.], seq 10001, ack 1, win 257, options [nop,nop,TS val 5097055 ecr 5097055], length 0 11:38:57.948106 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [.], ack 10002, win 386, options [nop,nop,TS val 5097095 ecr 5097055], length 0 -- 1 second sleep after read() and before write() at server side-- Server -> Client data: 11:38:58.908653 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [P.], seq 1:10001, ack 10002, win 386, options [nop,nop,TS val 5098055 ecr 5097055], length 10000 11:38:58.908676 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 10001, win 386, options [nop,nop,TS val 5098055 ecr 5098055], length 0 Final close() from both sides: 11:38:58.908703 IP 127.0.0.1.9996 > 127.0.0.1.44572: Flags [F.], seq 10001, ack 10002, win 386, options [nop,nop,TS val 5098055 ecr 5098055], length 0 11:38:58.908710 IP 127.0.0.1.44572 > 127.0.0.1.9996: Flags [.], ack 10002, win 386, options [nop,nop,TS val 5098055 ecr 5098055], length 0 > All that said, this is not that big a deal - there are lots of ways to > deal with this without changing the protocol. For example, the DNS > *server* could shorten its fin-wait timeout to a very short timeout, > after which it just drops any record of the connection. Agreed, but is there any objection on testing/suggesting such a valid (from the protocol's POV) hack? I mean: do you see any problem? If I understand this correctly the half-closed connection will speed-up the client (considering a blocking client) by RTT/2 or even RTT depending on how the FINs are exchanged. Code of the above test is available of course. From touch at isi.edu Tue Oct 5 11:25:03 2010 From: touch at isi.edu (Joe Touch) Date: Tue, 05 Oct 2010 11:25:03 -0700 Subject: [e2e] Call for contribution to middlebox survey Message-ID: <4CAB6D7F.7040002@isi.edu> Hi, all, The following is forwarded from the multipathtcp mailing list. Joe (list admin) ------ From: Michio Honda Date: October 3, 2010 2:30:57 GMT+03:00 To: Multipath TCP Mailing List , "tcpm at ietf.org" Cc: Mark Handley Subject: [multipathtcp] Call for contribution to middlebox survey Hi, We are surveying middleboxes affecting TCP in the Internet, and we'd like you to contribute to this work by running 1 python script at your available networks, because we want data of as many paths as possible. This script generates test TCP traffic to a server node, and detects various middlebox behavior, for example, it detects how unknown TCP options are treated and if sequence number is rewritten. - Overview of script This generates test TCP traffic by using raw socket or pcap. Destinations of the test traffic are port 80, 443 and 34343 on vinson3.sfc.wide.ad.jp, which is located in Japan. The total amount of test traffic is approximately 90 connections (not parallel), and each of them uses approximately maximum 2048Byte. - System requirement Our script works on Mac OSX 10.5 or 10.6, Linux (kernel 2.6) and FreeBSD (7.0 or higher). This also requires python 2.5 or higher, and libpcap NOTE. if you try in a virtual machine on Windows, please connect the guest OS via not NAT but bridge. How to run experiment is described below per-OS basis. After the experiment, you will find 3 log files (logxxxxxxxxx.txt) in the same directory as the experiment. Please send them to us (micchie at sfc.wide.ad.jp) and tell me your network information as much as you know (e.g., product name of the broadband router, ISP name, product name of firewall appliance etc...) In addition, let us know if you have hesitation to open these information. This experiment doesn't collect traffic information other than those our script generated. ***** How to run the experiment (Mac OSX) ***** 1. Filtering RST TCP segment from OS Execute a following command by root: ipfw add 101 deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 34343,80,443 tcpflags rst NOTE: if you are already running ipfw, please add equivalent rules After the experiment, you can revert by "ipfw delete 101" 2. Executing script Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz by command line) In the for_distrib directory, execute a following command by root: sh run-bsd2.sh (This will take approximately 30 min.) ***** How to run the experiment (Linux) ***** 1. Filtering RST TCP segment from OS Execute following command by root: /sbin/iptables -A OUTPUT -p tcp -d vinson3.sfc.wide.ad.jp --tcp-flags RST RST -m multiport --dports 34343,80,443 -j DROP NOTE: if you are already running iptables, please add equivalent rules After the experiment, you can revert by opposite commands - using -D instead of -A 2. Executing script Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz) In the for_distrib directory, execute a following command by root: sh run-linux2.sh (This will take approximately 30 min.) ***** How to run the script (FreeBSD) ***** 1. Filtering RST TCP segment from OS If you are using neither ipfw nor pf: Load pf kernel module with a following command by root: kldload /boot/kernel/pf.ko Add following 2 lines to /etc/pf.conf (please replace IFNAME to your outgoing interface name (e.g., em0): pass out all block out quick on IFNAME proto tcp to vinson3.sfc.wide.ad.jp port {34343,80,443} flags R/R Execute following command by root: pfctl -e -f /etc/pf.conf If you are already running pf, please add equivalent rules After the experiment, you can revert settings by cleaning up /etc/pf.conf and executing "pfctl -d" by root If you are already using ipfw: Please add a following rule to ipfw configuration: deny tcp from any to vinson3.sfc.wide.ad.jp dst-port 34343,80,443 tcpflags rst 2. Executing script Download script from http://www.micchie.net/software/tcpexposure/for_distrib.tar.gz, and decompress it to anywhere you like (e.g., tar xzf for_distrib.tar.gz) In the for_distrib directory, execute a following command by root: sh run-bsd2.sh (This will take approximately 30 min.) Best regards, - Michio From matta at cs.bu.edu Wed Oct 13 17:39:29 2010 From: matta at cs.bu.edu (Jerry Humphrey) Date: Thu, 14 Oct 2010 07:39:29 +0700 Subject: [e2e] draw no reply Message-ID: <236862680.32279527140706@cs.bu.edu> Whajm29sft_does_the_Dow7h6nloadable_Softuzqbware_mean?It_is_very_si8gvgumple_-_it_is_zel8ceasy_as_One,_Two,_Three! You_get_the_ins1tq0p7htallation_instrucjstions,_serial_and_actc85qcelohivation_code! Find_your_dennsired_soft_and_dowconload_it_now!Downlbmkg2oad_a_soft_archi0wwve_and_save_it_on_your_com4b1qquzputerExtrdsoact_the_aroq4s0achiveInsttc0vsrall_it_and_use! Office_Professional_Plus_2010_32-bit Office_Profess49ucj5ional_Plus_2010_64-bit Windows_7_Ultbtbp40imate_32_bit Adobe_Photosbdg6t0hop_CS5_Extended Office_Profmzb1essional_2007 Windows_7_U257zultimate_64_bit Adidspp3n2obe_Acrobat_9_Pro_Extended Windows_XP_Pr90d2ofessional_with_Service_Pack_3 Office_Home_and_Student_2007 Nero_Multim2struyedia_Suite_10 Adobe_Creative_Suite_5_Master_Collection Adobe_Photosg349hop_CS5_Extended_for_MAC Adobe_Creativfgele_Suite_5_Master_Collection_for_MAC Adobe_Creative_Suite_5_Design_Premium Adobe_Photoshop_Lightroom_3 Adobe_Creative_Suite_5_Design_Premium_for_MAC Windows_7_Home_Premium_32_bit Autdq8ko5oCAD_2011 Office_2003_Professional_(including_Publisher_2003) Windowsxw34b_7_Professional_32_bit Adobe_Illustrzh4wxator_CS5 Aperturp5de_3_for_MAC Microsoft_Office_2008_Standart_Edition_for_Mac Mac_OS_Xelzkin_v10.5.6_Leopard FileMaker_Pro_11_Advanced_for_MAC ABBYY_FineReader_10_Professional_Edition Parallels_Debdq4jgsktop_5 Adobe_Dre3namweaver_CS5 Adobe_Photoshop_Lightroom_3_for_MAC Roxio_Toast_10_Titanium_Pro_for_MAC CyberLink_PowerDVD_10_Ultra_3D Windows_7_Professional_64_bit -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20101014/3bebbc2c/attachment.html From v13 at v13.gr Thu Oct 14 02:34:15 2010 From: v13 at v13.gr (Stefanos Harhalakis) Date: Thu, 14 Oct 2010 12:34:15 +0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <201010131827.UAA11711@TR-Sys.de> References: <201010131827.UAA11711@TR-Sys.de> Message-ID: <201010141234.15781.v13@v13.gr> Hello, On Wednesday 13 of October 2010, Alfred H?nes wrote: > Stefanos, > regarding your question on TCP connection management for DNS, please > see RFC 5966 (DNS Transport over TCP - Implementation Requirements). Thanks for the pointer. I just read it. See comments inline: > For instance, an end-system resolver (or a DNS forwarder, e.g. in an > access gateway) that is configured with a single recursive resolver, > to which it will forward all queries, should preferably keep a TCP > connection open for considerable time, ideally forever. This should only apply whenever a stub resolver is involved. If it isn't or the client API directly contacts the DNS server (like in linux systems without nscd running) then it is not possible to use this kind of keep-alived connections (no?). In that case the client may not benefit from an open connection (e.g. when running "dig") or it is not aware of using TCP. I believe that this is the case of most implementations that use gethostbyname() (or similars) without using sethostent(1). Since gethostbyname() will retry using TCP whenever the UDP query was truncated and then close the connection, the early FIN would work there. > Unless the server is under serious resource pressure, it should keep > a TCP connection open for a time span that is comparable to the time > it would keep the closed connection in TIME-WAIT state. The proposal (which should apply for all query/reply TCP-based sessions) only affects clients. Server-side would only benefit from clients that early-close. The decision for early-closing the connection is a client-side decision which in many cases is already aware of whether it will keep the connection open. The server-side implementation only needs to test the socket for reading (which I believe it already does if it supports connection keep-alive) and if it fails it should close the connection exactly at the point where all data are sent. Am I mistaken in the above rationale? From v13 at v13.gr Thu Oct 14 02:35:38 2010 From: v13 at v13.gr (Stefanos Harhalakis) Date: Thu, 14 Oct 2010 12:35:38 +0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <4CA62E3C.5030208@reed.com> References: <201009302324.47717.v13@v13.gr> <201010011208.16336.v13@v13.gr> <4CA62E3C.5030208@reed.com> Message-ID: <201010141235.39124.v13@v13.gr> On Friday 01 of October 2010, David P. Reed wrote: > Stefanos - good point about shutdown. I had pretty much forgotten it > existed in Unix stacks, I don't even know if I've had much reason to use > it... I checked Symbian and Windows, finding that they do have > similarly named calls. that support declaring the end of writing and > sending a FIN. Thanks for testing that! From fernando at gont.com.ar Thu Oct 14 23:51:17 2010 From: fernando at gont.com.ar (Fernando Gont) Date: Fri, 15 Oct 2010 03:51:17 -0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <201009302324.47717.v13@v13.gr> References: <201009302324.47717.v13@v13.gr> Message-ID: <4CB7F9E5.10706@gont.com.ar> On 30/09/2010 05:24 p.m., Stefanos Harhalakis wrote: >>From what I understand, the FINs are problematic for this kind of traffic so I > was wondering: Why not transform this to: > > * Client connects > * Client queries > * Client closes > * Server responds > * Server closes > > IOW: Since we know that this connection will have a Query and a Response, why > not half-close the connection after the query is sent? This requires no > modifications to TCP's behavior and will save at least RTT/2 because the > server side will not wait for a FIN before closing. (no?) Are you aiming at reducing state on servers? If that's the case, one might argue that you still need to handle this (and any other) sequence gracefully... or else expect that you might be the subject of DoS attacks.... Thanks, -- Fernando Gont e-mail: fernando at gont.com.ar || fgont at acm.org PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1 From v13 at v13.gr Fri Oct 29 06:06:42 2010 From: v13 at v13.gr (Stefanos Harhalakis) Date: Fri, 29 Oct 2010 16:06:42 +0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <4CB7F9E5.10706@gont.com.ar> References: <201009302324.47717.v13@v13.gr> <4CB7F9E5.10706@gont.com.ar> Message-ID: <201010291606.42992.v13@v13.gr> Hello, On Friday 15 of October 2010, Fernando Gont wrote: > On 30/09/2010 05:24 p.m., Stefanos Harhalakis wrote: > > IOW: Since we know that this connection will have a Query and a Response, > > why not half-close the connection after the query is sent? This requires > > no modifications to TCP's behavior and will save at least RTT/2 because > > the server side will not wait for a FIN before closing. (no?) > > Are you aiming at reducing state on servers? > > If that's the case, one might argue that you still need to handle this > (and any other) sequence gracefully... or else expect that you might be > the subject of DoS attacks.... While reading old mails of end2end-interest, I was left with the impression that there is concern about the TIME_WAIT state of DNS servers and the port allocation period. Having that in mind, I came with the above proposal. >From what I understand, (when using my proposal) since a DNS client would be the first to close the connection, it will have to wait in TIME_WAIT and the server will not. In contrast, with the current situation, if a server supports persistent connections, it should wait for the client to close the connection first, meaning that the DNS server will have to TIME_WAIT. Early half-closing of a connection is not a violation of TCP and does not affect the reliability of the connection. The connections are still closed gracefully and there is no possibility of data loss or DoS attacks. Furthermore, the proposal does not require any modifications at all in the server's implementation and could be implemented with very few changes in clients. I've tested the proposal against BIND using a custom-made client and: a) It works very well, meaning that the BIND's implementation is able to handle half-closed connections without a problem b) It seems that BIND does not support persistent TCP connections and thus it cannot be used for tests right now. Of course I can share the code if you like. If you can give me any hints on how I can further test this I'll be grateful. From v13 at v13.gr Fri Oct 29 06:20:22 2010 From: v13 at v13.gr (Stefanos Harhalakis) Date: Fri, 29 Oct 2010 16:20:22 +0300 Subject: [e2e] Early FIN/close for query-response TCP connections like DNS's In-Reply-To: <201010141937.VAA14263@TR-Sys.de> References: <201010141937.VAA14263@TR-Sys.de> Message-ID: <201010291620.22707.v13@v13.gr> Hello, On Thursday 14 of October 2010, Alfred HÎnes wrote: > On Thu Oct 14 02:34:15 PDT 2010, Stefanos Harhalakis wrote: > > On Wednesday 13 of October 2010, Alfred HÎnes wrote: > >> For instance, an end-system resolver (or a DNS forwarder, e.g. in an > >> access gateway) that is configured with a single recursive resolver, > >> to which it will forward all queries, should preferably keep a TCP > >> connection open for considerable time, ideally forever. > > > > This should only apply whenever a stub resolver is involved. If it > > isn't or the client API directly contacts the DNS server (like in > > linux systems without nscd running) then it is not possible to use > > this kind of keep-alived connections (no?). > > Most clients -- be they application-based or host-based -- will likely > talk to a *single* recursive resolver all the time (the one they have > been configured with via DHCP or PPP[oE], or via static configuration), > and not to the different authoritative resolvers directly; > debugging tools like dig are an exception, of course. Indeed, but the persistent connection may only be used when there is a stub resolver running at client-side (For example, you cannot share a TCP connection to the DNS server between two clients like Firefox and IE) (this is the host-based client you're referring). > > In that case the client may not benefit from an open connection (e.g. > > when running "dig") or it is not aware of using TCP. I believe that > > this is the case of most implementations that use gethostbyname() (or > > similars) without using sethostent(1). Since gethostbyname() will > > retry using TCP whenever the UDP query was truncated and then close > > the connection, the early FIN would work there. > > That are per-platform implementation details. AFAIK, all DNS resolver implementations out there will retry with TCP whenever they receive a truncated reply. (No?). I was referring to that, which is not platform-specific. gethostbyname() was just an example. > > The server-side implementation only needs to test the socket for > > reading ... > > ... and exceptions! -- receipt of a FIN should perhaps be acted upon > quickly! (Of course, the details depend on the API.) AFAIK a FIN is not considered an exception. I believe that only URG data are considered an exception. At least that's what I've seen when using sockets in a couple of Unix OSs and from what is mentioned in various TCP-related mailing lists. > No, but as you have already observed, it depends on the circumstances > what behavior would be regarded as "optimal" (in some sense). > A per-host stub resolver, a browser's DNS helper, a small company > recursive server, a big ISP recursive resolver with many anycast > replicas, a DNSSEC-validating resolver, a company's authoritative > server, a large registry's server, ... -- they all will have different > views of what is "optimal". Exactly. No argue! > That's why details in DNS client behavior like what you are > considering are not standardized. Exactly^2. That's why I mentioned that as a proposal/guideline which may be followed under certain circumstances only, from willing clients only. IOW, when it makes sense. > Note that the RTT of a truncated DNS response that then causes > the setup of a TCP connection also introduces delay. The overhead > amortization over multiple queries is a real advantage -- think of > an application using service discovery: pick up NAPTRs, then SRVs, > then address RRs -- all likely served by the same authority; if > not in cache, there will be bursts of lookups. Yeap. That's why I was referring only to one-query connections. Most of the time, the client already knows whether there is just one query or not (e.g. when using gethostbyname() there is always only one query per TCP connection). > If you are interested in extreme optimization for transactional > usage of TCP, please wait for the Experimental RFC(-to-be) 6013, > "TCP Cookie Transactions", to be published soon (maybe next week) > on the Independent Submission stream or look at the draft for it, > draft-simpson-tcpct-03 ! I'll. Thanks!