[e2e] Re: crippled Internet
David P. Reed
dpreed at reed.com
Thu Apr 26 19:58:28 PDT 2001
VoIP sound quality?
This is a ridiculous measure without constraining the problem.
First, sound "quality" relates to codec quality more than latency/jitter.
Second, end-to-end task latency (which is related to human ability to
conduct normal turn-taking by introducing "satellite-like dealy") in a
real-time voice channel is dominated by the jitter buffer size, which
introduces a delay proportional to jitter (std dev of delay), so total VoIP
latency is (avg.latency)+c*(std.latency)+(source processing latency)+(dest
processing latency), for some c that is set to control the dropped-frame
rate. Unless great care is taken, the latter terms dominate.
Third, many currently deployed VoIP systems use TCP rather than UDP or RTP
because the latter protocols don't work over NAT boxes and firewalls. Ask
Real Networks about what percentage of streaming (non-telephony) content
actually goes out in UDP or RTP form from their customers' servers. It
isn't much. The result is that retransmission of lost frames by TCP
Here's what's needed for VoIP:
1) very low latency software codec. You want a codec that encodes frames
that are 10-20 msec long, and pumps them out immediately in
packets. Unfortunately, to get good compression to fit on 33Kb links over
PPP, you get people trying to encode longer frames, and similarly, you get
people trying to cut IP overhead by cramming multiple frames in
packets. This is not typically what is used in most "mass market products".
2) very low latency hardware codec. You want a hardware codec that
delivers frames to the software codec instantly - device driver typically
needs to use a mapped buffer shared with the client plus a signalling
system that has very little jitter (which is not the case in most OS's in
popular use (windows and Linux for example are not great at user-level
real-time device stuff).
3) very low latency "small packet" Internet stack. If using TCP, don't
want to "dally". Prefer to use UDP or RTP. Most OS's don't have stacks
that pay attention to latency on small packets - they go for throughput, so
there's a lot of path lengths that focus on optimizing throughput at the
cost of introducing latency for small packets (buffer management, for example).
4) ability to pump data between hardware audio driver and internet
interface preempting background threads. Since this thread is not cpu
bound, since audio processing is not costly on today's processors that have
embedded DSP, the data pump goes blocked frequently. Waking up the data
pump is burdened with task wakeup latency, which is poor in a system like
Windows or Mac, for example that does not have an effective priority mechanism.
5) ability to manage audio output just before it goes to hardware codec so
that if a frame is missing due to packet drop that compatible noise is
inserted into the gap.
Most PC audio cards, PC OSs, and software codecs do not meet these
criteria. So most of the "commercial" VoIP products for the "mass market"
cannot do a good job, so the network becomes the bottleneck.
And the access network also introduces serious problems, at least in the
case of a dialup line, which is where many people try to evaluate VoIP -
thinking that "56Kb" is sufficient bandwidth, they don't realize the delays
introduced by V.92 compression and PPP are serious problems. I haven't
measured the delay & jitter introduced by PPPoEt (used by almost all DSL
broadband providers, and possibly some cable modem providers), but PPP may
well be problematic there as well, if there is competing traffic.
Significant improvements can be achieved in sound quality by using
techniques that compensate for lost packets due to errors and
congestion. These are not used either in "commercial" products.
So, I would sum this up by saying that before blaming the network for user
perceptions, we have to control for very big factors due to the lack of
attention to "sound quality" in the source and destination software. There
is much to be improved here, and the adoption of standards that were
designed for dedicated isochronous point-to-point lines by the VoIP people
has been a large part of the problem (H.32x).
WWW Page: http://www.reed.com/dpr.html
More information about the end2end-interest