[e2e] simulation using self-similar traffic

Wed Apr 4 09:45:49 PDT 2001

Sorry, came in late to this discussion.

Polly's points are right in line with what we had in mind in writing the
paper.   Namely, that if the sample mean of the inputs doesn't converge
rapidly, then simple measures like throughput (which, as Polly points out,
essentially involves the sample mean of the inputs) won't converge rapidly
either.   This is simply an argument about necessary, not sufficient
conditions.   I would expect sufficient conditions to depend rather heavily
on the particular metric of interest and probably don't exist in most cases.

Also note, that the reason that although the paper considered the sample
mean of i.i.d. heavy-tailed RV's, the convergence properties of the moments
of an LRD process are quite similar.  So, an (effectively) bounded process
can indeed show slow convergence in the mean and variance, because of
autocorrelation.   Using i.i.d. HTRV's is a particularly convenient way to
quantify the implications of slow convergence of moments, because of the
existence of limit theorems.

Mark

-----Original Message-----
From: end2end-interest-admin at postel.org
[mailto:end2end-interest-admin at postel.org]On Behalf Of Polly Huang
Sent: Wednesday, April 04, 2001 11:08 AM
To: Soo-hyeong Lee
Cc: end2end-interest
Subject: Re: [e2e] simulation using self-similar traffic

On Wed, 4 Apr 2001, Soo-hyeong Lee wrote:

>
> > The heavy-tailed random variable Crovella and Lipsky were talking about
> > was for the Pareto-distributed ON period or file size you are
interested.
> > This is also what I meant by heavy-tailed file sizes (assuming you're
> > using a SURGE-like model to generate traffic already...)
>
> The difference is that Crovella and Lipsky tried to measure average of a
known random variable,
> while I am trying to measure average of a random variable which is a
function of known random variables.
> Suppose file size is Pareto-distributed, then Crovella and Lipsky tried to
measure average file size, while I am trying to measure average throughput
when transferring those files.
>
> > Assuming these are
> > TCP files (which is the case) and if you're interested in measuring
> > 'average TCP throughput', I expect one would need to generate as many
> > samples as necessary to make sure at least the average file size
> > converges. Otherwise, I couldn't see why would average throughput
> > converge.
>
> You seem to argue that convergence of average file size to its ensemble
mean is a necessary condition for convergence of throughput. You don't seem
to argue that it's a sufficient condition. What I want is a sufficient
condition for throughput convergence.
>
> However, I am not quite sure if it is a necessary condition.
> The reason that a running average of an iid Pareto sequence doesn't
> converge quickly would be that the law of large number doesn't hold here
> because of infinite variance of Pareto variable.
> However, throughput is clearly upper bounded by link bandwidth and
> doesn't show infinite variance. In fact, throughput may be calculated as
> (sum of file sizes successfully transferred) / (time taken). Even if
> file size doesn't converge, its ratio over transfer duration may still
> converge.

Soo-hyeong,

I'd like to point out that just because a process is bounded doesn't
necessary mean it will converge.  And you might be right that it might
not be a necessary condition either.  Again, without formal analysis,
I'm using the method as heuristics and only claiming that it seems to
make sense 'intuitively'.  Let me borrow your formula:

(sum of file sizes successfully transmitted) / (time taken)

Suppose we complete all file transfers in a simulation. Since mean is
just sum of file sizes divided by number of files, if mean doesn't
converge, the sum won't converge.  If sum doesn't converge, I'm not
sure if overall link throughput will converge (actually the 'throughput'
I meant in earlier emails is individual TCP connection throughput not
the overall link throughput).

Perhaps the term 'time taken' will not converge either and somehow
cancel out the degree of oscilation in 'sum of file sizes successfully
transmitted'.  But it's not clear at the moment.  (Hmm.. this could
rather be a sufficient condition after all.)

Nevertheless I agree the average link throughput is bounded by the
bandwidth.  And, the discussion is going a bit too mathematical.
Perhaps we should take it offline.

Cheers,
-Polly