[e2e] simulation using self-similar traffic

Soo-hyeong Lee shlee at mmlab.snu.ac.kr
Wed Apr 4 04:17:59 PDT 2001


> The heavy-tailed random variable Crovella and Lipsky were talking about
> was for the Pareto-distributed ON period or file size you are interested. 
> This is also what I meant by heavy-tailed file sizes (assuming you're
> using a SURGE-like model to generate traffic already...)

The difference is that Crovella and Lipsky tried to measure average of a known random variable, 
while I am trying to measure average of a random variable which is a function of known random variables.
Suppose file size is Pareto-distributed, then Crovella and Lipsky tried to measure average file size, while I am trying to measure average throughput when transferring those files.

> Assuming these are
> TCP files (which is the case) and if you're interested in measuring
> 'average TCP throughput', I expect one would need to generate as many
> samples as necessary to make sure at least the average file size
> converges. Otherwise, I couldn't see why would average throughput
> converge.

You seem to argue that convergence of average file size to its ensemble mean is a necessary condition for convergence of throughput. You don't seem to argue that it's a sufficient condition. What I want is a sufficient condition for throughput convergence.

However, I am not quite sure if it is a necessary condition.
The reason that a running average of an iid Pareto sequence doesn't converge quickly would be that the law of large number doesn't hold here because of infinite variance of Pareto variable.
However, throughput is clearly upper bounded by link bandwidth and doesn't show infinite variance. In fact, throughput may be calculated as (sum of file sizes successfully transferred) / (time taken). Even if file size doesn't converge, its ratio over transfer duration may still converge.



----- Original Message ----- 
From: "Polly Huang" <huang at tik.ee.ethz.ch>
To: "Soo-hyeong Lee" <shlee at mmlab.snu.ac.kr>
Cc: "end2end-interest" <end2end-interest at postel.org>
Sent: Wednesday, April 04, 2001 7:06 PM
Subject: Re: [e2e] simulation using self-similar traffic


> On Wed, 4 Apr 2001, Soo-hyeong Lee wrote:
> 
> > 
> > Thank you very much for kind reply. However,
> > 
> > > According to Crovella and Lipsky's chapter in 'Self-similar Network
> > > Traffic and Performance Evaluation' edited by Park and Willinger, if
> > > you're measuring average throughput and using heavy-tailed file sizes 
> > > with a small alpha, say 1.2, you'll need to simulate 10^12 samples to 
> > > get 2-digit accuracy. 
> > 
> > Crovella and Lipsky's paper is about getting average by taking
> > independent samples from a heavy-tailed random variable, and doesn't
> > mention anything about relation between throughput and file size. It
> > doesn't take into consideration the long-range dependence either.
> > 
> > The situation of my interest is more related with long-range dependence
> > among traffic loads of different time instance or high variability of
> > time span of ON,OFF period.
> > The type of workload I want to simulate with is a kind of ON-OFF source
> > whose OFF period follows Pareto distribution and ON period is devoted to
> > transfer a file whose size is Pareto-distributed. This is a simplified
> > version of what is proposed in Barford's paper(Barford, Paul; Crovella,
> > Mark. Generating Representative Web Workloads for Network and Server
> > Performance Evaluation, In Proceedings of ACM SIGMETRICS '98.).
> 
> Soo-hyeong,
> 
> The heavy-tailed random variable Crovella and Lipsky were talking about
> was for the Pareto-distributed ON period or file size you are interested. 
> This is also what I meant by heavy-tailed file sizes (assuming you're
> using a SURGE-like model to generate traffic already...)
> 
> > I hope that there may be some research, because this type of traffic
> > trace is what is dominating the real network.
> > 
> 
> Some of us do generate aggregated Internet-like traffic with Poisson 
> or Pareto arrivals and Pareto file sizes.   See for an example:
> 
> 'Dynamics of IP traffic: A study of the role of variability and the 
> impact of control'
> Anja Feldmann; Anna C. Gilbert; Polly Huang; Walter Willinger 
> In the proceeding of SIGCOMM '99, Cambridge, Massachusetts, Sept 1999 
> 
> I believe others on the list can point to more works that generate
> long-range dependent traffic in a similar fashion.
> 
> > 
> > 
> > > For metrics like 90% quantile of the throughput
> > 
> > Do you mean 90-percentile value in the random variable named 'average
> > throughput measured in small interval', below which most(90%) of the
> > small-interval-average-throughput lie?
> > 
> > > 1. plot the 90% quantile of the file sizes for different numbers of
> > >    samples (using MATLAB or Splus, see the attached ps for example)
> > 
> > I am afraid that you seem to switch arbitrarily between throughput and
> > file size.
> > Could you explain why 90-percentile of the file size is related with
> > 90-percentile of throughput?
> 
> Perhaps I wasn't being clear in the previous email.  Assuming these are
> TCP files (which is the case) and if you're interested in measuring
> 'average TCP throughput', I expect one would need to generate as many
> samples as necessary to make sure at least the average file size
> converges. Otherwise, I couldn't see why would average throughput
> converge.
> 
> As for the 90% quantile of throughtputs, I meant the 90% highest value of 
> all the measured connection throughputs.  I.e., median would be the 
> 50% quantile.  
> 
> > 
> > > 2. identify where the value converges
> > > 3. simulate at minimum that amount of samples
> > 
> > On what ground are you thinking that 90-percentile converges much
> > faster that average?
> 
> That's because 90% quantile doesn't take into account those extremely
> large (but rare) samples which contribute significantly to the infinite
> variance and the slow convergence of average.
> 
> Cheers,
> -Polly
> 
> > 
> > > This might not be sufficient theoretically but necessary intuitively.
> > > And I hope this helps.
> > > 
> > > -Polly
> > > 
> > > On Tue, 3 Apr 2001, Soo-hyeong Lee wrote:
> > > 
> > > > 
> > > > Hello,
> > > > 
> > > > Could you please tell me how long should I run a simulation to obtain a sufficiently confident result when using a self-similar traffic trace?
> > > > I want to show the performance of a scheme in the 'general' case which consists of mixture of busy period and silent period.
> > > > However, a self-similar traffic trace can have very long busy period and very long silent period with unnegligible probability. Then any fixed simulation time can be entirely filled with either busy period or silent period with unnegligible probability.
> > > > Is there any recommendation on simulation time (or how many independent simulations should be run) to yield something like 90% confidence interval.
> > > > 
> > > > Thanks and regards.
> > > > 
> > > > Soo-hyeong
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> 
> 
> 



More information about the end2end-interest mailing list