[e2e] simulation using self-similar traffic

Wed Apr 4 03:06:59 PDT 2001

On Wed, 4 Apr 2001, Soo-hyeong Lee wrote:

> 
> Thank you very much for kind reply. However,
> 
> > According to Crovella and Lipsky's chapter in 'Self-similar Network
> > Traffic and Performance Evaluation' edited by Park and Willinger, if
> > you're measuring average throughput and using heavy-tailed file sizes 
> > with a small alpha, say 1.2, you'll need to simulate 10^12 samples to 
> > get 2-digit accuracy. 
> 
> Crovella and Lipsky's paper is about getting average by taking
> independent samples from a heavy-tailed random variable, and doesn't
> mention anything about relation between throughput and file size. It
> doesn't take into consideration the long-range dependence either.
> 
> The situation of my interest is more related with long-range dependence
> among traffic loads of different time instance or high variability of
> time span of ON,OFF period.
> The type of workload I want to simulate with is a kind of ON-OFF source
> whose OFF period follows Pareto distribution and ON period is devoted to
> transfer a file whose size is Pareto-distributed. This is a simplified
> version of what is proposed in Barford's paper(Barford, Paul; Crovella,
> Mark. Generating Representative Web Workloads for Network and Server
> Performance Evaluation, In Proceedings of ACM SIGMETRICS '98.).

Soo-hyeong,

The heavy-tailed random variable Crovella and Lipsky were talking about
was for the Pareto-distributed ON period or file size you are interested. 
This is also what I meant by heavy-tailed file sizes (assuming you're
using a SURGE-like model to generate traffic already...)

> I hope that there may be some research, because this type of traffic
> trace is what is dominating the real network.
> 

Some of us do generate aggregated Internet-like traffic with Poisson 
or Pareto arrivals and Pareto file sizes.   See for an example:

'Dynamics of IP traffic: A study of the role of variability and the 
impact of control'
Anja Feldmann; Anna C. Gilbert; Polly Huang; Walter Willinger 
In the proceeding of SIGCOMM '99, Cambridge, Massachusetts, Sept 1999 

I believe others on the list can point to more works that generate
long-range dependent traffic in a similar fashion.

> 
> 
> > For metrics like 90% quantile of the throughput
> 
> Do you mean 90-percentile value in the random variable named 'average
> throughput measured in small interval', below which most(90%) of the
> small-interval-average-throughput lie?
> 
> > 1. plot the 90% quantile of the file sizes for different numbers of
> >    samples (using MATLAB or Splus, see the attached ps for example)
> 
> I am afraid that you seem to switch arbitrarily between throughput and
> file size.
> Could you explain why 90-percentile of the file size is related with
> 90-percentile of throughput?

Perhaps I wasn't being clear in the previous email.  Assuming these are
TCP files (which is the case) and if you're interested in measuring
'average TCP throughput', I expect one would need to generate as many
samples as necessary to make sure at least the average file size
converges. Otherwise, I couldn't see why would average throughput
converge.

As for the 90% quantile of throughtputs, I meant the 90% highest value of 
all the measured connection throughputs.  I.e., median would be the 
50% quantile.  

> 
> > 2. identify where the value converges
> > 3. simulate at minimum that amount of samples
> 
> On what ground are you thinking that 90-percentile converges much
> faster that average?

That's because 90% quantile doesn't take into account those extremely
large (but rare) samples which contribute significantly to the infinite
variance and the slow convergence of average.

Cheers,
-Polly

> 
> > This might not be sufficient theoretically but necessary intuitively.
> > And I hope this helps.
> > 
> > -Polly
> > 
> > On Tue, 3 Apr 2001, Soo-hyeong Lee wrote:
> > 
> > > 
> > > Hello,
> > > 
> > > Could you please tell me how long should I run a simulation to obtain a sufficiently confident result when using a self-similar traffic trace?
> > > I want to show the performance of a scheme in the 'general' case which consists of mixture of busy period and silent period.
> > > However, a self-similar traffic trace can have very long busy period and very long silent period with unnegligible probability. Then any fixed simulation time can be entirely filled with either busy period or silent period with unnegligible probability.
> > > Is there any recommendation on simulation time (or how many independent simulations should be run) to yield something like 90% confidence interval.
> > > 
> > > Thanks and regards.
> > > 
> > > Soo-hyeong
> > > 
> > > 
> > > 
> > 
> > 
> > 
>