From l.wood at surrey.ac.uk  Mon Jul  2 05:23:01 2012
From: l.wood at surrey.ac.uk (l.wood@surrey.ac.uk)
Date: Mon, 2 Jul 2012 13:23:01 +0100
Subject: [e2e] "Busty" traffic?
Message-ID: <FD7B10366AE3794AB1EC5DE97A93A37341C5B16B4C@EXMB01CMS.surrey.ac.uk>


I have come across a few references to "busty" trafiic. I am trying to figure out if this is a particular statistical pattern, or just an increasingly common corruption of bursty. anyone?

Either way, the jokes write themselves.

Lloyd Wood
http://sat-net.com/L.Wood/


From anoop at alumni.duke.edu  Tue Jul  3 12:42:10 2012
From: anoop at alumni.duke.edu (Anoop Ghanwani)
Date: Tue, 3 Jul 2012 12:42:10 -0700
Subject: [e2e] "Busty" traffic?
In-Reply-To: <FD7B10366AE3794AB1EC5DE97A93A37341C5B16B4C@EXMB01CMS.surrey.ac.uk>
References: <FD7B10366AE3794AB1EC5DE97A93A37341C5B16B4C@EXMB01CMS.surrey.ac.uk>
Message-ID: <CA+-tSzxYxRqrCo=Nc9H5ZCpeseUwT1Lx52wYvh410v92LXwEJQ@mail.gmail.com>

It refers to bursty traffic in the presence of errors. :)

Interestingly, my email editor thinks bursty is a typo,
but busty is not, so this "terminology" may well be a
result of automatic spelling correction tools.

I think we have to start getting used to typos as folks
depend more and more on editing tools rather than
a second pair of eyes to catch errors.

Anoop

On Mon, Jul 2, 2012 at 5:23 AM,  <l.wood at surrey.ac.uk> wrote:
>
> I have come across a few references to "busty" trafiic. I am trying to figure out if this is a particular statistical pattern, or just an increasingly common corruption of bursty. anyone?
>
> Either way, the jokes write themselves.
>
> Lloyd Wood
> http://sat-net.com/L.Wood/
>
>

From changliu.aus at gmail.com  Tue Jul 10 05:05:46 2012
From: changliu.aus at gmail.com (Chang Liu)
Date: Tue, 10 Jul 2012 22:05:46 +1000
Subject: [e2e] =?utf-8?q?Final_CFP=3A_CGC2012_=28Cloud_and_Green_Computing?=
	=?utf-8?q?=29_and_SCA2012_=28Social_Computing_and_its_Applicatio?=
	=?utf-8?b?4oCLbnMp?=
In-Reply-To: <CAH4Q8Chz+qaKXXZK_hxX6dDz+8JsjJaYYDt0TxQZ_g3VSJVV-g@mail.gmail.com>
References: <CAH4Q8Chz+qaKXXZK_hxX6dDz+8JsjJaYYDt0TxQZ_g3VSJVV-g@mail.gmail.com>
Message-ID: <CAH4Q8CgFMhdAiO_WBCAwMiN1mGBRWm99ODYuzqb9nRzBLguN3A@mail.gmail.com>

Final CFP: CGC2012 (Cloud and Green Computing) and SCA2012 (Social
Computing and its Applicatio?ns)

Joint Call for Papers:

CGC2012 - 2012 International Conference on Cloud and Green Computing, 1-3
Nov. 2012, Xiangtan, China.
Website: http://kpnm.hnust.cn/confs/cgc2012/

SCA2012 - 2012 International Conference on Social Computing and Its
Applications, 1~3 Nov. 2012, Xiangtan, China.
Website:(http://kpnm.hnust.cn/confs/sca2012)


Key dates:
Submission Deadline: July 13, 2012 (firm)
Authors Notification: August 10, 2012
Final Manuscript Due: August 25, 2012
Registration Due: August 30, 2012

Proceedings Publication:
Proceedings will be published by IEEE CS Press (EI index).

Special issues:
CGC2012 (Cloud and Green Computing): Distinguised papers will be selected
for special issues in Concurrency and Computation: Practice and Experience;
Future Generation Computer Systems; International Journal of High
Performance Computing Applications, or Computing (Springer).


SCA2012 (Social Computing and its Applications): Selected papers will be
published in international SCI-indexed high quality journals: World Wide
Web Journal; The Computer Journal; or Journal of Systems and Software.


------------------------------
----------------------------------------------------------------------
CGC2012 - 2012 International Conference on Cloud and Green Computing
----------------------------------------------------------------------------------------------------
Topics (not limited to):

?   Fundamentals of cloud computing
?   Architectural cloud models
?   Programming cloud models
?   Provisioning/pricing cloud models
?   Volumn, Velocity and Variety of Big Data on Cloud
?   Resource scheduling and SLA for Big Data on Cloud
?   Storage and computation management of Big Data on Cloud
?   Large-scale scientific workflow in support of Big Data processing on
Cloud
?   Big Data mining and analytics
?   Multiple source data processing and integration on Cloud
?   Visualisation of Big Data on Cloud
?   MapReduce for Big Data processing
?   Distributed file storage of Big Data on Cloud
?   Data storage and computation in cloud computing
?   Resource and large-scale job scheduling in cloud computing
?   Security, privacy, trust, risk in cloud computing
?   Fault tolerance and reliability in cloud computing
?   Access control to cloud computing
?   Resource virtualisation
?   Monitoring and auditing in cloud
?   Scalable and elastic cloud services
?   Social computing and impacts on the cloud
?   Innovative HCI and touch-screen models and technologies to cloud
?   Mobile commerce, handheld commerce and e-markets on cloud
?   Intelligent/agent-based cloud computing
?   Migration of business applications to cloud
?   Cloud use case studies
?   Fundamentals of green computing
?   Energy aware software, hardware and middleware
?   Energy efficient IT architecture
?   Energy efficient resource scheduling and optimisation
?   Energy efficient clustering and computing
?   Large-scale energy aware data storage and computation
?   Energy aware control, monitoring and HCI design
?   Energy efficient networking and operation
?   Energy efficient design of VLSI and micro-architecture
?   Intelligent energy management
?   Green data centers
?   Energy aware resource usage and consumption
?   Smart power grid and virtual power stations
?   Energy policy, social behaviour and government management
?   Teleworking, tele-conferences and virtual meeting
?   Low power electronics and energy recycling
?   Green computing case studies
?   Energy efficient Internet of Things
?   Energy efficient cloud architecture
?   Energy aware data storage and computation in cloud computing
?   Energy aware scheduling, monitoring, auditing in cloud
?   Case studies of green cloud computing.


Submission Guidelines

Submissions must include an abstract, keywords, the e-mail address of the
corresponding author and should not exceed 8 pages for main conference,
including tables and figures in IEEE CS format. The template files for
LATEX or WORD can be downloaded here. All paper submissions must represent
original and unpublished work. Each submission will be peer reviewed by at
least three program committee members. Submission of a paper should be
regarded as an undertaking that, should the paper be accepted, at least one
of the authors will register for the conference and present the work.
Submit your paper(s) in PDF file at the CGC2012 submission site:
https://www.easychair.org/conferences/?conf=cgc2012. Authors of accepted
papers, or at least one of them, are requested to register and present
their work at the conference, otherwise their papers may be removed from
the digital libraries of IEEE CS and EI after the conference.

Honorary Chairs
Ramamohanarao Kotagiri, The University of Melbourne, Australia
Jack Dongarra, University of Tennessee, USA
Deyi Li, Chinese Academy of Engineering, China

General Chairs
Ivan Stojmenovic, University of Ottawa, Canada
Albert Zomaya, The University of Sydney, Australia
Hai Jin, Huazhong University of Science and Technology, China

General Vice-Chairs
Geoffrey Fox, Indiana University, USA
Schahram Dustdar, Vienna University of Technology, Austria
Laurence Yang, St Francis Xavier University, Canada

Program Chairs
Jinjun Chen, University of Technology, Sydney, Australia
Peter Brezany, University of Vienna , Austria
Jianxun Liu, Hunan University of Science and Technology, China

Program Vice-Chairs
Ivona Brandic,Vienna University of Technology, Austria,
Yang Yu, Sun Yat-Sen University, China
Ching-Hsien (Robert) Hsu, Chung Hua University, Taiwan


---------------------------------------------------------------------------------------------------------------
SCA2012 - 2012 International Conference on Social Computing and Its
Applications
---------------------------------------------------------------------------------------------------------------
Topics (not limited to):
 * Fundamentals of social computing
* Modelling of social behaviour
* Social network analysis and mining
* Computational models of social simulation
* Web 2.0 and semantic web
* Innovative HCI and touch-screen models
* Modelling of social conventions and social contexts
* Social cognition and social intelligence
* Social media analytics and intelligence
* Group formation and evolution
* Security, privacy, trust, risk and cryptography in social contexts
* Social system design and architectures
* Information retrieval, data mining, artificial intelligence and
agent-based technology
 * Group interaction, collaboration, representation and profiling
* Handheld/mobile social computing
* Service science and service oriented interaction design
* Cultural patterns and representation
* Emotional intelligence, opinion representation, influence process
* Mobile commerce, handheld commerce and e-markets
* Connected e-health in social networks
* Social policy and government management
* Social blog, micro-blog, public blog, internet forum
* Business social software systems
* Impact on peoples activities in complex and dynamic environments
* Collaborative filtering, mining and prediction
* Social computing applications and case studies

Submission Guidelines
Submissions must include an abstract, keywords, the e-mail address of the
corresponding author and should not exceed 8 pages for main conference,
including tables and figures in IEEE CS format. The template files for
LATEX or WORD can be downloaded here. All paper submissions must represent
original and unpublished work. Each submission will be peer reviewed by at
least three program committee members. Submission of a paper should be
regarded as an undertaking that, should the paper be accepted, at least one
of the authors will register for the conference and present the work.
Submit your paper(s) in PDF file at the CGC2012 submission site:
https://www.easychair.org/conferences/?conf=sca2012. Authors of accepted
papers, or at least one of them, are requested to register and present
their work at the conference, otherwise their papers may be removed from
the digital libraries of IEEE CS and EI after the conference.

Honorary General Chairs
Jiawei Han, University of Illinois at Urbana-Champaign, USA
Kyu-Young Whang, Korea Advanced Institute of Science and Technology, Korea

General Chairs
Irwin King, the Chinese University of HongKong, Hong Kong
Wolfgan Nejdl, L3S, Germany
Feiyue Wang, Chinese Academy of Sciences, China

General Vice Chairs
V.S. Subrahmanian, University of Maryland, USA
Jiming Liu, Hong Kong Baptist University, China
Jinho Kim, Kangwon National University, Korea

Program Chairs
Aoying Zhou, East China Normal University, China
Guandong Xu, Victoria University, Australia
Nitin Agarwal, University of Arkansas at Little Rock, USA

Program Vice-Chairs
Tim Butcher, Royal Melbourne Institute of Technology, Australia
Akiyo Nadamoto, Konan University, Japan
Tiejian Luo, Graduate University of the Chinese Academy of Sciences, China
Xiaoqing (Frank) Liu, Missouri University of Science and Technology, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20120710/2ae180e7/attachment.html

From mbeck at eecs.utk.edu  Mon Jul 16 10:46:16 2012
From: mbeck at eecs.utk.edu (Micah Beck)
Date: Mon, 16 Jul 2012 13:46:16 -0400
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
	this is an e2e topic)
Message-ID: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>

Submitted for your consideration:

Process A sends data over a channel which is prone to bit corruption. Process B receives the data, but there is no possibility for B to communicate with A to request retransmission. B must work with what it has received.

Examples of such one-way communication scenarios include highly asynchronous communication (eg Delay Tolerant Networking), or multicast situations where there are too many receivers for the sender to handle retransmission requests from all of them.

The scenario that I am dealing with has some similarities to these examples: it is Digital Preservation. In this scenario A "sends" data by writing it to a storage archive and B "receives" it long after (100 years is a good target), when no communication with A is possible. The "channel" is the storage archive, which is not a simple disk but in fact a process that involves a number of different storage media deployed over time and mechanisms for "forwarding" (migrating) between them.

One end-to-end approach is to use forward error correction, but that can be inefficient, and in any case it will always have limits to the error rate it can overcome. Let us assume that the receiver will in some cases still have to deal with unrecoverable information loss.

Another solution is to use hop-by-hop error correction in the "channel" (archive), and that is in fact the approach taken by conventional Digital Preservation systems. Constant checksum calculation and application of error correct algorithms are used as "anti-entropy" measures. The issue with this is scalability: we have huge amounts of data to store over long periods of time. Furthermore, the. cost and complexity of the solution is a major issue, since we need to maintain even data whose value we are unsure of through unpredictable periods of austerity or of hostility to the particular content being preserved. Think for example about NASA's need to archive all of the data coming from Earth-observing satellites essentially forever in order to be able to study climate change over time. Now consider how one would fund such preservation at some future time when rapacious oil companies control the government's pursestrings - use your imagination!

One interpretation of end-to-end tells us that in order to improve the scalability of our solution, we should do less in the channel, let corruption go uncorrected, and move the work of overcoming faults closer to the endpoint. In the case of video streaming without retransmission, this means using the structure of the video stream: detect corrupt frames and apply interpolation to repair (but not fully correct) the damage. The adequacy of this approach is application-dependent, and it definitely has its limits, but it may be necessary in order to achieve scale.

Applying this last approach to Digital Preservation, this tells us that if we need to preserve data at scale we should let the bits in the archive rot, perhaps focusing on media and mechanisms with "good" failure modes rather than applying complex mechanisms to overcoming "bad" failure modes. Then we would need to focus on end-to-end protocols (between the writer A and the ultimate reader B) that would operate at a higher layer and be resilient to such bit rot by using the structure of the application and the data.

Not to put too fine a point on it, this analysis has so far been vigorously rejected by the academic Digital Preservation community. The idea of allowing bit rot and then working to overcome it (an approach I call Loss Tolerant Digital Preservation) is anathema to them.

I post the idea on the e2e list to find out if it seems to what's left of the end-to-end community like a valid application of end-to-end analysis to the problem of communication over time. If my argument is flawed, I thought perhaps someone who understands end-to-end could explain why. The feedback I have so far received from the Digital Preservation community has not been very useful.

Micah Beck
University of Tennessee EECS


From touch at isi.edu  Mon Jul 16 15:23:15 2012
From: touch at isi.edu (Joe Touch)
Date: Mon, 16 Jul 2012 15:23:15 -0700
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
 this is an e2e topic)
In-Reply-To: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
Message-ID: <50049453.1020200@isi.edu>

Hi Micah,

Long time no chat ;-)

On 7/16/2012 10:46 AM, Micah Beck wrote:
> Submitted for your consideration:

Picturing Rod Serling now ...

> Process A sends data over a channel which is prone to bit
> corruption. Process B receives the data, but there is no possibility
> for B to communicate with A to request retransmission. B must work
> with what it has received.

FEC is the only solution.

> Examples of such one-way communication scenarios include highly
> asynchronous communication (eg Delay Tolerant Networking), or
> multicast situations where there are too many receivers for the
> sender to handle retransmission requests from all of them.
>
> The scenario that I am dealing with has some similarities to these
> examples: it is Digital Preservation. In this scenario A "sends" data
> by writing it to a storage archive and B "receives" it long after
> (100 years is a good target), when no communication with A is
> possible. The "channel" is the storage archive, which is not a simple
> disk but in fact a process that involves a number of different
> storage media deployed over time and mechanisms for "forwarding"
> (migrating) between them.
>
> One end-to-end approach is to use forward error correction, but that
> can be inefficient, and in any case it will always have limits to
> the error rate it can overcome. Let us assume that the receiver will
> in some cases still have to deal with unrecoverable information
> loss.

FEC does have limits. FWIW, FEC includes the use of a digital media 
itself, since you can do "level restoration" repeatedly (if the 'errors' 
are less than half a bit of analog value).

That's how all archives that have survived over the millennia persist - 
there is redundancy, either in the "digitization" (encoded as letters, 
where loss of a small fraction of a letter can be recovered) or in 
multi-level syntax and semantics (e.g., fixing missing letters of a word 
or missing words of a sentence).

> Another solution is to use hop-by-hop error correction in the
> "channel" (archive), and that is in fact the approach taken by
> conventional Digital Preservation systems.

That corrects errors within the hops, but not across multiple hops. 
E.g., you can fix the errors within a copy, but when you copy from one 
media to another you could easily do an erroneous encoding that destroys 
the info.

Some people who do backups see this - the disk reports errors, 
intra-disk encoding fixes errors, but a failure of the overall RAID 
system can render the entire copy invalid.

 > Constant checksum
> calculation and application of error correct algorithms are used as
> "anti-entropy" measures. The issue with this is scalability: we have
> huge amounts of data to store over long periods of time. Furthermore,
> the. cost and complexity of the solution is a major issue, since we
> need to maintain even data whose value we are unsure of through
> unpredictable periods of austerity or of hostility to the particular
> content being preserved. Think for example about NASA's need to
> archive all of the data coming from Earth-observing satellites
> essentially forever in order to be able to study climate change over
> time. Now consider how one would fund such preservation at some
> future time when rapacious oil companies control the government's
> pursestrings - use your imagination!
>
> One interpretation of end-to-end tells us that in order to improve
> the scalability of our solution, we should do less in the channel,
> let corruption go uncorrected, and move the work of overcoming faults
> closer to the endpoint.

Scalability depends on your metric - are you concerned with archive 
size, ongoing restoration maintenance (repeated checking and correcting 
detected errors), or something else?

 > In the case of video streaming without
> retransmission, this means using the structure of the video stream:
> detect corrupt frames and apply interpolation to repair (but not
> fully correct) the damage. The adequacy of this approach is
> application-dependent, and it definitely has its limits, but it may
> be necessary in order to achieve scale.

That works only because of the nature of video - that there's very 
little information in a single bit of a single frame if there are 
adjacent frames that are valid. It notably does NOT work for digital 
books - you can't figure out page 83 of Paradise Lost by looking at 
pages 82 and 84.

> Applying this last approach to Digital Preservation, this tells us
> that if we need to preserve data at scale we should let the bits in
> the archive rot,

The only reason to let things rot is to reduce the overhead of repeated 
restoration. However, if you let disorder creep in and don't correct it 
as it accumulates, you can easily end up with an irrecoverable error.

> perhaps focusing on media and mechanisms with
> "good" failure modes rather than applying complex mechanisms to
> overcoming "bad" failure modes. Then we would need to focus on
> end-to-end protocols (between the writer A and the ultimate reader B)
> that would operate at a higher layer and be resilient to such bit rot
> by using the structure of the application and the data.

That's just multilevel encoding, and still amounts to FEC. How much 
structure you look at, at what timescale, using what semantics, 
determines the higher level structure and how efficient your FEC is. 
However, once compressed, such structure is irrelevant anyway since 
you'll lose it at the bit level on the recorded media.

> Not to put too fine a point on it, this analysis has so far been
> vigorously rejected by the academic Digital Preservation community.
> The idea of allowing bit rot and then working to overcome it (an
> approach I call Loss Tolerant Digital Preservation) is anathema to
> them.

Then why do they do it all the time? They don't replace books every day; 
they let them decay *until* they're near the time when they are 
irrecoverable, then they transfer them to new media. This is no 
different - it's just a trade-off between maintenance and FEC overhead.

> I post the idea on the e2e list to find out if it seems to what's
> left of the end-to-end community like a valid application of
> end-to-end analysis to the problem of communication over time. If my
> argument is flawed, I thought perhaps someone who understands
> end-to-end could explain why. The feedback I have so far received
> from the Digital Preservation community has not been very useful.

I think this is a good example of E2E (for error recovery) - and why HBH 
is still useful, but cannot replace E2E, as noted above...

Joe

>
> Micah Beck
> University of Tennessee EECS
>


From l.wood at surrey.ac.uk  Tue Jul 17 01:21:20 2012
From: l.wood at surrey.ac.uk (l.wood@surrey.ac.uk)
Date: Tue, 17 Jul 2012 09:21:20 +0100
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
 this is an e2e topic)
In-Reply-To: <50049453.1020200@isi.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>,
	<50049453.1020200@isi.edu>
Message-ID: <FD7B10366AE3794AB1EC5DE97A93A37341C5B16BD3@EXMB01CMS.surrey.ac.uk>

> One interpretation of end-to-end tells us that in order to improve
> the scalability of our solution, we should do less in the channel,
> let corruption go uncorrected, and move the work of overcoming faults
> closer to the endpoint.

No, the end-to-end argument says that  work of *rejecting* faults must take place at the endpoint. (It may not be the only place to detect errors and reject them for performance reasons, but it is the place of last resort to catch errors that intermediate checks that boost performance can't catch.)

The end-to-end argument is basically an argument of where and how best to implement ARQ. In a tight control loop, intermediate checks along the path do not increase performance, cannot guarantee correctness - the check at end is always needed - and are redundant. In a longer control loop, intermediate checks can boost performance by decreasing local resend times, reducing overall delay.

But, in open-loop digital preservation using FEC you can't use ARQ to communicate back centuries. If you reject your data as corrupted, what then? You have no control loop, you have no recourse.

 You need a control loop, but can't introduce it.  The end-to-end argument isn't applicable to your scenario.

Lloyd Wood
http://sat-net.com/L.Wood/dtn/


From mbeck at eecs.utk.edu  Tue Jul 17 08:50:14 2012
From: mbeck at eecs.utk.edu (Micah Beck)
Date: Tue, 17 Jul 2012 11:50:14 -0400
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
	this is an e2e topic)
In-Reply-To: <50049453.1020200@isi.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
	<50049453.1020200@isi.edu>
Message-ID: <5AA5962A-27C5-4A7B-9C20-4708DE5E346A@eecs.utk.edu>


On Jul 16, 2012, at 6:23 PM, Joe Touch wrote:

>> One interpretation of end-to-end tells us that in order to improve
>> the scalability of our solution, we should do less in the channel,
>> let corruption go uncorrected, and move the work of overcoming faults
>> closer to the endpoint.
> 
> Scalability depends on your metric - are you concerned with archive size, ongoing restoration maintenance (repeated checking and correcting detected errors), or something else?

I always get tripped up by this point. Perhaps I shouldn't use the term "scalability" which has so many different connotations. 

What *I* mean by scalability is a solution that can be widely deployed and widely adopted without undue non-linearity in cost and difficulty. One which meet the needs of varied communities and can be implemented on varied technologies and will therefore attract investment and instill confidence in its longterm stability, neutrality and correctness. Call me nostalgic, but the best examples I have to point to are the Unix kernel interface and IP. I realize that such a general answer leaves me open to charges that I can't state my goal clearly so any attempt to engage in reasoned discourse with me is futile. But I have decided to try anyway, and to ask the E2E community for help.

In this context, the obvious aspects of "scalability" that I am attempting to address are scale of data to be preserved (as measured in Zettabytes) and length of time to preserve it (measured in centuries). Also important are the varied environments, both technological and societal, through which preservation must continue. Natural disasters and war are obvious cases, but lack of funding and loss of political support for the preservation effort are others. Correlated failures due to use of the same software or closely related hardware throughout highly distributed systems must also be anticipated. All kinds of low-probability or easily-avoided failures will eventually occur if you wait long enough and don't pay close enough attention to the archive. Eventually the power will go out in a data center containing the only copy of the data you later decide you absolutely need, and no one will have brought it back online for a year, or two years, or ten years.

Today, we can deploy IP on a cell phone in the middle of the Sahara dessert and interoperate with servers attached to the North American backbone. Today my telephone (Android) and my laptop (OS X) run operating systems whose kernel interfaces are descended from the one that Ken Thompson designed, and which still have a certain interoperable core. Those are designs that *have* scaled. Call it what you will, that kind of design success is my goal when designing hardware or software infrastructure.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20120717/a3342d22/attachment.html

From mbeck at eecs.utk.edu  Tue Jul 17 10:53:09 2012
From: mbeck at eecs.utk.edu (Micah Beck)
Date: Tue, 17 Jul 2012 13:53:09 -0400
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
	this is an e2e topic)
In-Reply-To: <50049453.1020200@isi.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
	<50049453.1020200@isi.edu>
Message-ID: <98CDA32D-8FC3-4883-8C7A-A450F00D7B3C@eecs.utk.edu>


On Jul 16, 2012, at 6:23 PM, Joe Touch wrote:

> > In the case of video streaming without
>> retransmission, this means using the structure of the video stream:
>> detect corrupt frames and apply interpolation to repair (but not
>> fully correct) the damage. The adequacy of this approach is
>> application-dependent, and it definitely has its limits, but it may
>> be necessary in order to achieve scale.
> 
> That works only because of the nature of video - that there's very little information in a single bit of a single frame if there are adjacent frames that are valid. It notably does NOT work for digital books - you can't figure out page 83 of Paradise Lost by looking at pages 82 and 84.

You are correct that nearly undetectable recovery in the face of reasonable levels of actual information loss is fairly rare in application protocols and file formats. However, that may partly be due to the fact that it is not commonly required. Video is a good fit to this idea because 1) there is a high degree of redundancy between adjacent frames, even after compression is applied, and 2) some important streaming applications do not allow for retransmission. That doesn't mean that the concept is not applicable elsewhere.

One can see the importance of recovery at application-level when considering the converse: applications that fail when confronted by even trivial levels of information loss. Take for example large data file formats that are extremely vulnerable to any corruption in headers or other embedded metadata. Are there cases in which a PDF interpreter will fail due to the corruption of a small number of bits? If I had a  PDF of Paradise Lost with bit corruption on page 83, would I prefer that someone had built a PDF reader that was highly robust, and could at least properly display the remaining pages in the face of such corruption? Or had designed the file so that complete loss of whole pages was less unlikely, spreading likely patterns of information loss out over multiple pages? Could a variety of applications be made much more resilient to bit errors if attention was paid to these issues? 

>> Applying this last approach to Digital Preservation, this tells us
>> that if we need to preserve data at scale we should let the bits in
>> the archive rot,
> 
> The only reason to let things rot is to reduce the overhead of repeated restoration. However, if you let disorder creep in and don't correct it as it accumulates, you can easily end up with an irrecoverable error.

Overhead is not the only reason to avoid repeated restoration. The process of "scrubbing" a disk, reading each file and checking for checksum errors is itself a source of danger to the data. With medium capacity increasing faster than access bandwidth, scrubbing may become a constant activity. Moving a read/write head over the surface of a disk increase wear on the driver motor and arm actuators. Then of course there is the problem that "correcting" errors to the disk runs the risk of introducing new errors due to the processing required, just as routers can introduce errors while forwarding (I seem to remember that this was a fundamental argument for E2E in IP networking). If information loss is to be avoided, it is probably better to increase redundancy by writing a new block somewhere else rather than trying to "correct" an error by updating a block. Any write to a sector creates a danger to neighboring sectors due to possible head alignment errors.

Sometimes doing less accomplishes more.

>> perhaps focusing on media and mechanisms with
>> "good" failure modes rather than applying complex mechanisms to
>> overcoming "bad" failure modes. Then we would need to focus on
>> end-to-end protocols (between the writer A and the ultimate reader B)
>> that would operate at a higher layer and be resilient to such bit rot
>> by using the structure of the application and the data.
> 
> That's just multilevel encoding, and still amounts to FEC. How much structure you look at, at what timescale, using what semantics, determines the higher level structure and how efficient your FEC is. However, once compressed, such structure is irrelevant anyway since you'll lose it at the bit level on the recorded media.

OK, I agree that conceptually what I'm talking about is FEC at the higher layer. However, it's not generally seen this way in the Digital Preservation community. If I store a Rembrandt poorly and then reinterpret the damaged parts by drawing on the canvas with a crayon would be considered a cultural crime, not an application of FEC at a high semantic level.

Some kinds of high level redundancy may not be not lost due to bit-level compression. And there's the question of whether compression is a good idea at all in the context of DIgital Preservation.

>> Not to put too fine a point on it, this analysis has so far been
>> vigorously rejected by the academic Digital Preservation community.
>> The idea of allowing bit rot and then working to overcome it (an
>> approach I call Loss Tolerant Digital Preservation) is anathema to
>> them.
> 
> Then why do they do it all the time? They don't replace books every day; they let them decay *until* they're near the time when they are irrecoverable, then they transfer them to new media. This is no different - it's just a trade-off between maintenance and FEC overhead.

The question is not why they do it, but why they object to it in the digital case. I am probably the wrong person to ask since I am a newcomer to that community, but I can tell you a bit of what I've heard, been told and surmised.

An important difference between digital and non-digital objects in the context of preservation is that digital objects are interpreted by a program, and that program has invariants (or assumption about its inputs). The interpreters used for day-to-day work with digital objects generally make strong assumptions regarding non-corruption of their inputs. Modern networks and file systems are considered reliable enough to make this acceptable. Sure, an ebook reader might be considered higher quality if it can tolerate a bit of corruption in its input file, but hardly anyone eschews the use of PDF because readers are so brittle. Admitting the possibility of corruption means weakening the assumptions made in that software, which makes the software harder to write and means that commercial off-the-shelf readers may not suffice. That's difficult enough and expensive enough to possibly put it out of reach of the Digital Preservation community.

Many in the DIgital Preservation community come from a non-technical background (libraries with books in them) and there is in fact a great lack of understanding of what's actually going on. There is, for example, a common confusion between the fact that one cannot necessarily control the lifetime of data on the Internet at the application layer and the idea that it therefore "lives forever." Bits are thought to be much more resilient than non-digital information when their implementation is in fact typically quite ephemeral, land only an active process of continual checking, correcting and copy results in resilience. I overheard someone at a Library of Congress meeting referring to maps that had been "digitized and therefore preserved."

Making reference to Dr. Reed's point about the ultimate goal of preservation: I believe is to preserve knowledge. Libraries have "reduced" (in the formal sense) the problem to that of preserving objects which bear knowledge. However, it's not clear that the objects which society has chosen in which to encode its knowledge (eg digital objects) are a good choice for purposes of long-term preservation. This has been a problem for libraries in the past: paper back books fall apart; acetate degrades. Some important knowledge has been lost to the choice of technology in the past (many films in archives will decompose before they can be copied). Facing this fact may result in some painful conclusions, such as that more work has to be done to make digital objects resilient to bit rot, and that we may have to choose between preserving at scale (in terms of byte count) or preserving with minimal bit errors.


From touch at isi.edu  Wed Jul 18 08:20:12 2012
From: touch at isi.edu (Joe Touch)
Date: Wed, 18 Jul 2012 08:20:12 -0700
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
	this is an e2e topic)
In-Reply-To: <5AA5962A-27C5-4A7B-9C20-4708DE5E346A@eecs.utk.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
	<50049453.1020200@isi.edu>
	<5AA5962A-27C5-4A7B-9C20-4708DE5E346A@eecs.utk.edu>
Message-ID: <82F1D032-3E88-4C72-B297-8A32DB5D7593@isi.edu>


On Jul 17, 2012, at 8:50 AM, Micah Beck wrote:

> 
> On Jul 16, 2012, at 6:23 PM, Joe Touch wrote:
> 
>>> One interpretation of end-to-end tells us that in order to improve
>>> the scalability of our solution, we should do less in the channel,
>>> let corruption go uncorrected, and move the work of overcoming faults
>>> closer to the endpoint.
>> 
>> Scalability depends on your metric - are you concerned with archive size, ongoing restoration maintenance (repeated checking and correcting detected errors), or something else?
> 
> I always get tripped up by this point. Perhaps I shouldn't use the term "scalability" which has so many different connotations. 
> 
> What *I* mean by scalability is a solution that can be widely deployed and widely adopted without undue non-linearity in cost and difficulty....

Archive solutions don't necessarily benefits from widescale adoption. If my archive uses one solution for data maintenance, and yours uses another, there's no inherent benefit besides reuse of the solution, unless merging archives is an issue.

For many of your other concerns, this many not be the best list, e.g., archive duration, durability, etc. For some of your concerns, even archivists aren't always thinking in those terms - some approach that of Danny Hillis (see http://longnow.org/).

> Today, we can deploy IP on a cell phone in the middle of the Sahara dessert and interoperate with servers attached to the North American backbone. Today my telephone (Android) and my laptop (OS X) run operating systems whose kernel interfaces are descended from the one that Ken Thompson designed, and which still have a certain interoperable core. Those are designs that *have* scaled. Call it what you will, that kind of design success is my goal when designing hardware or software infrastructure.

IP is designed as a common interoperation layer. OS interfaces similarly define interoperation. What is the interoperation goal here?

Joe

From mbeck at eecs.utk.edu  Wed Jul 18 12:15:23 2012
From: mbeck at eecs.utk.edu (Micah Beck)
Date: Wed, 18 Jul 2012 15:15:23 -0400
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
	this is an e2e topic)
In-Reply-To: <82F1D032-3E88-4C72-B297-8A32DB5D7593@isi.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
	<50049453.1020200@isi.edu>
	<5AA5962A-27C5-4A7B-9C20-4708DE5E346A@eecs.utk.edu>
	<82F1D032-3E88-4C72-B297-8A32DB5D7593@isi.edu>
Message-ID: <D142CA77-C330-4004-9311-16B94CE87AF5@eecs.utk.edu>

Hey Joe,

On Jul 18, 2012, at 11:20 AM, Joe Touch wrote:

> Archive solutions don't necessarily benefits from widescale adoption. If my archive uses one solution for data maintenance, and yours uses another, there's no inherent benefit besides reuse of the solution, unless merging archives is an issue.

The first thing to see is that Digital Preservation requires interoperability between the present and future. The data that is stored today must be readable and interpretable a long time from now.

Readability is more than a matter of survival of the storage medium. As those who work in forensics point out, the mechanical and electronic mechanisms that connect the storage medium to an operational computer system must be maintained. If you have a file system stored in some old tape format, can you find an operational drive, and does it connect to an I/O bus on any system that can still execute instructions, and do you have a version of the OS with compatible drivers that understands the format with which the files were written out? Is there an application that can interpret the files? if one gets around these issues by assuming that there will always be timely copying of data to new media and formats, then it is necessary to accept a higher risk of losing the data altogether due to violation of these assumptions.

Since the writer does not know the identity of the reader, the set of potential readers must be taken into account. These are spread throughout time, and could amount to a large community.

It is also important to take into account that data stored in an archive does not necessarily stay in the same archive over its entire lifespan. One important aspect of Digital Preservation is what is called "federation" of archives, meaning that the data in one archive may be accessed through another archive, or that data from one archive may be moved to another archive. These scenarios are particularly important when the funding or institutional will of one institution to maintain the data is diminished. In these cases, interoperability between archives can be important.

There's also the issue  of vendor lock-in, which is again a matter of interoperability over time. If an archive uses interfaces that are non-standard, then when it comes time to replace its systems it may be locked in to one vendor. Unless the vendor stops supporting the products used by that archive, in which case it may be necessary to migrate its entire collection to some other vendor's interfaces and representations. That can impose a high cost, or even spell the end of particular archive's operations.

All of these issues become more important as the scale of data being preserved increases, and as the duration increases. The costs and limitations of non-interoperability increase with the scale of data. And the probability of circumstances placing especially stringent limitations or obliterating the resources available at any one archive increases with the duration of preservation.

> For many of your other concerns, this many not be the best list, e.g., archive duration, durability, etc. For some of your concerns, even archivists aren't always thinking in those terms - some approach that of Danny Hillis (see http://longnow.org/).
> 
>> Today, we can deploy IP on a cell phone in the middle of the Sahara dessert and interoperate with servers attached to the North American backbone. Today my telephone (Android) and my laptop (OS X) run operating systems whose kernel interfaces are descended from the one that Ken Thompson designed, and which still have a certain interoperable core. Those are designs that *have* scaled. Call it what you will, that kind of design success is my goal when designing hardware or software infrastructure.
> 
> IP is designed as a common interoperation layer. OS interfaces similarly define interoperation. What is the interoperation goal here?

Communities of interoperability include

1) storers of data, since they need to have a choice of where they put things;
2) maintainers of data, since they need to have a choice of tools and resources to use over time
3) readers of data, since they need to be able to easily extract and use data from a number of different archives

These parallel  the communities of interoperability that are important for wide area networking. To quote Dan Hillis (albeit selectively) from his 1982 paper Why Computer Science Is No Good, "? memory locations ? are just wires turned sidewarys in time."

/micah


From touch at isi.edu  Fri Jul 20 13:31:52 2012
From: touch at isi.edu (Joe Touch)
Date: Fri, 20 Jul 2012 13:31:52 -0700
Subject: [e2e] Achieving Scalability in Digital Preservation (yes,
 this is an e2e topic)
In-Reply-To: <D142CA77-C330-4004-9311-16B94CE87AF5@eecs.utk.edu>
References: <2283B746-AFA2-4A3C-821D-31B5E5C5782D@eecs.utk.edu>
	<50049453.1020200@isi.edu>
	<5AA5962A-27C5-4A7B-9C20-4708DE5E346A@eecs.utk.edu>
	<82F1D032-3E88-4C72-B297-8A32DB5D7593@isi.edu>
	<D142CA77-C330-4004-9311-16B94CE87AF5@eecs.utk.edu>
Message-ID: <5009C038.1050304@isi.edu>

Hi, Micah,

On 7/18/2012 12:15 PM, Micah Beck wrote:
> Hey Joe,
>
> On Jul 18, 2012, at 11:20 AM, Joe Touch wrote:
>
>> Archive solutions don't necessarily benefits from widescale
>> adoption.If my archive uses one solution for data maintenance, and
>> yours uses another, there's no inherent benefit besides reuse of
>> the solution, unless merging archives is an issue.
>
> The first thing to see is that Digital Preservation requires
> interoperability between the present and future. The data that is
> stored today must be readable and interpretable a long time from
> now.
>
> Readability is more than a matter of survival of the storage medium.
> As those who work in forensics point out, the mechanical and
> electronic mechanisms that connect the storage medium to an
> operational computer system must be maintained.
...

Sure - but in E2E-speak, that's the protocol. We don't often talk about 
making sure both ends run correct copies of the protocol; that's 
assumed. If you introduce that into the equation, you've got several 
levels of persistent information:

A- the data

B- the protocol to access/retrieve it

C- the semantic mapping of data symbols to meaning to interpret what you 
retrieve

However, we assume everything except the data is persistent, or just 
treat other levels of this issue as data to another layer. Ultimately, 
nothing in a comm system can exchange data unless the protocol (B) and 
the semantics of the data (C) are known and agreed in advance. In a 
sense, {B,C} is the prerequisite information that allows the exchange of 
{A} either between endpoints in space or over time.

> It is also important to take into account that data stored in an
> archive does not necessarily stay in the same archive over its
> entire lifespan. One important aspect of Digital Preservation is what
> is called "federation" of archives, meaning that the data in one
> archive may be accessed through another archive, or that data from
> one archive may be moved to another archive. These scenarios are
> particularly important when the funding or institutional will of one
> institution to maintain the data is diminished. In these cases,
> interoperability between archives can be important.

That can be achieved two ways:

1) the adoption of a uniform standard
	which we did in the Internet, but might not be feasible
	or even desirable for archives. And note that the Internet
	isn't the sole means of communication either.

2) proper documentation of a variety of standards

Neither of these are E2E issues; they are at the core of the Internet 
argument, but not the E2E argument.

> There's also the issue of vendor lock-in, which is again a matter of
> interoperability over time.
...

Yes, that's a temporal version of the Internet issue, but not the E2E 
one, AFAICT.

> All of these issues become more important as the scale of data being
> preserved increases, and as the duration increases. The costs and
> limitations of non-interoperability increase with the scale of data.
> And the probability of circumstances placing especially stringent
> limitations or obliterating the resources available at any one
> archive increases with the duration of preservation.
>
>> For many of your other concerns, this many not be the best list,
>> e.g., archive duration, durability, etc. For some of your concerns, even
>> archivists aren't always thinking in those terms - some approach that of
>> Danny Hillis (see http://longnow.org/).
>>
>>> Today, we can deploy IP on a cell phone in the middle of the
>>> Sahara dessert and interoperate with servers attached to the
>>> North American backbone. Today my telephone (Android) and my
>>> laptop (OS X) run operating systems whose kernel interfaces are
>>> descended from the one that Ken Thompson designed, and which
>>> still have a certain interoperable core. Those are designs that
>>> *have* scaled. Call it what you will, that kind of design success
>>> is my goal when designing hardware or software infrastructure.
>>
>> IP is designed as a common interoperation layer. OS interfaces
>> similarly define interoperation. What is the interoperation goal here?
>
> Communities of interoperability include
>
> 1) storers of data, since they need to have a choice of where they
> put things;
>
> 2) maintainers of data, since they need to have a choice of tools and
> resources to use over time
>
> 3) readers of data, since they need to be able to easily extract and
> use data from a number of different archives

AOK - then you have an interoperability issue, but not an E2E one.

Joe

From artur.lugmayr at tut.fi  Mon Jul 23 17:02:12 2012
From: artur.lugmayr at tut.fi (artur.lugmayr@tut.fi)
Date: Tue, 24 Jul 2012 03:02:12 +0300 (EEST)
Subject: [e2e] 2 WEEKS LEFT - 5th Nokia Ubimedia MindTrek Award Competition
 - 6.000 Euros Award Sum
Message-ID: <3669284.225.1343088132318.JavaMail.lugmayr@HLO45-TC>

---------------------------------------------------------------------------------

            The 6th Nokia Ubimedia MindTrek Awards Competition - NUMA2012

                 EXTENDED DEADLINE 6th AUGUST on: http://www.numa.fi 

                 Call for the best UBIMEDIA, PERVASIVE, AMBIENT MEDIA
                  ... products, services, applications, concepts...


      Check out previous years' entries on: http://www.ambientmediaassociation.org

                         The total AWARD SUM: 6.000? 

---------------------------------------------------------------------------------

NUMA2012 seeks for novel ways to combine ubiquitous computing to media. We are 
looking for disruptive artistic visions as well as clever near-to-market 
solutions off the beaten tracks! This includes any range of innovative ubimedia, 
pervasive, or ambient products and services. The Nokia Ubimedia MindTrek Awards 
is a highly interdisciplinary competition and we invite Designers, Computer 
Scientists, Artists, Economists and Engineers to take a stand on one of the 
following questions with their entries:

// What is the aesthetic experience opened up by the rise of ubiquitous and ambient media?
// What constitutes the specific intelligence that drives future media environments?
// How will location- and context-aware media services change our social life?
// How will our future lives look like in the era of ubiquitous computation?
// How can society as a whole benefit from these advanced technologies?

Valid competition entries include:
// Pervasive and ubiquitous games
// Ambient installations
// Artistic works related to ubiquitous media and computation
// Business models and management strategies
// Ambient and ubiquitous media technology
// Ubiquitous and ambient media services, devices, and environments
// Context aware, sensing, and interfaces for ubiquitous computation
// Ergonomics, human-computer interaction designs, and product prototypes
// Software, hardware and middleware framework demonstrations
// Ambient television
// Any other inspiring work in the broad context of ubiquitous media

All ubiquitous, pervasive, or ambient media products, services or prototypes which have 
been finalized during the previous year after 1st January 2011 are eligible to take part.
More information on: www.numa.fi

Competition Chairs
 - Artur LUGMAYR, EMMi Lab, Tampere Univ. of Technology, FINLAND
 - Cai MELAKOSKI, Degree Programme in Media, Tampere Univ. of Applied Sciences, FINLAND
 - Ville LUOTONEN, Ubiquitous Computing Tampere Center of Expertise, Hermia Ltd., FINLAND 
Honorary Chair
 - Timothy MERRITT, Aarhus School of Architecture, DENMARK
Head of Jury
 - Bj?rn STOCKLEBEN, Project "Cross Media", University of Applied Sciences Magdeburg-Stendal, GERMANY

---

The Nokia Ubimedia MindTrek Awards 2012 (NUMA2012) is a competition category of the MindTrek 
Conference 2012. The category is organized collaboratively by MindTrek, Tampere Region Centre 
of Expertise in Ubiquitous Computing, Entertainment & Media Management Lab. (EMMi Lab.)/Tampere 
University of Technology, the Tampere University of Applied Sciences, Nokia Oyj and the Ambient 
Media Association (AMEA). NUMA2012 is funded by Nokia Oyj and the Tampere Region Centre of 
Expertise in Ubiquitous Computing.