[e2e] Achieving Scalability in Digital Preservation (yes, this is an e2e topic)

Joe Touch touch at isi.edu
Fri Jul 20 13:31:52 PDT 2012

Hi, Micah,

On 7/18/2012 12:15 PM, Micah Beck wrote:
> Hey Joe,
> On Jul 18, 2012, at 11:20 AM, Joe Touch wrote:
>> Archive solutions don't necessarily benefits from widescale
>> adoption.If my archive uses one solution for data maintenance, and
>> yours uses another, there's no inherent benefit besides reuse of
>> the solution, unless merging archives is an issue.
> The first thing to see is that Digital Preservation requires
> interoperability between the present and future. The data that is
> stored today must be readable and interpretable a long time from
> now.
> Readability is more than a matter of survival of the storage medium.
> As those who work in forensics point out, the mechanical and
> electronic mechanisms that connect the storage medium to an
> operational computer system must be maintained.

Sure - but in E2E-speak, that's the protocol. We don't often talk about 
making sure both ends run correct copies of the protocol; that's 
assumed. If you introduce that into the equation, you've got several 
levels of persistent information:

A- the data

B- the protocol to access/retrieve it

C- the semantic mapping of data symbols to meaning to interpret what you 

However, we assume everything except the data is persistent, or just 
treat other levels of this issue as data to another layer. Ultimately, 
nothing in a comm system can exchange data unless the protocol (B) and 
the semantics of the data (C) are known and agreed in advance. In a 
sense, {B,C} is the prerequisite information that allows the exchange of 
{A} either between endpoints in space or over time.

> It is also important to take into account that data stored in an
> archive does not necessarily stay in the same archive over its
> entire lifespan. One important aspect of Digital Preservation is what
> is called "federation" of archives, meaning that the data in one
> archive may be accessed through another archive, or that data from
> one archive may be moved to another archive. These scenarios are
> particularly important when the funding or institutional will of one
> institution to maintain the data is diminished. In these cases,
> interoperability between archives can be important.

That can be achieved two ways:

1) the adoption of a uniform standard
	which we did in the Internet, but might not be feasible
	or even desirable for archives. And note that the Internet
	isn't the sole means of communication either.

2) proper documentation of a variety of standards

Neither of these are E2E issues; they are at the core of the Internet 
argument, but not the E2E argument.

> There's also the issue of vendor lock-in, which is again a matter of
> interoperability over time.

Yes, that's a temporal version of the Internet issue, but not the E2E 
one, AFAICT.

> All of these issues become more important as the scale of data being
> preserved increases, and as the duration increases. The costs and
> limitations of non-interoperability increase with the scale of data.
> And the probability of circumstances placing especially stringent
> limitations or obliterating the resources available at any one
> archive increases with the duration of preservation.
>> For many of your other concerns, this many not be the best list,
>> e.g., archive duration, durability, etc. For some of your concerns, even
>> archivists aren't always thinking in those terms - some approach that of
>> Danny Hillis (see http://longnow.org/).
>>> Today, we can deploy IP on a cell phone in the middle of the
>>> Sahara dessert and interoperate with servers attached to the
>>> North American backbone. Today my telephone (Android) and my
>>> laptop (OS X) run operating systems whose kernel interfaces are
>>> descended from the one that Ken Thompson designed, and which
>>> still have a certain interoperable core. Those are designs that
>>> *have* scaled. Call it what you will, that kind of design success
>>> is my goal when designing hardware or software infrastructure.
>> IP is designed as a common interoperation layer. OS interfaces
>> similarly define interoperation. What is the interoperation goal here?
> Communities of interoperability include
> 1) storers of data, since they need to have a choice of where they
> put things;
> 2) maintainers of data, since they need to have a choice of tools and
> resources to use over time
> 3) readers of data, since they need to be able to easily extract and
> use data from a number of different archives

AOK - then you have an interoperability issue, but not an E2E one.


More information about the end2end-interest mailing list