[e2e] network coding and spam and anonymous email...

Thu Jul 6 07:30:07 PDT 2006

On Thu, 2006-07-06 at 10:54 +0100, Jon Crowcroft wrote:
> one could _code_ legitiamte messages simply as a set of references
> to spam - the nice thing about this is that there is so much spam that
> it acts as fairly uniform random cover traffic

Awsome idea -- change the role of spam from background noise to a
carrier signal!

A first-stab feasibility analysis is promising. Based on my corpus of
spam and e2e mails, spam has feature-set (think unique words) of roughly
300K while e2e has only 30K. Unfortunately, they have only 7.5K words in
common, so a simple mapping may not be sufficient, but one can easily
construct a dictionary that maps the basis-vector for spam onto the
basis-vector for e2e. With that mapping, a legitimate email can become a
linear-combination of the spam messages.

If the mapping is based on the most frequent words, here is what it
might look like.

For e2e (words in decreasing frequency of use)
-----------
1. tcp    (makes sense)
2. but    (apparently we disagree a lot)
3. internet
4. there
5. e2e    (duh)

For spam [*]
---------
1. software  (makes sense; targeted spam)
2. please    (apparently they are more persuasive)
3. $69.95
4. free
5. viagra    (no comment)

-- 
Saikat

* it took me a while to cherry-pick my spam corpus for the desired effect
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20060706/26a96d82/attachment.bin