[e2e] network coding and spam and anonymous email...
saikat at cs.cornell.edu
Thu Jul 6 07:30:07 PDT 2006
On Thu, 2006-07-06 at 10:54 +0100, Jon Crowcroft wrote:
> one could _code_ legitiamte messages simply as a set of references
> to spam - the nice thing about this is that there is so much spam that
> it acts as fairly uniform random cover traffic
Awsome idea -- change the role of spam from background noise to a
A first-stab feasibility analysis is promising. Based on my corpus of
spam and e2e mails, spam has feature-set (think unique words) of roughly
300K while e2e has only 30K. Unfortunately, they have only 7.5K words in
common, so a simple mapping may not be sufficient, but one can easily
construct a dictionary that maps the basis-vector for spam onto the
basis-vector for e2e. With that mapping, a legitimate email can become a
linear-combination of the spam messages.
If the mapping is based on the most frequent words, here is what it
might look like.
For e2e (words in decreasing frequency of use)
1. tcp (makes sense)
2. but (apparently we disagree a lot)
5. e2e (duh)
For spam [*]
1. software (makes sense; targeted spam)
2. please (apparently they are more persuasive)
5. viagra (no comment)
* it took me a while to cherry-pick my spam corpus for the desired effect
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 191 bytes
Desc: This is a digitally signed message part
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20060706/26a96d82/attachment.bin
More information about the end2end-interest