[e2e] latest spate of cruft postings to e2e

Vernon Schryver vjs at calcite.rhyolite.com
Thu Nov 6 21:09:30 PST 2003


> To: Vernon Schryver <vjs at calcite.rhyolite.com>, end2end-interest at postel.org
> From: "David P. Reed" <dpreed at reed.com>

> <html>
> <body>
> At 04:40 PM 11/6/2003, Vernon Schryver wrote:<br>
> <blockquote type=cite class=cite cite>&nbsp; - keyword and other scoring
> filters including so called &quot;Bayesian&quot;<br>
> &nbsp;&nbsp; systems<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Except for some individuals and for them
> only some of the time,<br>
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; these have non-trivial false positive
> rates.</blockquote><br>
> I use an excellent open-source Bayesian filter, called POPFile (see
> sourceforge).&nbsp;&nbsp; It's long-term accuracy for classifying
> messages into 25 personally invented buckets (including e2e messages) is
> displayed as follows:
> <dl>
> <dd><h2><b>Classification Accuracy</b></h2>
> <dd>Messages classified: 75,001
> <dd>Classification errors: 301<hr>
>
> <dd>Accuracy: 99.59%<br>
>
> <dd>Bucket&nbsp;&nbsp; Classification Count False Positives False
> Negatives
> <dd>...
> <dd><font color="#FF0000">spam</font>&nbsp;&nbsp;
> <x-tab>&nbsp;</x-tab>51,953 (69.26%)
> <x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>205
> <x-tab>&nbsp;&nbsp;&nbsp;&nbsp;</x-tab><x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>81
> <dd>...
> </dl>It takes me about 15 seconds to scan a folder of 100 messages that
> are classed as spam to detect these false positives, and the false
> negatives are of course less of a problem.&nbsp;&nbsp; A 0.2% false
> positive rate is quite reasonable.&nbsp; Note that I have deliberately
> resisted using POPFile's whitelist capability - I ONLY use the Bayesian
> learning filter.<br><br>
> The advantage, of course, is that what I consider to be spam is a purely
> personal decision, which is Joe Touch's point - it's a very bad idea to
> impose a notion like &quot;solicitation&quot; as a criterion for
> rejecting stuff.&nbsp;&nbsp;&nbsp; Email is by definition unsolicited, in
> almost all instances.&nbsp;&nbsp; The Nobel Prize phone call is equally
> unsolicited.&nbsp;&nbsp; Perhaps you don't want to get it, but I'd prefer
> to have the choice to be given my Nobel, thank you.<br><br>
> &nbsp;</body>
> </html>

As far as I can tell from my manual decryption of that missive, it:

 - misrepresents my position.
    
 - makes the incredible claim that all mail is unsolicited.
    Some mail is unsolicited but a lot of mail is solicited by any
    common definition of the word.

 - No informed person considers all unsolicited mail to be spam for
    most people.  That notion is generally the domain of kooky spammer
    fighters.  The consensus definition of spam is unsolicited bulk email.

 - of course spam is a personal matter.  That's the point of and
    a major implication of "solicited" in the consensus definition.
    The "person" involved here is the charter of the list represented
    by the human running the list.  I've tried to unsubscribe from
    this list because Joe's personal notion of spam differs from mine.

 - I do not believe a 0.2% false positive rate or anything less than
    a few % over the long haul for a Bayesian filter.  I've investigated
    more than one or two such claims about Bayesian filters.  They
    have all turned out to carry caveats like "but of course that was
    after I trained it for 3 months and doesn't count the mail I look
    at to update the training."  Any mail your filter requires you to
    examine, no matter in which "bin" or "folder," is either not really
    filtered or must be counted as false positives or negatives.

  - False positive rates of less than 0.1% are humanly impossible for
    spam loads above a gross or two spam/day for manual examination
    even if you spend 1.5 minutes/100 spam instead of 15 seconds.
    Unless your job consists entirely of reading your spam load, you
    will miss some legitimate mail among 100s of spam per day.

 - Some of those Nobel messages are unsolicited, but the majority are
    in fact solicited by the English meaning of the word.  If you have
    any hope of hearing from the King of Norway and if you really think
    the messges would be substantially identical to a lot of other
    messages, then you ought do some whitelisting to document your
    preference/solicitation even if you use a Bayesian system.  It is
    humanly impossible to filter a Nobel invitation from among 100
    messages in only 1.5 seconds/message with better than 99% accuracy.

 - That missive above misstates the situation with this mailing list.
   There's no telling what messages Joe Touch's filters are rejecting,
   but he has made clear that he is rejecting some.  The spam that
   passes his filters is much less than you would expect given the
   wide distribution of the submission address.  Joe has probably
   already caused you to miss hearing from whichever Nobel Committee
   tried to contact you via the E2E list.  (That's sarcasm.)  (saying
   it's sarcasm is intended to be insulting.)

 - HTML mail is the single biggest enabler of spam on the net.  Everyone
   who sends HTML mail to strangers should lose the privilege of
   sending mail to strangers for one day for each unjustified HTML
   mail messages sent to a stranger.  Everyone responsible for making
   HTML the default configuration of an MUA should be forced to receive
   only spam for the rest of the decade.

As I said, I've unsubscribed from the list.  I've no interest in
receiving uninformed and unthinking commentary on spam.  I've been
playing the anti-spam game since before Spamford.  My anti-spam system
handled more than 63 million mail messages in the 24 hours ending
midnight GMT.  That this list continues to distribute so much spam
while other equally open mailing lists don't says all that needs to
be said about Joe's understanding of his understanding of spam.  That
David Reed claims all mail is unsolicited implies at best that he and
I don't share a common language.

I also have no interest in attempts to elevate what was an execellent
engineering insight into a religion complete with an elder priesthood
that makes inscrutable oracular pronouncements on what the dogma really
means that would have done the priestesses of Delphi proud.


Vernon Schryver    vjs at rhyolite.com


P.S.  If Joe would do what's reasonable and very common, namely
  manually examine and filter submissions from non-subscribers,
  only David Reed and he would see this diatribe. 




More information about the end2end-interest mailing list