Gary's Longer Rantstag:typepad.com,2003:weblog-111452004-02-05T16:56:22-05:00TypePadInstructions for Training to Exaustiontag:typepad.com,2003:post-8402522004-02-05T16:56:22-05:002004-02-05T16:56:22-05:00TRAINING TO EXHAUSTION WITH A PRE-CLASSIFIED EMAIL DATABASE Here's how the training set is handled. Start with an empty word probability database. Consecutively classify each email in the training set. Switch between ham and spam back and forth. If an...Gary Robinson
<div xmlns="http://www.w3.org/1999/xhtml"><p><b>TRAINING TO EXHAUSTION WITH A PRE-CLASSIFIED EMAIL DATABASE</b></p>
<p>Here's how the training set is handled.</p>
<p>Start with an empty word probability database.</p>
<p>Consecutively classify each email in the training set. Switch between ham and spam back and forth.</p>
<p>If an email is classified correctly, ignore it.</p>
<p>When an email is encountered that is misclassified, update the probability database using the misclassified email, and continue processing the emails in sequence using the updated probability database, until the last email is reached.</p>
<p>Go back to the first email in the training set and start again.</p>
<p>The above is repeated until either: a) the very last email is properly classified, or b) there is one complete iteration through the system with no improvement in accuracy.</p>
<p>Notes:</p>
<p>1) The probability database is not recreated during each iteration. Rather, the counts that lead to the probabilities are further refined with each iteration.</p>
<p>2) Some emails are counted more than once. In practice this happens relatively rarely. Also this can be avoided, possibly leaving wrongly classified messages in the end.</p>
<p>3) Experiments have shown that it is useful to use a so-called security margin. This is an interval around the cutoff value which is forbidden for all messages. For example, spams may be required to be >.7 and hams <.2. In practice, almost all mails can be made to conform to this (if you don't allow for messages used more than once, a few might not conform). Then, when .5 is used as the cutoff in testing, messages more likely get scores not close to the cutoff and are more likely to be classified correctly.</p>
<p>4) In preliminary tests, this procedure has let to very good performance.</p>
<p>5) Using this procedure, we don't need to tune the filter for parameters such as finding the ideal spam/ham cutoff point. Rather that probability database is automatically adjusted to match the cutoff point(s) initially chosen by the software developer (see 3 above).</p>
<p>6) In actual use in an inline spam filter, the process would ideally iterate through the entire database of emails every time there is a misclassification. If resources are limited this process might be restricted to recent messages.</p></div>
My Position on Business Method Patentstag:typepad.com,2003:post-4104702003-10-16T11:58:37-04:002003-10-16T11:58:37-04:00This piece discusses my personal position on software patents, using a recent Microsoft patent announcement as the jump-off point. Update: Slashdot already reports solid prior art. "Microsoft has won a patent for an instant messaging feature that notifies users when...Gary Robinson
<div xmlns="http://www.w3.org/1999/xhtml"><p>This piece discusses my personal position on software patents, using a recent Microsoft patent announcement as the jump-off point.</p></p>
<p><em>Update: <a href="http://slashdot.org/article.pl?sid=03/10/08/135237">Slashdot already reports solid prior art</a>.
</em></p></p>
<p><p>"Microsoft has won a patent for an instant messaging feature that notifies users when the person they are communicating with is typing a message " [<a href="http://rss.com.com/2100-1028_3-5088150.html?part=rss&tag=feed&subj=news">News.com</a>]</p></p>
<p><p>Another stupid, obvious patent. AOL's AIM and Apple's iChat software infringe.</p></p>
<p><p>On the other hand, a <a href="http://rss.com.com/2100-1023-978234.html?tag=nl">News.com article from 2002</a> says:</p></p>
<p><p><blockquote>America Online has quietly secured a patent that could shake up the competitive landscape for instant messaging software. </p></p>
<p><p>The patent (6449344), originally filed in 1997, and granted in September this year, gives AOL instant messaging subsidiary ICQ rights as the inventor of the popular IM Internet application. The patent covers anything resembling a network that lets multiple IM users see when other people are present and then communicate with them. </blockquote></p></p>
<p><p>So it looks like AOL can fight back if necessary (I don't know where that would leave Apple and others). <em>Update: <a href="http://slashdot.org/article.pl?sid=03/10/08/135237">Slashdot already reports solid prior art</a>.
</em></p></p>
<p>This is why companies that are responsible to their shareholders have no choice but to get software patents -- so they can fight back if necessary. The problem isn't with companies that get software patents, the problem is the environment. The PTO is so mindless in its inability to distinguish obviousness that way too many software patents are a joke, and do indeed have huge potential to obstruct industry progress. And software patents should be 3 years in duration rather than 20 because software evolves so quickly.</p>
<p>I am in complete agreement with Jeff Bezos on these issues. He's only doing what he has to do to uphold his fiduciary responsibility to his shareholders when Amazon gets patents. Arguably, he could be sued by irate shareholders if he didn't get them if it meant that Amazon didn't have the bargaining chips it needs to deal with patent lawsuits from others. (And it would mean that.)</p>
<p>Bezos argues that the system should be changed, and so do I. But while it exists as it does, he's going to do what it requires. So am I.</p>
<p>For the reasons stated above, plus a desire not to have Microsoft copy our hard-won ideas, Transpose, too, has software patents pending that could be considered to be "business method" patents. My current thinking on the issue, if those patents are granted, is to give free license to implementations which are free or result in only small profits, but retain the freedom to fight back against Microsoft and other large companies. I have begun to explore possible ways to build such licenses into the patents so that they couldn't be revoked.</p>
<p>Software patents suck, but if you're trying to be responsible to people who depend on you to make the right decisions financially, you gotta do what you gotta do.</p>
<p>You may have noticed that while I speak positively of Jeff Bezos above, I have also <a href="http://radio.weblogs.com/0101454/2003/02/26.html#a281">criticized</a> specific Amazon patents.</p>
<p>As I said, Bezos is doing what he gotta do. He has to get those patents when he can to fulfill his responsibilities to his shareholders. That doesn't mean those patents are <em>valid</em>, i.e. enforceable by law. And Bezos knows it. Validity can only be tested in court. And look at who Amazon has sued: Barnes and Noble, which has plenty of legal resources to enable the judge and jury to fairly decide whether the patent is valid. No small companies or open-source projects have been harassed by Amazon over their patents.</p>
<p>The next step would be to do what I have outlined above, and actually provide an automatic free license to small businesses and open-source projects. That would increase Amazon's good will in the business and open-source worlds, and thus be good business, without harming Amazon's ability to duke it out with the Barnes and Noble's of the world. In effect, he has done that by not suing such embodiments of ideas patented by Amazon. Unfortunately, he has left the door open so that Amazon can change to a more obnoxious strategy if it wants to in the future, so we can't rest easy.</p>
<p>We'll see if they correct this in the future. I hope so.</p>
<p>I'll report here as my thinking (and the thinking of my company), evolves with respect to these issues.</div>
Kazaa, BitTorrent, and Central Server-Based Systemstag:typepad.com,2003:post-4104222003-10-16T11:40:13-04:002003-10-16T11:40:13-04:00"Kazaa backs plan that could spell an end to the days of free music:" The world's most popular song-swapping network, Kazaa, has thrown its weight behind a plan to start billing song swappers for their music downloads. The proposal, which...Gary Robinson
<div xmlns="http://www.w3.org/1999/xhtml"><p>"Kazaa backs plan that could spell an end to the days of free music:"<br />
<blockquote>The world's most popular song-swapping network, Kazaa, has thrown its weight behind a plan to start billing song swappers for their music downloads. </p></p>
<p>The proposal, which could finally end the days of the free lunch for millions of music fans, has been put to big US record labels at the same time as a new legitimate version of the former file-swapping giant Napster is launched in the US. </p>
<p>The idea is to phase in a billing mechanism for peer to peer networks, such as Kazaa and Morpheus, that allow users to copy music directly from each other's hard drives. [<a href="http://www.theage.com.au/articles/2003/10/10/1065676130907.html">The Age</a>]</blockquote></p>
<p>It's hard not to be quite skeptical of such a plan. The reason peer-to-peer systems have flourished is because it was the only way to get easy access to large numbers of music files. There was no legal way to get them, and hugely expensive central server-based systems couldn't take the risk of making illegal files available. </p>
<p>Now that the legal agreements with the major labels are falling into place, central server-based systems can serve them. So the main reason now for peer-to-peer networks to exist is that they enable files to be shared at no cost. Once they start charging, their purpose in life won't be so clear.</p>
<p>At one point during Napster's original heyday, one of the managers at the company looked toward the future and said that the real reason for a system like Napster was <a href="http://pespmc1.vub.ac.be/COLLFILT.html">collaborative filtering</a> -- using the large numbers of people involved in such a system to form implicit communities to make recommendations to each other. For instance, everyone can see what is in everyone else's library. If you find someone whose tastes are like yours, then you may benefit from sampling items in their collection that you haven't heard yet.</p>
<p>In the end, this manager was right about the value of collaborative filtering, but confused about what it meant for peer-to-peer networks. Systems like <a href="http://www.audioscrobbler.com/">AudioScrobbler</a> let you see what other people have in their collections, but without caring how you got the files. The files can come from a central server -- it makes no difference where they came from. So the concept that peer-to-peer file sharing is justified by collaborative filtering makes no sense.</p>
<p>In the end, the advantages of stable, ultra-high bandwidth central servers will make peer-to-peer file sharing moot, except for that percentage of the population that continues, for one reason or another to choose not to pay for music. But that population won't be served by Kazaa under Kazaa's new plan.</p>
<p>There is one niche where legal, peer-to-peer sharing of music files may have a role to play in the future. This is in the sharing of legitimately free music -- music from artists who release some material freely in order to get people to pay for other material, and from artist that choose to make a living from concert ticket sales or who do it as a hobby and don't make money at all.</p>
<p>File sharing mechanisms such a BitTorrent make file sharing both speedy and reliable by sharing the load among a large number of peers. If one computer goes offline while it is serving a file, it doesn't end the transfer. The transfer continues from other computers in the network. That makes peer-to-peer comparably reliable to central server-based systems. There is still a disadvantage to something like BitTorrent, because each end-user computer shares the load of sending out files. In a central server-based system, the server carries all the load, and so there is less load on end-user computers. So BitTorrent-type systems will only flourish where there is a counterbalancing advantage compared to central server-based systems.</p>
<p>And indeed there are disadvantages to central server-based systems, even for serving legal files. A central server-based system will need to pay for the hardware and bandwidth to serve potentially huge numbers of files to potentially huge numbers of people. It will need to make a very large amount of money just to pay the overhead, and then it will want to make a profit on top of that. So its needs and the needs of users who just want the best free music are different. Such a system will have to either charge a sizable flat fee to pay for the overhead, or it will be inevitably try to convince people to buy more music when they would have been just as happy listening to a higher proportion of free music. Users will, of course, be able to choose to listen to large amounts of free music anyway, but the environment set up by the server-based system -- the content, the way the user interface works, etc. -- won't be particularly friendly to that choice. It will be a little less comfortable for people who are oriented toward free music than such people would like.</p>
<p>A reliable peer-to-peer system (such as one based on BitTorrent) could be set up that would be a perfect match to the needs of people who enjoy and care about free music. It wouldn't have to pay for the overhead of a central server-based system, so it wouldn't need to try to convince people to buy music when they would be just as happy with free music, and it won't need to charge high subscription frees to pay for bandwidth and hardware. </p>
<p>In fact, it fairly likely that a service will emerge that serves the needs of people who care about free music, and that it will be a peer-to-peer service. Kazaa has the opportunity to fill that role, but it sounds like Kazaa is a little too envious right now of the profits being made by the iTunes Music Store, and that will be made by other such stores, to see their opportunity. That may change in time. </p>
<p>(Of course the above assumes that Kazaa would prefer not to continue to base its business on the sharing of files that aren't legally available for that purpose. The fact that they are making moves toward establishing relationships with the labels is a strong argument that, in fact, they would prefer to make that change.) </p>
<p>A test of the hypothesis laid out above: will we ever see Amazon.com or the iTunes Music Store treat the needs of people who care about free music with as much priority as they treat the needs of paying consumers?</p>
<p></div>
First Longer Ranttag:typepad.com,2003:post-4104002003-10-16T11:30:25-04:002003-10-16T11:30:25-04:00Except, it isn't actually longer!Gary Robinson