Text Classifiers in Java

Originally, this was going to be a comparison of various text classifier algorithms for Fernando Pereira's Machine Learning for Language Processing class. I only ended up implementing a nearest-neighbor classifier, using a tree structure reminiscent of Clarkson's RPO trees.
[edit on 20070906] This is very similar to Clarkson's Kenneth Clarkson's Nearest-neighbor queries in metric spaces. I didn't realize at the time how similar it was, but a reference to it should have been included in the above writeup.

The source tarball is distributed under the GNU Library (a.k.a. "Lesser") General Public License. There's a somewhat rough description of it available. The code is documented, but also somewhat rough; I'm hoping to revise it Real Soon Now (tm.)


20030104
I've rewritten this in Objective Caml, adding a tree-balancing heuristic which seems to improve the approximation.
[.tar.gz archive]
20061030
Fixed some comments, and changed comments so that ocamldoc would see them.
[browse source][.tar.gz archive]

Josh Burdick / last updated on October 30, 2006