Friday, March 09, 2012

Use WEKA in your Python code

Weka is a collection of machine learning algorithms that can either be applied directly to a dataset or called from your own Java code. There is an article called “Use WEKA in your Java code” which as its title suggests explains how to use WEKA from your Java code. This is not a surprising thing to do since Weka is implemented in Java. As the title of this post suggests, I will describe how to use WEKA from your Python code instead.

If you have built an entire software system in Python, you might be reluctant to look at libraries in other languages. After all, there are a huge number of excellent Python libraries, and many good machine-learning libraries written in Python or C and C++ with Python bindings. However, as far as I am concerned, it would be a pity not to make use of Weka just because it is written in Java. It is one of the most well known machine-learning libraries around with an extensive number of implemented algorithms. What’s more, there are very few data stream mining libraries around and MOA, related to Weka and also written in Java is the best I have seen.

I use Jpype (http://jpype.sourceforge.net/) to access Weka class libraries. Once you have it installed, download the latest Weka & Moa versions and copy moa.jar, sizeofag.jar and weak.jar into your working directory. Below you can see the full Python listing of the test application. The code initializes the JVM, imports some Weka packages and classes, reads a data set, splits it into a training set and test set, trains a J48 tree classifier and then tests it. If you are familiar with Weka, this will all be very easy.

In a separate post, I will explore how easy it is to use MOA in the same way.

# Initialize the specified JVM
from jpype import *options = [
"-Xmx4G",
"-Djava.class.path=./moa.jar",
"-Djava.class.path=./weka.jar",
"-Djavaagent:sizeofag.jar",
]
startJVM(getDefaultJVMPath(), *options)

# Import java/weka packages and classes
Trees = JPackage("weka.classifiers.trees")
Filter = JClass("weka.filters.Filter")
Attribute = JPackage("weka.filters.unsupervised.attribute")
Instance = JPackage("weka.filters.unsupervised.instance")
RemovePercentage = JClass("weka.filters.unsupervised.instance.RemovePercentage")
Remove = JClass("weka.filters.unsupervised.attribute.Remove")
Classifier = JClass("weka.classifiers.Classifier")
NaiveBayes = JClass("weka.classifiers.bayes.NaiveBayes")
Evaluation = JClass("weka.classifiers.Evaluation")
FilteredClassifier = JClass("weka.classifiers.meta.FilteredClassifier")
Instances = JClass("weka.core.Instances")
BufferedReader = JClass("java.io.BufferedReader")
FileReader = JClass("java.io.FileReader")
Random = JClass("java.util.Random")


#Reading from an ARFF file
reader = BufferedReader(FileReader("./iris.arff"))
data = Instances(reader)
reader.close()
data.setClassIndex(data.numAttributes() - 1) # setting class attribute

# Standardizes all numeric attributes in the given dataset to have zero mean and unit variance, apart from the class attribute.
standardizeFilter = Attribute.Standardize()
standardizeFilter.setInputFormat(data)
data = Filter.useFilter(data, standardizeFilter)

# Randomly shuffles the order of instances passed through it.
randomizeFilter = Instance.Randomize()
randomizeFilter.setInputFormat(data)
data = Filter.useFilter(data, randomizeFilter)

# Creating train set
removeFilter = RemovePercentage()
removeFilter.setInputFormat(data)
removeFilter.setPercentage(30.0)
removeFilter.setInvertSelection(False)
trainData = Filter.useFilter(data, removeFilter)

# Creating test set
removeFilter.setInputFormat(data)
removeFilter.setPercentage(30.0)
removeFilter.setInvertSelection(True)
testData = Filter.useFilter(data, removeFilter)

# Create classifier
j48 = Trees.J48()
j48.setUnpruned(True) # using an unpruned J48
j48.buildClassifier(trainData)

print "Number Training Data", trainData.numInstances(), data.numInstances()
print "Number Test Data", testData.numInstances()

# Test classifier
for i in range(testData.numInstances()):
    pred = j48.classifyInstance(testData.instance(i))
    print "ID:", testData.instance(i).value(0),
    print "actual:", testData.classAttribute().value(int(testData.instance(i).classValue())),
    print "predicted:", testData.classAttribute().value(int(pred))

shutdownJVM()

Socializer Widget By Blogger Yard
SOCIALIZE IT →
FOLLOW US →
SHARE IT →

9 comments:

  1. when i am importing Filter = JClass("weka.filters.Filter")
    its giving me an error:
    File "C:\Python27\lib\site-packages\jpype\_jclass.py", line 54, in JClass
    raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
    java.lang.ExceptionPyRaisable: java.lang.Exception: Class weka.filters.Filter not found.

    kindly resolve this problem. i would be highly grateful to you

    ReplyDelete
    Replies
    1. something along the lines should help:

      if not jpype.isJVMStarted():
      _jvmArgs = ["-ea"] # enable assertions
      # _jvmArgs.append("-Djava.class.path="+os.environ["CLASSPATH"])
      _jvmArgs.append("-Djava.class.path=./;G:/programs/Weka-3-6/weka.jar")
      _jvmArgs.append("-Xmx1G")
      jpype.startJVM(jpype.getDefaultJVMPath(), *_jvmArgs)

      notice the _jvmArgs.append("-Djava.class.path=./;G:/programs....
      ./ <--- this adds your current working directory (e.g. from where you run your script)
      then a semicolon and a path to weka.jar. This should help.

      Delete
  2. I would think you've heard this since the writing of this post, but Jython is a Python implementation in Java that works seamlessly with Java libraries (but not all CPython libraries)

    ReplyDelete
  3. could you give an example of how to create an Instance programmatically?

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Dear Dimitri,

    Thanks a lot for this introduction on using weka from Python. Do you know if it could create a classifier and even a nested classifiers using methods like weka.core.Utils.splitOptions. It supports a command like:
    weka.classifiers.meta.MultiScheme -X 0 -S 1 -B "weka.classifiers.rules.ZeroR " -B "weka.classifiers.meta.AdaBoostM1 -P 100 -S 1 -I 20 -W weka.classifiers.trees.DecisionStump" -B "weka.classifiers.trees.RandomForest -I 200 -K 30 -S 1 -num-slots 8" -B "weka.classifiers.meta.CostSensitiveClassifier -cost-matrix \"[0.0 1.0; 10.0 0.0]\" -S 1 -W weka.classifiers.trees.RandomForest -- -I 200 -K 0 -S 1 -num-slots 8" -B "weka.classifiers.rules.JRip -F 3 -N 3.0 -O 2 -S 1"

    Thank you,
    Xavier

    ReplyDelete
  6. I don't know. Sorry. I have not been using this technique too much lately.

    ReplyDelete
  7. Hello, I need know how load a model in jpype for example : mymodel.model
    (weka.classifiers.meta.Vote -S 1 -B "weka.classifiers.bayes.NaiveBayes " -B "weka.classifiers.trees.J48 -C 0.25 -M 2" -R AVG). Thanks

    ReplyDelete