Use WEKA in your Python code ~ Software Ideas

Friday, March 09, 2012

Use WEKA in your Python code

Dimitri Machine-learning, python 9 comments

Weka is a collection of machine learning algorithms that can either be applied directly to a dataset or called from your own Java code. There is an article called “Use WEKA in your Java code” which as its title suggests explains how to use WEKA from your Java code. This is not a surprising thing to do since Weka is implemented in Java. As the title of this post suggests, I will describe how to use WEKA from your Python code instead.

If you have built an entire software system in Python, you might be reluctant to look at libraries in other languages. After all, there are a huge number of excellent Python libraries, and many good machine-learning libraries written in Python or C and C++ with Python bindings. However, as far as I am concerned, it would be a pity not to make use of Weka just because it is written in Java. It is one of the most well known machine-learning libraries around with an extensive number of implemented algorithms. What’s more, there are very few data stream mining libraries around and MOA, related to Weka and also written in Java is the best I have seen.

I use Jpype (http://jpype.sourceforge.net/) to access Weka class libraries. Once you have it installed, download the latest Weka & Moa versions and copy moa.jar, sizeofag.jar and weak.jar into your working directory. Below you can see the full Python listing of the test application. The code initializes the JVM, imports some Weka packages and classes, reads a data set, splits it into a training set and test set, trains a J48 tree classifier and then tests it. If you are familiar with Weka, this will all be very easy.

In a separate post, I will explore how easy it is to use MOA in the same way.

# Initialize the specified JVM
from jpype import *options = [
"-Xmx4G",
"-Djava.class.path=./moa.jar",
"-Djava.class.path=./weka.jar",
"-Djavaagent:sizeofag.jar",
]
startJVM(getDefaultJVMPath(), *options)

# Import java/weka packages and classes
Trees = JPackage("weka.classifiers.trees")
Filter = JClass("weka.filters.Filter")
Attribute = JPackage("weka.filters.unsupervised.attribute")
Instance = JPackage("weka.filters.unsupervised.instance")
RemovePercentage = JClass("weka.filters.unsupervised.instance.RemovePercentage")
Remove = JClass("weka.filters.unsupervised.attribute.Remove")
Classifier = JClass("weka.classifiers.Classifier")
NaiveBayes = JClass("weka.classifiers.bayes.NaiveBayes")
Evaluation = JClass("weka.classifiers.Evaluation")
FilteredClassifier = JClass("weka.classifiers.meta.FilteredClassifier")
Instances = JClass("weka.core.Instances")
BufferedReader = JClass("java.io.BufferedReader")
FileReader = JClass("java.io.FileReader")
Random = JClass("java.util.Random")

#Reading from an ARFF file
reader = BufferedReader(FileReader("./iris.arff"))
data = Instances(reader)
reader.close()
data.setClassIndex(data.numAttributes() - 1) # setting class attribute

# Standardizes all numeric attributes in the given dataset to have zero mean and unit variance, apart from the class attribute.
standardizeFilter = Attribute.Standardize()
standardizeFilter.setInputFormat(data)
data = Filter.useFilter(data, standardizeFilter)

# Randomly shuffles the order of instances passed through it.
randomizeFilter = Instance.Randomize()
randomizeFilter.setInputFormat(data)
data = Filter.useFilter(data, randomizeFilter)

# Creating train set
removeFilter = RemovePercentage()
removeFilter.setInputFormat(data)
removeFilter.setPercentage(30.0)
removeFilter.setInvertSelection(False)
trainData = Filter.useFilter(data, removeFilter)

# Creating test set
removeFilter.setInputFormat(data)
removeFilter.setPercentage(30.0)
removeFilter.setInvertSelection(True)
testData = Filter.useFilter(data, removeFilter)

# Create classifier
j48 = Trees.J48()
j48.setUnpruned(True) # using an unpruned J48
j48.buildClassifier(trainData)

print "Number Training Data", trainData.numInstances(), data.numInstances()
print "Number Test Data", testData.numInstances()

# Test classifier
for i in range(testData.numInstances()):
pred = j48.classifyInstance(testData.instance(i))
print "ID:", testData.instance(i).value(0),
print "actual:", testData.classAttribute().value(int(testData.instance(i).classValue())),
print "predicted:", testData.classAttribute().value(int(pred))

shutdownJVM()

Socializer Widget By Blogger Yard

SOCIALIZE IT →

9 comments:

Unknown1 January 2013 at 22:04
when i am importing Filter = JClass("weka.filters.Filter")
its giving me an error:
File "C:\Python27\lib\site-packages\jpype\_jclass.py", line 54, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
java.lang.ExceptionPyRaisable: java.lang.Exception: Class weka.filters.Filter not found.

kindly resolve this problem. i would be highly grateful to you
ReplyDelete
Replies
Eddie6 March 2013 at 13:21
I would think you've heard this since the writing of this post, but Jython is a Python implementation in Java that works seamlessly with Java libraries (but not all CPython libraries)
ReplyDelete
Replies
Dmitry Avtonomov20 June 2013 at 13:40
could you give an example of how to create an Instance programmatically?
ReplyDelete
Replies
Dmitry Avtonomov20 June 2013 at 13:44
This comment has been removed by the author.
ReplyDelete
Replies
Anonymous30 January 2014 at 16:54
Dear Dimitri,

Thanks a lot for this introduction on using weka from Python. Do you know if it could create a classifier and even a nested classifiers using methods like weka.core.Utils.splitOptions. It supports a command like:
weka.classifiers.meta.MultiScheme -X 0 -S 1 -B "weka.classifiers.rules.ZeroR " -B "weka.classifiers.meta.AdaBoostM1 -P 100 -S 1 -I 20 -W weka.classifiers.trees.DecisionStump" -B "weka.classifiers.trees.RandomForest -I 200 -K 30 -S 1 -num-slots 8" -B "weka.classifiers.meta.CostSensitiveClassifier -cost-matrix \"[0.0 1.0; 10.0 0.0]\" -S 1 -W weka.classifiers.trees.RandomForest -- -I 200 -K 0 -S 1 -num-slots 8" -B "weka.classifiers.rules.JRip -F 3 -N 3.0 -O 2 -S 1"

Thank you,
Xavier
ReplyDelete
Replies
Dimitri1 February 2014 at 13:49
I don't know. Sorry. I have not been using this technique too much lately.
ReplyDelete
Replies
Unknown6 February 2014 at 13:27
Hello, I need know how load a model in jpype for example : mymodel.model
(weka.classifiers.meta.Vote -S 1 -B "weka.classifiers.bayes.NaiveBayes " -B "weka.classifiers.trees.J48 -C 0.25 -M 2" -R AVG). Thanks
ReplyDelete
Replies

Add comment

Software Ideas

Friday, March 09, 2012