The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, you want to predict multiple output variables for each input instance. This different from the 'standard' case (binary, or multi-class classification) which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the MULAN framework.
NEW May 22, 2015 Some examples of using Meka for data streams (i.e., Updateable classifiers) added to the list of methods.
NEW Feb 13, 2015 Meka 1.7.5 is now has been released. Main changes include
- Usability improvements to the GUI
- Added javadocs documentation, Updated tutorial
- Minor bugs fixed and improvements
- Classifiers added (
Meka 1.7.5 is on Maven Central. To include it in your projects,
Sep 25, 2014 Meka 1.7.3 is now has been released. Main changes include
- Evaluation code is now more efficient at working with large ARFF files
- Several methods (e.g., PS, RandomSubspaceML) and tools (e.g., PSUtils) rewritten to be more scalable for datasets having a large labelset
- Classifier Chains (CC) based methods (CC, PCC, BCC, MCC) consolidated to share common code
- Classifiers added (RAkEL, RAkELd)
Download MEKA here.
Or checkout the code with subversion:
svn checkout svn://svn.code.sf.net/p/meka/code/trunk meka-code
Or get a nightly snapshot.
Getting Started: download MEKA and run
run.bat on Windows) to launch the GUI.
The MEKA tutorial (pdf) has numerous examples on how to run and extend MEKA.
A List of Methods available in MEKA, and examples on how to use them.
The API reference.
MEKA originated from implementations of work from several publications including a PhD thesis, they can can be found here.
Have a specific problem or query? Post to MEKA's Mailing List (please avoid contacting developers directly for MEKA-related help).
The following datasets have been created / compiled into WEKA's ARFF; They are all text datasets, parsed into binary-attribute format using WEKA's StringToWordVector filter. Also available are train/test splits and the original raw prefiltered text.
|Dataset||L||N||LC||PU||Description and Original Source(s)|
|Enron||53||1702||3.39||0.442||A subset of the Enron Email Dataset, as labelled by the UC Berkeley Enron Email Analysis Project|
|Slashdot||22||3782||1.18||0.041||Article titles and partial blurbs mined from Slashdot.org|
|Language Log||75||1460||1.18||0.208||Articles posted on the Language Log|
|IMDB Updated||28||120919||2.00||0.037||Movie plot text summaries labelled with genres sourced from the Internet Movie Database interface, labeled with genres.|
N = The number of examples (training+testing) in the datasets
L = The number of predefined labels relevant to this dataset
LC = Label Cardinality. Average number of labels assigned per document
PU = Percentage of documents with Unique label combinations
Usage notes: Attributes 1-L of these datasets represent the label space, and other attributes represent the attribute space
- WEKA Machine Learning Toolkit
- MOA environment for data streams (can run incremental MEKA classifiers)
- ADAMs framework (can integrate MEKA)
- MULAN Framework for Multi-label Classification
- Mulan Multi-label Group from the Machine Learning and Knowledge Discovery Group in the Aristotle University of Thessaloniki.
- My Webpage at Aalto University, Finland.