The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. This different from the 'standard' case (binary, or multi-class classification) which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related MULAN framework.
NEW Aug 29, 2015 Meka 1.7.7 is released. Bug fixes, minor improvements and new features in the GUI.
- A bug which caused an error when using
RandomForestas a base classifier, is now fixed.
- Can now visualize
Drawablebase classifiers, for example,
J48. Just right-click 'Show Graphs' in the GUI results History
- Improvements to the GUI such as
- an Open Recent option to the GUI results History
- a Save Model option to the GUI results History
MCCclassifier (and derivatives) now run faster in the case that no chain-search is specified
- OS-specific Meka home directories
May 22, 2015 Some examples of using Meka for data streams (i.e., Updateable classifiers) added to the list of methods.
Meka on Maven Central
To include it in your projects,
Download MEKA here.
Or checkout the code with subversion:
svn checkout svn://svn.code.sf.net/p/meka/code/trunk meka-code
Or get a nightly snapshot.
Getting Started: download MEKA and run
run.bat on Windows) to launch the GUI.
The MEKA tutorial (pdf) has numerous examples on how to run and extend MEKA.
A List of Methods available in MEKA, and examples on how to use them.
The API reference.
MEKA originated from implementations of work from several publications.
Have a specific problem or query? Post to MEKA's Mailing List (please avoid contacting developers directly for MEKA-related help).
A collection of multi-label and multi-target datasets is available here. Even more datasets are available at the MULAN Website (note that MULAN indexes labels as the final attributes, whereas MEKA indexs as the beginning). See the MEKA Tutorial for more information.
|Dataset||L||N||LC||PU||Description and Original Source(s)|
|Enron||53||1702||3.39||0.442||A subset of the Enron Email Dataset, as labelled by the UC Berkeley Enron Email Analysis Project|
|Slashdot||22||3782||1.18||0.041||Article titles and partial blurbs mined from Slashdot.org|
|Language Log||75||1460||1.18||0.208||Articles posted on the Language Log|
|IMDB (Updated)||28||120919||2.00||0.037||Movie plot text summaries labelled with genres sourced from the Internet Movie Database interface, labeled with genres.|
N = The number of examples (training+testing) in the datasets
L = The number of predefined labels relevant to this dataset
LC = Label Cardinality. Average number of labels assigned per document
PU = Percentage of documents with Unique label combinations
Other software that uses MEKA
- ADAMs framework - integrates MEKA into workflows
- KNIME framework, includes a plugin to integrate Meka Classifiers into workflows
- DKPro Text Classification Framework
- MOA environment for data streams can use Updateable MEKA classifiers
- scikit-multilearn can interface to MEKA in Python