The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, you want to predict multiple output variables for each input instance. This different from the 'standard' case (binary, or multi-class classification) which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the MULAN framework.
NEW Feb 13, 2015 Meka 1.7.5 is now has been released. Main changes include
- Usability improvements to the GUI
- Added javadocs documentation, Updated tutorial
- Minor bugs fixed and improvements
- Classifiers added (
Meka 1.7.5 is on Maven Central. To include it in your projects,
Sep 25, 2014 Meka 1.7.3 is now has been released. Main changes include
- Evaluation code is now more efficient at working with large ARFF files
- Several methods (e.g., PS, RandomSubspaceML) and tools (e.g., PSUtils) rewritten to be more scalable for datasets having a large labelset
- Classifier Chains (CC) based methods (CC, PCC, BCC, MCC) consolidated to share common code
- Classifiers added (RAkEL, RAkELd)
Download MEKA here.
Or checkout the code with subversion:
svn checkout svn://svn.code.sf.net/p/meka/code/trunk meka-code
Or get a nightly snapshot.
Getting Started: download MEKA and run
run.bat on Windows) to launch the GUI.
The MEKA tutorial (pdf) has numerous examples on how to run and extend MEKA.
A List of Methods available in MEKA, and examples on how to use them.
The API reference.
MEKA originated from implementations of work from several publications including a PhD thesis, they can can be found here.
Have a specific problem or query? Post to MEKA's Mailing List (please avoid contacting developers directly for MEKA-related help).
The following datasets have been created / compiled into WEKA's ARFF; They are all text datasets, parsed into binary-attribute format using WEKA's StringToWordVector filter. Also available are train/test splits and the original raw prefiltered text.
|Dataset||L||N||LC||PU||Description and Original Source(s)|
|Enron||53||1702||3.39||0.442||A subset of the Enron Email Dataset, as labelled by the UC Berkeley Enron Email Analysis Project|
|Slashdot||22||3782||1.18||0.041||Article titles and partial blurbs mined from Slashdot.org|
|Language Log||75||1460||1.18||0.208||Articles posted on the Language Log|
|IMDB Updated||28||120919||2.00||0.037||Movie plot text summaries labelled with genres sourced from the Internet Movie Database interface, labeled with genres.|
N = The number of examples (training+testing) in the datasets
L = The number of predefined labels relevant to this dataset
LC = Label Cardinality. Average number of labels assigned per document
PU = Percentage of documents with Unique label combinations
Usage notes: Attributes 1-L of these datasets represent the label space, and other attributes represent the attribute space
- WEKA Machine Learning Toolkit
- MOA environment for data streams (can run incremental MEKA classifiers)
- ADAMs framework (can integrate MEKA)
- MULAN Framework for Multi-label Classification
- Mulan Multi-label Group from the Machine Learning and Knowledge Discovery Group in the Aristotle University of Thessaloniki.
- My Webpage at Aalto University, Finland.