MEKA
The MEKA project provides an open source implementation of methods for multi-label classification and evaluation. It is based on the WEKA Machine Learning Toolkit. Several benchmark methods are also included, as well as the pruned sets and classifier chains methods, other methods from the scientific literature, and a wrapper to the MULAN framework.
Main developers:
Download
Download MEKA here.
Documentation
Quick Start: download MEKA and run bash run.sh (run.bat on Windows) .
See the MEKA tutorial on how to get started, with numerous examples on how to run and extend MEKA.
Check out the API reference.
MEKA began implementating work from several publications including a PhD thesis, they can can be found here.
Have a specific problem or query? Post to MEKA's Mailing List.
Datasets
The following datasets have been created / compiled into WEKA's ARFF; They are all text datasets, parsed into binary-attribute format using WEKA's StringToWordVector filter.
| Dataset | L | N | LC | PU | Description and Original Source(s) |
| Enron | 53 | 1702 | 3.39 | 0.442 | A subset of the Enron Email Dataset, as labelled by the UC Berkeley Enron Email Analysis Project |
| Slashdot | 22 | 3782 | 1.18 | 0.041 | Article titles and partial blurbs mined from Slashdot.org |
| Language Log | 75 | 1460 | 1.18 | 0.208 | Articles posted on the Language Log |
| IMDB Updated | 28 | 120919 | 2.00 | 0.037 | Movie plot text summaries labelled with genres sourced from the Internet Movie Database interface, labeled with genres. |
N = The number of examples (training+testing) in the datasets
L = The number of predefined labels relevant to this dataset
LC = Label Cardinality. Average number of labels assigned per document
PU = Percentage of documents with Unique label combinations
Usage notes: Attributes 1-L of these datasets represent the label space, and other attributes represent the attribute space
Other notes: A greater selection of multi-label datasets can be found at the MULAN Website.
The Medical and Ohsumed datasets can be found here.
Links
- WEKA Machine Learning Toolkit
- MOA environment for data streams
- MULAN Framework for Multi-label Classification
- Mulan Multi-label Group from the Machine Learning and Knowledge Discovery Group in the Aristotle University of Thessaloniki.
- My Webpage at the University Carlos III of Madrid.