The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. This different from the 'standard' case (binary, or multi-class classification) which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related MULAN framework.

Main developers:

Meka screenshot Meka screenshot Meka screenshot Meka screenshot

NEW RELEASE Nov 04, 2015: Meka 1.9.0 is released. This is a major release with many new features and improvements over earlier releases. See the README regarding changes.

Move to Github Nov 04, 2015: Meka is moving to GitHub. The releases and other files will still be updated here at, but the code repository is now:

Meka on Maven Central

To include it in your projects,



Download MEKA here.

Get a nightly snapshot.

Checkout the code:


Getting Started: download MEKA and run ./ (run.bat on Windows) to launch the GUI.

The MEKA tutorial (pdf) has numerous examples on how to run and extend MEKA.

A List of Methods available in MEKA, and examples on how to use them.

The API reference.

MEKA originated from implementations of work from several publications.

Have a specific problem or query? Post to MEKA's Mailing List (please avoid contacting developers directly for MEKA-related help).


A collection of multi-label and multi-target datasets is available here. Even more datasets are available at the MULAN Website (note that MULAN indexes labels as the final attributes, whereas MEKA indexs as the beginning). See the MEKA Tutorial for more information.

The following text datasets have been created / compiled into WEKA's ARFF format using the StringToWordVector filter. Also available are train/test splits and the original raw prefiltered text.

Dataset L N LC PU Description and Original Source(s)
Enron 53 1702 3.39 0.442 A subset of the Enron Email Dataset, as labelled by the UC Berkeley Enron Email Analysis Project
Slashdot 22 3782 1.18 0.041 Article titles and partial blurbs mined from
Language Log 75 1460 1.18 0.208 Articles posted on the Language Log
IMDB (Updated) 28 120919 2.00 0.037 Movie plot text summaries labelled with genres sourced from the Internet Movie Database interface, labeled with genres.

N = The number of examples (training+testing) in the datasets

L = The number of predefined labels relevant to this dataset

LC = Label Cardinality. Average number of labels assigned per document

PU = Percentage of documents with Unique label combinations


Other software that uses MEKA

Other multi-label links