manifestoberta
Our manifestoberta models are based on multilingual XLM-RoBERTa large models that were fine-tuned on all annotated statements in the Manifesto Corpus (currently more than 1,7 million annotated statements). They provide strong performance in classifying political texts, with the ability to categorize a wide range of political discourse into the categories of the rich manifesto project coding scheme and across various languages. The models are regularly updated and are freely available via the Hugging Face model hub.
Models
Sentence Model 56Topics
Classifies statements into one of the 56 different substantial categories available in the Handbook 4 coding scheme. Ready-to-use for classification tasks.
Hugging Face model card: Latest Version
Performance Report: PDF
DOI: https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1
Older Versions: 2023-1-1
Context Model 56Topics
Classifies statements into one of the 56 different substantial categories available in the Handbook 4 coding scheme. The context model variant additionally incorporates the surrounding sentences of a statement to improve the classification results compared to the sentence model version. To utilize the superior capabilities of the model, preprocessing is necessary to combine sentences with their context (for details see the Hugging Face model card).
Hugging Face model card: Latest Version
Performance Report: PDF
DOI: https://doi.org/10.25522/manifesto.manifestoberta.56topics.context.2024.1.1
Older Versions: 2023-1-1