The Manifesto Corpus is a free, digital, multilingual, and annotated collection of electoral programmes. It is based on the collection of the Manifesto Project, comprising the currently largest collection of annotated electoral programmes.
Since the project Manifesto Research on Political Representation (MARPOR) took over the duty to maintain and update the Manifesto Project Dataset from the Comparative Manifestos Project the collection and the coding process were fully digitalised. The big advantage of the digitalisation of the project's infrastructure is the possibility to distribute the text data - machine-readable electoral programmes and the codings of every single quasi-sentence.
The Manifesto Corpus contains three types of information:
The party and election dates can be used to link the corpus information to the Manifesto Project Main Dataset.
The corpus currently covers electoral programmes from more than 50 different countries in more than 35 languages. It contains more than 2.500 machine-readable programmes. For more than 1.350 of these, unitising and codings are available as well. These are more than 1,250,000 coded quasi-sentences.
The Corpus is stored in an online database. It can be accessed by four different ways:
We regularly update, correct and extend the Manifesto Corpus. To ensure that analyses with the corpus can be reproduced later, we save and distribute older versions of the Manifesto Corpus. When using manifestoR you can choose to download specific corpus versions. If you want to make sure that your work can be replicated later, note the version number you are working on.
The Manifesto Corpus contains document meta information for the following aspects. Note that this information cannot be accessed via the website, but only via manifestoR or the API.
party: the party code according to the general Manifesto Project party codes (see "List of Political Parties" on the dataset website
date: the election date in the format YYYYMM (201705 indicates an election date in May 2017)
language: the language of the document (eg. english, french, german,...)
source: the collection or project which originally collected the document (eg. MARPOR, or CEMP) (since corpus version 2015-4)
has_eu_code: whether a document contains "eu codes" (see the section "The eu_code column in the Manifesto Corpus" in the Subcategories tutorial)
is_primary_doc: is FALSE only in cases where for a single party and election date multiple manifestos are available and this is the document not used for coding by the Manifesto Project.
may_contradict_core_dataset: is TRUE for documents where the CMP codings in the corpus documents might be inconsistent with the coding aggregates in the Manifesto Project’s Main Dataset. This applies to manifestos which have been either recoded after they entered the dataset or cases where the dataset entries are derived from hand-written coding sheets used prior to the digitalization of the Manifesto Project’s data workflow, but the documents were digitalized and added to the Manifesto Corpus afterwards.
manifesto_id: a document id, usually the partycode_electiondate (eg.
md5sum_text: a md5 check sum of the document content
url_original: an URL to the pdf document on the server
md5sum_originalan md5 checksum of the pdf on the server
annotations: TRUE if the document is digitally coded (otherwise FALSE)
handbook: an integer that indicates the version of the coding instructions that was used for the coding (eg. 4 or 5) (since 2016-6)
is_copy_of: indicates whether a manifesto is the copy of another manifesto (eg. in case where two parties ran on the same document) (since 2017-1)
title: the title of the manifesto (in original language) (since 2017-1)
When publishing work using the Manifesto Corpus, please reference depending on the version you used (and replace the
Make sure to provide the exact version you used for your analyses to ensure the replicability of your work.