Policy Process Database

Of all the things that are powerful in constraining the choice set, in shaping the way we think, time and the way learning is embodied in history are certainly among the most powerful. ... I will be blunt: Without a deep understanding of time, you will be lousy political scientists. D.C. North 1999, p. 316

The PolProDat Project by Jasmin Riedl presents data to analyse the temporal features of legislative processes and time-strategic behaviour of legislative actors in Germany. For the first time, content overviews from the German parliamentary archives have been made machine-readable and combined with the computer-processable data of the DIP. What you see and explore here is processual information related to 3,134 (successful) legislatives processes between 1994 and 2013. This is the first, interim step. PolProDat also hopes to encourage political scientist to take a closer look on archives information online – also for other parliaments.

Time is often regarded as an indicator for the quality of democratic decision-making. Acceleration of decision-making, a break-neck pace of legislation or a short duration of processes are seen to dilute the legitimacy and quality of any political decision. Episodes of high law-making density are associated with the parliament's dwindling power of participation, resulting in a lack of parliamentary oversight. In contrast, a long duration of policy-making is also seen as an expression of inefficiency.

Furthermore, time is also a strategic resource for political actors. If, for example, majority and minority actors, first and second chambers, governments and members of parliament have the right to initiate bills, competition over the parliamentary agenda is fierce and actors fight for their position on the parliamentary timetable. Those who have influence over parliamentary time and schedules are privileged.

To measure and analyse these temporal features of the legislative process and the time-strategic behavior of legislative actors, researchers face the challenge of quantifying any legislatives’ temporality and go beyond the bare duration. To overcome this, they need very detailed and machine-readable data. Even if we ignore the month and years before a bill is introduced, we still need detailed information of every single procedural step (and stop) within a single legislative process.

Until now, there was no such data. For legislative processes in Germany, the DIP (System of Documentation and Information for Parliamentary Proceedings), documents parliamentary procedures, but lacks information on the temporality-dimension of law making. The DIP-database gives insights into some specific key events such as plenary readings or the beginning and end of a legislative process. However, no comprehensive database comprising each procedural step exists.

Why should this be needed anyway? When referring to legislative pace, the literature is always measuring such pace by output per period. But what about the pace of a single law? In order to grasp this, one needs to know the number of actions taken during a given duration. However, until now it was only possible to very roughly sequence the process and time-strategic behaviour as temporal data on committee work didn’t exist (e.g. the DIP does not provide information on decisions or the adjournment of individual committees).

The PolProDat Project closes this gap by combining DIP data with information from the parliamentary archives of the German Bundestag. These archives provide online data on the content of all legislative processes since the 8th election period (1976). These tables of content document every legislative event and their respective documents. Thus, the German parliamentary archives serve to illustrate the most comprehensive collection of available documentation of federal legislation. To date, the PolProDat Database has drawn on this information starting with the 13th election period (1994) up to the end of the 16th period until 2013, collating 3,134 (successful) legislatives processes.

The PolProDat project collates and combines information from both the DIP and the parliamentary archives and transforms them into an integrative, machine-readable format, creating a comprehensive database. For the first time, this new database allows variables of interest to be generated that fully measure and analyse all temporal features of German law-making.

Parliamentary Archives Documents

The relevant data were extracted from the content overviews. This was possible as the archives’ content overviews all adhere to a similar structure and include the same type of information (although they do vary in layout over time and are inconsistent in their spacing):

Data Retrieval

...from the Archives' Content Overviews

The Documents information was made machine-readable – from PDF, to CSV, to JSON. The raw data from the PDF documents were extracted and transformed in two automated steps. This was followed by a manual cleaning process and both a manual and automated check of the extracted data for consistency with the original PDF-data.

Information was made machine-readable – from PDF, to CSV, to JSON. The raw data from the PDF documents were extracted and transformed in two automated steps. This was followed by a manual cleaning process and both a manual and automated check of the extracted data for consistency with the original PDF-data.

The data in the PDF documents were extracted and stored in CSV format using a cloud-based service (www.pdftables.com). Even though the extraction generated reasonable and automated results, variations in the tabular representation were still present, for example, due to incorrectly identified columns or rows. Moreover, the relationship between the entries is not represented in the CSV. Therefore, a second, and significantly more complex step is necessary to rectify the known difficulties resulting from the parsing process. This processing step transforms the data in such a way to accurately resemble the original PDF (in terms of the relation of information): The structure of and the relations between all information within the PDF can be formalised by a schema-language (JSON schema) that captures the document structure and types of content.

The second step – the transformation – increases robustness in terms of mapping the detected elements with respect to the JSON schema. The general procedure here is to separate the following:

The transformation process is based on regular expressions indicating the end, or beginning, of a section and the specific rules that deal with the tabular form and contextual relations. These rules are defined in such a way that they capture all common variations that might occur with respect to column-size. They can filter out irrelevant information and deal with variations in notation or occurring typos (some of them may be captured, others may be not syntactically distinguished and need a manual, contextual and semantic rectification).

Internally, the resulting JSON structure is built up while extracting the information of interest. As such, the law-making material is considered to be the most difficult part of the document. All responsible institutions have to be identified correctly and all consecutive numbers have to be correlated to the right institution. Moreover, additional entries (i.e. enclosures) which do not have their own consecutive number but do relate to one, still need to be assigned. All of the lines that are not addressed, are written to the error output of the process in order to provide manual monitoring and for debugging purposes.

The schema-conform documents can still be further refined, for example errors that are contained in the original PDF document and issues which occurred at the moment of transformation. Some errors cannot be decided on syntactical level and require contextual embedding or semantical relation for decision. The manual refinement process can be done with two different editors. First, based on the JSON schema, a form-based editor has been generated to handle JSON documents. The second one is a text-based JSON editor. The former ensures that any user is able to create syntactically valid documents while the latter gives warnings to users when they violate the JSON schema during editing.

...from DIP Bundestag

The Parlamentsdokumentation (parliamentary documentation) provides access to a database of legislative material (text, plenary session records, etc.) from all initiated bills through their Documentation and Information System (DIP). DIP is a joint information system of the Bundestag and the Bundesrat. It documents parliamentary events in both chambers - as recorded in printed papers and stenographic reports. It provides an indispensable foundation of legislative transparency while simultaneously informing the public about law-making issues and their temporal aspects. The data can be retrieved and exported in XML format from HTML pages.

...from Further Sources

To analyse the causes and effects of a legislature’s temporality, further information is included in the database:

Resulting Variables

In the following, I list some variables extracted from the raw corpus. You can explore the data using the data explorer.

The dataset to explore in the data explorer contains all promulgated laws from the 13th up to the 16th legislative period of the German Bundestag (1994–2013). The data and variables will be extended step by step. News can be found in the blog.

The data cannot be downloaded yet. Contact me, if you are interested in using the data.

All variables with datetime (dd-mm-jjjj) have a second variable ending with "_numeric" to explore them within the data explorer, because SandDance has a missing feature (it treats datetimes only as categorial variables).

General Information
title title of the bill
id identification number of the bill, constructed by the Parliamentary Archives' signature of the bill; 130528 is signature XIII/528
lp legislative period of the Bundestag
initiative initiating actor
ini_g grouped initiator; grouped by BReg (government), BT (Bundestag), BR (Bundesrat)
gesta GESTA number of a specific legislature
gesta_g grouped GESTA number by subject field
committee list of all involved committees within a legislative process
committee_lead name of the leading committee
mandatory shows if the legislature is a bill under mandatory procedure (instead of consecutive procedure)
verma shows if the legislature has undergone a mediation procedure
div_gov shows the level of divided government between federal Government and Bundesrat
ini_reg shows the level of party cohesion between federal government and initiator (in case of bills initiated by Bundesrat), as each bill from Bundesrat is first brought in by a single Land or a group of German Länder)
Specific Information of the Process
num_lfdnr number of consecutive numbers listed within the legislative materials of the archives content overview
num_activity number of events within a legislature (e.g. committee meetings, plenary sessions)
begin / begin_numeric date of the first introduction of a bill, depending on initiator this is within Bundesrat or Bundestag
end / end_numeric date of the last decision on the bill, depending on the process in Bundesrat or Bundestag
reached_bt / reached_bt_numeric day when the bill reached the Bundestag for the first time
first_read / first_read_numeric day of the first reading in a Bundestag plenary session
second_read / second_read_numeric day of the second reading in a Bundestag plenary session
third_read / third_read_numeric day of the third reading in a Bundestag plenary session
latest_phase latest stage within the lawmaking process a bill reached: 0 = reached Bundestag, 1 = reached first reading of the Bundestag, 2 = reached second reading of the Bundestag, 3 = reached third reading of the Bundestag
latest_date / latest_date_numeric day of the latest_phase
duration days between begin and end
num_beats number of goal-oriented process events between begin and end (begin and end are not counted), whereas goal oriented means events toward the adoption of a bill (consequently adjournment are excluded here, while included in activity)
speed num_beats divided by duration