184 research outputs found
Analysis of legal networks
This report describes the main electronically available sources of law in the three target countries of the Openlaws.eu project: Austria, the Netherlands and the United Kingdom, plus those of the EU. It describes their strengths and weaknesses in terms of available data, formats and licensing. Since the world is dynamic, especially that of electronic data, the document was originally set up as a set of spreadsheets and a web site that is easier to maintain and update. This deliverable contains a snapshot of the status of these documents at the end of December 2014
PTPARL-D: Annotated Corpus of 44 years of Portuguese Parliament debates
In a representative democracy, some decide in the name of the rest, and these
elected officials are commonly gathered in public assemblies, such as
parliaments, where they discuss policies, legislate, and vote on fundamental
initiatives. A core aspect of such democratic processes are the plenary
debates, where important public discussions take place. Many parliaments around
the world are increasingly keeping the transcripts of such debates, and other
parliamentary data, in digital formats accessible to the public, increasing
transparency and accountability. Furthermore, some parliaments are bringing old
paper transcripts to semi-structured digital formats. However, these records
are often only provided as raw text or even as images, with little to no
annotation, and inconsistent formats, making them difficult to analyze and
study, reducing both transparency and public reach. Here, we present PTPARL-D,
an annotated corpus of debates in the Portuguese Parliament, from 1976 to 2019,
covering the entire period of Portuguese democracy
The ParlaMint corpora of parliamentary proceedings
This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis
The Parla-CLARIN Recommendations for Encoding Corpora of Parliamentary Proceedings
Parliamentary proceedings are a rich source of data that can be used by scholars in various humanities and social sciences disciplines. Unlike the sources of most other language corpora, parliamentary proceedings are not subject to copyright or personal privacy protections, and are typically available online, thus making them ideal for compilation into corpora and for open distribution. For these reasons many countries have already produced corpora of parliamentary proceedings, but each typically in their own encoding, limiting their comparability and utilization in a multilingual setting. In this paper we propose an encoding schema which could serve as an interchange format for parliamentary corpora compiled for the purposes of scholarly investigations. The schema, called Parla-CLARIN, was developed within the CLARIN research infrastructure, and is written as a TEI ODD which includes a TEI customization and prose guidelines with examples of use. We discuss the coverage and choices made in designing the recommendations, and give an overview of the guidelines. We also discuss two other standard schemas for encoding parliamentary data, Akoma Ntoso and RDF, and their relation to Parla-CLARIN. We conclude by presenting corpora already encoded in Parla-CLARIN and discussing further work, especially the provision of a set of example documents and of transformation scripts that would make the proposed encoding more usable
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
- …