Search CORE

38,476 research outputs found

Towards an Open Platform for Legal Information

Author: Blume Till
Ostendorff Malte
Ostendorff Saskia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/05/2020
Field of study

Recent advances in the area of legal information systems have led to a variety of applications that promise support in processing and accessing legal documents. Unfortunately, these applications have various limitations, e.g., regarding scope or extensibility. Furthermore, we do not observe a trend towards open access in digital libraries in the legal domain as we observe in other domains, e.g., economics of computer science. To improve open access in the legal domain, we present our approach for an open source platform to transparently process and access Legal Open Data. This enables the sustainable development of legal applications by offering a single technology stack. Moreover, the approach facilitates the development and deployment of new technologies. As proof of concept, we implemented six technologies and generated metadata for more than 250,000 German laws and court decisions. Thus, we can provide users of our platform not only access to legal documents, but also the contained information.Comment: Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL) 202

arXiv.org e-Print Archive

Crossref

Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches

Author: Corval Paulo Roberto dos S.
de Oliveira Daniel
Junior Daniel da Silva
Paes Aline
Publication venue
Publication date: 29/05/2023
Field of study

The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process. However, datasets from the legal domain required by several AI techniques are scarce and difficult to obtain as they need labels from experts. To address this challenge, this article contributes with four datasets from the legal domain, two with documents and metadata but unlabeled, and another two labeled with a heuristic aiming at its use in textual semantic similarity tasks. Also, to evaluate the effectiveness of the proposed heuristic label process, this article presents a small ground truth dataset generated from domain expert annotations. The analysis of ground truth labels highlights that semantic analysis of domain text can be challenging even for domain experts. Also, the comparison between ground truth and heuristic labels shows that heuristic labels are useful

arXiv.org e-Print Archive

A Query System for Extracting Requirements-related Information from Legal Texts

Author: Briand Lionel
Ceci Marcello
Dann John
Sabetzadeh Mehrdad
Sannier Nicolas
Sleimi Amin
Publication venue
Publication date: 01/01/2019
Field of study

Searching legal texts for relevant information is a complex and expensive activity. The search solutions offered by present-day legal portals are targeted primarily at legal professionals. These solutions are not adequate for requirements analysts whose objective is to extract domain knowledge including stakeholders, rights and duties, and business processes that are relevant to legal requirements. Semantic Web technologies now enable smart search capabilities and can be exploited to help requirements analysts in elaborating legal requirements. In our previous work, we developed an automated framework for extracting semantic metadata from legal texts. In this paper, we investigate the use of our metadata extraction framework as an enabler for smart legal search with a focus on requirements engineering activities. We report on our industrial experience helping the Government of Luxembourg provide an advanced search facility over Luxembourg’s Income Tax Law. The experience shows that semantic legal metadata can be successfully exploited for answering requirements engineering-related legal queries. Our results also suggest that our conceptualization of semantic legal metadata can be further improved with new information elements and relations

Crossref

Open Repository and Bibliography - Luxembourg

$FastDoc$ : Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Author: Butala Yash Parag
Ganguly Niloy
Goyal Pawan
Kapadnis Manav Nitin
Nandy Abhilash
Patnaik Sohan
Publication venue
Publication date: 14/11/2023
Field of study

As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does the need for efficient pre-training techniques. Current NLP models undergo resource-intensive pre-training. In response, we introduce

FastDoc

(Fast Pre-training Technique using Document-Level Metadata and Taxonomy), a novel approach designed to significantly reduce computational demands.

FastDoc

leverages document metadata and domain-specific taxonomy as supervision signals. It involves continual pre-training of an open-domain transformer encoder using sentence-level embeddings, followed by fine-tuning using token-level embeddings. We evaluate

FastDoc

on six tasks across nine datasets spanning three distinct domains. Remarkably,

FastDoc

achieves remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to competitive approaches in Customer Support, Scientific, and Legal domains, respectively. Importantly, these efficiency gains do not compromise performance relative to competitive baselines. Furthermore, reduced pre-training data mitigates catastrophic forgetting, ensuring consistent performance in open-domain scenarios.

FastDoc

offers a promising solution for resource-efficient pre-training, with potential applications spanning various domains.Comment: 38 pages, 7 figure

arXiv.org e-Print Archive

Moving data into and out of an institutional repository: Off the map and into the territory

Author: Bishop Libby
Publication venue: International Association for Social Science Information Service and Technology
Publication date: 01/01/2007
Field of study

Given the recent proliferation of institutional repositories, a key strategic question is how multiple institutions - repositories, archives, universities and others—can best work together to manage and preserve research data. In 2007, Green and Gutmann proposed how partnerships among social science researchers, institutional repositories and domain repositories should best work. This paper uses the Timescapes Archive—a new collection of qualitative longitudinal data— to examine the challenges of working across institutions in order to move data into and out of institutional repositories. The Timescapes Archive both tests and extends their framework by focusing on the specific case of qualitative longitudinal research and by highlighting researchers' roles across all phases of data preservation and sharing. Topics of metadata, ethical data sharing, and preservation are discussed in detail. What emerged from the work to date is the extremely complex nature of the coordination required among the agents; getting the timing right is both critical and difficult. Coordination among three agents is likely to be challenging under any circumstances and becomes more so when the trajectories of different life cycles, for research projects and for data sharing, are considered. Timescapes exposed some structural tensions that, although they can not be removed or eliminated, can be effectively managed

University of Essex Research Repository

Assigning Creative Commons Licenses to Research Metadata: Issues and Cases

Author: Aryani Amir
Casanovas Pompeu
Dallmeier-Tiessen Sunje
Doncel Victor Rodriguez
Hausstein Brigitte
Klas Claus-Peter
Manghi Paolo
Poblet Marta
Unsworth Kathryn
Wang Jingbo
Publication venue
Publication date: 01/01/2016
Field of study

This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures.Comment: 9 pages. Submitted to the 29th International Conference on Legal Knowledge and Information Systems (JURIX 2016), Nice (France) 14-16 December 201

arXiv.org e-Print Archive

Diposit Digital de Documents de la UAB

A comparative study to determine a suitable representational data model for UK building regulations

Author: Kumar Bimal
McGibbney Lewis
Publication venue
Publication date: 01/02/2013
Field of study

ResearchOnline@GCU

JISC Preservation of Web Resources (PoWR) Handbook

Author: Ashley Kevin
Davis Richard M.
Guy Marieke
Kelly Brian
Pinsent Edward
Publication venue
Publication date: 01/11/2008
Field of study

Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008. The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”. The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department