38,476 research outputs found
Towards an Open Platform for Legal Information
Recent advances in the area of legal information systems have led to a
variety of applications that promise support in processing and accessing legal
documents. Unfortunately, these applications have various limitations, e.g.,
regarding scope or extensibility. Furthermore, we do not observe a trend
towards open access in digital libraries in the legal domain as we observe in
other domains, e.g., economics of computer science. To improve open access in
the legal domain, we present our approach for an open source platform to
transparently process and access Legal Open Data. This enables the sustainable
development of legal applications by offering a single technology stack.
Moreover, the approach facilitates the development and deployment of new
technologies. As proof of concept, we implemented six technologies and
generated metadata for more than 250,000 German laws and court decisions. Thus,
we can provide users of our platform not only access to legal documents, but
also the contained information.Comment: Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL)
202
Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
The Brazilian judiciary has a large workload, resulting in a long time to
finish legal proceedings. Brazilian National Council of Justice has established
in Resolution 469/2022 formal guidance for document and process digitalization
opening up the possibility of using automatic techniques to help with everyday
tasks in the legal field, particularly in a large number of texts yielded on
the routine of law procedures. Notably, Artificial Intelligence (AI) techniques
allow for processing and extracting useful information from textual data,
potentially speeding up the process. However, datasets from the legal domain
required by several AI techniques are scarce and difficult to obtain as they
need labels from experts. To address this challenge, this article contributes
with four datasets from the legal domain, two with documents and metadata but
unlabeled, and another two labeled with a heuristic aiming at its use in
textual semantic similarity tasks. Also, to evaluate the effectiveness of the
proposed heuristic label process, this article presents a small ground truth
dataset generated from domain expert annotations. The analysis of ground truth
labels highlights that semantic analysis of domain text can be challenging even
for domain experts. Also, the comparison between ground truth and heuristic
labels shows that heuristic labels are useful
A Query System for Extracting Requirements-related Information from Legal Texts
Searching legal texts for relevant information is a complex and expensive activity. The search solutions offered by present-day legal portals are targeted primarily at legal professionals. These solutions are not adequate for requirements analysts whose objective is to extract domain knowledge including stakeholders, rights and duties, and business processes that are relevant to legal requirements. Semantic Web technologies now enable smart search capabilities and can be exploited to help requirements analysts in elaborating legal requirements.
In our previous work, we developed an automated framework for extracting semantic metadata from legal texts. In this paper, we investigate the use of our metadata extraction framework as an enabler for smart legal search with a focus on requirements engineering activities. We report on our industrial experience helping the Government of Luxembourg provide an advanced search facility over Luxembourg’s Income Tax Law. The experience shows that semantic legal metadata can be successfully exploited for answering requirements engineering-related legal queries. Our results also suggest that our conceptualization of semantic legal metadata can be further improved with new information elements and relations
: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy
As the demand for sophisticated Natural Language Processing (NLP) models
continues to grow, so does the need for efficient pre-training techniques.
Current NLP models undergo resource-intensive pre-training. In response, we
introduce (Fast Pre-training Technique using Document-Level Metadata
and Taxonomy), a novel approach designed to significantly reduce computational
demands. leverages document metadata and domain-specific taxonomy as
supervision signals. It involves continual pre-training of an open-domain
transformer encoder using sentence-level embeddings, followed by fine-tuning
using token-level embeddings. We evaluate on six tasks across nine
datasets spanning three distinct domains. Remarkably, achieves
remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to
competitive approaches in Customer Support, Scientific, and Legal domains,
respectively. Importantly, these efficiency gains do not compromise performance
relative to competitive baselines. Furthermore, reduced pre-training data
mitigates catastrophic forgetting, ensuring consistent performance in
open-domain scenarios. offers a promising solution for
resource-efficient pre-training, with potential applications spanning various
domains.Comment: 38 pages, 7 figure
Moving data into and out of an institutional repository: Off the map and into the territory
Given the recent proliferation of institutional repositories, a key strategic question is how multiple institutions - repositories, archives, universities and others—can best work together to manage and preserve research data. In 2007, Green and Gutmann proposed how partnerships among social science researchers, institutional repositories and domain repositories should best work. This paper uses the Timescapes Archive—a new collection of qualitative longitudinal data— to examine the challenges of working across institutions in order to move data into and out of institutional repositories. The Timescapes Archive both tests and extends their framework by focusing on the specific case of qualitative longitudinal research and by highlighting researchers' roles across all phases of data preservation and sharing. Topics of metadata, ethical data sharing, and preservation are discussed in detail. What emerged from the work to date is the extremely complex nature of the coordination required among the agents; getting the timing right is both critical and difficult. Coordination among three agents is likely to be challenging under any circumstances and becomes more so when the trajectories of different life cycles, for research projects and for data sharing, are considered. Timescapes exposed some structural tensions that, although they can not be removed or eliminated, can be effectively managed
Assigning Creative Commons Licenses to Research Metadata: Issues and Cases
This paper discusses the problem of lack of clear licensing and transparency
of usage terms and conditions for research metadata. Making research data
connected, discoverable and reusable are the key enablers of the new data
revolution in research. We discuss how the lack of transparency hinders
discovery of research data and make it disconnected from the publication and
other trusted research outcomes. In addition, we discuss the application of
Creative Commons licenses for research metadata, and provide some examples of
the applicability of this approach to internationally known data
infrastructures.Comment: 9 pages. Submitted to the 29th International Conference on Legal
Knowledge and Information Systems (JURIX 2016), Nice (France) 14-16 December
201
JISC Preservation of Web Resources (PoWR) Handbook
Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008.
The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”.
The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department
- …