38,476 research outputs found

    Towards an Open Platform for Legal Information

    Full text link
    Recent advances in the area of legal information systems have led to a variety of applications that promise support in processing and accessing legal documents. Unfortunately, these applications have various limitations, e.g., regarding scope or extensibility. Furthermore, we do not observe a trend towards open access in digital libraries in the legal domain as we observe in other domains, e.g., economics of computer science. To improve open access in the legal domain, we present our approach for an open source platform to transparently process and access Legal Open Data. This enables the sustainable development of legal applications by offering a single technology stack. Moreover, the approach facilitates the development and deployment of new technologies. As proof of concept, we implemented six technologies and generated metadata for more than 250,000 German laws and court decisions. Thus, we can provide users of our platform not only access to legal documents, but also the contained information.Comment: Accepted at ACM/IEEE Joint Conference on Digital Libraries (JCDL) 202

    Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches

    Full text link
    The Brazilian judiciary has a large workload, resulting in a long time to finish legal proceedings. Brazilian National Council of Justice has established in Resolution 469/2022 formal guidance for document and process digitalization opening up the possibility of using automatic techniques to help with everyday tasks in the legal field, particularly in a large number of texts yielded on the routine of law procedures. Notably, Artificial Intelligence (AI) techniques allow for processing and extracting useful information from textual data, potentially speeding up the process. However, datasets from the legal domain required by several AI techniques are scarce and difficult to obtain as they need labels from experts. To address this challenge, this article contributes with four datasets from the legal domain, two with documents and metadata but unlabeled, and another two labeled with a heuristic aiming at its use in textual semantic similarity tasks. Also, to evaluate the effectiveness of the proposed heuristic label process, this article presents a small ground truth dataset generated from domain expert annotations. The analysis of ground truth labels highlights that semantic analysis of domain text can be challenging even for domain experts. Also, the comparison between ground truth and heuristic labels shows that heuristic labels are useful

    A Query System for Extracting Requirements-related Information from Legal Texts

    Get PDF
    Searching legal texts for relevant information is a complex and expensive activity. The search solutions offered by present-day legal portals are targeted primarily at legal professionals. These solutions are not adequate for requirements analysts whose objective is to extract domain knowledge including stakeholders, rights and duties, and business processes that are relevant to legal requirements. Semantic Web technologies now enable smart search capabilities and can be exploited to help requirements analysts in elaborating legal requirements. In our previous work, we developed an automated framework for extracting semantic metadata from legal texts. In this paper, we investigate the use of our metadata extraction framework as an enabler for smart legal search with a focus on requirements engineering activities. We report on our industrial experience helping the Government of Luxembourg provide an advanced search facility over Luxembourg’s Income Tax Law. The experience shows that semantic legal metadata can be successfully exploited for answering requirements engineering-related legal queries. Our results also suggest that our conceptualization of semantic legal metadata can be further improved with new information elements and relations

    FastDocFastDoc: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

    Full text link
    As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does the need for efficient pre-training techniques. Current NLP models undergo resource-intensive pre-training. In response, we introduce FastDocFastDoc (Fast Pre-training Technique using Document-Level Metadata and Taxonomy), a novel approach designed to significantly reduce computational demands. FastDocFastDoc leverages document metadata and domain-specific taxonomy as supervision signals. It involves continual pre-training of an open-domain transformer encoder using sentence-level embeddings, followed by fine-tuning using token-level embeddings. We evaluate FastDocFastDoc on six tasks across nine datasets spanning three distinct domains. Remarkably, FastDocFastDoc achieves remarkable compute reductions of approximately 1,000x, 4,500x, 500x compared to competitive approaches in Customer Support, Scientific, and Legal domains, respectively. Importantly, these efficiency gains do not compromise performance relative to competitive baselines. Furthermore, reduced pre-training data mitigates catastrophic forgetting, ensuring consistent performance in open-domain scenarios. FastDocFastDoc offers a promising solution for resource-efficient pre-training, with potential applications spanning various domains.Comment: 38 pages, 7 figure

    Moving data into and out of an institutional repository: Off the map and into the territory

    Get PDF
    Given the recent proliferation of institutional repositories, a key strategic question is how multiple institutions - repositories, archives, universities and others—can best work together to manage and preserve research data. In 2007, Green and Gutmann proposed how partnerships among social science researchers, institutional repositories and domain repositories should best work. This paper uses the Timescapes Archive—a new collection of qualitative longitudinal data— to examine the challenges of working across institutions in order to move data into and out of institutional repositories. The Timescapes Archive both tests and extends their framework by focusing on the specific case of qualitative longitudinal research and by highlighting researchers' roles across all phases of data preservation and sharing. Topics of metadata, ethical data sharing, and preservation are discussed in detail. What emerged from the work to date is the extremely complex nature of the coordination required among the agents; getting the timing right is both critical and difficult. Coordination among three agents is likely to be challenging under any circumstances and becomes more so when the trajectories of different life cycles, for research projects and for data sharing, are considered. Timescapes exposed some structural tensions that, although they can not be removed or eliminated, can be effectively managed

    Assigning Creative Commons Licenses to Research Metadata: Issues and Cases

    Get PDF
    This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures.Comment: 9 pages. Submitted to the 29th International Conference on Legal Knowledge and Information Systems (JURIX 2016), Nice (France) 14-16 December 201

    JISC Preservation of Web Resources (PoWR) Handbook

    Get PDF
    Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008. The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”. The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department
    • …
    corecore