8,409 research outputs found
Legal knowledge acquisition and multimedia applications
Search, retrieval, and management of multimedia contents are challenging tasks for users and researchers alike. The aim of e-sentencias Project is to develop a software-hardware system for the global management of the multimedia contents produced by the Spanish Civil Courts. We apply technologies such as the Semantic Web, ontologies, NLP techniques, audio-video segmentation and IR. The ultimate goal is to obtain an automatic classification of images and segments of the audiovisual records that, coupled with textual semantics, allows anefficient navigation and retrieval of judicial documents and additional legal sources
Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches
The Brazilian judiciary has a large workload, resulting in a long time to
finish legal proceedings. Brazilian National Council of Justice has established
in Resolution 469/2022 formal guidance for document and process digitalization
opening up the possibility of using automatic techniques to help with everyday
tasks in the legal field, particularly in a large number of texts yielded on
the routine of law procedures. Notably, Artificial Intelligence (AI) techniques
allow for processing and extracting useful information from textual data,
potentially speeding up the process. However, datasets from the legal domain
required by several AI techniques are scarce and difficult to obtain as they
need labels from experts. To address this challenge, this article contributes
with four datasets from the legal domain, two with documents and metadata but
unlabeled, and another two labeled with a heuristic aiming at its use in
textual semantic similarity tasks. Also, to evaluate the effectiveness of the
proposed heuristic label process, this article presents a small ground truth
dataset generated from domain expert annotations. The analysis of ground truth
labels highlights that semantic analysis of domain text can be challenging even
for domain experts. Also, the comparison between ground truth and heuristic
labels shows that heuristic labels are useful
D10.1.1. Before analysis
WP Case study - Intelligent integrated decision support for legal professionalsThe objective of this document is to study the determining factors that exist in thelegal domain in Spain that can affect the achievement of a successful application in the Legal Case Study in the SEKT project. To do this,several surveys are presented, such as a user analysis, a domain analysis,a requirements analysis,a state of the art on legal applications anda state of the art on legal ontologie
SCALE: Scaling up the Complexity for Advanced Language Model Evaluation
Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks (even professional domain-specific ones), emphasizing the need for novel, more challenging novel ones to properly assess LLM capabilities. In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to
document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks). Our benchmark comprises diverse legal NLP datasets from the Swiss legal system, allowing for a comprehensive study of the underlying Non-English, inherently multilingual, federal legal system. Despite recent advances, efficiently processing long documents for intense review/analysis tasks remains an open challenge for language models. Also, comprehensive, domain-specific benchmarks requiring high expertise to develop are rare, as are multilingual benchmarks. This scarcity underscores our contribution’s value, considering most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. Our benchmark allows for testing and advancing the state-of-the-art LLMs. As part of our study, we evaluate several pre-trained multilingual language models on our benchmark to establish strong baselines as a point of reference. Despite the large size of our datasets ∗ Equal contribution. (tens to hundreds of thousands of examples), existing publicly available models struggle with most tasks, even after in-domain pretraining. We publish all resources (benchmark suite, pre-trained models, code) under a fully permissive open CC BY-SA license
Court Judgment Decision Support System Based on Medical Text Mining
Medical damage is a common problem faced by hospitals around the world and is widely watched by countries and the World Health Organization. As the number of medical damage dispute lawsuit cases rapidly grows, many countries in the world face the problem how to improve the efficiency of the judicial system under the premise of guaranteeing the quality of the trial. Therefore, in addition to reforming the system, the decision support system will effectively improve judicial decisions. This paper takes medical damage judgment documents in China as example, and proposes a court judgment decision support system (CJ-DSS) based on medical text mining and the automatic classification technology. The system can predict the trail results of the new lawsuit documents according to the previous cases verdict - rejected and non-rejected. Combined with the cases, the study in this paper found that combined feature extraction method does improve the performance of three kinds of classifiers - Support Value Machine (SVM), Artificial Neural Network (ANN) and K-Nearest Neighbor (KNN), the degree of improved performance is different from using DF-CHI combined feature extraction method. In addition, integrated learning algorithm also improves the classification performance of the overall system
Challenges to knowledge representation in multilingual contexts
To meet the increasing demands of the complex inter-organizational processes and the demand for
continuous innovation and internationalization, it is evident that new forms of organisation are
being adopted, fostering more intensive collaboration processes and sharing of resources, in what
can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are
crucial resources in collaborative networks, being their management fundamental processes to
optimize.
Knowledge organisation and collaboration systems are thus important instruments for the success of
collaborative networks of organisations having been researched in the last decade in the areas of
computer science, information science, management sciences, terminology and linguistics.
Nevertheless, research in this area didn’t give much attention to multilingual contexts of
collaboration, which pose specific and challenging problems. It is then clear that access to and
representation of knowledge will happen more and more on a multilingual setting which implies the
overcoming of difficulties inherent to the presence of multiple languages, through the use of
processes like localization of ontologies.
Although localization, like other processes that involve multilingualism, is a rather well-developed
practice and its methodologies and tools fruitfully employed by the language industry in the
development and adaptation of multilingual content, it has not yet been sufficiently explored as an
element of support to the development of knowledge representations - in particular ontologies -
expressed in more than one language. Multilingual knowledge representation is then an open
research area calling for cross-contributions from knowledge engineering, terminology, ontology
engineering, cognitive sciences, computational linguistics, natural language processing, and
management sciences.
This workshop joined researchers interested in multilingual knowledge representation, in a
multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge
engineering, terminology, ontology engineering, cognitive sciences, computational linguistics,
natural language processing, and management sciences applied to contexts where multilingualism
continuously creates new and demanding challenges to current knowledge representation methods
and techniques.
In this workshop six papers dealing with different approaches to multilingual knowledge
representation are presented, most of them describing tools, approaches and results obtained in the
development of ongoing projects.
In the first case, Andrés Domínguez Burgos, Koen Kerremansa and Rita Temmerman present a
software module that is part of a workbench for terminological and ontological mining,
Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific
texts for extracting terminological and ontological information. The crawler is part of a tool
suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned
multilingual terminological databases. In this paper the authors describe the basic principles
behind the crawler and summarized the research setting in which the tool is currently tested.
In the second paper, Fumiko Kano presents a work comparing four feature-based similarity
measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized
pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for
Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose
of the comparison is to verify the similarity measures based on the objectively developed datasets.
According to the author the results demonstrate that the Bayesian Model of Generalization provides
for the most effective cognitive model for identifying the most similar corresponding concepts
existing for a targeted socio-cultural community.
In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an
ongoing work and propose an approach to automatic extraction of information from multilingual
financial Web resources, to provide candidate terms for building ontology elements or instances of
ontology concepts. The authors present a complementary approach to the direct
localization/translation of ontology labels, by acquiring terminologies through the access and
harvesting of multilingual Web presences of structured information providers in the field of finance,
leading to both the detection of candidate terms in various multilingual sources in the financial
domain that can be used not only as labels of ontology classes and properties but also for the
possible generation of (multilingual) domain ontologies themselves.
In the next paper, Manuel Silva, António Lucas Soares and Rute Costa claim that despite the
availability of tools, resources and techniques aimed at the construction of ontological artifacts,
developing a shared conceptualization of a given reality still raises questions about the principles
and methods that support the initial phases of conceptualization. These questions become, according
to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle
these issues the authors present a collaborative platform – conceptME - where terminological and
knowledge representation processes support domain experts throughout a conceptualization
framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and
enhance conceptualization and support a multilingual ontology specification.
In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project
dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation
office of the ministry of justice, a project which aims at developing an advanced tool including
expert knowledge in the algorithms that extract specialized language from textual data (legal
documents) and whose outcome is a knowledge database including Dutch/French equivalents for
legal concepts, enriched with the phraseology related to the terms under discussion.
Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary
results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to
subject librarians, employed in large and multilingual Academic Institutions, the model used by
translators working within European Union Institutions. The authors are using User Experience
(UX) Analysis in order to provide subject librarians with a visual support, by means of “ontology
tables” depicting conceptual linking and connections of words with concepts presented according to
their semantic and linguistic meaning.
The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation
Relevance Feedback Search Based on Automatic Annotation and Classification of Texts
The idea behind Relevance Feedback Search (RFBS) is to build search queries as an iterative and interactive process in which they are gradually refined based on the results of the previous search round. This can be helpful in situations where the end user cannot easily formulate their information needs at the outset as a well-focused query, or more generally as a way to filter and focus search results. This paper concerns (1) a framework that integrates keyword extraction and unsupervised classification into the RFBS paradigm and (2) the application of this framework to the legal domain as a use case. We focus on the Natural Language Processing (NLP) methods underlying the framework and application, where an automatic annotation tool is used for extracting document keywords as ontology concepts, which are then transformed into word embeddings to form vectorial representations of the texts. An unsupervised classification system that employs similar techniques is also used in order to classify the documents into broad thematic classes. This classification functionality is evaluated using two different datasets. As the use case, we describe an application perspective in the semantic portal LawSampo - Finnish Legislation and Case Law on the Semantic Web. This online demonstrator uses a dataset of 82145 sections in 3725 statutes of Finnish legislation and another dataset that comprises 13470 court decisions
- …