7,092 research outputs found
A Survey on Legal Question Answering Systems
Many legal professionals think that the explosion of information about local,
regional, national, and international legislation makes their practice more
costly, time-consuming, and even error-prone. The two main reasons for this are
that most legislation is usually unstructured, and the tremendous amount and
pace with which laws are released causes information overload in their daily
tasks. In the case of the legal domain, the research community agrees that a
system allowing to generate automatic responses to legal questions could
substantially impact many practical implications in daily activities. The
degree of usefulness is such that even a semi-automatic solution could
significantly help to reduce the workload to be faced. This is mainly because a
Question Answering system could be able to automatically process a massive
amount of legal resources to answer a question or doubt in seconds, which means
that it could save resources in the form of effort, money, and time to many
professionals in the legal sector. In this work, we quantitatively and
qualitatively survey the solutions that currently exist to meet this challenge.Comment: 57 pages, 1 figure, 10 table
Text Analytics: the convergence of Big Data and Artificial Intelligence
The analysis of the text content in emails, blogs,
tweets, forums and other forms of textual communication
constitutes what we call text analytics. Text analytics is applicable
to most industries: it can help analyze millions of emails; you can
analyze customers’ comments and questions in forums; you can
perform sentiment analysis using text analytics by measuring
positive or negative perceptions of a company, brand, or product.
Text Analytics has also been called text mining, and is a subcategory
of the Natural Language Processing (NLP) field, which is one of the
founding branches of Artificial Intelligence, back in the 1950s, when
an interest in understanding text originally developed. Currently
Text Analytics is often considered as the next step in Big Data
analysis. Text Analytics has a number of subdivisions: Information
Extraction, Named Entity Recognition, Semantic Web annotated
domain’s representation, and many more. Several techniques are
currently used and some of them have gained a lot of attention,
such as Machine Learning, to show a semisupervised enhancement
of systems, but they also present a number of limitations which
make them not always the only or the best choice. We conclude
with current and near future applications of Text Analytics
Evaluation campaigns and TRECVid
The TREC Video Retrieval Evaluation (TRECVid) is an
international benchmarking activity to encourage research
in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video
corpus, automatic detection of a variety of semantic and
low-level video features, shot boundary detection and the
detection of story boundaries in broadcast TV news. This
paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether
such campaigns are a good thing or a bad thing. There are
arguments for and against these campaigns and we present
some of them in the paper concluding that on balance they
have had a very positive impact on research progress
A Survey on Knowledge-Enhanced Pre-trained Language Models
Natural Language Processing (NLP) has been revolutionized by the use of
Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in
nearly every NLP task, PLMs still face a number of challenges including poor
interpretability, weak reasoning capability, and the need for a lot of
expensive annotated data when applied to downstream tasks. By integrating
external knowledge into PLMs,
\textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained
\underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to
overcome the above-mentioned limitations. In this paper, we examine KEPLMs
systematically through a series of studies. Specifically, we outline the common
types and different formats of knowledge to be integrated into KEPLMs, detail
the existing methods for building and evaluating KEPLMS, present the
applications of KEPLMs in downstream tasks, and discuss the future research
directions. Researchers will benefit from this survey by gaining a quick and
comprehensive overview of the latest developments in this field.Comment: 19 pages, 12 figures, 192 reference
Document Understanding Dataset and Evaluation (DUDE)
We call on the Document AI (DocAI) community to reevaluate current
methodologies and embrace the challenge of creating more practically-oriented
benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to
remediate the halted research progress in understanding visually-rich documents
(VRDs). We present a new dataset with novelties related to types of questions,
answers, and document layouts based on multi-industry, multi-domain, and
multi-page VRDs of various origins, and dates. Moreover, we are pushing the
boundaries of current methods by creating multi-task and multi-domain
evaluation setups that more accurately simulate real-world situations where
powerful generalization and adaptation under low-resource settings are desired.
DUDE aims to set a new standard as a more practical, long-standing benchmark
for the community, and we hope that it will lead to future extensions and
contributions that address real-world challenges. Finally, our work illustrates
the importance of finding more efficient ways to model language, images, and
layout in DocAI.Comment: Preprint, under revie
- …