1,573 research outputs found
A Complete Text-Processing Pipeline for Business Performance Tracking
Natural text processing is amongst the most researched domains because of its varied applications. However, most existing works focus on improving the performance of machine learning models instead of applying those models in practical business cases. We present a text processing pipeline that enables business users to identify business performance factors through sentiment analysis and opinion summarization of customer feedback. The pipeline performs fine-grained sentiment classification of customer comments, and the results are used for the sentiment trend tracking process. The pipeline also performs topic modelling in which key aspects of customer comments are clustered using their co-relation scores. The results are used to produce abstractive opinion summarization. The proposed text processing pipeline is evaluated using two business cases in the food and retail domains. The performance of the sentiment analysis component is measured using mean absolute error (MAE) rate, root mean squared error (RMSE) rate, and coefficient of determination
Exploring the State of the Art in Legal QA Systems
Answering questions related to the legal domain is a complex task, primarily
due to the intricate nature and diverse range of legal document systems.
Providing an accurate answer to a legal query typically necessitates
specialized knowledge in the relevant domain, which makes this task all the
more challenging, even for human experts. QA (Question answering systems) are
designed to generate answers to questions asked in human languages. They use
natural language processing to understand questions and search through
information to find relevant answers. QA has various practical applications,
including customer service, education, research, and cross-lingual
communication. However, they face challenges such as improving natural language
understanding and handling complex and ambiguous questions. Answering questions
related to the legal domain is a complex task, primarily due to the intricate
nature and diverse range of legal document systems. Providing an accurate
answer to a legal query typically necessitates specialized knowledge in the
relevant domain, which makes this task all the more challenging, even for human
experts. At this time, there is a lack of surveys that discuss legal question
answering. To address this problem, we provide a comprehensive survey that
reviews 14 benchmark datasets for question-answering in the legal field as well
as presents a comprehensive review of the state-of-the-art Legal Question
Answering deep learning models. We cover the different architectures and
techniques used in these studies and the performance and limitations of these
models. Moreover, we have established a public GitHub repository where we
regularly upload the most recent articles, open data, and source code. The
repository is available at:
\url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}
Text segmentation techniques: A critical review
Text segmentation is widely used for processing text. It is a method of splitting a document into smaller parts, which is usually called segments. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase or any information unit depending on the task of the text analysis. This study presents various reasons of usage of text segmentation for different analyzing approaches. We categorized the types of documents and languages used. The main contribution of this study includes a summarization of 50 research papers and an illustration of past decade (January 2007- January 2017)’s of research that applied text segmentation as their main approach for analysing text. Results revealed the popularity of using text segmentation in different languages. Besides that, the “word” seems to be the most practical and usable segment, as it is the smaller unit than the phrase, sentence or line
情報検索における意味的ギャップの解消 : トピックモデルを用いた先進的画像探索
Tohoku University徳山豪課
Abstractive Multi-Document Summarization based on Semantic Link Network
The key to realize advanced document summarization is semantic representation of documents. This paper investigates the role of Semantic Link Network in representing and understanding documents for multi-document summarization. It proposes a novel abstractive multi-document summarization framework by first transforming documents into a Semantic Link Network of concepts and events and then transforming the Semantic Link Network into the summary of the documents based on the selection of important concepts and events while keeping semantics coherence. Experiments on benchmark datasets show that the proposed summarization approach significantly outperforms relevant state-of-the-art baselines and the Semantic Link Network plays an important role in representing and understanding documents
Survey on Multi-Document Summarization: Systematic Literature Review
In this era of information technology, abundant information is available on
the internet in the form of web pages and documents on any given topic. Finding
the most relevant and informative content out of these huge number of
documents, without spending several hours of reading has become a very
challenging task. Various methods of multi-document summarization have been
developed to overcome this problem. The multi-document summarization methods
try to produce high-quality summaries of documents with low redundancy. This
study conducts a systematic literature review of existing methods for
multi-document summarization methods and provides an in-depth analysis of
performance achieved by these methods. The findings of the study show that more
effective methods are still required for getting higher accuracy of these
methods. The study also identifies some open challenges that can gain the
attention of future researchers of this domain
On the Use of Parsing for Named Entity Recognition
[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)
PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT
Disaster summarization approaches provide an overview of the important
information posted during disaster events on social media platforms, such as,
Twitter. However, the type of information posted significantly varies across
disasters depending on several factors like the location, type, severity, etc.
Verification of the effectiveness of disaster summarization approaches still
suffer due to the lack of availability of good spectrum of datasets along with
the ground-truth summary. Existing approaches for ground-truth summary
generation (ground-truth for extractive summarization) relies on the wisdom and
intuition of the annotators. Annotators are provided with a complete set of
input tweets from which a subset of tweets is selected by the annotators for
the summary. This process requires immense human effort and significant time.
Additionally, this intuition-based selection of the tweets might lead to a high
variance in summaries generated across annotators. Therefore, to handle these
challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we
partly automate the ground-truth summary generation procedure. This approach
reduces the effort and time of the annotators while ensuring the quality of the
created ground-truth summary. We validate the effectiveness of PORTRAIT on 5
disaster events through quantitative and qualitative comparisons of
ground-truth summaries generated by existing intuitive approaches, a
semi-automated approach, and PORTRAIT. We prepare and release the ground-truth
summaries for 5 disaster events which consist of both natural and man-made
disaster events belonging to 4 different countries. Finally, we provide a study
about the performance of various state-of-the-art summarization approaches on
the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores
Neologisms in Modern English: study of word-formation processes
http://tartu.ester.ee/record=b2654513~S1*es
- …