202 research outputs found
On the Use of Parsing for Named Entity Recognition
[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the ConsellerÃa de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the SecretarÃa Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-RodrÃguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)
Using Natural Language Processing to Mine Multiple Perspectives from Social Media and Scientific Literature.
This thesis studies how Natural Language Processing techniques can be used to mine perspectives from textual data. The first part of the thesis focuses on analyzing the text exchanged by people who participate in discussions on social media sites. We particularly focus on threaded discussions that discuss ideological and political topics. The goal is to identify the different viewpoints that the discussants have with respect to the discussion topic. We use subjectivity and sentiment analysis techniques to identify the attitudes that the participants carry toward one another and toward the different aspects of the discussion topic. This involves identifying opinion expressions and their polarities, and identifying the targets of opinion. We use this information to represent discussions in one of two representations: discussant attitude vectors or signed attitude networks. We use data mining and network analysis techniques to analyze these representations to detect rifts in discussion groups and study how the discussants split into subgroups with contrasting opinions.
In the second part of the thesis, we use linguistic analysis to mine scholars perspectives from scientific literature through the lens of citations. We analyze the text adjacent to reference anchors in scientific articles as a means to identify researchers' viewpoints toward previously published work. We propose methods for identifying, extracting, and cleaning citation text. We analyze this text to identify the purpose (author's intention) and polarity (author's sentiment) of citation. Finally, we present several applications that can benefit from this analysis such as generating multi-perspective summaries of scientific articles and predicting future prominence of publications.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99934/1/amjbara_1.pd
PersoNER: Persian named-entity recognition
© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network
Reconocimiento de acto de diálogo secuencial para debates argumentativos árabes
Dialogue act recognition remains a primordial task that helps user to automatically identify participants’ intentions. In this paper, we propose a sequential approach consisting of segmentation followed by annotation process to identify dialogue acts within Arabic politic debates. To perform DA recognition, we used the CARD corpus labeled using the SADA annotation schema. Segmentation and annotation tasks were then carried out using Conditional Random Fields probabilistic models as they prove high performance in segmenting and labeling sequential data. Learning results are notably important for the segmentation task (F-score=97.9%) and relatively reliable within the annotation process (f-score=63.4%) given the complexity of identifying argumentative tags and the presence of disfluencies in spoken conversations.El reconocimiento del acto de diálogo sigue siendo una tarea primordial que ayuda al usuario a identificar automáticamente las intenciones de los participantes. En este documento, proponemos un enfoque secuencial que consiste en la segmentación seguida de un proceso de anotación para identificar actos de diálogo dentro de los debates polÃticos árabes. Para realizar el reconocimiento DA, utilizamos el corpus CARD etiquetado utilizando el esquema de anotación SADA. Las tareas de segmentación y anotación se llevaron a cabo utilizando modelos probabilÃsticos de Campos aleatorios condicionales, ya que demuestran un alto rendimiento en la segmentación y el etiquetado de datos secuenciales. Los resultados de aprendizaje son especialmente importantes para la tarea de segmentación (F-score = 97.9%) y relativamente confiables dentro del proceso de anotación (f-score = 63.4%) dada la complejidad de identificar etiquetas argumentativas y la presencia de disfluencias en las conversaciones habladas
Theory and Applications for Advanced Text Mining
Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Deep learning (DL) enables the development of computer models that are
capable of learning, visualizing, optimizing, refining, and predicting data. In
recent years, DL has been applied in a range of fields, including audio-visual
data processing, agriculture, transportation prediction, natural language,
biomedicine, disaster management, bioinformatics, drug design, genomics, face
recognition, and ecology. To explore the current state of deep learning, it is
necessary to investigate the latest developments and applications of deep
learning in these disciplines. However, the literature is lacking in exploring
the applications of deep learning in all potential sectors. This paper thus
extensively investigates the potential applications of deep learning across all
major fields of study as well as the associated benefits and challenges. As
evidenced in the literature, DL exhibits accuracy in prediction and analysis,
makes it a powerful computational tool, and has the ability to articulate
itself and optimize, making it effective in processing data with no prior
training. Given its independence from training data, deep learning necessitates
massive amounts of data for effective analysis and processing, much like data
volume. To handle the challenge of compiling huge amounts of medical,
scientific, healthcare, and environmental data for use in deep learning, gated
architectures like LSTMs and GRUs can be utilized. For multimodal learning,
shared neurons in the neural network for all activities and specialized neurons
for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table
Aspect extraction for sentiment analysis in Arabic dialect
The increase of the user-generated content on the web led to the explosion of opinionated text which facilitated opinion mining research. Despite the popularity of this research field in English text and the large number of Arabic speakers who contribute continuously to the web content, Arabic opinion mining has not received much attention due to the lack of reliable NLP tools and an accepted/comprehensive dataset. While English opinion mining has been studied extensively, Arabic opinion mining hasn’t received as much attention. The work that exists in sentiment analysis is limited to news, blogs written in Modern Standard Arabic and few studies on social media and web reviews written in Arabic dialect. Moreover, most of the work done has been done at the document and sentence level and to best of our knowledge, there is no work on a more fine-grained level. In this work, we take a more fine-grained approach to Arabic opinion mining at the aspect level through experimentation with methods that have been used in English Aspect extraction. Further, we are also contributing a dataset that can be used for further research on Arabic dialect
- …