20,768 research outputs found
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
Basic tasks of sentiment analysis
Subjectivity detection is the task of identifying objective and subjective
sentences. Objective sentences are those which do not exhibit any sentiment.
So, it is desired for a sentiment analysis engine to find and separate the
objective sentences for further analysis, e.g., polarity detection. In
subjective sentences, opinions can often be expressed on one or multiple
topics. Aspect extraction is a subtask of sentiment analysis that consists in
identifying opinion targets in opinionated text, i.e., in detecting the
specific aspects of a product or service the opinion holder is either praising
or complaining about
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
Causality Management and Analysis in Requirement Manuscript for Software Designs
For software design tasks involving natural language, the results of a causal investigation
provide valuable and robust semantic information, especially for identifying key
variables during product (software) design and product optimization. As the interest
in analytical data science shifts from correlations to a better understanding of causality,
there is an equal task focused on the accuracy of extracting causality from textual
artifacts to aid requirement engineering (RE) based decisions. This thesis focuses on
identifying, extracting, and classifying causal phrases using word and sentence labeling
based on the Bi-directional Encoder Representations from Transformers (BERT) deep
learning language model and five machine learning models. The aim is to understand
the form and degree of causality based on their impact and prevalence in RE practice.
Methodologically, our analysis is centered around RE practice, and we considered 12,438
sentences extracted from 50 requirement engineering manuscripts (REM) for training
our machine models. Our research reports that causal expressions constitute about 32%
of sentences from REM. We applied four evaluation metrics, namely recall, accuracy,
precision, and F1, to assess our machine models’ performance and accuracy to ensure
the results’ conformity with our study goal. Further, we computed the highest model
accuracy to be 85%, attributed to Naive Bayes. Finally, we noted that the applicability
and relevance of our causal analytic framework is relevant to practitioners for different
functionalities, such as generating test cases for requirement engineers and software
developers and product performance auditing for management stakeholders
- …