Search CORE

1,103 research outputs found

Basic tasks of sentiment analysis

Author: DM Blei
E Cambria
E Cambria
E Cambria
E Cambria
E Cambria
G Murray
G Qiu
GE Hinton
GW Taylor
H Tang
I Chaturvedi
L Oneto
R Collobert
R Ortega
S Branavan
S Poria
S Rill
T Wang
X Ding
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2017
Field of study

Subjectivity detection is the task of identifying objective and subjective sentences. Objective sentences are those which do not exhibit any sentiment. So, it is desired for a sentiment analysis engine to find and separate the objective sentences for further analysis, e.g., polarity detection. In subjective sentences, opinions can often be expressed on one or multiple topics. Aspect extraction is a subtask of sentiment analysis that consists in identifying opinion targets in opinionated text, i.e., in detecting the specific aspects of a product or service the opinion holder is either praising or complaining about

arXiv.org e-Print Archive

Crossref

Knowledge-Driven Implicit Information Extraction

Author: Perera Pathirage Dinindu
Publication venue: CORE Scholar
Publication date: 01/01/2016
Field of study

Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding. Consider the tweet New Sandra Bullock astronaut lost in space movie looks absolutely terrifying and the text snippet extracted from a clinical narrative He is suffering from nausea and severe headaches. Dolasteron was prescribed . The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea . Such implicit references of the entities and the relationships are common occurrences in daily communication and they add value to conversations. However, extracting implicit constructs has not received enough attention in the information extraction literature. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets. When people use implicit constructs in their daily communication, they assume the existence of a shared knowledge with the audience about the subject being discussed. This shared knowledge helps to decode implicitly conveyed information. For example, the above Twitter user assumed that his/her audience knows that the actress Sandra Bullock starred in the movie Gravity and it is a movie about space exploration. The clinical professional who wrote the clinical narrative above assumed that the reader knows that Dolasteron is an anti-nausea drug. The audience without such domain knowledge may not have correctly decoded the information conveyed in the above examples. This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a software solution that is capable of extracting implicit information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages such as the Resource Description Framework (RDF) or other custom formats. Hence, the acquired knowledge is pre-processed to create models that can be processed by machines. Such models provide the infrastructure to perform implicit information extraction. This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are: 1) implicit entity linking in clinical narratives, 2) implicit entity linking in Twitter, and 3) implicit relationship extraction from clinical narratives. The evaluations are conducted on relevant annotated datasets for implicit information and they demonstrate the effectiveness of the developed solution in extracting implicit information from text

CORE

Corpus annotation for mining biomedical events from literature

Abstract Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Variation and Semantic Relation Interpretation: Linguistic and Processing Issues

Author: Aussenac-Gilles Nathalie
Condamines Anne
Publication venue: HAL CCSD
Publication date: 19/06/2012
Field of study

International audienceStudies in linguistics define lexico-syntactic patterns to characterize the linguistic utterances that can be interpreted with semantic relations. Because patterns are assumed to reflect linguistic regularities that have a stable interpretation, several software implement such patterns to extract semantic relations from text. Nevertheless, a thorough analysis of pattern occurrences in various corpora proved that variation may affect their interpretation. In this paper, we report the linguistic variations that impact relation interpretation in language, and may lead to errors in relation extraction systems. We analyze several features of state-of-the-art pattern-based relation extraction tools, mostly how patterns are represented and matched with text, and discuss their role in the tool ability to manage variation

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Satellite Workshop On Language, Artificial Intelligence and Computer Science for Natural Language Processing Applications (LAICS-NLP): Discovery of Meaning from Text

Author: Kulathuramaiyer Narayanan
Ong , Siou Chin.
Yeo Alvin Wee
Publication venue: Faculty of Engineering Kasetsart University, Bangkok, Thailand.
Publication date: 01/01/2006
Field of study

This paper proposes a novel method to disambiguate important words from a collection of documents. The hypothesis that underlies this approach is that there is a minimal set of senses that are significant in characterizing a context. We extend Yarowsky’s one sense per discourse [13] further to a collection of related documents rather than a single document. We perform distributed clustering on a set of features representing each of the top ten categories of documents in the Reuters-21578 dataset. Groups of terms that have a similar term distributional pattern across documents were identified. WordNet-based similarity measurement was then computed for terms within each cluster. An aggregation of the associations in WordNet that was employed to ascertain term similarity within clusters has provided a means of identifying clusters’ root senses

Unimas Institutional Repository

Gimme The Context: Context-driven automatic semantic annotation with C-PANKOW

Author: Cimiano Philipp
Ellis Allan
Hagino Tatsuya
Ladwig Günter
Staab Steffen
Publication venue: ACM Press
Publication date: 01/01/2005
Field of study

Cimiano P, Ladwig G, Staab S. Gimme The Context: Context-driven automatic semantic annotation with C-PANKOW. In: Ellis A, Hagino T, eds. Proceedings of the 14th international conference on World Wide Web, WWW 2005. ACM Press; 2005: 332-341

Publications at Bielefeld University

A Discourse Stylistics Analysis on the Regularities in Dan Brown’s The Da Vinci Code

Author: Arafah Burhanuddin
Asanti Chris
Zamruddin Mardliya Pratiwi
Publication venue: 'Universitas Cokroaminoto Palopo'
Publication date: 31/12/2022
Field of study

The aim of this study was to identify the regularities of Dan Brown’s novel: The Da Vinci Code based on stylistic and narratological approach. To attain the regularities/irregularities that occurred in the novel and to frame the style of the novelist, a qualitative research design was employed. The data were gathered from the novel; The Da Vinci Code, by taking into account of stylistic categories. The results of the study revealed the occurrences and the forms of regularities in the novel The Da Vinci Code by Dan Brown which mirrored the style of the novelist. Dan Brown's powerful style is evident in the large number of noun phrases used in this novel. In every case, he uses noun phrases repeatedly and consistently. When describing a person's traits, he employs the noun phrase to point to occupations, specific names, locations, and items, and to designate personal pronoun

Ethical Lingua - Journal of Language Teaching and Literature

Annotating Causality in the TempEval-3 Corpus

Author: Mirza P.
Speranza M.
Sprugnoli R. (ORCID:0000-0001-6861-5595)
Tonelli S.
Publication venue: place:Gothenburg, SWEDEN
Publication date: 01/01/2014
Field of study

While there is a wide consensus in the NLP community over the modeling of temporal relations between events, mainly based on Allen\u2019s temporal logic, the question on how to annotate other types of event relations, in particular causal ones, is still open. In this work, we present some annotation guidelines to capture causality between event pairs, partly inspired by TimeML. We then implement a rule-based algorithm to automatically identify explicit causal relations in the TempEval-3 corpus. Based on this annotation, we report some statistics on the behavior of causal cues in text and perform a preliminary investigation on the interaction between causal and temporal relations

CiteSeerX

Archivio istituzionale della Ricerca - Università degli Studi di Parma

PubliCatt

Archivio della ricerca - Fondazione Bruno Kessler