Search CORE

68 research outputs found

General Purpose Textual Sentiment Analysis and Emotion Detection Tools

Author: Bellalem Nadia
Cruz-Lara Samuel
Denis Alexandre
Publication venue
Publication date: 11/09/2013
Field of study

Textual sentiment analysis and emotion detection consists in retrieving the sentiment or emotion carried by a text or document. This task can be useful in many domains: opinion mining, prediction, feedbacks, etc. However, building a general purpose tool for doing sentiment analysis and emotion detection raises a number of issues, theoretical issues like the dependence to the domain or to the language but also pratical issues like the emotion representation for interoperability. In this paper we present our sentiment/emotion analysis tools, the way we propose to circumvent the di culties and the applications they are used for.Comment: Workshop on Emotion and Computing (2013

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Domain transfer for deep natural language generation from abstract meaning representations

Author: Dethlefs Nina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/07/2017
Field of study

Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%

Repository@Hull - Worktribe

DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

Author: Beesley Chris
Dexter Paul
Kesterson Joe
Krishnan Krishnan
Liu Hongfang
Mehrabi Saeed
Palakal Mathew
Roch Alexandra M
Schmidt C. Max
Schmidt Heidi
Sohn Sunghwan
Publication venue: 'Elsevier BV'
Publication date: 01/04/2015
Field of study

In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs

Elsevier - Publisher Connector

IUPUIScholarWorks

Functional units: Abstractions for Web service annotations

Author: Bechhofer Sean
Belhajjame Khalid
Goble Carole
Li Peter
Missier Paolo
Pettifer Steve
Tanoh Franck
Wolstencroft Katy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Computational and data-intensive science increasingly depends on a large Web Service infrastructure, as services that provide a broad array of functionality can be composed into workflows to address complex research questions. In this context, the goal of service registries is to offer accurate search and discovery functions to scientists. Their effectiveness, however, depends not only on the model chosen to annotate the services, but also on the level of abstraction chosen for the annotations. The work presented in this paper stems from the observation that current annotation models force users to think in terms of service interfaces, rather than of high-level functionality, thus reducing their effectiveness. To alleviate this problem, we introduce Functional Units (FU) as the elementary units of information used to describe a service. Using popular examples of services for the Life Sciences, we define FUs as configurations and compositions of underlying service operations, and show how functional-style service annotations can be easily realised using the OWL semantic Web language. Finally, we suggest techniques for automating the service annotations process, by analysing collections of workflows that use those services.</p

Crossref

University of Birmingham Research Portal

The University of Manchester - Institutional Repository

Characterizing the personality of Twitter users based on their timeline information

Author: Batista F.
Jusupova A.
Ribeiro R.
Publication venue: 'Associacao Portuguesa de Sistemas de Informacao'
Publication date: 01/01/2016
Field of study

Personality is a set of characteristics that differentiate a person from others. It can be identified by the words that people use in conversations or in publications that they do in social networks. Most existing work focuses on personality prediction analyzing English texts. In this study we analyzed publications of the Portuguese users of the social network Twitter. Taking into account the difficulties in sentiment classification that can be caused by the 140 character limit imposed on tweets, we decided to use different features and methods such as the quantity of followers, friends, locations, publication times, etc. to get a more precise picture of a personality. In this paper, we present methods by which the personality of a user can be predicted without any effort from the Twitter users. The personality can be accurately predicted through the publicly available information on Twitter profiles.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

A sentence-based image search engine

Author: Meng Weizhi
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2015
Field of study

Nowadays people are more interested in searching the relevant images directly through search engines like Google, Yahoo or Bing, these image search engines have dedicated extensive research effort to the problem of keyword-based image retrieval. However, the most widely used keyword-based image search engine Google is reported to have a precision of only 39%. And all of these systems have limitation in creating sentence-based queries for images. This thesis studies a practical image search scenario, where many people feel annoyed by using only keywords to find images for their ideas of speech or presentation through trial and error. This thesis proposes and realizes a sentence-based image search engine (SISE) that offers the option of querying images by sentence. Users can naturally create sentence-based queries simply by inputting one or several sentences to retrieve a list of images that match their ideas well. The SISE relies on automatic concept detection and tagging techniques to provide support for searching visual content using sentence-based queries. The SISE gathered thousands of input sentences from TED talk, covering many areas like science, economy, politics, education and so on. The comprehensive evaluation of this system was focused on usability (perceived image usefulness) aspect. The final comprehensive precision has been reached 60.7%. The SISE is found to be able to retrieve matching images for a wide variety of topics, across different areas, and provide subjectively more useful results than keyword-based image search engines --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Semantic models as metrics for kernel-based interaction identification

Author: Polajnar Tamara
Publication venue
Publication date: 01/01/2010
Field of study

Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge. Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection

Glasgow Theses Service