880 research outputs found
Medical WordNet: A new methodology for the construction and validation of information resources for consumer health
A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing
project to create a new lexical database called Medical WordNet (MWN), consisting of
medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide
medically validated contexts for MWN terms. The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
Refining the use of the web (and web search) as a language teaching and learning resource
The web is a potentially useful corpus for language study because it provides examples of language that are contextualized and authentic, and is large and easily searchable. However, web contents are heterogeneous in the extreme, uncontrolled and hence 'dirty,' and exhibit features different from the written and spoken texts in other linguistic corpora. This article explores the use of the web and web search as a resource for language teaching and learning. We describe how a particular derived corpus containing a trillion word tokens in the form of n-grams has been filtered by word lists and syntactic constraints and used to create three digital library collections, linked with other corpora and the live web, that exploit the affordances of web text and mitigate some of its constraints
Semantics-based information extraction for detecting economic events
As today's financial markets are sensitive to breaking news on economic events, accurate and timely automatic identification of events in news items is crucial. Unstructured news items originating from many heterogeneous sources have to be mined in order to extract knowledge useful for guiding decision making processes. Hence, we propose the Semantics-Based Pipeline for Economic Event Detection (SPEED), focusing on extracting financial events from news articles and annotating these with meta-data at a speed that enables real-time use. In our implementation, we use some components of an existing framework as well as new components, e.g., a high-performance Ontology Gazetteer, a Word Group Look-Up component, a Word Sense Disambiguator, and components for detecting economic events. Through their interaction with a domain-specific ontology, our novel, semantically enabled components constitute a feedback loop which fosters future reuse of acquired knowledge in the event detection process
Opinion Holder and Target Extraction on Opinion Compounds â A Linguistic Approach
We present an approach to the new task of opinion holder and target extraction on opinion compounds. Opinion compounds (e.g. user rating or victim support) are noun compounds whose head is an opinion noun. We do not only examine features known to be effective for noun compound analysis, such as paraphrases and semantic classes of heads and modifiers, but also propose novel features tailored to this new task. Among them, we examine paraphrases that jointly consider holders and targets, a verb detour in which noun heads are replaced by related verbs, a global head constraint allowing inferencing between different compounds, and the categorization of the sentiment view that the head conveys
Ontology Population via NLP Techniques in Risk Management
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency . The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic Analysis
- âŠ