5,155 research outputs found
The Development and the Evaluation of a System for Extracting Events from Web Pages
The centralization of a particular event is primarily useful for running news services. These services should provide updated information, if possible even in real time, on a specific type of event. These events and their extraction involved the automatic analysis of linguistic structure documents to determine the possible sequences in which these events occur in documents. This analysis will provide structured and semi-structured documents in which the unit events can be extracted automatically. In order to measure the quality of a system, a methodology will be introduced, which describes the stages and how the decomposition of a system for extracting events in components, quality attributes and properties will be defined for these components, and finally will be introduced metrics for evaluation.Event, Performance Metric, Event Extraction System
NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation
Biomedical researchers use ontologies to annotate their data with ontology
terms, enabling better data integration and interoperability. However, the
number, variety and complexity of current biomedical ontologies make it
cumbersome for researchers to determine which ones to reuse for their specific
needs. To overcome this problem, in 2010 the National Center for Biomedical
Ontology (NCBO) released the Ontology Recommender, which is a service that
receives a biomedical text corpus or a list of keywords and suggests ontologies
appropriate for referencing the indicated terms. We developed a new version of
the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new
recommendation approach that evaluates the relevance of an ontology to
biomedical text data according to four criteria: (1) the extent to which the
ontology covers the input data; (2) the acceptance of the ontology in the
biomedical community; (3) the level of detail of the ontology classes that
cover the input data; and (4) the specialization of the ontology to the domain
of the input data. Our evaluation shows that the enhanced recommender provides
higher quality suggestions than the original approach, providing better
coverage of the input data, more detailed information about their concepts,
increased specialization for the domain of the input data, and greater
acceptance and use in the community. In addition, it provides users with more
explanatory information, along with suggestions of not only individual
ontologies but also groups of ontologies. It also can be customized to fit the
needs of different scenarios. Ontology Recommender 2.0 combines the strengths
of its predecessor with a range of adjustments and new features that improve
its reliability and usefulness. Ontology Recommender 2.0 recommends over 500
biomedical ontologies from the NCBO BioPortal platform, where it is openly
available.Comment: 29 pages, 8 figures, 11 table
Comprehensive Review of Opinion Summarization
The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe
Social Measurement and Causal Inference with Text
The digital age has dramatically increased access to large-scale collections of digitized text documents. These corpora include, for example, digital traces from social media, decades of archived news reports, and transcripts of spoken interactions in political, legal, and economic spheres. For social scientists, this new widespread data availability has potential for improved quantitative analysis of relationships between language use and human thought, actions, and societal structure. However, the large-scale nature of these collections means that traditional manual approaches to analyzing content are extremely costly and do not scale. Furthermore, incorporating unstructured text data into quantitative analysis is difficult due to texts’ high-dimensional nature and linguistic complexity.
This thesis blends (a) the computational strengths of natural language processing (NLP) and machine learning to automate and scale-up quantitative text analysis with (b) two themes central to social scientific studies but often under-addressed in NLP: measurement—creating quantifiable summaries of empirical phenomena—and causal inference—estimating the effects of interventions. First, we address measuring class prevalence in document collections; we contribute a generative probabilistic modeling approach to prevalence estimation and show empirically that our model is more robust to shifts in class priors between training and inference. Second, we examine cross- document entity-event measurement; we contribute an empirical pipeline and a novel latent disjunction model to identify the names of civilians killed by police from our corpus of web-scraped news reports. Third, we gather and categorize applications that use text to reduce confounding from causal estimates and contribute a list of open problems as well as guidance about data processing and evaluation decisions in this area. Finally, we contribute a new causal research design to estimate the natural indirect and direct effects of social group signals (e.g. race or gender) on conversational outcomes with separate aspects of language as causal mediators; this chapter is motivated by a theoretical case study of U.S. Supreme Court oral arguments and the effect of an advocate’s gender on interruptions from justices. We conclude by discussing the relationship between measurement and causal inference with text and future work at this intersection
- …