4,711 research outputs found
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Generating Natural Language from Linked Data:Unsupervised template extraction
We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.
The Requirements for Ontologies in Medical Data Integration: A Case Study
Evidence-based medicine is critically dependent on three sources of
information: a medical knowledge base, the patients medical record and
knowledge of available resources, including where appropriate, clinical
protocols. Patient data is often scattered in a variety of databases and may,
in a distributed model, be held across several disparate repositories.
Consequently addressing the needs of an evidence-based medicine community
presents issues of biomedical data integration, clinical interpretation and
knowledge management. This paper outlines how the Health-e-Child project has
approached the challenge of requirements specification for (bio-) medical data
integration, from the level of cellular data, through disease to that of
patient and population. The approach is illuminated through the requirements
elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three
diseases being studied in the EC-funded Health-e-Child project.Comment: 6 pages, 1 figure. Presented at the 11th International Database
Engineering & Applications Symposium (Ideas2007). Banff, Canada September
200
Boosting terminology extraction through crosslingual resources
Terminology Extraction is an important Natural Language Processing task with multiple applications in many areas. The task has been approached from different points of view using different techniques. Language and domain independent systems have been proposed as well. Our contribution in this paper focuses on the improvements on Terminology Extraction using crosslingual resources and specifically the Wikipedia and on the use of a variant of PageRank for scoring the candidate terms. // La extracción de terminología es una tarea de procesamiento de la lengua sumamente importante y aplicable en numerosas áreas. La tarea se ha abordado desde múltiples perspectivas y utilizando técnicas diversas. También se han propuesto sistemas independientes de la lengua y del dominio. La contribución de este artículo se centra en las mejoras que los sistemas de extracción de terminología pueden lograr utilizando recursos translingües, y concretamente la Wikipedia y en el uso de una variante de PageRank para valorar los candidatos a términoPeer ReviewedPostprint (published version
- …