18,462 research outputs found
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
OpenCFU, a New Free and Open-Source Software to Count Cell Colonies and Other Circular Objects
Counting circular objects such as cell colonies is an important source of
information for biologists. Although this task is often time-consuming and
subjective, it is still predominantly performed manually. The aim of the
present work is to provide a new tool to enumerate circular objects from
digital pictures and video streams. Here, I demonstrate that the created
program, OpenCFU, is very robust, accurate and fast. In addition, it provides
control over the processing parameters and is implemented in an in- tuitive and
modern interface. OpenCFU is a cross-platform and open-source software freely
available at http://opencfu.sourceforge.net
Medical WordNet: A new methodology for the construction and validation of information resources for consumer health
A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing
project to create a new lexical database called Medical WordNet (MWN), consisting of
medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide
medically validated contexts for MWN terms. The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Automatic Segmentation of Exudates in Ocular Images using Ensembles of Aperture Filters and Logistic Regression
Hard and soft exudates are the main signs of diabetic macular edema (DME). The segmentation of both kinds of exudates generates valuable information not only for the diagnosis of DME, but also for treatment, which helps to avoid vision loss and blindness. In this paper, we propose a new algorithm for the automatic segmentation of exudates in ocular fundus images. The proposed algorithm is based on ensembles of aperture filters that detect exudate candidates and remove major blood vessels from the processed images. Then, logistic regression is used to classify each candidate as either exudate or non-exudate based on a vector of 31 features that characterize each potensial lesion. Finally, we tested the performance of the proposed algorithm using the images in the public HEI-MED database.Fil: Benalcazar Palacios, Marco Enrique. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. SecretarÃa Nacional de Educación Superior, Ciencia, TecnologÃa e Innovación; EcuadorFil: Brun, Marcel. Universidad Nacional de Mar del Plata; ArgentinaFil: Ballarin, Virginia Laura. Universidad Nacional de Mar del Plata; Argentin
- …