18,462 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    OpenCFU, a New Free and Open-Source Software to Count Cell Colonies and Other Circular Objects

    Get PDF
    Counting circular objects such as cell colonies is an important source of information for biologists. Although this task is often time-consuming and subjective, it is still predominantly performed manually. The aim of the present work is to provide a new tool to enumerate circular objects from digital pictures and video streams. Here, I demonstrate that the created program, OpenCFU, is very robust, accurate and fast. In addition, it provides control over the processing parameters and is implemented in an in- tuitive and modern interface. OpenCFU is a cross-platform and open-source software freely available at http://opencfu.sourceforge.net

    Medical WordNet: A new methodology for the construction and validation of information resources for consumer health

    Get PDF
    A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing project to create a new lexical database called Medical WordNet (MWN), consisting of medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide medically validated contexts for MWN terms. The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Automatic Segmentation of Exudates in Ocular Images using Ensembles of Aperture Filters and Logistic Regression

    Get PDF
    Hard and soft exudates are the main signs of diabetic macular edema (DME). The segmentation of both kinds of exudates generates valuable information not only for the diagnosis of DME, but also for treatment, which helps to avoid vision loss and blindness. In this paper, we propose a new algorithm for the automatic segmentation of exudates in ocular fundus images. The proposed algorithm is based on ensembles of aperture filters that detect exudate candidates and remove major blood vessels from the processed images. Then, logistic regression is used to classify each candidate as either exudate or non-exudate based on a vector of 31 features that characterize each potensial lesion. Finally, we tested the performance of the proposed algorithm using the images in the public HEI-MED database.Fil: Benalcazar Palacios, Marco Enrique. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Secretaría Nacional de Educación Superior, Ciencia, Tecnología e Innovación; EcuadorFil: Brun, Marcel. Universidad Nacional de Mar del Plata; ArgentinaFil: Ballarin, Virginia Laura. Universidad Nacional de Mar del Plata; Argentin
    • …
    corecore