103 research outputs found

    COVER: a linguistic resource combining common sense and lexicographic information

    Get PDF
    Lexical resources are fundamental to tackle many tasks that are central to present and prospective research in Text Mining, Information Retrieval, and connected to Natural Language Processing. In this article we introduce COVER, a novel lexical resource, along with COVERAGE, the algorithm devised to build it. In order to describe concepts, COVER proposes a compact vectorial representation that combines the lexicographic precision characterizing BabelNet and the rich common-sense knowledge featuring ConceptNet. We propose COVER as a reliable and mature resource, that has been employed in as diverse tasks as conceptual categorization, keywords extraction, and conceptual similarity. The experimental assessment is performed on the last task: we report and discuss the obtained results, pointing out future improvements. We conclude that COVER can be directly exploited to build applications, and coupled with existing resources, as well

    Taming Sense Sparsity: a Common-Sense Approach

    Get PDF
    Abstract English. We present a novel algorithm and a linguistic resource named CLOSEST after 'Common SEnse STrainer'. The resource contains a list of the main senses associated to a given term, and it was obtained by applying a simple set of pruning heuristics to the senses provided in the NASARI vectors for the set of 15K most frequent English terms. The preliminary experimentation provided encouraging results. Italiano. In questo lavoro presentiamo un algoritmo e una risorsa linguistica, ClOSeSt, che contiene i sensi più rilevanti per i 15K termini più frequenti del dizionario inglese. L'algoritmo implementato utilizza una risorsa esistente che codifica conoscenza di tipo enciclopedico, e poggia sulla nozione di senso comune per filtrare i possibili sensi associati a ciascun termine. La valutazione preliminare ha fornito risultati incoraggianti in merito alla qualità dei sensi estratti

    A Resource for Detecting Misspellings and Denoising Medical Text Data

    Get PDF
    In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è così rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche
    • …
    corecore