43,614 research outputs found
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
Recommended from our members
A tool for enhancing MetaMap performance when annotating clinical guideline documents with UMLS concepts
We developed a tool that integrates the National Library of Medicine's MetaMap software with GATE, an open-source text an- alytics framework. The tool allows non-ASCII encoded documents of numerous formats to be annotated with UMLS concepts. We created a GATE pipeline to chunk cardiovascular disease guideline text into default segments (blank-line delimited), XML element content, sentences and phrases, which were sequentially submitted to MetaMap for annotation. XML element, sentence and phrase chunking allowed term extraction and mapping to be completed in around 1/3 of the time taken with de- fault chunking, although with slight loss of accuracy (F1.0s=0.94-0.99). However, phrase chunking allows more complex input to be processed in real time, which is not possible with the other approaches. We discuss the results in relation to use of MetaMap's --term processing option for generating pre- and post-coordinated mappings from composite phrases
Structural variation in generated health reports
We present a natural language generator that produces a range of medical reports on the clinical histories of
cancer patients, and discuss the problem of conceptual restatement in generating various textual views of the
same conceptual content. We focus on two features of our system: the demand for 'loose paraphrases' between
the various reports on a given patient, with a high degree of semantic overlap but some necessary amount of distinctive content; and the requirement for paraphrasing at primarily the discourse level
Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining
Important data are locked in ancient literature. It would be uneconomic to
produce these data again and today or to extract them without the help of text
mining technologies. Vespa is a text mining project whose aim is to extract
data on pest and crops interactions, to model and predict attacks on crops, and
to reduce the use of pesticides. A few attempts proposed an agricultural
information access. Another originality of our work is to parse documents with
a dependency of the document architecture
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
Summarisation and visualisation of e-Health data repositories
At the centre of the Clinical e-Science Framework (CLEF) project is a repository of well organised,
detailed clinical histories, encoded as data that will be available for use in clinical care and in-silico
medical experiments. We describe a system that we have developed as part of the CLEF project, to perform the task of generating a diverse range of textual and graphical summaries of a patient’s clinical history from a data-encoded model, a chronicle, representing the record of the patient’s medical history. Although the focus of our current work is on cancer patients, the approach we
describe is generalisable to a wide range of medical areas
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
- …