4,970 research outputs found
FrameNet CNL: a Knowledge Representation and Information Extraction Language
The paper presents a FrameNet-based information extraction and knowledge
representation framework, called FrameNet-CNL. The framework is used on natural
language documents and represents the extracted knowledge in a tailor-made
Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be
generated automatically in multiple languages. This approach brings together
the fields of information extraction and CNL, because a source text can be
considered belonging to FrameNet-CNL, if information extraction parser produces
the correct knowledge representation as a result. We describe a
state-of-the-art information extraction parser used by a national news agency
and speculate that FrameNet-CNL eventually could shape the natural language
subset used for writing the newswire articles.Comment: CNL-2014 camera-ready version. The final publication is available at
link.springer.co
A geo-temporal information extraction service for processing descriptive metadata in digital libraries
In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
A Hybrid Approach to Domain-Specific Entity Linking
The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.Comment: SEM'1
- …