24,598 research outputs found
Building a Generation Knowledge Source using Internet-Accessible Newswire
In this paper, we describe a method for automatic creation of a knowledge
source for text generation using information extraction over the Internet. We
present a prototype system called PROFILE which uses a client-server
architecture to extract noun-phrase descriptions of entities such as people,
places, and organizations. The system serves two purposes: as an information
extraction tool, it allows users to search for textual descriptions of
entities; as a utility to generate functional descriptions (FD), it is used in
a functional-unification based generation system. We present an evaluation of
the approach and its applications to natural language generation and
summarization.Comment: 8 pages, uses eps
Machine-learning-based vs. manually designed approaches to anaphor resolution: the best of two worlds
In the last years, much effort went into the design of robust anaphor resolution algorithms. Many algorithms are based on antecedent filtering and preference strategies that are manually designed. Along a different line of research, corpus-based approaches have been investigated that employ machine-learning techniques for deriving strategies automatically. Since the knowledge-engineering effort for designing and optimizing the strategies is reduced, the latter approaches are considered particularly attractive. Since, however, the hand-coding of robust antecedent filtering strategies such as syntactic disjoint reference and agreement in person, number, and gender constitutes a once-for-all effort, the question arises whether at all they should be derived automatically. In this paper, it is investigated what might be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually, in a once-for-all fashion, and deriving the (potentially genre-specific) antecedent selection strategies automatically by applying machine-learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented. Through a series of formal evaluations, it is shown that, while exhibiting additional advantages, ROSANAML reaches a performance level that compares with the performance of its manually designed ancestor ROSANA
NeuralREG: An end-to-end approach to referring expression generation
Traditionally, Referring Expression Generation (REG) models first decide on
the form and then on the content of references to discourse entities in text,
typically relying on features such as salience and grammatical function. In
this paper, we present a new approach (NeuralREG), relying on deep neural
networks, which makes decisions about form and content in one go without
explicit feature extraction. Using a delexicalized version of the WebNLG
corpus, we show that the neural model substantially improves over two strong
baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201
Distributed human computation framework for linked data co-reference resolution
Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud
LivDet 2017 Fingerprint Liveness Detection Competition 2017
Fingerprint Presentation Attack Detection (FPAD) deals with distinguishing
images coming from artificial replicas of the fingerprint characteristic, made
up of materials like silicone, gelatine or latex, and images coming from alive
fingerprints. Images are captured by modern scanners, typically relying on
solid-state or optical technologies. Since from 2009, the Fingerprint Liveness
Detection Competition (LivDet) aims to assess the performance of the
state-of-the-art algorithms according to a rigorous experimental protocol and,
at the same time, a simple overview of the basic achievements. The competition
is open to all academics research centers and all companies that work in this
field. The positive, increasing trend of the participants number, which
supports the success of this initiative, is confirmed even this year: 17
algorithms were submitted to the competition, with a larger involvement of
companies and academies. This means that the topic is relevant for both sides,
and points out that a lot of work must be done in terms of fundamental and
applied research.Comment: presented at ICB 201
Recommended from our members
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.
There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods
Design and Optimisation of the FlyFast Front-end for Attribute-based Coordination
Collective Adaptive Systems (CAS) consist of a large number of interacting
objects. The design of such systems requires scalable analysis tools and
methods, which have necessarily to rely on some form of approximation of the
system's actual behaviour. Promising techniques are those based on mean-field
approximation. The FlyFast model-checker uses an on-the-fly algorithm for
bounded PCTL model-checking of selected individual(s) in the context of very
large populations whose global behaviour is approximated using deterministic
limit mean-field techniques. Recently, a front-end for FlyFast has been
proposed which provides a modelling language, PiFF in the sequel, for the
Predicate-based Interaction for FlyFast. In this paper we present details of
PiFF design and an approach to state-space reduction based on probabilistic
bisimulation for inhomogeneous DTMCs.Comment: In Proceedings QAPL 2017, arXiv:1707.0366
- …