1,687 research outputs found
Recommended from our members
Hierarchical statistical semantic realization for minimal recursion semantics
Recommended from our members
Semantic chunking
Long sentences pose a challenge for natural language processing (NLP) applications. They are associated with a complex information structure leading to increased requirements for processing resources. Although the issue is present in many areas of research, there is little uniformity in the solutions used by research communities dedicated to individual NLP applications. Different aspects of the problem are addressed by different tasks, such as sentence simplification or shallow chunking.
The main contribution of this thesis is the introduction of the task of semantic chunking as a general approach to reducing the cost of processing long sentences. The goal of semantic chunking is to find semantically contained fragments of a sentence representation that can be processed independently and recombined without loss of information. We anchor its principles in established concepts of semantic theory, in particular event and situation semantics. Most of the experiments in this thesis focus on semantic chunking defined on complex semantic representations in Dependency Minimal Recursion Semantics (DMRS),
but we also demonstrate that the task can be performed on sentence strings. We present three chunking models: a) rule-based proof-of-concept DMRS chunking system; b) a semi-supervised sequence labelling neural model for surface semantic chunking; c) a system capable of finding semantic chunk boundaries based on the inherent structure of DMRS graphs, generalisable in the form of descriptive templates. We show how semantic chunking can be applied within a divide-and-conquer processing paradigm, using as an example the task of realization from DMRS. The application of semantic chunking yields noticeable efficiency gains without decreasing the quality of results
Natural Language Syntax Complies with the Free-Energy Principle
Natural language syntax yields an unbounded array of hierarchically
structured expressions. We claim that these are used in the service of active
inference in accord with the free-energy principle (FEP). While conceptual
advances alongside modelling and simulation work have attempted to connect
speech segmentation and linguistic communication with the FEP, we extend this
program to the underlying computations responsible for generating syntactic
objects. We argue that recently proposed principles of economy in language
design - such as "minimal search" criteria from theoretical syntax - adhere to
the FEP. This affords a greater degree of explanatory power to the FEP - with
respect to higher language functions - and offers linguistics a grounding in
first principles with respect to computability. We show how both tree-geometric
depth and a Kolmogorov complexity estimate (recruiting a Lempel-Ziv compression
algorithm) can be used to accurately predict legal operations on syntactic
workspaces, directly in line with formulations of variational free energy
minimization. This is used to motivate a general principle of language design
that we term Turing-Chomsky Compression (TCC). We use TCC to align concerns of
linguists with the normative account of self-organization furnished by the FEP,
by marshalling evidence from theoretical linguistics and psycholinguistics to
ground core principles of efficient syntactic computation within active
inference
Graph- and surface-level sentence chunking
The computing cost of many NLP tasks increases faster than linearly with the length of the representation of a sentence. For parsing the representation is tokens, while for operations on syntax and semantics it will be more complex. In this paper we propose a new task of : splitting sentence representations into coherent substructures. Its aim is to make further processing of long sentences more tractable. We investigate this idea experimentally using the Dependency Minimal Recursion Semantics (DMRS) representation.EPSR
Parameter Learning of Logic Programs for Symbolic-Statistical Modeling
We propose a logical/mathematical framework for statistical parameter
learning of parameterized logic programs, i.e. definite clause programs
containing probabilistic facts with a parameterized distribution. It extends
the traditional least Herbrand model semantics in logic programming to
distribution semantics, possible world semantics with a probability
distribution which is unconditionally applicable to arbitrary logic programs
including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM
algorithm, the graphical EM algorithm, that runs for a class of parameterized
logic programs representing sequential decision processes where each decision
is exclusive and independent. It runs on a new data structure called support
graphs describing the logical relationship between observations and their
explanations, and learns parameters by computing inside and outside probability
generalized for logic programs. The complexity analysis shows that when
combined with OLDT search for all explanations for observations, the graphical
EM algorithm, despite its generality, has the same time complexity as existing
EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside
algorithm for PCFGs, and the one for singly connected Bayesian networks that
have been developed independently in each research field. Learning experiments
with PCFGs using two corpora of moderate size indicate that the graphical EM
algorithm can significantly outperform the Inside-Outside algorithm
Evaluating Text Generation from Discourse Representation Structures
We present an end-to-end neural approach to generate English sentences from formal meaning representations, Discourse Representation Structures (DRSs). We use a rather standard bi-LSTM sequence-to-sequence model, work with a linearized DRS input representation, and evaluate character-level and word-level decoders. We obtain very encouraging results in terms of reference-based automatic metrics such as BLEU. But because such metrics only evaluate the surface level of generated output, we develop a new metric, ROSE, that targets specific semantic phenomena. We do this with five DRS generation challenge sets focusing on tense, grammatical number, polarity, named entities and quantities. The aim of these challenge sets is to assess the neural generator’s systematicity and generalization to unseen inputs
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
- …