26,424 research outputs found
Grammar-Based Random Walkers in Semantic Networks
Semantic networks qualify the meaning of an edge relating any two vertices.
Determining which vertices are most "central" in a semantic network is
difficult because one relationship type may be deemed subjectively more
important than another. For this reason, research into semantic network metrics
has focused primarily on context-based rankings (i.e. user prescribed
contexts). Moreover, many of the current semantic network metrics rank semantic
associations (i.e. directed paths between two vertices) and not the vertices
themselves. This article presents a framework for calculating semantically
meaningful primary eigenvector-based metrics such as eigenvector centrality and
PageRank in semantic networks using a modified version of the random walker
model of Markov chain analysis. Random walkers, in the context of this article,
are constrained by a grammar, where the grammar is a user defined data
structure that determines the meaning of the final vertex ranking. The ideas in
this article are presented within the context of the Resource Description
Framework (RDF) of the Semantic Web initiative.Comment: First draft of manuscript originally written in November 200
Controlled non uniform random generation of decomposable structures
Consider a class of decomposable combinatorial structures, using different
types of atoms \Atoms = \{\At_1,\ldots ,\At_{|{\Atoms}|}\}. We address the
random generation of such structures with respect to a size and a targeted
distribution in of its \emph{distinguished} atoms. We consider two
variations on this problem. In the first alternative, the targeted distribution
is given by real numbers \TargFreq_1, \ldots, \TargFreq_k such that 0 <
\TargFreq_i < 1 for all and \TargFreq_1+\cdots+\TargFreq_k \leq 1. We
aim to generate random structures among the whole set of structures of a given
size , in such a way that the {\em expected} frequency of any distinguished
atom \At_i equals \TargFreq_i. We address this problem by weighting the
atoms with a -tuple \Weights of real-valued weights, inducing a weighted
distribution over the set of structures of size . We first adapt the
classical recursive random generation scheme into an algorithm taking
\bigO{n^{1+o(1)}+mn\log{n}} arithmetic operations to draw structures from
the \Weights-weighted distribution. Secondly, we address the analytical
computation of weights such that the targeted frequencies are achieved
asymptotically, i. e. for large values of . We derive systems of functional
equations whose resolution gives an explicit relationship between \Weights
and \TargFreq_1, \ldots, \TargFreq_k. Lastly, we give an algorithm in
\bigO{k n^4} for the inverse problem, {\it i.e.} computing the frequencies
associated with a given -tuple \Weights of weights, and an optimized
version in \bigO{k n^2} in the case of context-free languages. This allows
for a heuristic resolution of the weights/frequencies relationship suitable for
complex specifications. In the second alternative, the targeted distribution is
given by a natural numbers such that
where is the number of undistinguished atoms.
The structures must be generated uniformly among the set of structures of size
that contain {\em exactly} atoms \At_i (). We give
a \bigO{r^2\prod_{i=1}^k n_i^2 +m n k \log n} algorithm for generating
structures, which simplifies into a \bigO{r\prod_{i=1}^k n_i +m n} for
regular specifications
Robust Grammatical Analysis for Spoken Dialogue Systems
We argue that grammatical analysis is a viable alternative to concept
spotting for processing spoken input in a practical spoken dialogue system. We
discuss the structure of the grammar, and a model for robust parsing which
combines linguistic sources of information and statistical sources of
information. We discuss test results suggesting that grammatical processing
allows fast and accurate processing of spoken input.Comment: Accepted for JNL
Using ontology in query answering systems: Scenarios, requirements and challenges
Equipped with the ultimate query answering system, computers would finally be in a position to address all our information needs in a natural way. In this paper, we describe how Language and Computing nv (L&C), a developer of ontology-based natural language understanding systems for the healthcare domain, is working towards the ultimate Question Answering (QA) System for healthcare workers. L&C’s company strategy in this area is to design in a step-by-step fashion the essential components of such a system, each component being designed to solve some one part of the total problem and at the same time reflect well-defined needs on the prat of our customers. We compare our strategy with the research roadmap proposed by the Question Answering Committee of the National Institute of Standards and Technology (NIST), paying special attention to the role of ontology
Knowledge-based intelligent error feedback in a Spanish ICALL system
This paper describes the Spanish ICALL system ESPADA
which helps language learners to improve their syntactical
knowledge. The most important parts of ESPADA for the learner are a Demonstration Module and an Analysis Module. The Demonstration Module provides animated presentation of selected grammatical information. The Analysis Module is able to parse ill-formed sentences and to give adequate feedback on 28 different error types from different levels of language use (syntax, semantics, agreement). It contains a robust chart-based island parser which uses a combination
of mal-rules and constraint relaxation to ensure that learner input can be analysed and appropriate error feedback can be generated
Practical experiments with regular approximation of context-free languages
Several methods are discussed that construct a finite automaton given a
context-free grammar, including both methods that lead to subsets and those
that lead to supersets of the original context-free language. Some of these
methods of regular approximation are new, and some others are presented here in
a more refined form with respect to existing literature. Practical experiments
with the different methods of regular approximation are performed for
spoken-language input: hypotheses from a speech recognizer are filtered through
a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200
A Multi-Relational Network to Support the Scholarly Communication Process
The general pupose of the scholarly communication process is to support the
creation and dissemination of ideas within the scientific community. At a finer
granularity, there exists multiple stages which, when confronted by a member of
the community, have different requirements and therefore different solutions.
In order to take a researcher's idea from an initial inspiration to a community
resource, the scholarly communication infrastructure may be required to 1)
provide a scientist initial seed ideas; 2) form a team of well suited
collaborators; 3) located the most appropriate venue to publish the formalized
idea; 4) determine the most appropriate peers to review the manuscript; and 5)
disseminate the end product to the most interested members of the community.
Through the various delinieations of this process, the requirements of each
stage are tied soley to the multi-functional resources of the community: its
researchers, its journals, and its manuscritps. It is within the collection of
these resources and their inherent relationships that the solutions to
scholarly communication are to be found. This paper describes an associative
network composed of multiple scholarly artifacts that can be used as a medium
for supporting the scholarly communication process.Comment: keywords: digital libraries and scholarly communicatio
Polyglot Semantic Parsing in APIs
Traditional approaches to semantic parsing (SP) work by training individual
models for each available parallel dataset of text-meaning pairs. In this
paper, we explore the idea of polyglot semantic translation, or learning
semantic parsing models that are trained on multiple datasets and natural
languages. In particular, we focus on translating text to code signature
representations using the software component datasets of Richardson and Kuhn
(2017a,b). The advantage of such models is that they can be used for parsing a
wide variety of input natural languages and output programming languages, or
mixed input languages, using a single unified model. To facilitate modeling of
this type, we develop a novel graph-based decoding framework that achieves
state-of-the-art performance on the above datasets, and apply this method to
two other benchmark SP tasks.Comment: accepted for NAACL-2018 (camera ready version
- …