443 research outputs found
Querying parse trees of stochastic context-free grammars
ABSTRACT Stochastic context-free grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Web-page wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of tree-pattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomialsize bit representation and, more importantly, an efficient query-evaluation algorithm is presented
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical
workflows whose component modules, both composite and atomic, have been
annotated with keywords. Since keyword search does not use the graph structure
of a workflow, we develop a model of workflows using context-free bag grammars.
We then give efficient polynomial-time algorithms that, given a workflow and a
keyword query, determine whether some execution of the workflow matches the
query. Based on these algorithms we develop a search and ranking solution that
efficiently retrieves the top-k grammars from a repository. Finally, we propose
a novel result presentation method for grammars matching a keyword query, based
on representative parse-trees. The effectiveness of our approach is validated
through an extensive experimental evaluation
Semantics, Modelling, and the Problem of Representation of Meaning -- a Brief Survey of Recent Literature
Over the past 50 years many have debated what representation should be used
to capture the meaning of natural language utterances. Recently new needs of
such representations have been raised in research. Here I survey some of the
interesting representations suggested to answer for these new needs.Comment: 15 pages, no figure
Flexible and scalable digital library search
In this report the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provides a description language to achieve a high degree of flexibility for multimedia retrieval. The physical level takes care of scalable and efficient persistent data storage. The role, played by each level, changes during the various stages of a search engine's lifecycle: (1) modeling the index, (2) populating and maintaining the index and (3) querying the index. The integration of all this functionality allows the combination of both conceptual and content-based querying in the query stage. A search engine for the Australian Open tennis tournament website is used as a running example, which shows the power of the complete architecture and its various component
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of ou
BOSS: Bayesian Optimization over String Spaces
This article develops a Bayesian optimization (BO) method which acts directly
over raw strings, proposing the first uses of string kernels and genetic
algorithms within BO loops. Recent applications of BO over strings have been
hindered by the need to map inputs into a smooth and unconstrained latent
space. Learning this projection is computationally and data-intensive. Our
approach instead builds a powerful Gaussian process surrogate model based on
string kernels, naturally supporting variable length inputs, and performs
efficient acquisition function maximization for spaces with syntactical
constraints. Experiments demonstrate considerably improved optimization over
existing approaches across a broad range of constraints, including the popular
setting where syntax is governed by a context-free grammar
Program Synthesis using Natural Language
Interacting with computers is a ubiquitous activity for millions of people.
Repetitive or specialized tasks often require creation of small, often one-off,
programs. End-users struggle with learning and using the myriad of
domain-specific languages (DSLs) to effectively accomplish these tasks.
We present a general framework for constructing program synthesizers that
take natural language (NL) inputs and produce expressions in a target DSL. The
framework takes as input a DSL definition and training data consisting of
NL/DSL pairs. From these it constructs a synthesizer by learning optimal
weights and classifiers (using NLP features) that rank the outputs of a
keyword-programming based translation. We applied our framework to three
domains: repetitive text editing, an intelligent tutoring system, and flight
information queries. On 1200+ English descriptions, the respective synthesizers
rank the desired program as the top-1 and top-3 for 80% and 90% descriptions
respectively
- …