2,920 research outputs found
Program Synthesis using Natural Language
Interacting with computers is a ubiquitous activity for millions of people.
Repetitive or specialized tasks often require creation of small, often one-off,
programs. End-users struggle with learning and using the myriad of
domain-specific languages (DSLs) to effectively accomplish these tasks.
We present a general framework for constructing program synthesizers that
take natural language (NL) inputs and produce expressions in a target DSL. The
framework takes as input a DSL definition and training data consisting of
NL/DSL pairs. From these it constructs a synthesizer by learning optimal
weights and classifiers (using NLP features) that rank the outputs of a
keyword-programming based translation. We applied our framework to three
domains: repetitive text editing, an intelligent tutoring system, and flight
information queries. On 1200+ English descriptions, the respective synthesizers
rank the desired program as the top-1 and top-3 for 80% and 90% descriptions
respectively
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality
We survey diverse approaches to the notion of information: from Shannon
entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov
complexity are presented: randomness and classification. The survey is divided
in two parts published in a same volume. Part II is dedicated to the relation
between logic and information system, within the scope of Kolmogorov
algorithmic information theory. We present a recent application of Kolmogorov
complexity: classification using compression, an idea with provocative
implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses
how Kolmogorov complexity, besides being a foundation to randomness, is also
related to classification. Another approach to classification is also
considered: the so-called "Google classification". It uses another original and
attractive idea which is connected to the classification using compression and
to Kolmogorov complexity from a conceptual point of view. We present and unify
these different approaches to classification in terms of Bottom-Up versus
Top-Down operational modes, of which we point the fundamental principles and
the underlying duality. We look at the way these two dual modes are used in
different approaches to information system, particularly the relational model
for database introduced by Codd in the 70's. This allows to point out diverse
forms of a fundamental duality. These operational modes are also reinterpreted
in the context of the comprehension schema of axiomatic set theory ZF. This
leads us to develop how Kolmogorov's complexity is linked to intensionality,
abstraction, classification and information system.Comment: 43 page
Application Software, Domain-Specific Languages, and Language Design Assistants
While application software does the real work, domain-specific languages
(DSLs) are tools to help produce it efficiently, and language design assistants
in turn are meta-tools to help produce DSLs quickly. DSLs are already in wide
use (HTML for web pages, Excel macros for spreadsheet applications, VHDL for
hardware design, ...), but many more will be needed for both new as well as
existing application domains. Language design assistants to help develop them
currently exist only in the basic form of language development systems. After a
quick look at domain-specific languages, and especially their relationship to
application libraries, we survey existing language development systems and give
an outline of future language design assistants.Comment: To be presented at SSGRR 2000, L'Aquila, Ital
Naturalizing a Programming Language via Interactive Learning
Our goal is to create a convenient natural language interface for performing
well-specified but complex actions such as analyzing data, manipulating text,
and querying databases. However, existing natural language interfaces for such
tasks are quite primitive compared to the power one wields with a programming
language. To bridge this gap, we start with a core programming language and
allow users to "naturalize" the core language incrementally by defining
alternative, more natural syntax and increasingly complex concepts in terms of
compositions of simpler ones. In a voxel world, we show that a community of
users can simultaneously teach a common system a diverse language and use it to
build hundreds of complex voxel structures. Over the course of three days,
these users went from using only the core language to using the naturalized
language in 85.9\% of the last 10K utterances.Comment: 10 pages, ACL201
- …