23,025 research outputs found
A Comparative Study of the Application of Different Learning Techniques to Natural Language Interfaces
In this paper we present first results from a comparative study. Its aim is
to test the feasibility of different inductive learning techniques to perform
the automatic acquisition of linguistic knowledge within a natural language
database interface. In our interface architecture the machine learning module
replaces an elaborate semantic analysis component. The learning module learns
the correct mapping of a user's input to the corresponding database command
based on a collection of past input data. We use an existing interface to a
production planning and control system as evaluation and compare the results
achieved by different instance-based and model-based learning algorithms.Comment: 10 pages, to appear CoNLL9
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
Near-Optimal Induced Universal Graphs for Bounded Degree Graphs
A graph is an induced universal graph for a family of graphs if every
graph in is a vertex-induced subgraph of . For the family of all
undirected graphs on vertices Alstrup, Kaplan, Thorup, and Zwick [STOC
2015] give an induced universal graph with vertices,
matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965].
Let . Improving asymptotically on previous results by
Butler [Graphs and Combinatorics 2009] and Esperet, Arnaud and Ochem [IPL
2008], we give an induced universal graph with vertices for the family of graphs with vertices of maximum degree
. For constant , Butler gives a lower bound of
. For an odd constant , Esperet et al.
and Alon and Capalbo [SODA 2008] give a graph with
vertices. Using their techniques for any
(including constant) even values of gives asymptotically worse bounds than
we present.
For large , i.e. when , the previous best
upper bound was due to Adjiashvili and
Rotbart [ICALP 2014]. We give upper and lower bounds showing that the size is
. Hence the optimal size is
and our construction is within a factor of
from this. The previous results were
larger by at least a factor of .
As a part of the above, proving a conjecture by Esperet et al., we construct
an induced universal graph with vertices for the family of graphs with
max degree . In addition, we give results for acyclic graphs with max degree
and cycle graphs. Our results imply the first labeling schemes that for any
are at most bits from optimal
- …