38,689 research outputs found
Robust Inference of Trees
This paper is concerned with the reliable inference of optimal
tree-approximations to the dependency structure of an unknown distribution
generating data. The traditional approach to the problem measures the
dependency strength between random variables by the index called mutual
information. In this paper reliability is achieved by Walley's imprecise
Dirichlet model, which generalizes Bayesian learning with Dirichlet priors.
Adopting the imprecise Dirichlet model results in posterior interval
expectation for mutual information, and in a set of plausible trees consistent
with the data. Reliable inference about the actual tree is achieved by focusing
on the substructure common to all the plausible trees. We develop an exact
algorithm that infers the substructure in time O(m^4), m being the number of
random variables. The new algorithm is applied to a set of data sampled from a
known distribution. The method is shown to reliably infer edges of the actual
tree even when the data are very scarce, unlike the traditional approach.
Finally, we provide lower and upper credibility limits for mutual information
under the imprecise Dirichlet model. These enable the previous developments to
be extended to a full inferential method for trees.Comment: 26 pages, 7 figure
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a
challenging problem in statistics and AI. Classical approaches for {exploratory
data analysis} are usually not flexible enough to deal with the uncertainty
inherent to real-world data: they are often restricted to fixed latent
interaction models and homogeneous likelihoods; they are sensitive to missing,
corrupt and anomalous data; moreover, their expressiveness generally comes at
the price of intractable inference. As a result, supervision from statisticians
is usually needed to find the right model for the data. However, since domain
experts are not necessarily also experts in statistics, we propose Automatic
Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible
at large. Specifically, ABDA allows for automatic and efficient missing value
estimation, statistical data type and likelihood discovery, anomaly detection
and dependency structure mining, on top of providing accurate density
estimation. Extensive empirical evidence shows that ABDA is a suitable tool for
automatic exploratory analysis of mixed continuous and discrete tabular data.Comment: In proceedings of the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Discovery of Linguistic Relations Using Lexical Attraction
This work has been motivated by two long term goals: to understand how humans
learn language and to build programs that can understand language. Using a
representation that makes the relevant features explicit is a prerequisite for
successful learning and understanding. Therefore, I chose to represent
relations between individual words explicitly in my model. Lexical attraction
is defined as the likelihood of such relations. I introduce a new class of
probabilistic language models named lexical attraction models which can
represent long distance relations between words and I formalize this new class
of models using information theory.
Within the framework of lexical attraction, I developed an unsupervised
language acquisition program that learns to identify linguistic relations in a
given sentence. The only explicitly represented linguistic knowledge in the
program is lexical attraction. There is no initial grammar or lexicon built in
and the only input is raw text. Learning and processing are interdigitated. The
processor uses the regularities detected by the learner to impose structure on
the input. This structure enables the learner to detect higher level
regularities. Using this bootstrapping procedure, the program was trained on
100 million words of Associated Press material and was able to achieve 60%
precision and 50% recall in finding relations between content-words. Using
knowledge of lexical attraction, the program can identify the correct relations
in syntactically ambiguous sentences such as ``I saw the Statue of Liberty
flying over New York.''Comment: dissertation, 56 page
- …