1,646 research outputs found
Cycles and eigenvalues of sequentially growing random regular graphs
Consider the sum of many i.i.d. random permutation matrices on labels
along with their transposes. The resulting matrix is the adjacency matrix of a
random regular (multi)-graph of degree on vertices. It is known that
the distribution of smooth linear eigenvalue statistics of this matrix is given
asymptotically by sums of Poisson random variables. This is in contrast with
Gaussian fluctuation of similar quantities in the case of Wigner matrices. It
is also known that for Wigner matrices the joint fluctuation of linear
eigenvalue statistics across minors of growing sizes can be expressed in terms
of the Gaussian Free Field (GFF). In this article, we explore joint asymptotic
(in ) fluctuation for a coupling of all random regular graphs of various
degrees obtained by growing each component permutation according to the Chinese
Restaurant Process. Our primary result is that the corresponding eigenvalue
statistics can be expressed in terms of a family of independent Yule processes
with immigration. These processes track the evolution of short cycles in the
graph. If we now take to infinity, certain GFF-like properties emerge.Comment: Published in at http://dx.doi.org/10.1214/13-AOP864 the Annals of
Probability (http://www.imstat.org/aop/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Towards a machine-learning architecture for lexical functional grammar parsing
Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also
recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages.
The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing.
In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able
to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously
lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages
Eigenvalue fluctuations for random regular graphs
One of the major themes of random matrix theory is that many asymptotic
properties of traditionally studied distributions of random matrices are
universal. We probe the edges of universality by studying the spectral
properties of random regular graphs. Specifically, we prove limit theorems for
the fluctuations of linear spectral statistics of random regular graphs. We
find both universal and non-universal behavior. Our most important tool is
Stein's method for Poisson approximation, which we develop for use on random
regular graphs.
This is my Ph.D. thesis, based on joint work with Ioana Dumitriu, Elliot
Paquette, and Soumik Pal. For the most part, it's a mashed up version of
arXiv:1109.4094, arXiv:1112.0704, and arXiv:1203.1113, but some things in here
are improved or new. In particular, Chapter 4 goes into more detail on some of
the proofs than arXiv:1203.1113 and includes a new section. See Section 1.3 for
more discussion on what's new and who contributed to what.Comment: 103 pages; Ph.D. thesis at the University of Washington, 201
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Exploiting side information in Bayesian nonparametric models and their applications
My research is to exploit side information into advanced Bayesian nonparametric models. We have developed some novel models for data clustering and medical data analysis and also have made our methods scalable for large-scale data. I have published my research in several journal and conference papers
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
- …