1,646 research outputs found

    Cycles and eigenvalues of sequentially growing random regular graphs

    Full text link
    Consider the sum of dd many i.i.d. random permutation matrices on nn labels along with their transposes. The resulting matrix is the adjacency matrix of a random regular (multi)-graph of degree 2d2d on nn vertices. It is known that the distribution of smooth linear eigenvalue statistics of this matrix is given asymptotically by sums of Poisson random variables. This is in contrast with Gaussian fluctuation of similar quantities in the case of Wigner matrices. It is also known that for Wigner matrices the joint fluctuation of linear eigenvalue statistics across minors of growing sizes can be expressed in terms of the Gaussian Free Field (GFF). In this article, we explore joint asymptotic (in nn) fluctuation for a coupling of all random regular graphs of various degrees obtained by growing each component permutation according to the Chinese Restaurant Process. Our primary result is that the corresponding eigenvalue statistics can be expressed in terms of a family of independent Yule processes with immigration. These processes track the evolution of short cycles in the graph. If we now take dd to infinity, certain GFF-like properties emerge.Comment: Published in at http://dx.doi.org/10.1214/13-AOP864 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Towards a machine-learning architecture for lexical functional grammar parsing

    Get PDF
    Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages. The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing. In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages

    Eigenvalue fluctuations for random regular graphs

    Full text link
    One of the major themes of random matrix theory is that many asymptotic properties of traditionally studied distributions of random matrices are universal. We probe the edges of universality by studying the spectral properties of random regular graphs. Specifically, we prove limit theorems for the fluctuations of linear spectral statistics of random regular graphs. We find both universal and non-universal behavior. Our most important tool is Stein's method for Poisson approximation, which we develop for use on random regular graphs. This is my Ph.D. thesis, based on joint work with Ioana Dumitriu, Elliot Paquette, and Soumik Pal. For the most part, it's a mashed up version of arXiv:1109.4094, arXiv:1112.0704, and arXiv:1203.1113, but some things in here are improved or new. In particular, Chapter 4 goes into more detail on some of the proofs than arXiv:1203.1113 and includes a new section. See Section 1.3 for more discussion on what's new and who contributed to what.Comment: 103 pages; Ph.D. thesis at the University of Washington, 201

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    Exploiting side information in Bayesian nonparametric models and their applications

    Full text link
     My research is to exploit side information into advanced Bayesian nonparametric models. We have developed some novel models for data clustering and medical data analysis and also have made our methods scalable for large-scale data. I have published my research in several journal and conference papers

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail
    corecore