808 research outputs found

    An Alternative Conception of Tree-Adjoining Derivation

    Get PDF
    The precise formulation of derivation for tree-adjoining grammars has important ramifications for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and statistical language modeling. We argue that the definition of tree-adjoining derivation must be reformulated in order to manifest the proper linguistic dependencies in derivations. The particular proposal is both precisely characterizable through a definition of TAG derivations as equivalence classes of ordered derivation trees, and computationally operational, by virtue of a compilation to linear indexed grammars together with an efficient algorithm for recognition and parsing according to the compiled grammar.Comment: 33 page

    Some Novel Applications of Explanation-Based Learning to Parsing Lexicalized Tree-Adjoining Grammars

    Get PDF
    In this paper we present some novel applications of Explanation-Based Learning (EBL) technique to parsing Lexicalized Tree-Adjoining grammars. The novel aspects are (a) immediate generalization of parses in the training set, (b) generalization over recursive structures and (c) representation of generalized parses as Finite State Transducers. A highly impoverished parser called a ``stapler'' has also been introduced. We present experimental results using EBL for different corpora and architectures to show the effectiveness of our approach.Comment: uuencoded postscript fil

    Probabilistic parsing

    Get PDF
    Postprin

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

    Grammar-based Representation and Identification of Dynamical Systems

    Get PDF
    In this paper we propose a novel approach to identify dynamical systems. The method estimates the model structure and the parameters of the model simultaneously, automating the critical decisions involved in identification such as model structure and complexity selection. In order to solve the combined model structure and model parameter estimation problem, a new representation of dynamical systems is proposed. The proposed representation is based on Tree Adjoining Grammar, a formalism that was developed from linguistic considerations. Using the proposed representation, the identification problem can be interpreted as a multi-objective optimization problem and we propose a Evolutionary Algorithm-based approach to solve the problem. A benchmark example is used to demonstrate the proposed approach. The results were found to be comparable to that obtained by state-of-the-art non-linear system identification methods, without making use of knowledge of the system description.Comment: Submitted to European Control Conference (ECC) 201

    Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results

    Get PDF
    Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes

    Probabilistic Constraint Logic Programming

    Full text link
    This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

    D-Tree Grammars

    Full text link
    DTG are designed to share some of the advantages of TAG while overcoming some of its limitations. DTG involve two composition operations called subsertion and sister-adjunction. The most distinctive feature of DTG is that, unlike TAG, there is complete uniformity in the way that the two DTG operations relate lexical items: subsertion always corresponds to complementation and sister-adjunction to modification. Furthermore, DTG, unlike TAG, can provide a uniform analysis for em wh-movement in English and Kashmiri, despite the fact that the em wh element in Kashmiri appears in sentence-second position, and not sentence-initial position as in English.Comment: Latex source, needs aclap.sty, 8 pages, to appear in ACL-9

    Data-Oriented Language Processing. An Overview

    Full text link
    During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
    corecore