33 research outputs found

    Sampling distributions and estimation for multi-type Branching Processes

    Full text link
    Consider a multi-dimensional supercritical branching process with offspring distribution in a parametric family. Here, each vector coordinate corresponds to the number of offspring of a given type. The process is observed under family-size sampling: a random sample is drawn, each individual reporting its vector of brood sizes. In this work, we show that the set in which no siblings are sampled (so that the sample can be considered independent) has probability converging to one under certain conditions on the sampling size. Furthermore, we show that the sampling distribution of the observed sizes converges to the product of identical distributions, hence developing a framework for which the process can be considered iid, and the usual methods for parameter estimation apply. We provide asymptotic distributions for the resulting estimators

    Regularized Ordinal Regression and the ordinalNet R Package

    Full text link
    Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, for instance to accommodate unordered categorical data. We introduce an elastic net penalty class that applies to both model forms. Additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class

    Regularized Ordinal Regression and the ordinalNet R Package

    Get PDF
    Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class

    A motif-independent metric for DNA sequence specificity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity.</p> <p>Results</p> <p>We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: <url>https://github.com/lucapinello/mim</url>. </p> <p>Conclusions</p> <p>Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.</p

    Contributions To Ancestral Inference For Supercritical Branching Processes And High-Dimensional Data Analysis

    Full text link
    This thesis is concerned with statistical methods that are relevant in the scientific study of gene expression data. It is customary in these areas to use microarray technology as a first step in identifying the genes that are differentially expressed followed by using quantitative polymerase chain reaction (qPCR) as a confirmatory tool. The first part of thesis addresses statistical analysis for qPCR data, while the second part of the thesis addresses the so-called large p, small n problem, using microarray gene expression data as the motivating example. Description of the gene expression profiles from PCR can be cast within the more general framework of ancestral inference for branching processes. Accordingly, part one of the thesis is devoted to the study of branching processes initiated by a random number of ancestors. We address issues concerning modeling, inference, and asymptotic justification of the proposed methodologies. The second part of the thesis focuses on microarray data, specifically developing multivariate techniques for identifying differentially expressed genes. The results can be viewed in the more general context of multiple hypothesis testing or the multivariate testing problem

    The West Point BattleBot Competition

    No full text
    Three cadet teams at the United States Military Academy each design, budget, build, and test a middleweight, non-stomping BattleBot according to the rules of the national competition.[1] In 2003 we emphasized two aspects of this multidisciplinary, hands-on project--the importance of the final competition and project planning as a military operation. We observed three significant results of this change: 1) increased competitiveness and learning; 2) successful introduction of the Military Decision Making Process (MDMP); and 3) learning valuable leadership and teamwork lessons

    Ist Wettbewerb geeignet, das Leistungsangebot oeffentlicher Einrichtungen zu verbessern?: Projektbericht

    No full text
    Available from Bibliothek des Instituts fuer Weltwirtschaft, ZBW, Duesternbrook Weg 120, D-24105 Kiel C 148156 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman

    Event Detection with Topic Modeling of Tweets

    No full text
    WARF Discovery Challenge poster from April 2016. We present preliminary results indicating that we can use latent Dirichlet allocation models (at specific time points) of tweets to identify social and political events
    corecore