33 research outputs found
Sampling distributions and estimation for multi-type Branching Processes
Consider a multi-dimensional supercritical branching process with offspring
distribution in a parametric family. Here, each vector coordinate corresponds
to the number of offspring of a given type. The process is observed under
family-size sampling: a random sample is drawn, each individual reporting its
vector of brood sizes. In this work, we show that the set in which no siblings
are sampled (so that the sample can be considered independent) has probability
converging to one under certain conditions on the sampling size. Furthermore,
we show that the sampling distribution of the observed sizes converges to the
product of identical distributions, hence developing a framework for which the
process can be considered iid, and the usual methods for parameter estimation
apply. We provide asymptotic distributions for the resulting estimators
Regularized Ordinal Regression and the ordinalNet R Package
Regularization techniques such as the lasso (Tibshirani 1996) and elastic net
(Zou and Hastie 2005) can be used to improve regression model coefficient
estimation and prediction accuracy, as well as to perform variable selection.
Ordinal regression models are widely used in applications where the use of
regularization could be beneficial; however, these models are not included in
many popular software packages for regularized regression. We propose a
coordinate descent algorithm to fit a broad class of ordinal regression models
with an elastic net penalty. Furthermore, we demonstrate that each model in
this class generalizes to a more flexible form, for instance to accommodate
unordered categorical data. We introduce an elastic net penalty class that
applies to both model forms. Additionally, this penalty can be used to shrink a
non-ordinal model toward its ordinal counterpart. Finally, we introduce the R
package ordinalNet, which implements the algorithm for this model class
Regularized Ordinal Regression and the ordinalNet R Package
Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the elementwise link multinomial-ordinal class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package ordinalNet, which implements the algorithm for this model class
A motif-independent metric for DNA sequence specificity
<p>Abstract</p> <p>Background</p> <p>Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity.</p> <p>Results</p> <p>We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: <url>https://github.com/lucapinello/mim</url>. </p> <p>Conclusions</p> <p>Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.</p
Contributions To Ancestral Inference For Supercritical Branching Processes And High-Dimensional Data Analysis
This thesis is concerned with statistical methods that are relevant in the scientific study of gene expression data. It is customary in these areas to use microarray technology as a first step in identifying the genes that are differentially expressed followed by using quantitative polymerase chain reaction (qPCR) as a confirmatory tool. The first part of thesis addresses statistical analysis for qPCR data, while the second part of the thesis addresses the so-called large p, small n problem, using microarray gene expression data as the motivating example. Description of the gene expression profiles from PCR can be cast within the more general framework of ancestral inference for branching processes. Accordingly, part one of the thesis is devoted to the study of branching processes initiated by a random number of ancestors. We address issues concerning modeling, inference, and asymptotic justification of the proposed methodologies. The second part of the thesis focuses on microarray data, specifically developing multivariate techniques for identifying differentially expressed genes. The results can be viewed in the more general context of multiple hypothesis testing or the multivariate testing problem
The West Point BattleBot Competition
Three cadet teams at the United States Military Academy each design, budget, build, and test a middleweight, non-stomping BattleBot according to the rules of the national competition.[1] In 2003 we emphasized two aspects of this multidisciplinary, hands-on project--the importance of the final competition and project planning as a military operation. We observed three significant results of this change: 1) increased competitiveness and learning; 2) successful introduction of the Military Decision Making Process (MDMP); and 3) learning valuable leadership and teamwork lessons
Ist Wettbewerb geeignet, das Leistungsangebot oeffentlicher Einrichtungen zu verbessern?: Projektbericht
Available from Bibliothek des Instituts fuer Weltwirtschaft, ZBW, Duesternbrook Weg 120, D-24105 Kiel C 148156 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman
Event Detection with Topic Modeling of Tweets
WARF Discovery Challenge poster from April 2016. We present preliminary results indicating that we can use latent Dirichlet allocation models (at specific time points) of tweets to identify social and political events