319,160 research outputs found

    Finite-time fluctuations in the degree statistics of growing networks

    Full text link
    This paper presents a comprehensive analysis of the degree statistics in models for growing networks where new nodes enter one at a time and attach to one earlier node according to a stochastic rule. The models with uniform attachment, linear attachment (the Barab\'asi-Albert model), and generalized preferential attachment with initial attractiveness are successively considered. The main emphasis is on finite-size (i.e., finite-time) effects, which are shown to exhibit different behaviors in three regimes of the size-degree plane: stationary, finite-size scaling, large deviations.Comment: 33 pages, 7 figures, 1 tabl

    Residual Weighted Learning for Estimating Individualized Treatment Rules

    Full text link
    Personalized medicine has received increasing attention among statisticians, computer scientists, and clinical practitioners. A major component of personalized medicine is the estimation of individualized treatment rules (ITRs). Recently, Zhao et al. (2012) proposed outcome weighted learning (OWL) to construct ITRs that directly optimize the clinical outcome. Although OWL opens the door to introducing machine learning techniques to optimal treatment regimes, it still has some problems in performance. In this article, we propose a general framework, called Residual Weighted Learning (RWL), to improve finite sample performance. Unlike OWL which weights misclassification errors by clinical outcomes, RWL weights these errors by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment. We utilize the smoothed ramp loss function in RWL, and provide a difference of convex (d.c.) algorithm to solve the corresponding non-convex optimization problem. By estimating residuals with linear models or generalized linear models, RWL can effectively deal with different types of outcomes, such as continuous, binary and count outcomes. We also propose variable selection methods for linear and nonlinear rules, respectively, to further improve the performance. We show that the resulting estimator of the treatment rule is consistent. We further obtain a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed RWL methods is illustrated in simulation studies and in an analysis of cystic fibrosis clinical trial data.Comment: 48 pages, 3 figure

    A Generalized Labeled Multi-Bernoulli filter for maneuvering targets

    Get PDF
    A multiple maneuvering target system can be viewed as a Jump Markov System (JMS) in the sense that the target movement can be modeled using different motion models where the transition between the motion models by a particular target follows a Markov chain probability rule. This paper describes a Generalized Labelled Multi-Bernoulli (GLMB) filter for tracking maneuvering targets whose movement can be modeled via such a JMS. The proposed filter is validated with two linear and nonlinear maneuvering target tracking examples

    Some Aspects on Data Modelling

    Get PDF
    Statistical methods are motivated by the desire of learning from data. Transaction dataset and time-ordered data sequence are commonly found in many research areas, such as finance, bioinformatics and text mining. In this dissertation, two problems regarding these two types of data: association rule mining from transaction data and structural change estimation in time-ordered sequence, are studied. Informative association rule mining is fundamental for knowledge discovery from transaction data, for which brute-force search algorithms, e.g., the well-known Apriori algorithm, were developed. However, operating these algorithms becomes computationally intractable in searching large rule space. A stochastic search framework is developed to tackle this challenge by imposing a probability distribution on the association rule space and using the idea of annealing Gibbs sampling. Large rule space of exponential order can still be randomly searched by this algorithm to generate a Markov chain of viable length. This chain contains the most informative rules with probability one. The stochastic search algorithm is flexible to incorporate any measure of interest. Moreover, it reduces computational complexities and large memory requirements. A time-ordered data sequence may contain some sudden changes at some time points, before and after which the data sequences follow different distributions or statistical models. Change point problems in generalized linear models and distributions of independent random variables are studied respectively. Firstly, to estimate multiple change points in generalized linear models, we convert it into a model selection problem. Then modern model selection techniques are applied to estimate the regression coefficients. A consistent estimator of the number of change points is developed, and an algorithm is provided to estimate the change points. Secondly, to estimate single change point in distributions of independent random variables, a change point estimator is proposed based on empirical characteristic functions. Its consistency is also established

    GAP Safe screening rules for sparse multi-task and multi-class models

    Full text link
    High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with â„“1\ell_1 and â„“1/â„“2\ell_1/\ell_2 norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.Comment: in Proceedings of the 29-th Conference on Neural Information Processing Systems (NIPS), 201

    Fisher information under Gaussian quadrature models

    Get PDF
    This paper develops formulae to compute the Fisher information matrix for the regression parameters of generalized linear models with Gaussian random effects. The Fisher information matrix relies on the estimation of the response variance under the model assumptions. We propose two approaches to estimate the response variance: the first is based on an analytic formula (or a Taylor expansion for cases where we cannot obtain the closed form), and the second is an empirical approximation using the model estimates via the expectation–maximization process. Further, simulations under several response distributions and a real data application involving a factorial experiment are presented and discussed. In terms of standard errors and coverage probabilities for model parameters, the proposed methods turn out to behave more reliably than does the ‘disparity rule’ or direct extraction of results from the generalized linear model fitted in the last expectation–maximization iteration
    • …
    corecore