421 research outputs found

    Nonlinear quantile mixed models

    Full text link
    In regression applications, the presence of nonlinearity and correlation among observations offer computational challenges not only in traditional settings such as least squares regression, but also (and especially) when the objective function is non-smooth as in the case of quantile regression. In this paper, we develop methods for the modeling and estimation of nonlinear conditional quantile functions when data are clustered within two-level nested designs. This work represents an extension of the linear quantile mixed models of Geraci and Bottai (2014, Statistics and Computing). We develop a novel algorithm which is a blend of a smoothing algorithm for quantile regression and a second order Laplacian approximation for nonlinear mixed models. To assess the proposed methods, we present a simulation study and two applications, one in pharmacokinetics and one related to growth curve modeling in agriculture.Comment: 26 pages, 8 figures, 8 table

    A family of linear mixed-effects models using the generalized Laplace distribution

    Get PDF
    We propose a new family of linear mixed-effects models based on the generalized Laplace distribution. Special cases include the classical normal mixed-effects model, models with Laplace random effects and errors, and models where Laplace and normal variates interchange their roles as random effects and errors. By using a scale-mixture representation of the generalized Laplace, we develop a maximum likelihood estimation approach based on Gaussian quadrature. For model selection, we propose likelihood ratio testing and we account for the situation in which the null hypothesis is at the boundary of the parameter space. In a simulation study, we investigate the finite sample properties of our proposed estimator and compare its performance to other flexible linear mixed-effects specifications. In two real data examples, we demonstrate the flexibility of our proposed model to solve applied problems commonly encountered in clustered data analysis. The newly proposed methods discussed in this paper are implemented in the R package nlmm

    Quantile contours and allometric modelling for risk classification of abnormal ratios with an application to asymmetric growth-restriction in preterm infants

    Full text link
    We develop an approach to risk classification based on quantile contours and allometric modelling of multivariate anthropometric measurements. We propose the definition of allometric direction tangent to the directional quantile envelope, which divides ratios of measurements into half-spaces. This in turn provides an operational definition of directional quantile that can be used as cutoff for risk assessment. We show the application of the proposed approach using a large dataset from the Vermont Oxford Network containing observations of birthweight (BW) and head circumference (HC) for more than 150,000 preterm infants. Our analysis suggests that disproportionately growth-restricted infants with a larger HC-to-BW ratio are at increased mortality risk as compared to proportionately growth-restricted infants. The role of maternal hypertension is also investigated.Comment: 31 pages, 3 figures, 8 table

    Dynamic User-Defined Similarity Searching in Semi-Structured Text Retrieval

    Get PDF
    Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number k of documents in the data set that are most similar to a given query (here a query is either a simple sequence of keywords or the identifier of a full document found in previous searches that is considered of interest). We consider the case of a textual database made of semi-structured documents. For example, in a corpus of bibliographic records any record may be structured into three fields: title, authors and abstract, where each field is an unstructured free text. Each field, in turns, is modelled with a specific vector space. The problem is more complex when we also allow each such vector space to have an associated user-defined dynamic weight that influences its contribution to the overall dynamic aggregated and weighted similarity. This dynamic problem has been tackled in a recent paper by Singitham et al. in VLDB 2004. Their proposed solution, which we take as baseline, is a variant of the cluster-pruning technique that has the potential for scaling to very large corpora of documents, and is far more efficient than the naive exhaustive search. We devise an alternative way of embedding weights in the data structure, coupled with a non-trivial application of a clustering algorithm based on the furthest point first heuristic for the metric k-center problem. The validity of our approach is demonstrated experimentally by showing significant performance improvements over the scheme proposed in VLDB 2004 We improve significantly tradeoffs between query time and output quality with respect to the baseline method in VLDB 2004, and also with respect to a novel method by Chierichetti et al. to appear in ACM PODS 2007. We also speed up the pre-processing time by a factor at least thirty

    Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

    Get PDF
    This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the ?y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-?rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted external\u27 metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second
    • …
    corecore