4,571 research outputs found

    Delete or merge regressors for linear model selection

    Full text link
    We consider a problem of linear model selection in the presence of both continuous and categorical predictors. Feasible models consist of subsets of numerical variables and partitions of levels of factors. A new algorithm called delete or merge regressors (DMR) is presented which is a stepwise backward procedure involving ranking the predictors according to squared t-statistics and choosing the final model minimizing BIC. In the article we prove consistency of DMR when the number of predictors tends to infinity with the sample size and describe a simulation study using a pertaining R package. The results indicate significant advantage in time complexity and selection accuracy of our algorithm over Lasso-based methods described in the literature. Moreover, a version of DMR for generalized linear models is proposed

    Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

    Full text link
    Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in Neural Networks are often imbalanced, such that the uniform quantization determined from extremal values may under utilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both Convolutional Neural Networks and Recurrent Neural Networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7\%, which is superior to the state-of-the-arts of QNNs

    A Fast Minimal Infrequent Itemset Mining Algorithm

    Get PDF
    A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records

    Unsupervised routine discovery in egocentric photo-streams

    Full text link
    The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person's health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people

    Multiscale 3D Shape Analysis using Spherical Wavelets

    Get PDF
    ©2005 Springer. The original publication is available at www.springerlink.com: http://dx.doi.org/10.1007/11566489_57DOI: 10.1007/11566489_57Shape priors attempt to represent biological variations within a population. When variations are global, Principal Component Analysis (PCA) can be used to learn major modes of variation, even from a limited training set. However, when significant local variations exist, PCA typically cannot represent such variations from a small training set. To address this issue, we present a novel algorithm that learns shape variations from data at multiple scales and locations using spherical wavelets and spectral graph partitioning. Our results show that when the training set is small, our algorithm significantly improves the approximation of shapes in a testing set over PCA, which tends to oversmooth data

    Classic and Bayesian Tree-Based Methods

    Get PDF
    Tree-based methods are nonparametric techniques and machine-learning methods for data prediction and exploratory modeling. These models are one of valuable and powerful tools among data mining methods and can be used for predicting different types of outcome (dependent) variable: (e.g., quantitative, qualitative, and time until an event occurs (survival data)). Tree model is called classification tree/regression tree/survival tree based on the type of outcome variable. These methods have some advantages over against traditional statistical methods such as generalized linear models (GLMs), discriminant analysis, and survival analysis. Some of these advantages are: without requiring to determine assumptions about the functional form between outcome variable and predictor (independent) variables, invariant to monotone transformations of predictor variables, useful for dealing with nonlinear relationships and high-order interactions, deal with different types of predictor variable, ease of interpretation and understanding results without requiring to have statistical experience, robust to missing values, outliers, and multicollinearity. Several classic and Bayesian tree algorithms are proposed for classification and regression trees, and in this chapter, we provide a review of these algorithms and appropriate criteria for determining the predictive performance of them
    • …
    corecore