8,226 research outputs found

    A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation

    Full text link
    Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. The best indicators are selected via a classical forward scheme, leading to a much reduced number of indicators that are tuned to a data set. We illustrate the interest of the method on simulated data which contain realistic early signs of anomalies.Comment: Proceedings of the 14th Industrial Conference, ICDM 2014, St. Petersburg : Russian Federation (2014

    Texture-based crowd detection and localisation

    Get PDF
    This paper presents a crowd detection system based on texture analysis. The state-of-the-art techniques based on co-occurrence matrix have been revisited and a novel set of features proposed. These features provide a richer description of the co-occurrence matrix, and can be exploited to obtain stronger classification results, especially when smaller portions of the image are considered. This is extremely useful for crowd localisation: acquired images are divided into smaller regions in order to perform a classification on each one. A thorough evaluation of the proposed system on a real world data set is also presented: this validates the improvements in reliability of the crowd detection and localisation

    ICA as a preprocessing technique for classification

    Get PDF
    In this paper we propose the use of the independent component analysis (ICA) [1] technique for improving the classification rate of decision trees and multilayer perceptrons [2], [3]. The use of an ICA for the preprocessing stage, makes the structure of both classifiers simpler, and therefore improves the generalization properties. The hypothesis behind the proposed preprocessing is that an ICA analysis will transform the feature space into a space where the components are independent, and aligned to the axes and therefore will be more adapted to the way that a decision tree is constructed. Also the inference of the weights of a multilayer perceptron will be much easier because the gradient search in the weight space will follow independent trajectories. The result is that classifiers are less complex and on some databases the error rate is lower. This idea is also applicable to regressio

    Machine learning with the hierarchy‐of‐hypotheses (HoH) approach discovers novel pattern in studies on biological invasions

    Get PDF
    Research synthesis on simple yet general hypotheses and ideas is challenging in scientific disciplines studying highly context‐dependent systems such as medical, social, and biological sciences. This study shows that machine learning, equation‐free statistical modeling of artificial intelligence, is a promising synthesis tool for discovering novel patterns and the source of controversy in a general hypothesis. We apply a decision tree algorithm, assuming that evidence from various contexts can be adequately integrated in a hierarchically nested structure. As a case study, we analyzed 163 articles that studied a prominent hypothesis in invasion biology, the enemy release hypothesis. We explored if any of the nine attributes that classify each study can differentiate conclusions as classification problem. Results corroborated that machine learning can be useful for research synthesis, as the algorithm could detect patterns that had been already focused in previous narrative reviews. Compared with the previous synthesis study that assessed the same evidence collection based on experts' judgement, the algorithm has newly proposed that the studies focusing on Asian regions mostly supported the hypothesis, suggesting that more detailed investigations in these regions can enhance our understanding of the hypothesis. We suggest that machine learning algorithms can be a promising synthesis tool especially where studies (a) reformulate a general hypothesis from different perspectives, (b) use different methods or variables, or (c) report insufficient information for conducting meta‐analyses

    Mixing hetero- and homogeneous models in weighted ensembles

    Get PDF
    The effectiveness of ensembling for improving classification performance is well documented. Broadly speaking, ensemble design can be expressed as a spectrum where at one end a set of heterogeneous classifiers model the same data, and at the other homogeneous models derived from the same classification algorithm are diversified through data manipulation. The cross-validation accuracy weighted probabilistic ensemble is a heterogeneous weighted ensemble scheme that needs reliable estimates of error from its base classifiers. It estimates error through a cross-validation process, and raises the estimates to a power to accentuate differences. We study the effects of maintaining all models trained during cross-validation on the final ensemble's predictive performance, and the base model's and resulting ensembles' variance and robustness across datasets and resamples. We find that augmenting the ensemble through the retention of all models trained provides a consistent and significant improvement, despite reductions in the reliability of the base models' performance estimates

    Auto-tail dependence coefficients for stationary solutions of linear stochastic recurrence equations and for GARCH(1,1)

    Get PDF
    We examine the auto-dependence structure of strictly stationary solutions of linear stochastic recurrence equations and of strictly stationary GARCH(1, 1) processes from the point of view of ordinary and generalized tail dependence coefficients. Since such processes can easily be of infinite variance, a substitute for the usual auto-correlation function is needed

    TreeGrad: Transferring Tree Ensembles to Neural Networks

    Full text link
    Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.Comment: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.1179

    Predicting the Sieving Effort for the Number Field Sieve

    Full text link

    Interpreting random forest classification models using a feature contribution method

    Get PDF
    Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models

    Ensemble learning of linear perceptron; Online learning theory

    Full text link
    Within the framework of on-line learning, we study the generalization error of an ensemble learning machine learning from a linear teacher perceptron. The generalization error achieved by an ensemble of linear perceptrons having homogeneous or inhomogeneous initial weight vectors is precisely calculated at the thermodynamic limit of a large number of input elements and shows rich behavior. Our main findings are as follows. For learning with homogeneous initial weight vectors, the generalization error using an infinite number of linear student perceptrons is equal to only half that of a single linear perceptron, and converges with that of the infinite case with O(1/K) for a finite number of K linear perceptrons. For learning with inhomogeneous initial weight vectors, it is advantageous to use an approach of weighted averaging over the output of the linear perceptrons, and we show the conditions under which the optimal weights are constant during the learning process. The optimal weights depend on only correlation of the initial weight vectors.Comment: 14 pages, 3 figures, submitted to Physical Review
    corecore