12,085 research outputs found

    Nonlinear Dimension Reduction for Micro-array Data (Small n and Large p)

    Get PDF

    Easy over Hard: A Case Study on Deep Learning

    Full text link
    While deep learning is an exciting new technique, the benefits of this method need to be assessed with respect to its computational cost. This is particularly important for deep learning since these learners need hours (to weeks) to train the model. Such long training time limits the ability of (a)~a researcher to test the stability of their conclusion via repeated runs with different random seeds; and (b)~other researchers to repeat, improve, or even refute that original work. For example, recently, deep learning was used to find which questions in the Stack Overflow programmer discussion forum can be linked together. That deep learning system took 14 hours to execute. We show here that applying a very simple optimizer called DE to fine tune SVM, it can achieve similar (and sometimes better) results. The DE approach terminated in 10 minutes; i.e. 84 times faster hours than deep learning method. We offer these results as a cautionary tale to the software analytics community and suggest that not every new innovation should be applied without critical analysis. If researchers deploy some new and expensive process, that work should be baselined against some simpler and faster alternatives.Comment: 12 pages, 6 figures, accepted at FSE201

    Statistical Inference using the Morse-Smale Complex

    Full text link
    The Morse-Smale complex of a function ff decomposes the sample space into cells where ff is increasing or decreasing. When applied to nonparametric density estimation and regression, it provides a way to represent, visualize, and compare multivariate functions. In this paper, we present some statistical results on estimating Morse-Smale complexes. This allows us to derive new results for two existing methods: mode clustering and Morse-Smale regression. We also develop two new methods based on the Morse-Smale complex: a visualization technique for multivariate functions and a two-sample, multivariate hypothesis test.Comment: 45 pages, 13 figures. Accepted to Electronic Journal of Statistic

    Self-Selective Correlation Ship Tracking Method for Smart Ocean System

    Full text link
    In recent years, with the development of the marine industry, navigation environment becomes more complicated. Some artificial intelligence technologies, such as computer vision, can recognize, track and count the sailing ships to ensure the maritime security and facilitates the management for Smart Ocean System. Aiming at the scaling problem and boundary effect problem of traditional correlation filtering methods, we propose a self-selective correlation filtering method based on box regression (BRCF). The proposed method mainly include: 1) A self-selective model with negative samples mining method which effectively reduces the boundary effect in strengthening the classification ability of classifier at the same time; 2) A bounding box regression method combined with a key points matching method for the scale prediction, leading to a fast and efficient calculation. The experimental results show that the proposed method can effectively deal with the problem of ship size changes and background interference. The success rates and precisions were higher than Discriminative Scale Space Tracking (DSST) by over 8 percentage points on the marine traffic dataset of our laboratory. In terms of processing speed, the proposed method is higher than DSST by nearly 22 Frames Per Second (FPS)

    Computationally intensive, distributed and decentralised machine learning: from theory to applications

    Get PDF
    Machine learning (ML) is currently one of the most important research fields, spanning computer science, statistics, pattern recognition, data mining, and predictive analytics. It plays a central role in automatic data processing and analysis in numerous research domains owing to widely distributed and geographically scattered data sources, powerful computing clouds, and high digitisation requirements. However, aspects such as the accuracy of methods, data privacy, and model explainability remain challenging and require additional research. Therefore, it is necessary to analyse centralised and distributed data processing architectures, and to create novel computationally intensive explainable and privacy-preserving ML methods, to investigate their properties, to propose distributed versions of prospective ML baseline methods, and to evaluate and apply these in various applications. This thesis addresses the theoretical and practical aspects of state-of-the-art ML methods. The contributions of this thesis are threefold. In Chapter 2, novel non-distributed, centralised, computationally intensive ML methods are proposed, their properties are investigated, and state-of-the-art ML methods are applied to real-world data from two domains, namely transportation and bioinformatics. Moreover, algorithms for ‘black-box’ model interpretability are presented. Decentralised ML methods are considered in Chapter 3. First, we investigate data processing as a preliminary step in data-driven, agent-based decision-making. Thereafter, we propose novel decentralised ML algorithms that are based on the collaboration of the local models of agents. Within this context, we consider various regression models. Finally, the explainability of multiagent decision-making is addressed. In Chapter 4, we investigate distributed centralised ML methods. We propose a distributed parallelisation algorithm for the semi-parametric and non-parametric regression types, and implement these in the computational environment and data structures of Apache SPARK. Scalability, speed-up, and goodness-of-fit experiments using real-world data demonstrate the excellent performance of the proposed methods. Moreover, the federated deep-learning approach enables us to address the data privacy challenges caused by processing of distributed private data sources to solve the travel-time prediction problem. Finally, we propose an explainability strategy to interpret the influence of the input variables on this federated deep-learning application. This thesis is based on the contribution made by 11 papers to the theoretical and practical aspects of state-of-the-art and proposed ML methods. We successfully address the stated challenges with various data processing architectures, validate the proposed approaches in diverse scenarios from the transportation and bioinformatics domains, and demonstrate their effectiveness in scalability, speed-up, and goodness-of-fit experiments with real-world data. However, substantial future research is required to address the stated challenges and to identify novel issues in ML. Thus, it is necessary to advance the theoretical part by creating novel ML methods and investigating their properties, as well as to contribute to the application part by using of the state-of-the-art ML methods and their combinations, and interpreting their results for different problem setting

    High-resolution SAR images for fire susceptibility estimation in urban forestry

    Get PDF
    We present an adaptive system for the automatic assessment of both physical and anthropic fire impact factors on periurban forestries. The aim is to provide an integrated methodology exploiting a complex data structure built upon a multi resolution grid gathering historical land exploitation and meteorological data, records of human habits together with suitably segmented and interpreted high resolution X-SAR images, and several other information sources. The contribution of the model and its novelty rely mainly on the definition of a learning schema lifting different factors and aspects of fire causes, including physical, social and behavioural ones, to the design of a fire susceptibility map, of a specific urban forestry. The outcome is an integrated geospatial database providing an infrastructure that merges cartography, heterogeneous data and complex analysis, in so establishing a digital environment where users and tools are interactively connected in an efficient and flexible way

    Review and Comparison of Intelligent Optimization Modelling Techniques for Energy Forecasting and Condition-Based Maintenance in PV Plants

    Get PDF
    Within the field of soft computing, intelligent optimization modelling techniques include various major techniques in artificial intelligence. These techniques pretend to generate new business knowledge transforming sets of "raw data" into business value. One of the principal applications of these techniques is related to the design of predictive analytics for the improvement of advanced CBM (condition-based maintenance) strategies and energy production forecasting. These advanced techniques can be used to transform control system data, operational data and maintenance event data to failure diagnostic and prognostic knowledge and, ultimately, to derive expected energy generation. One of the systems where these techniques can be applied with massive potential impact are the legacy monitoring systems existing in solar PV energy generation plants. These systems produce a great amount of data over time, while at the same time they demand an important e ort in order to increase their performance through the use of more accurate predictive analytics to reduce production losses having a direct impact on ROI. How to choose the most suitable techniques to apply is one of the problems to address. This paper presents a review and a comparative analysis of six intelligent optimization modelling techniques, which have been applied on a PV plant case study, using the energy production forecast as the decision variable. The methodology proposed not only pretends to elicit the most accurate solution but also validates the results, in comparison with the di erent outputs for the di erent techniques
    • …
    corecore