140,229 research outputs found
Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper
This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling.
Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system.
The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated.E
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Evolving Spatially Aggregated Features from Satellite Imagery for Regional Modeling
Satellite imagery and remote sensing provide explanatory variables at
relatively high resolutions for modeling geospatial phenomena, yet regional
summaries are often desirable for analysis and actionable insight. In this
paper, we propose a novel method of inducing spatial aggregations as a
component of the machine learning process, yielding regional model features
whose construction is driven by model prediction performance rather than prior
assumptions. Our results demonstrate that Genetic Programming is particularly
well suited to this type of feature construction because it can automatically
synthesize appropriate aggregations, as well as better incorporate them into
predictive models compared to other regression methods we tested. In our
experiments we consider a specific problem instance and real-world dataset
relevant to predicting snow properties in high-mountain Asia
A Survey of Prediction and Classification Techniques in Multicore Processor Systems
In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems
Modeling and visualizing networked multi-core embedded software energy consumption
In this report we present a network-level multi-core energy model and a
software development process workflow that allows software developers to
estimate the energy consumption of multi-core embedded programs. This work
focuses on a high performance, cache-less and timing predictable embedded
processor architecture, XS1. Prior modelling work is improved to increase
accuracy, then extended to be parametric with respect to voltage and frequency
scaling (VFS) and then integrated into a larger scale model of a network of
interconnected cores. The modelling is supported by enhancements to an open
source instruction set simulator to provide the first network timing aware
simulations of the target architecture. Simulation based modelling techniques
are combined with methods of results presentation to demonstrate how such work
can be integrated into a software developer's workflow, enabling the developer
to make informed, energy aware coding decisions. A set of single-,
multi-threaded and multi-core benchmarks are used to exercise and evaluate the
models and provide use case examples for how results can be presented and
interpreted. The models all yield accuracy within an average +/-5 % error
margin
Recommended from our members
Semiparametric estimation for a class of time-inhomogenous diffusion processes
Copyright @ 2009 Institute of Statistical Science, Academia SinicaWe develop two likelihood-based approaches to semiparametrically estimate a class of time-inhomogeneous diffusion processes: log penalized splines (P-splines) and the local log-linear method. Positive volatility is naturally embedded and this positivity is not guaranteed in most existing diffusion models. We investigate different smoothing parameter selections. Separate bandwidths are used for drift and volatility estimation. In the log P-splines approach, different smoothness for different time varying coefficients is feasible by assigning different penalty parameters. We also provide theorems for both approaches and report statistical inference results. Finally, we present a case study using the weekly three-month Treasury bill data from 1954 to 2004. We find that the log P-splines approach seems to capture the volatility dip in mid-1960s the best. We also present an application to calculate a financial market risk measure called Value at Risk (VaR) using statistical estimates from log P-splines
Improved physiological noise regression in fNIRS: a multimodal extension of the General Linear Model using temporally embedded Canonical Correlation Analysis
For the robust estimation of evoked brain activity from functional Near-Infrared Spectroscopy (fNIRS) signals, it is crucial to reduce nuisance signals from systemic physiology and motion. The current best practice incorporates short-separation (SS) fNIRS measurements as regressors in a General Linear Model (GLM). However, several challenging signal characteristics such as non-instantaneous and non-constant coupling are not yet addressed by this approach and additional auxiliary signals are not optimally exploited. We have recently introduced a new methodological framework for the unsupervised multivariate analysis of fNIRS signals using Blind Source Separation (BSS) methods. Building onto the framework, in this manuscript we show how to incorporate the advantages of regularized temporally embedded Canonical Correlation Analysis (tCCA) into the supervised GLM. This approach allows flexible integration of any number of auxiliary modalities and signals. We provide guidance for the selection of optimal parameters and auxiliary signals for the proposed GLM extension. Its performance in the recovery of evoked HRFs is then evaluated using both simulated ground truth data and real experimental data and compared with the GLM with short-separation regression. Our results show that the GLM with tCCA significantly improves upon the current best practice, yielding significantly better results across all applied metrics: Correlation (HbO max. +45%), Root Mean Squared Error (HbO max. -55%), F-Score (HbO up to 3.25-fold) and p-value as well as power spectral density of the noise floor. The proposed method can be incorporated into the GLM in an easily applicable way that flexibly combines any available auxiliary signals into optimal nuisance regressors. This work has potential significance both for conventional neuroscientific fNIRS experiments as well as for emerging applications of fNIRS in everyday environments, medicine and BCI, where high Contrast to Noise Ratio is of importance for single trial analysis.Published versio
- ā¦