Search CORE

3,676 research outputs found

Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Author: Bianchi Filippo Maria
Jenssen Robert
Mikalsen Karl Øyvind
Soguero-Ruiz Cristina
Publication venue
Publication date: 01/01/2017
Field of study

Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online

Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

Author: Choudhary Kamal
Jiang Xinyu
Nian Qiong
Sun Haofan
Zhuang Houlong
Publication venue
Publication date: 24/07/2023
Field of study

Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties

arXiv.org e-Print Archive

Symbolic uses of export information: implications for export performance

Author: Evagelos Korobilis-Magas (7196780)
Publication venue
Publication date: 01/01/2011
Field of study

As export competition becomes more intense and export success vital for survival (Katsikeas, 1994), so the effective processing and use of information regarding the international environment becomes a critical prerequisite for gaining competitive advantage (Leonidou and Theodosiou, 2004). Symbolic use of information is one type of information use, which although relatively underexplored to date, may be the most prevalent form of information use within organisations – especially in an export setting (Beyer and Trice, 1982). Symbolic use occurs when information is used for purposes other than the ones which led to its collection (Menon and Varadarajan, 1992). Symbolic use of information has been conceptualised as a multi-dimensional construct encompassing various dimensions (Vyas and Souchon, 2003). Examples include “exporters that engage in distorting market research findings, taking conclusions out of context, disclosing only the findings that confirm an executive‟s predetermined position or consciously ignoring information” (Toften and Olsen, 2004, p. 106). Symbolic use can also legitimate decisions reached on the basis of intuition or managerial assumptions (Vyas and Souchon, 2003). Although conceptual propositions of the potential relationship between each of the symbolic use dimensions and performance exist (Vyas and Souchon 2003), no empirical research has yet been undertaken. As a result, little is known about how and why symbolic use of export information may affect export performance, and under what circumstances. Furthermore, reliable and valid measures for each one of the symbolic use dimensions are absent in the literature. The purpose of this thesis is to fill in these research gaps. In so doing, a combination of both qualitative and quantitative methods is employed. The exploratory phase takes the form of in depth interviews with export decision makers in the UK. The data collected in this exploratory phase are analysed through the use of within-case and cross-case displays as per Miles and Huberman (1994) and are used not just for hypothesis development, but also to identify potential outcomes of using information symbolically in specific ways and to create pools of items for the development of measures of symbolic use. (Continues...)

Loughborough University Institutional Repository

Predictive Maintenance on the Machining Process and Machine Tool

Author: Box
Breiman
Djeziri
Extrapolation
Farokhzad
Galar
Guyon
Jin
Maronna
Mills
Mobley
Tsymbal
Tyagi
Winkler
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

This paper presents the process required to implement a data driven Predictive Maintenance (PdM) not only in the machine decision making, but also in data acquisition and processing. A short review of the different approaches and techniques in maintenance is given. The main contribution of this paper is a solution for the predictive maintenance problem in a real machining process. Several steps are needed to reach the solution, which are carefully explained. The obtained results show that the Preventive Maintenance (PM), which was carried out in a real machining process, could be changed into a PdM approach. A decision making application was developed to provide a visual analysis of the Remaining Useful Life (RUL) of the machining tool. This work is a proof of concept of the methodology presented in one process, but replicable for most of the process for serial productions of pieces

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

TECNALIA Publications

Practical approaches to mining of clinical datasets : from frameworks to novel feature selection

Author: Poolsawad Nongnuch
Publication venue
Publication date: 01/05/2014
Field of study

Research has investigated clinical data that have embedded within them numerous complexities and uncertainties in the form of missing values, class imbalances and high dimensionality. The research in this thesis was motivated by these challenges to minimise these problems whilst, at the same time, maximising classification performance of data and also selecting the significant subset of variables. As such, this led to the proposal of a data mining framework and feature selection method. The proposed framework has a simple algorithmic framework and makes use of a modified form of existing frameworks to address a variety of different data issues, called the Handling Clinical Data Framework (HCDF). The assessment of data mining techniques reveals that missing values imputation and resampling data for class balancing can improve the performance of classification. Next, the proposed feature selection method was introduced; it involves projecting onto principal component method (FS-PPC) and draws on ideas from both feature extraction and feature selection to select a significant subset of features from the data. This method selects features that have high correlation with the principal component by applying symmetrical uncertainty (SU). However, irrelevant and redundant features are removed by using mutual information (MI). However, this method provides confidence in the selected subset of features that will yield realistic results with less time and effort. FS-PPC is able to retain classification performance and meaningful features while consisting of non-redundant features. The proposed methods have been practically applied to analysis of real clinical data and their effectiveness has been assessed. The results show that the proposed methods are enable to minimise the clinical data problems whilst, at the same time, maximising classification performance of data

Repository@Hull - Worktribe

ANN-MIND : dropout for neural network training with missing data

Author: Mudau Tshilidzi
Publication venue
Publication date: 01/01/2019
Field of study

M.Sc. (Computer Science)Abstract: It is a well-known fact that the quality of the dataset plays a central role in the results and conclusions drawn from the analysis of such a dataset. As the saying goes, ”garbage in, garbage out”. In recent years, neural networks have displayed good performance in solving a diverse number of problems. Unfortunately, neural networks are not immune to this misfortune presented by missing values. Furthermore, in most real-world settings, it is often the case that, the only data available for training neural networks consists of missing values. In such cases, we are left with little choice but to use this data for the purposes of training neural networks, although doing so may result in a poorly trained neural network. Most systems currently in use- merely discard the missing observation from the training datasets, while others just proceed to use this data and ignore the problems presented by the missing values. Still other approaches choose to impute these missing values with fixed constants such as means and mode. Most neural network models work under the assumption that the supplied data contains no missing values. This dissertation explores a method for training neural networks in the event where the training dataset consists of missing values..

University of Johannesburg Institutional Repository

Recovering Loss to Followup Information Using Denoising Autoencoders

Author: Gondara Lovedeep
Wang Ke
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2018
Field of study

Loss to followup is a significant issue in healthcare and has serious consequences for a study's validity and cost. Methods available at present for recovering loss to followup information are restricted by their expressive capabilities and struggle to model highly non-linear relations and complex interactions. In this paper we propose a model based on overcomplete denoising autoencoders to recover loss to followup information. Designed to work with high volume data, results on various simulated and real life datasets show our model is appropriate under varying dataset and loss to followup conditions and outperforms the state-of-the-art methods by a wide margin (

\ge 20\%

in some scenarios) while preserving the dataset utility for final analysis.Comment: Copyright IEEE 2017, IEEE International Conference on Big Data (Big Data

arXiv.org e-Print Archive

Crossref