Search CORE

213 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

AUC-Based Extreme Learning Machines for Supervised and Semi-Supervised Imbalanced Classification

Author: Lu J
Wang G
Wong KW
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/03/2021
Field of study

OPUS - University of Technology Sydney

Tournament leave-pair-out cross-validation for receiver operating characteristic analysis

Author: Airola A.
Boström P.
Jambor I.
Montoya Perez I.
Pahikkala T.
Publication venue: 'SAGE Publications'
Publication date: 28/10/2022
Field of study

Receiver operating characteristic analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under receiver operating characteristic curve with standard cross-validation methods suffers from a large bias. The leave-pair-out cross-validation has been shown to correct this bias. However, while leave-pair-out produces an almost unbiased estimate of area under receiver operating characteristic curve, it does not provide a ranking of the data needed for plotting and analyzing the receiver operating characteristic curve. In this study, we propose a new method called tournament leave-pair-out cross-validation. This method extends leave-pair-out by creating a tournament from pair comparisons to produce a ranking for the data. Tournament leave-pair-out preserves the advantage of leave-pair-out for estimating area under receiver operating characteristic curve, while it also allows performing receiver operating characteristic analyses. We have shown using both synthetic and real-world data that tournament leave-pair-out is as reliable as leave-pair-out for area under receiver operating characteristic curve estimation and confirmed the bias in leave-one-out cross-validation on low-dimensional data. As a case study on receiver operating characteristic analysis, we also evaluate how reliably sensitivity and specificity can be estimated from tournament leave-pair-out receiver operating characteristic curves.</p

UTUPub

Machine learning for acquiring knowledge in astro-particle physics

Author: Bunse Mirko
Publication venue
Publication date: 01/01/2022
Field of study

This thesis explores the fundamental aspects of machine learning, which are involved with acquiring knowledge in the research field of astro-particle physics. This research field substantially relies on machine learning methods, which reconstruct the properties of astro-particles from the raw data that specialized telescopes record. These methods are typically trained from resource-intensive simulations, which reflect the existing knowledge about the particles—knowledge that physicists strive to expand. We study three fundamental machine learning tasks, which emerge from this goal. First, we address ordinal quantification, the task of estimating the prevalences of ordered classes in sets of unlabeled data. This task emerges from the need for testing the agreement of astro-physical theories with the class prevalences that a telescope observes. To this end, we unify existing methods on quantification, propose an alternative optimization process, and develop regularization techniques to address ordinality in quantification problems, both in and outside of astro-particle physics. These advancements provide more accurate reconstructions of the energy spectra of cosmic gamma ray sources and, hence, support physicists in drawing conclusions from their telescope data. Second, we address learning under class-conditional label noise. More particularly, we focus on a novel setting, in which one of the class-wise noise rates is known and one is not. This setting emerges from a data acquisition protocol, through which astro-particle telescopes simultaneously observe a region of interest and several background regions. We enable learning under this type of label noise with algorithms for consistent, noise-aware decision thresholding. These algorithms yield binary classifiers, which outperform the existing state-of-the-art in gamma hadron classification with the FACT telescope. Moreover, unlike the state-of-the-art, our classifiers are entirely trained from the real telescope data and thus do not require any resource-intensive simulation. Third, we address active class selection, the task of actively finding those proportions of classes which optimize the classification performance. In astro-particle physics, this task emerges from the simulation, which produces training data in any desired class proportions. We clarify the implications of this setting from two theoretical perspectives, one of which provides us with bounds of the resulting classification performance. We employ these bounds in a certificate of model robustness, which declares a set of class proportions for which the model is accurate with a high probability. We also employ these bounds in an active strategy for class-conditional data acquisition. Our strategy uniquely considers existing uncertainties about those class proportions that have to be handled during the deployment of the classifier, while being theoretically well-justified

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

On the relevance of preprocessing in predictive maintenance for dynamic systems

Author: A Chuang
A Graves
A Savitzky
AJ Smola
AP Bradley
B Schölkopf
B Schölkopf
BS Yang
BW Silverman
C Cernuda
C Cernuda
C Cernuda
C Cernuda
C Phua
C Wang
Carlos Cernuda
CE Shannon
D Cabrera
D Freedman
D Li
D Lin
D Wolpert
D Wu
DB Rubin
DL Wilson
E Lughofer
F Fleuret
F Serdio
F Serdio
F Serdio
G Brown
G Qiu
G Weiss
GEAPA Batista
GEP Box
H Peng
H Yang
H Zou
HB Mann
HJ Weaver
I Daubechies
I Guyon
I Guyon
I Jolliffe
I Tomek
J Gerretzen
J Ville
JB Tenenbaum
Jorma Laurikkala
K Greff
K Tschumitschew
K Varmuza
KV Branden
L Breiman
L Breiman
L Maaten
L Tan
L Zhang
M Bartlett
M Frigo
M Hubert
M Jung
M Li
MA Oliveira
MR Smith
N Friedman
N Kwak
NE Huang
NV Chawla
NV Chawla
O Troyanskaya
P Duhamel
P Mahalanobis
P Welch
PE Hart
R Battiti
R Kohavi
R Nikzad-Langerodi
R Nunkesser
R Tibshirani
RC Sharpley
RD Maesschalck
RM Sakia
RN Bracewell
S García
S Gelper
S Hochreiter
S Kadambe
S Oba
S Roweis
SA Dudani
SE Said
SG Mallat
Sudipto Guha
T Benkedjouh
T Hastie
T Hastie
T Hofmann
T Jo
T Loutas
TY Wu
V Vapnik
W Pedrycz
Y Saeys
Publication venue
Publication date: 01/01/2018
Field of study

The complexity involved in the process of real-time data-driven monitoring dynamic systems for predicted maintenance is usually huge. With more or less in-depth any data-driven approach is sensitive to data preprocessing, understood as any data treatment prior to the application of the monitoring model, being sometimes crucial for the final development of the employed monitoring technique. The aim of this work is to quantify the sensitiveness of data-driven predictive maintenance models in dynamic systems in an exhaustive way. We consider a couple of predictive maintenance scenarios, each of them defined by some public available data. For each scenario, we consider its properties and apply several techniques for each of the successive preprocessing steps, e.g. data cleaning, missing values treatment, outlier detection, feature selection, or imbalance compensation. The pretreatment configurations, i.e. sequential combinations of techniques from different preprocessing steps, are considered together with different monitoring approaches, in order to determine the relevance of data preprocessing for predictive maintenance in dynamical systems

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BCAM's Institutional Repository Data

Investigating Social Interactions Using Multi-Modal Nonverbal Features

Author: Carissimi Nicolo'
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 25/02/2019
Field of study

Every day, humans are involved in social situations and interplays, with the goal of sharing emotions and thoughts, establishing relationships with or acting on other human beings. These interactions are possible thanks to what is called social intelligence, which is the ability to express and recognize social signals produced during the interactions. These signals aid the information exchange and are expressed through verbal and non-verbal behavioral cues, such as facial expressions, gestures, body pose or prosody. Recently, many works have demonstrated that social signals can be captured and analyzed by automatic systems, giving birth to a relatively new research area called social signal processing, which aims at replicating human social intelligence with machines. In this thesis, we explore the use of behavioral cues and computational methods for modeling and understanding social interactions. Concretely, we focus on several behavioral cues in three specic contexts: rst, we analyze the relationship between gaze and leadership in small group interactions. Second, we expand our analysis to face and head gestures in the context of deception detection in dyadic interactions. Finally, we analyze the whole body for group detection in mingling scenarios

Archivio istituzionale della ricerca - Università di Genova

Computational intelligence contributions to readmisision risk prediction in Healthcare systems

Author: Artetxe Ballejo Arkaitz
Publication venue
Publication date: 26/10/2017
Field of study

136 p.The Thesis tackles the problem of readmission risk prediction in healthcare systems from a machine learning and computational intelligence point of view. Readmission has been recognized as an indicator of healthcare quality with primary economic importance. We examine two specific instances of the problem, the emergency department (ED) admission and heart failure (HF) patient care using anonymized datasets from three institutions to carry real-life computational experiments validating the proposed approaches. The main difficulties posed by this kind of datasets is their high class imbalance ratio, and the lack of informative value of the recorded variables. This thesis reports the results of innovative class balancing approaches and new classification architectures

Archivo Digital para la Docencia y la Investigación

BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Author: Brown Benjamin
Meng Xiao-Li
Zhang Kai
Publication venue
Publication date: 04/12/2023
Field of study

Two linearly uncorrelated binary variables must be also independent because non-linear dependence cannot manifest with only two possible states. This inherent linearity is the atom of dependency constituting any complex form of relationship. Inspired by this observation, we develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking Gaussian parallels. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how the choice of link affects modeling. For example, setting a GLM interaction coefficient to zero does not necessarily lead to the kind of no-interaction model assumption as understood under their linear model counterparts. Furthermore, for a binary response, maximum likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that is responsible for complete separation. We explore these phenomena and provide related theoretical results. We also provide preliminary empirical demonstration of some theoretical results

arXiv.org e-Print Archive