196 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data
Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques.
Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic
Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5
This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered.
First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes.
Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification.
Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well
Resource efficient action recognition in videos
This thesis traces an innovative journey in the domain of real-world action recognition, in particular focusing on memory and data efficient systems. It begins by introducing a novel approach for smart frame selection, which significantly reduces computational costs in video classification. It further optimizes the action recognition process by addressing the challenges of training time and memory consumption in video transformers, laying a strong foundation for memory efficient action recognition.
The thesis then delves into zero-shot learning, focusing on the flaws of the currently existing protocol and establishing a new split for true zero-shot action recognition, ensuring zero overlap between unseen test classes and training or pre-training classes. Building on this, a unique cluster-based representation, optimized using reinforcement learning, is proposed for zero-shot action recognition. Crucially, we show that a joint
visual-semantic representation learning is essential for improved performance. We also experiment with feature generation approaches for zero-shot action recognition by introducing a synthetic sample selection methodology extending the utility of zero-shot learning to both images and videos and selecting high-quality samples for synthetic data augmentation. This form of data valuation is then incorporated for our novel video data augmentation approach where we generate video composites using foreground and background mixing of videos. The data valuation helps us choose good composites at a reduced overall cost. Finally, we propose the creation of a meaningful semantic space for action labels. We create a textual description dataset for each action class and propose a novel feature generating approach to maximise the benefits of this semantic space. The research contributes significantly to the field, potentially paving the way for more efficient, resource-friendly, and robust video processing and understanding techniques
Logistic Regression and Classification with non-Euclidean Covariates
We introduce a logistic regression model for data pairs consisting of a
binary response and a covariate residing in a non-Euclidean metric space
without vector structures. Based on the proposed model we also develop a binary
classifier for non-Euclidean objects. We propose a maximum likelihood estimator
for the non-Euclidean regression coefficient in the model, and provide upper
bounds on the estimation error under various metric entropy conditions that
quantify complexity of the underlying metric space. Matching lower bounds are
derived for the important metric spaces commonly seen in statistics,
establishing optimality of the proposed estimator in such spaces. Similarly, an
upper bound on the excess risk of the developed classifier is provided for
general metric spaces. A finer upper bound and a matching lower bound, and thus
optimality of the proposed classifier, are established for Riemannian
manifolds. We investigate the numerical performance of the proposed estimator
and classifier via simulation studies, and illustrate their practical merits
via an application to task-related fMRI data.Comment: This revision contains the following updates: (1) The parameter space
is allowed to be unbounded; (2) Some upper bounds are tightene
Responsible AI (RAI) Games and Ensembles
Several recent works have studied the societal effects of AI; these include
issues such as fairness, robustness, and safety. In many of these objectives, a
learner seeks to minimize its worst-case loss over a set of predefined
distributions (known as uncertainty sets), with usual examples being perturbed
versions of the empirical distribution. In other words, aforementioned problems
can be written as min-max problems over these uncertainty sets. In this work,
we provide a general framework for studying these problems, which we refer to
as Responsible AI (RAI) games. We provide two classes of algorithms for solving
these games: (a) game-play based algorithms, and (b) greedy stagewise
estimation algorithms. The former class is motivated by online learning and
game theory, whereas the latter class is motivated by the classical statistical
literature on boosting, and regression. We empirically demonstrate the
applicability and competitive performance of our techniques for solving several
RAI problems, particularly around subpopulation shift
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
We characterize the statistical efficiency of knowledge transfer through
samples from a teacher to a probabilistic student classifier with input space
over labels . We show that privileged information at
three progressive levels accelerates the transfer. At the first level, only
samples with hard labels are known, via which the maximum likelihood estimator
attains the minimax rate . The
second level has the teacher probabilities of sampled labels available in
addition, which turns out to boost the convergence rate lower bound to
. However, under this second data
acquisition protocol, minimizing a naive adaptation of the cross-entropy loss
results in an asymptotically biased student. We overcome this limitation and
achieve the fundamental limit by using a novel empirical variant of the squared
error logit loss. The third level further equips the student with the soft
labels (complete logits) on given every sampled input, thereby
provably enables the student to enjoy a rate free of
. We find any Kullback-Leibler divergence minimizer to be
optimal in the last case. Numerical simulations distinguish the four learners
and corroborate our theory.Comment: 41 pages, 2 figures; Appendix polishe
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Time-series Generation by Contrastive Imitation
Consider learning a generative model for time-series data. The sequential
setting poses a unique challenge: Not only should the generator capture the
conditional dynamics of (stepwise) transitions, but its open-loop rollouts
should also preserve the joint distribution of (multi-step) trajectories. On
one hand, autoregressive models trained by MLE allow learning and computing
explicit transition distributions, but suffer from compounding error during
rollouts. On the other hand, adversarial models based on GAN training alleviate
such exposure bias, but transitions are implicit and hard to assess. In this
work, we study a generative framework that seeks to combine the strengths of
both: Motivated by a moment-matching objective to mitigate compounding error,
we optimize a local (but forward-looking) transition policy, where the
reinforcement signal is provided by a global (but stepwise-decomposable) energy
model trained by contrastive estimation. At training, the two components are
learned cooperatively, avoiding the instabilities typical of adversarial
objectives. At inference, the learned policy serves as the generator for
iterative sampling, and the learned energy serves as a trajectory-level measure
for evaluating sample quality. By expressly training a policy to imitate
sequential behavior of time-series features in a dataset, this approach
embodies "generation by imitation". Theoretically, we illustrate the
correctness of this formulation and the consistency of the algorithm.
Empirically, we evaluate its ability to generate predictively useful samples
from real-world datasets, verifying that it performs at the standard of
existing benchmarks
- …