2,226 research outputs found
Gauge-optimal approximate learning for small data classification problems
Small data learning problems are characterized by a significant discrepancy
between the limited amount of response variable observations and the large
feature space dimension. In this setting, the common learning tools struggle to
identify the features important for the classification task from those that
bear no relevant information, and cannot derive an appropriate learning rule
which allows to discriminate between different classes. As a potential solution
to this problem, here we exploit the idea of reducing and rotating the feature
space in a lower-dimensional gauge and propose the Gauge-Optimal Approximate
Learning (GOAL) algorithm, which provides an analytically tractable joint
solution to the dimension reduction, feature segmentation and classification
problems for small data learning problems. We prove that the optimal solution
of the GOAL algorithm consists in piecewise-linear functions in the Euclidean
space, and that it can be approximated through a monotonically convergent
algorithm which presents -- under the assumption of a discrete segmentation of
the feature space -- a closed-form solution for each optimization substep and
an overall linear iteration cost scaling. The GOAL algorithm has been compared
to other state-of-the-art machine learning (ML) tools on both synthetic data
and challenging real-world applications from climate science and bioinformatics
(i.e., prediction of the El Nino Southern Oscillation and inference of
epigenetically-induced gene-activity networks from limited experimental data).
The experimental results show that the proposed algorithm outperforms the
reported best competitors for these problems both in learning performance and
computational cost.Comment: 47 pages, 4 figure
Reconstruction of metabolic networks from high-throughput metabolite profiling data: in silico analysis of red blood cell metabolism
We investigate the ability of algorithms developed for reverse engineering of
transcriptional regulatory networks to reconstruct metabolic networks from
high-throughput metabolite profiling data. For this, we generate synthetic
metabolic profiles for benchmarking purposes based on a well-established model
for red blood cell metabolism. A variety of data sets is generated, accounting
for different properties of real metabolic networks, such as experimental
noise, metabolite correlations, and temporal dynamics. These data sets are made
available online. We apply ARACNE, a mainstream transcriptional networks
reverse engineering algorithm, to these data sets and observe performance
comparable to that obtained in the transcriptional domain, for which the
algorithm was originally designed.Comment: 14 pages, 3 figures. Presented at the DIMACS Workshop on Dialogue on
Reverse Engineering Assessment and Methods (DREAM), Sep 200
Discovering time-lagged rules from microarray data using gene profile classifiers
Background: Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes.Results: This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations.Conclusions: A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. © 2011 Gallo et al; licensee BioMed Central Ltd.Fil: Gallo, Cristian Andrés. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Planta Piloto de IngenierÃa QuÃmica. Universidad Nacional del Sur. Planta Piloto de IngenierÃa QuÃmica; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación; Argentin
Parameter estimation for macroscopic pedestrian dynamics models from microscopic data
In this paper we develop a framework for parameter estimation in macroscopic
pedestrian models using individual trajectories -- microscopic data. We
consider a unidirectional flow of pedestrians in a corridor and assume that the
velocity decreases with the average density according to the fundamental
diagram. Our model is formed from a coupling between a density dependent
stochastic differential equation and a nonlinear partial differential equation
for the density, and is hence of McKean--Vlasov type. We discuss
identifiability of the parameters appearing in the fundamental diagram from
trajectories of individuals, and we introduce optimization and Bayesian methods
to perform the identification. We analyze the performance of the developed
methodologies in various situations, such as for different in- and outflow
conditions, for varying numbers of individual trajectories and for differing
channel geometries
- …