6,466 research outputs found
Augmented Sparse Reconstruction of Protein Signaling Networks
The problem of reconstructing and identifying intracellular protein signaling
and biochemical networks is of critical importance in biology today. We sought
to develop a mathematical approach to this problem using, as a test case, one
of the most well-studied and clinically important signaling networks in biology
today, the epidermal growth factor receptor (EGFR) driven signaling cascade.
More specifically, we suggest a method, augmented sparse reconstruction, for
the identification of links among nodes of ordinary differential equation (ODE)
networks from a small set of trajectories with different initial conditions.
Our method builds a system of representation by using a collection of integrals
of all given trajectories and by attenuating block of terms in the
representation itself. The system of representation is then augmented with
random vectors, and minimization of the 1-norm is used to find sparse
representations for the dynamical interactions of each node. Augmentation by
random vectors is crucial, since sparsity alone is not able to handle the large
error-in-variables in the representation. Augmented sparse reconstruction
allows to consider potentially very large spaces of models and it is able to
detect with high accuracy the few relevant links among nodes, even when
moderate noise is added to the measured trajectories. After showing the
performance of our method on a model of the EGFR protein network, we sketch
briefly the potential future therapeutic applications of this approach.Comment: 24 pages, 6 figure
Time series data mining: preprocessing, analysis, segmentation and prediction. Applications
Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which
significantly reduces the computational cost of previously proposed coral reef methods.
Also, the optimisation of both objectives (clustering quality and approximation quality),
which are in conflict, could be an interesting open challenge, which will be tackled
in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences
Estudios del clima pasado con redes optimizadas usando inteligencia artificial
Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, leída el 12-07-2021The availability of high-quality climate records decreases backwards in time, and the associated increase in uncertainty supports the use of complementary sources of climate information (such as model simulations) to understand the underlying physics of the climate system, as well as its past and futurechanges. In this Ph.D. thesis we assess the potential of Artificial Intelligence as an additional efficient tool to solve complex problems in the field of climate sciences. We show that these techniques can optimize the informationcoming from different sets of climate networks such as meteorological stations, historical records, and paleoclimate archives. Being employed to address a plethora of questions, they share issues in terms of incompleteness. Within this framework, we address different problems that are common in the climate community by developing tailored methodologies with the same goal of maximizing the extraction of information from incomplete climate datasets. The developed approaches include metaheuristic algorithms and cluster analyses and will be applied to incomplete datasets that are typically employed for paleo-climate reconstructions and regional climate assessments, respectively...La disponibilidad de datos climáticos decrece exponencialmente a medida que retrocedemos en el tiempo, siendo muchas veces necesario el uso de fuentes complementarias de información (como las simulaciones de modelos de circulación general) para comprender la física subyacente del sistema climático, así como sus cambios pasados y futuros. En esta tesis doctoral evaluamos el potencial de la Inteligencia Artificial como una herramienta eficiente que se puede usar para resolver problemas complejos en la ciencia del clima. Mostramos como estas técnicas pueden maximizar la información proveniente de diferentes conjuntos de redes climáticas, como estaciones meteorológicas, registros históricos y proxies paleoclimáticos. Todos ellos comparten un problema similar: son datos incompletos que proporcionan información por un periodo de tiempo limitado. Por lo tanto, hemos abordado diferentes problemas cuyo objetivo común es maximizar la extracción de información de conjuntos de datos incompletos. Los métodos desarrollados incluyen algoritmos metaheurísticos y análisis de conglomerados...Fac. de Ciencias FísicasTRUEunpu
Novel Computationally Intelligent Machine Learning Algorithms for Data Mining and Knowledge Discovery
This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model
Recommended from our members
New Algorithms in Computational Microscopy
Microscopy plays an important role in providing tools to microscopically observe objects and their surrounding areas with much higher resolution ranging from the scale between molecular machineries (angstrom) and individual cells (micrometer). Under microscopes, illumination, such as visible light and electron-magnetic radiation/electron beam, interacts with samples, then they are scattered to a plane and are recorded. Computational microscopy corresponds to image reconstruction from these measurements as well as improving quality of the images. Along with the evolution of microscopy, new studies are discovered and algorithms need development not only to provide high-resolution imaging but also to decipher new and advanced research. In this dissertation, we focus on algorithm development for inverse problems in microscopy, specifically phase retrieval and tomography, and the application of these techniques to machine learning. The four studies in this dissertation demonstrates the use of optimization and calculus of variation in imaging science and other different disciplines.Study 1 focuses on coherent diffractive imaging (CDI) or phase retrieval, a non-linear inverse problem that aims to recover 2D image from it Fourier transforms in modulus taking into account that extra information provided by oversampling as a second constraint. To solve this two-constraint minimization, we proceed from Hamilton-Jacobi partial differential equation (HJ-PDE) and its Hopf-Lax formula. Introducing generalized Bregman distance to the HJ-PDE and applying Legendre transform, we derive our generalized proximal smoothing (GPS) algorithm under the form of primal-dual hybrid gradient (PDHG). While the reflection operator, known as extrapolating momentum, helps overcome local minima, the smoothing by the generalized Bregman distance is adjusted to improve convergence and consistency of phase retrieval.Study 2 focuses on electron tomography, 3D image reconstruction from a set of 2D projections obtained from a transmission electron microscope (TEM) or X-ray microscope. Notice that current tomography algorithms limit to a single tilt axis and fail to work with fully or partially missing data. In the light of calculus of variations and Fourier slice theorem (FST), we develop a highly accurate tomography iterative algorithm that can provide higher resolution imaging and work with missing data as well as has capability to perform multiple-tilt-axis tomography. The algorithm is further developed to work with non-isolated objects and partially-blocked projections which have become more popular in experiment. The success of real space iterative reconstruction engine (RESIRE) opens a new era to the study of tomography in material science and magnetic structures (vector Tomography).Study 3 and 4 are applications of our algorithms to machine learning. Study 3 develops a backward Euler method in a stochastic manner to solve K-mean clustering, a well-known non-convex optimization problem. The algorithm has been shown to improve minimums and consistency, providing a new powerful tool to the class of classification techniques. Study 4 is a direct application of GPS to deep learning gradient descent algorithms. Linearizing the Hopf-Lax formula derived in GPS, we derive our method Laplacian smoothing gradient descent (LSGD), simply known as gradient smoothing. Our experiment shows that LSGD has the ability to search for better and flatter minimums, reduce variation, and obtain higher accuracy and consistency
Imputation techniques for the reconstruction of missing interconnected data from higher Educational Institutions
Educational Institutions data constitute the basis for several important analyses on the educational systems; however they often contain not negligible shares of missing values, for several reasons. We consider in this work the relevant case of the European Tertiary Education Register (ETER), describing the Educational Institutions of Europe. The presence of missing values prevents the full exploitation of this database, since several types of analyses that could be performed are currently impracticable. The imputation of artificial data, reconstructed with the aim of being statistically equivalent to the (unknown) missing data, would allow to overcome these problems. A main complication in the imputation of this type of data is given by the correlations that exist among all the variables. We propose several imputation techniques designed to deal with the different types of missing values appearing in these interconnected data. We use these techniques to impute the database. Moreover, we evaluate the accuracy of the proposed approach by artificially introducing missing data, by imputing them, and by comparing imputed and original values. Results show that the information reconstruction does not introduce statistically significant changes in the data and that the imputed values are close enough to the original values
Cluster analysis and artificial neural networks in predicting energy efficiency of public buildings as a cost-saving approach
Although energy efficiency is a hot topic in the context of global climate change, in the European Union directives and in national energy policies, methodology for estimating energy efficiency still relies on standard techniques defined by experts in the field. Recent research shows a potential of machine learning methods that can produce models to assess energy efficiency based on available previous data. In this paper, we analyse a real dataset of public buildings in Croatia, extract their most important features based on the correlation analysis and chi-square tests, cluster the buildings based on three selected features, and create a prediction model of energy efficiency for each cluster of buildings using the artificial neural network (ANN) methodology. The main objective of this research was to investigate whether a clustering procedure improves the accuracy of a neural network prediction model or not. For that purpose, the symmetric mean average percentage error (SMAPE) was used to compare the accuracy of the initial prediction model obtained on the whole dataset and the separate models obtained on each cluster. The results show that the clustering procedure has not increased the prediction accuracy of the models. Those preliminary findings can be used to set goals for future research, which can be focused on estimating clusters using more features, conducted more extensive variable reduction, and testing more machine learning algorithms to obtain more accurate models which will enable reducing costs in the public sector
- …