80 research outputs found

    TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms

    Get PDF
    Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. Triclustering relaxes the constraints for grouping and allows genes to be evaluated under a subset of experimental conditions and a subset of time points simultaneously. The authors previously presented a genetic algorithm, TriGen, that finds triclusters of gene expression dasta. They also defined three different fitness functions for TriGen: MSR3D, LSL and MSL. In order to asses the results obtained by application of TriGen, a validity measure needs to be defined. Therefore, we present TRIQ, a validity measure which combines information from three different sources: (1) correlation among genes, conditions and times, (2) graphic validation of the patterns extracted and (3) functional annotations for the genes extracted.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Ministerio de ciencia y Tecnología TIN2014-55894-C2-1-RJunta de Andalucía P12-TIC-752

    MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data

    Get PDF
    Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the triclusterMinisterio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía TIC-752

    Rationale for Timing of Follow-Up Visits to Assess Gluten-Free Diet in Celiac Disease Patients Based on Data Mining

    Get PDF
    The assessment of compliance of gluten-free diet (GFD) is a keystone in the supervision of celiac disease (CD) patients. Few data are available documenting evidence-based follow-up frequency for CD patients. In this work we aim at creating a criterion for timing of clinical follow-up for CD patients using data mining. We have applied data mining to a dataset with 188 CD patients on GFD (75% of them are children below 14 years old), evaluating the presence of gluten immunogenic peptides (GIP) in stools as an adherence to diet marker. The variables considered are gender, age, years following GFD and adherence to the GFD by fecal GIP. The results identify patients on GFD for more than two years (41.5% of the patients) as more prone to poor compliance and so needing more frequent follow-up than patients with less than 2 years on GFD. This is against the usual clinical practice of following less patients on long term GFD, as they are supposed to perform better. Our results support different timing follow-up frequency taking into consideration the number of years on GFD, age and gender. Patients on long term GFD should have a more frequent monitoring as they show a higher level of gluten exposure. A gender perspective should also be considered as non-compliance is partially linked to gender in our results: Males tend to get more gluten exposure, at least in the cultural context where our study was carried out. Children tend to perform better than teenagers or adultMinisterio de Economía y Competitividad TIN2017-88209-C2-2-RJunta de Andalucía US-126334

    Modeling Genetic Networks: Comparison of Static and Dynamic Models

    Get PDF
    Biomedical research has been revolutionized by high-throughput techniques and the enormous amount of biological data they are able to generate. The interest shown over network models and systems biology is rapidly raising. Genetic networks arise as an essential task to mine these data since they explain the function of genes in terms of how they influence other genes. Many modeling approaches have been proposed for building genetic networks up. However, it is not clear what the advantages and disadvantages of each model are. There are several ways to discriminate network building models, being one of the most important whether the data being mined presents a static or dynamic fashion. In this work we compare static and dynamic models over a problem related to the inflammation and the host response to injury. We show how both models provide complementary information and cross-validate the obtained results

    Deep Learning Techniques to Improve the Performance of Olive Oil Classification

    Get PDF
    The olive oil assessment involves the use of a standardized sensory analysis according to the “panel test” method. However, there is an important interest to design novel strategies based on the use of Gas Chromatography (GC) coupled to mass spectrometry (MS), or ion mobility spectrometry (IMS) together with a chemometric data treatment for olive oil classification. It is an essential task in an attempt to get the most robust model over time and, both to avoid fraud in the price and to know whether it is suitable for consumption or not. The aim of this paper is to combine chemical techniques and Deep Learning approaches to automatically classify olive oil samples from two different harvests in their three corresponding classes: extra virgin olive oil (EVOO), virgin olive oil (VOO), and lampante olive oil (LOO). Our Deep Learning model is built with 701 samples, which were obtained from two olive oil campaigns (2014–2015 and 2015–2016). The data from the two harvests are built from the selection of specific olive oil markers from the whole spectral fingerprint obtained with GC-IMS method. In order to obtain the best results we have configured the parameters of our model according to the nature of the data. The results obtained show that a deep learning approach applied to data obtained from chemical instrumental techniques is a good method when classifying oil samples in their corresponding categories, with higher success rates than those obtained in previous works.Ministerio de Economía y Competitividad TIN2017-88209-C2-2-

    Revisiting the Yeast Cell Cycle Problem with the Improved TriGen Algorithm

    Get PDF
    Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. On a previous work we presented the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously, and was applied to the yeast (Saccharomyces Cerevisiae) cell cycle problem. In this article we present some improvements on the genetic algorithm and we also present the results of applying the improved TriGen algorithm to the yeast cell cycle problem, where the goal is to identify all genes whose expression levels are regulated by the cell cycle

    Triclustering on TemporaryMicroarray Data using the TriGen Algorithm

    Get PDF
    The analysis of microarray data is a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. In this paper, we propose the TriGen algorithm, which finds triclusters that take into account the experimental conditions and the time points, using evolutionary computation, in particular genetic algorithms, enabling the evaluation of the gene’s behavior under subsets of conditions and of time points

    Optimization of multi-classifiers for computational biology: application to gene finding and expression

    Get PDF
    Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome and their expression. We propose a multi-objective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain optimal methods’ aggregations. The results obtained show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems. The methodology proposed here is an automatic method generator, and a step forward to exploit all already existing methods, by providing alternative optimal methods’ aggregations to answer concrete queries for a certain biological problem with a maximized accuracy of the prediction. As more approaches are integrated for each of the presented problems, de novo accuracy can be expected to improve further.Ministerio de Ciencia y Tecnología TIN2006-12879Junta de Andalucía TIC-0278

    Determining the best set of seismicity indicators to predict earthquakes. Two case studies: Chile and the Iberian Peninsula

    Get PDF
    This work explores the use of different seismicity indicators as inputs for artificial neural networks. The combination of multiple indicators that have already been successfully used in different seismic zones by the application of feature selection techniques is proposed. These techniques evaluate every input and propose the best combination of them in terms of information gain. Once these sets have been obtained, artificial neural networks are applied to four Chilean zones (the most seismic country in the world) and to two zones of the Iberian Peninsula (a moderate seismicity area). To make the comparison to other models possible, the prediction problem has been turned into one of classification, thus allowing the application of other machine learning classifiers. Comparisons with original sets of inputs and different classifiers are reported to support the degree of success achieved. Statistical tests have also been applied to confirm that the results are significantly different than those of other classifiers. The main novelty of this work stems from the use of feature selection techniques for improving earthquake prediction methods. So, the infor-mation gain of different seismic indicators has been determined. Low ranked or null contribution seismic indicators have been removed, optimizing the method. The optimized prediction method proposed has a high performance. Finally, four Chilean zones and two zones of the Iberian Peninsula have been charac-terized by means of an information gain analysis obtained from different seismic indicators. The results confirm the methodology proposed as the best features in terms of information gain are the same for both regions.Ministerio de Ciencia y Tecnología BIA2004-01302Ministerio de Ciencia y Tecnología TIN2011-28956-C02-01Junta de Andalucía P11-TIC-752
    corecore