22 research outputs found

    A multi criteria group decision making approach based on fuzzy measure theory to assess the different gene regions used in rodent species

    Get PDF
    Many mitochondrial and nuclear gene regions are used in phylogenetic and taxonomic studies to investigate the historical background of the species and to present the hierarchy of the species. In this paper, we consider the problem of proposing a favorable gene region that determines the diversification of rodent species as a multi criteria group decision making problem. We use fuzzy measure theory and fuzzy integrals to get the results. We conclude with different fuzzy measures and fuzzy integral techniques that COI gene region which is preferred in animal barcoding studies is more favorable.Publisher's Versio

    An automatic graph layout procedure to visualize correlated data

    Get PDF
    This paper introduces an automatic procedure to assist on the interpretation of a large dataset when a similarity metric is available. We propose a visualization approach based on a graph layout method- ology that uses a Quadratic Assignment Problem (QAP) formulation. The methodology is presented using as testbed a time series dataset of the Standard & Poor’s 100, one the leading stock market indicators in the United States. A weighted graph is created with the stocks repre- sented by the nodes and the edges’ weights are related to the correlation between the stocks’ time series. A heuristic for clustering is then pro- posed; it is based on the graph partition into disconnected subgraphs allowing the identification of clusters of highly-correlated stocks. The final layout corresponds well with the perceived market notion of the different industrial sectors. We compare the output of this procedure with a traditional dendogram approach of hierarchical clusteringIFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI

    Genome-wide analysis of long noncoding RNA stability

    Get PDF
    Transcriptomic analyses have identified tens of thousands of intergenic, intronic, and cis-antisense long noncoding RNAs (lncRNAs) that are expressed from mammalian genomes. Despite progress in functional characterization, little is known about the post-transcriptional regulation of lncRNAs and their half-lives. Although many are easily detectable by a variety of techniques, it has been assumed that lncRNAs are generally unstable, but this has not been examined genome-wide. Utilizing a custom noncoding RNA array, we determined the half-lives of ∼800 lncRNAs and ∼12,000 mRNAs in the mouse Neuro-2a cell line. We find only a minority of lncRNAs are unstable. LncRNA half-lives vary over a wide range, comparable to, although on average less than, that of mRNAs, suggestive of complex metabolism and widespread functionality. Combining half-lives with comprehensive lncRNA annotations identified hundreds of unstable (half-life 16 h). Analysis of lncRNA features revealed that intergenic and cis-antisense RNAs are more stable than those derived from introns, as are spliced lncRNAs compared to unspliced (single exon) transcripts. Subcellular localization of lncRNAs indicated widespread trafficking to different cellular locations, with nuclear-localized lncRNAs more likely to be unstable. Surprisingly, one of the least stable lncRNAs is the well-characterized paraspeckle RNA Neat1, suggesting Neat1 instability contributes to the dynamic nature of this subnuclear domain. We have created an online interactive resource (http://stability. matticklab.com) that allows easy navigation of lncRNA and mRNA stability profiles and provides a comprehensive annotation of ∼7200 mouse lncRNAs

    QAPgrid: A Two Level QAP-Based Approach for Large-Scale Data Analysis and Visualization

    Get PDF
    Background: The visualization of large volumes of data is a computationally challenging task that often promises rewarding new insights. There is great potential in the application of new algorithms and models from combinatorial optimisation. Datasets often contain “hidden regularities” and a combined identification and visualization method should reveal these structures and present them in a way that helps analysis. While several methodologies exist, including those that use non-linear optimization algorithms, severe limitations exist even when working with only a few hundred objects. Methodology/Principal Findings: We present a new data visualization approach (QAPgrid) that reveals patterns of similarities and differences in large datasets of objects for which a similarity measure can be computed. Objects are assigned to positions on an underlying square grid in a two-dimensional space. We use the Quadratic Assignment Problem (QAP) as a mathematical model to provide an objective function for assignment of objects to positions on the grid. We employ a Memetic Algorithm (a powerful metaheuristic) to tackle the large instances of this NP-hard combinatorial optimization problem, and we show its performance on the visualization of real data sets. Conclusions/Significance: Overall, the results show that QAPgrid algorithm is able to produce a layout that represents the relationships between objects in the data set. Furthermore, it also represents the relationships between clusters that are feed into the algorithm. We apply the QAPgrid on the 84 Indo-European languages instance, producing a near-optimal layout. Next, we produce a layout of 470 world universities with an observed high degree of correlation with the score used by the Academic Ranking of World Universities compiled in the The Shanghai Jiao Tong University Academic Ranking of World Universities without the need of an ad hoc weighting of attributes. Finally, our Gene Ontology-based study on Saccharomyces cerevisiae fully demonstrates the scalability and precision of our method as a novel alternative tool for functional genomics

    A Transcription Factor Map as Revealed by a Genome-Wide Gene Expression Analysis of Whole-Blood mRNA Transcriptome in Multiple Sclerosis

    Get PDF
    Background: Several lines of evidence suggest that transcription factors are involved in the pathogenesis of Multiple Sclerosis (MS) but complete mapping of the whole network has been elusive. One of the reasons is that there are several clinical subtypes of MS and transcription factors that may be involved in one subtype may not be in others. We investigate the possibility that this network could be mapped using microarray technologies and contemporary bioinformatics methods on a dataset derived from whole blood in 99 untreated MS patients (36 Relapse Remitting MS, 43 Primary Progressive MS, and 20 Secondary Progressive MS) and 45 age-matched healthy controls. Methodology/Principal Findings: We have used two different analytical methodologies: a non-standard differential expression analysis and a differential co-expression analysis, which have converged on a significant number of regulatory motifs that are statistically overrepresented in genes that are either differentially expressed (or differentially co-expressed) in cases and controls (e.g., VKROXQ6,pvalue,3.31E6;VKROX_Q6, p-value ,3.31E-6; VCREBP1_Q2, p-value ,9.93E-6, V$YY1_02, p-value ,1.65E-5). Conclusions/Significance: Our analysis uncovered a network of transcription factors that potentially dysregulate several genes in MS or one or more of its disease subtypes. The most significant transcription factor motifs were for the Early Growth Response EGR/KROX family, ATF2, YY1 (Yin and Yang 1), E2F-1/DP-1 and E2F-4/DP-2 heterodimers, SOX5, and CREB and ATF families. These transcription factors are involved in early T-lymphocyte specification and commitment as well as in oligodendrocyte dedifferentiation and development, both pathways that have significant biological plausibility in MS causation

    A Bi-Objective Clustering Algorithm for Gene Expression Data

    No full text
    Clustering algorithms are a common method for data analysis in many science field. They have become popular among biologists because of ease to discovery similar cellular functions in gene expression data. Most approaches consider the gene clustering as an optimization problem, where an ad-hoc cluster quality index is optimized which can be defined regarding gene expression data or biological information. However, these approaches may not be sufficient since they cannot guarantee to generate clusters with similar expression patterns and biological coherence. In this paper, we propose a bi-objective clustering algorithm to discover clusters of genes with high levels of co-expression and biological coherence. Our approach uses a multi-objective evolutionary algorithm (MOEA) that optimizes two index based on gene expression level and biological functional classes. The algorithm is tested on three real-life gene expression datasets. Results show that the proposed model yields gene clusters with higher levels of co-expression and biological coherence than traditional approaches

    A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies

    No full text
    Abstract Background Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence. Method We propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process. Results The effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm. Conclusions Integrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation

    An integrated QAP-based approach to visualize patterns of gene expression similarity

    No full text
    This paper illustrates how the Quadratic Assignment Problem (QAP) is used as a mathematical model that helps to produce a visualization of microarray data, based on the relationships between the objects (genes or samples). The visualization method can also incorporate the result of a clustering algorithm to facilitate the process of data analysis. Specifically, we show the integration with a graph-based clustering algorithm that outperforms the results against other benchmarks, namely k −means and self-organizing maps. Even though the application uses gene expression data, the method is general and only requires a similarity function being defined between pairs of objects. The microarray dataset is based on the budding yeast (S. cerevisiae). It is composed of 79 samples taken from different experiments and 2,467 genes. The proposed method delivers an automatically generated visualization of the microarray dataset based on the integration of the relationships coming from similarity measures, a clustering result and a graph structure

    A dynamic evolutionary multi-agent system to predict the 3D structure of proteins

    Get PDF
    International audienceThe protein structure prediction is one of the key problems in Structural Bioinformatics. The protein function is directly related to its conformation and the folding can provide to researchers better understandings about the protein roles in the cell. Several computational methods have been proposed over the last decades to tackle the problem. In this paper, we propose an ab initio algorithm with database information for the protein structure prediction problem. We do so by designing some versions of a multi-agent system that use concepts of dynamic distributed evolutionary algorithms to speed up and improve the optimization by better adapting the algorithm to the target protein. The dynamic strategy consists of auto-adapting the number of optimization agents according to the needs and current status of the optimization process. The system is able to scale in/out itself depending on some diversity criteria. The algorithms also take advantage of structural knowledge from the Protein Data Bank to better guide the search and constraint the state space. To validate our computational strategies, we tested them on a set of eight protein sequences. The obtained results were topologically compatible with the experimental correspondent ones, thus corroborating the promising performance of the strategies
    corecore