1,709 research outputs found

    Mining microarray datasets aided by knowledge stored in literature

    Get PDF
    DNA microarray technology produces large amounts of data. For data mining of these datasets, background information on genes can be helpful. Unfortunately most information is stored in free text. Here, we present an approach to use this information for DNA microarray data mining

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Discovering gene association networks by multi-objective evolutionary quantitative association rules

    Get PDF
    In the last decade, the interest in microarray technology has exponentially increased due to its ability to monitor the expression of thousands of genes simultaneously. The reconstruction of gene association networks from gene expression profiles is a relevant task and several statistical techniques have been proposed to build them. The problem lies in the process to discover which genes are more relevant and to identify the direct regulatory relationships among them. We developed a multi-objective evolutionary algorithm for mining quantitative association rules to deal with this problem. We applied our methodology named GarNet to a well-known microarray data of yeast cell cycle. The performance analysis of GarNet was organized in three steps similarly to the study performed by Gallo et al. GarNet outperformed the benchmark methods in most cases in terms of quality metrics of the networks, such as accuracy and precision, which were measured using YeastNet database as true network. Furthermore, the results were consistent with previous biological knowledge.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

    Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

    Get PDF
    Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.Ministerio de Ciencia y Tecnología TIN2011- 28956-C02-02Junta de Andalucía TIC-7528Junta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    Deep Functional Mapping For Predicting Cancer Outcome

    Get PDF
    The effective understanding of the biological behavior and prognosis of cancer subtypes is becoming very important in-patient administration. Cancer is a diverse disorder in which a significant medical progression and diagnosis for each subtype can be observed and characterized. Computer-aided diagnosis for early detection and diagnosis of many kinds of diseases has evolved in the last decade. In this research, we address challenges associated with multi-organ disease diagnosis and recommend numerous models for enhanced analysis. We concentrate on evaluating the Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Positron Emission Tomography (PET) for brain, lung, and breast scans to detect, segment, and classify types of cancer from biomedical images. Moreover, histopathological, and genomic classification of cancer prognosis has been considered for multi-organ disease diagnosis and biomarker recommendation. We considered multi-modal, multi-class classification during this study. We are proposing implementing deep learning techniques based on Convolutional Neural Network and Generative Adversarial Network. In our proposed research we plan to demonstrate ways to increase the performance of the disease diagnosis by focusing on a combined diagnosis of histology, image processing, and genomics. It has been observed that the combination of medical imaging and gene expression can effectively handle the cancer detection situation with a higher diagnostic rate rather than considering the individual disease diagnosis. This research puts forward a blockchain-based system that facilitates interpretations and enhancements pertaining to automated biomedical systems. In this scheme, a secured sharing of the biomedical images and gene expression has been established. To maintain the secured sharing of the biomedical contents in a distributed system or among the hospitals, a blockchain-based algorithm is considered that generates a secure sequence to identity a hash key. This adaptive feature enables the algorithm to use multiple data types and combines various biomedical images and text records. All data related to patients, including identity, pathological records are encrypted using private key cryptography based on blockchain architecture to maintain data privacy and secure sharing of the biomedical contents

    DEVELOPMENT OF BIOINFORMATICS TOOLS AND ALGORITHMS FOR IDENTIFYING PATHWAY REGULATORS, INFERRING GENE REGULATORY RELATIONSHIPS AND VISUALIZING GENE EXPRESSION DATA

    Get PDF
    In the era of genetics and genomics, the advent of big data is transforming the field of biology into a data-intensive discipline. Novel computational algorithms and software tools are in demand to address the data analysis challenges in this growing field. This dissertation comprises the development of a novel algorithm, web-based data analysis tools, and a data visualization platform. Triple Gene Mutual Interaction (TGMI) algorithm, presented in Chapter 2 is an innovative approach to identify key regulatory transcription factors (TFs) that govern a particular biological pathway or a process through interaction among three genes in a triple gene block, which consists of a pair of pathway genes and a TF. The identification of key TFs controlling a biological pathway or a process allows biologists to understand the complex regulatory mechanisms in living organisms. TF-Miner, presented in Chapter 3, is a high-throughput gene expression data analysis web application that was developed by integrating two highly efficient algorithms; TF-cluster and TF-Finder. TF-Cluster can be used to obtain collaborative TFs that coordinately control a biological pathway or a process using genome-wide expression data. On the other hand, TF-Finder can identify regulatory TFs involved in or associated with a specific biological pathway or a process using Adaptive Sparse Canonical Correlation Analysis (ASCCA). Chapter 4 presents ExactSearch; a suffix tree based motif search algorithm, implemented in a web-based tool. This tool can identify the locations of a set of motif sequences in a set of target promoter sequences. ExactSearch also provides the functionality to search for a set of motif sequences in flanking regions from 50 plant genomes, which we have incorporated into the web tool. Chapter 5 presents STTM JBrowse; a web-based RNA-Seq data visualization system built using the JBrowse open source platform. STTM JBrowse is a unified repository to share/produce visualizations created from large RNA-Seq datasets generated from a variety of model and crop plants in which miRNAs were destroyed using Short Tandem Target Mimic (STTM) Technology

    BioBridge: Bringing Data Exploration to Biologists

    Get PDF
    Since the completion of the Human Genome Project in 2003, biologists have become exceptionally good at producing data. Indeed, biological data has experienced a sustained exponential growth rate, putting effective and thorough analysis beyond the reach of many biologists. This thesis presents BioBridge, an interactive visualization tool developed to bring intuitive data exploration to biologists. BioBridge is designed to work on omics style tabular data in general and thus has broad applicability. This work describes the design and evaluation of BioBridge\u27s Entity View primary visualization as well the accompanying user interface. The Entity View visualization arranges glyphs representing biological entities (e.g. genes, proteins, metabolites) along with related text mining results to provide biological context. Throughout development the goal has been to maximize accessibility and usability for biologists who are not computationally inclined. Evaluations were done with three informal case studies, one of a metabolome dataset and two of microarray datasets. BioBridge is a proof of concept that there is an underexploited niche in the data analysis ecosystem for tools that prioritize accessibility and usability. The use case studies, while anecdotal, are very encouraging. These studies indicate that BioBridge is well suited for the task of data exploration. With further development, BioBridge could become more flexible and usable as additional use case datasets are explored and more feedback is gathered

    Saliva Ontology: An ontology-based framework for a Salivaomics Knowledge Base

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Salivaomics Knowledge Base (SKB) is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1) the saliva biomarker discovery and validation resources at UCLA with (2) the ontology resources developed by the OBO (Open Biomedical Ontologies) Foundry, including a new Saliva Ontology (SALO).</p> <p>Results</p> <p>We define the Saliva Ontology (SALO; <url>http://www.skb.ucla.edu/SALO/</url>) as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics domain and to saliva-related diagnostics following the principles of the OBO (Open Biomedical Ontologies) Foundry.</p> <p>Conclusions</p> <p>The Saliva Ontology is an ongoing exploratory initiative. The ontology will be used to facilitate salivaomics data retrieval and integration across multiple fields of research together with data analysis and data mining. The ontology will be tested through its ability to serve the annotation ('tagging') of a representative corpus of salivaomics research literature that is to be incorporated into the SKB.</p

    Bioinformatics-based assessment of the relevance of candidate genes for mutation discovery

    Get PDF
    The bioinformatics resources provide a wide range of tools that can be applied in different areas of mutation screening. The enormous and constantly increasing amount of genomic data obtained in plant-oriented molecular studies requires the development of efficient techniques for its processing. There is a wide range of bioinformatics tools which can aid in the course of mutation discovery. The following chapter focuses mainly on the application of different tools and resources to facilitate a Targeting-Induced Local Lesions in Genomes (TILLING) analysis. TILLING is a technique of reverse genetics that applies a traditional mutagenesis to create DNA libraries of mutagenised individuals that are then subjected to high-throughput screening for the identification of mutations. The bioinformatics tools have shown to be useful in supporting the process of candidate gene selection for mutation screening. The availability of bioinformatics software and experimental data repositories provides a powerful tool which enables a process of multi-database mining. The existing raw experimental data (genomics-related information, expression data, annotated ontologies) can be interpreted in terms of a new biological context. This may help in selecting the proper candidate gene for mutation discovery that is controlling the target phenotype. The mutation screening using a TILLING strategy requires a former knowledge of the full genomic sequence of the gene which is of interest. Depending on whether a fully sequenced genome of a particular species is available, different bioinformatics tools can facilitate this process. Specific tools can be also useful for the identification of possible gene paralogs which may mask the effect of mutated gene. Bioinformatics resources can also support the selection of gene fragments most prone to acquire a deleterious nucleotide change. Finally, there are available tools enabling a proper design of oligonucleotide primers for the amplification of a gene fragment for the purpose of mutation screening
    corecore