1,337 research outputs found

    PathExpress update: the enzyme neighbourhood method of associating gene-expression data with metabolic pathways

    Get PDF
    The post-genomic era presents us with the challenge of linking the vast amount of raw data obtained with transcriptomic and proteomic techniques to relevant biological pathways. We present an update of PathExpress, a web-based tool to interpret gene-expression data and explore the metabolic network without being restricted to predefined pathways. We define the Enzyme Neighbourhood (EN) as a sub-network of linked enzymes with a limited path length to identify the most relevant sub-networks affected in gene-expression experiments. PathExpress is freely available at: http://bioinfoserver.rsbs.anu.edu.au/utils/PathExpress/

    Visualization and analysis of gene expression in bio-molecular networks

    Get PDF

    A cooperative framework for molecular biology database integration using image object selection.

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    Cholera- and Anthrax-Like Toxins Are among Several New ADP-Ribosyltransferases

    Get PDF
    Chelt, a cholera-like toxin from Vibrio cholerae, and Certhrax, an anthrax-like toxin from Bacillus cereus, are among six new bacterial protein toxins we identified and characterized using in silico and cell-based techniques. We also uncovered medically relevant toxins from Mycobacterium avium and Enterococcus faecalis. We found agriculturally relevant toxins in Photorhabdus luminescens and Vibrio splendidus. These toxins belong to the ADP-ribosyltransferase family that has conserved structure despite low sequence identity. Therefore, our search for new toxins combined fold recognition with rules for filtering sequences – including a primary sequence pattern – to reduce reliance on sequence identity and identify toxins using structure. We used computers to build models and analyzed each new toxin to understand features including: structure, secretion, cell entry, activation, NAD+ substrate binding, intracellular target binding and the reaction mechanism. We confirmed activity using a yeast growth test. In this era where an expanding protein structure library complements abundant protein sequence data – and we need high-throughput validation – our approach provides insight into the newest toxin ADP-ribosyltransferases

    KnetMiner - An integrated data platform for gene mining and biological knowledge discovery

    Get PDF
    Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: UniversitĂ€t Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and ‘omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans. KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence

    Mitmekesiste bioloogiliste andmete ĂŒhendamine ja analĂŒĂŒs

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsiooneTĂ€nu tehnoloogiate arengule on bioloogiliste andmete maht viimastel aastatel mitmekordistunud. Need andmed katavad erinevaid bioloogia valdkondi. Piirdudes vaid ĂŒhe andmestikuga saab bioloogilisi protsesse vĂ”i haigusi uurida vaid ĂŒhest aspektist korraga. SeetĂ”ttu on tekkinud ĂŒha suurem vajadus masinĂ”ppe meetodite jĂ€rele, mis aitavad kombineerida eri valdkondade andmeid, et uurida bioloogilisi protsesse tervikuna. Lisaks on nĂ”udlus usaldusvÀÀrsete haigusspetsiifiliste andmestike kogude jĂ€rele, mis vĂ”imaldaks vastavaid analĂŒĂŒse efektiivsemalt lĂ€bi viia. KĂ€esolev vĂ€itekiri kirjeldab, kuidas rakendada masinĂ”ppel pĂ”hinevaid integratsiooni meetodeid erinevate bioloogiliste kĂŒsimuste uurimiseks. Me nĂ€itame kuidas integreeritud andmetel pĂ”hinev analĂŒĂŒs vĂ”imaldab paremini aru saada bioloogilistes protsessidest kolmes valdkonnas: Alzheimeri tĂ”bi, toksikoloogia ja immunoloogia. Alzheimeri tĂ”bi on vanusega seotud neurodegeneratiivne haigus millel puudub efektiivne ravi. VĂ€itekirjas nĂ€itame, kuidas integreerida erinevaid Alzheimeri tĂ”ve spetsiifilisi andmestikke, et moodustada heterogeenne graafil pĂ”hinev Alzheimeri spetsiifiline andmestik HENA. SeejĂ€rel demonstreerime sĂŒvaĂ”ppe meetodi, graafi konvolutsioonilise tehisnĂ€rvivĂ”rgu, rakendamist HENA-le, et leida potentsiaalseid haigusega seotuid geene. Teiseks uurisime kroonilist immuunpĂ”letikulist haigust psoriaasi. Selleks kombineerisime patsientide verest ja nahast pĂ€rinevad laboratoorsed mÔÔtmised kliinilise infoga ning integreerisime vastavad analĂŒĂŒside tulemused tuginedes valdkonnaspetsiifilistel teadmistel. Töö viimane osa keskendub toksilisuse testimise strateegiate edasiarendusele. Toksilisuse testimine on protsess, mille kĂ€igus hinnatakse, kas uuritavatel kemikaalidel esineb organismile kahjulikke toimeid. See on vajalik nĂ€iteks ravimite ohutuse hindamisel. Töös me tuvastasime sarnase toimemehhanismiga toksiliste ĂŒhendite rĂŒhmad. Lisaks arendasime klassifikatsiooni mudeli, mis vĂ”imaldab hinnata uute ĂŒhendite toksilisust.A fast advance in biotechnological innovation and decreasing production costs led to explosion of experimental data being produced in laboratories around the world. Individual experiments allow to understand biological processes, e.g. diseases, from different angles. However, in order to get a systematic view on disease it is necessary to combine these heterogeneous data. The large amounts of diverse data requires building machine learning models that can help, e.g. to identify which genes are related to disease. Additionally, there is a need to compose reliable integrated data sets that researchers could effectively work with. In this thesis we demonstrate how to combine and analyze different types of biological data in the example of three biological domains: Alzheimer’s disease, immunology, and toxicology. More specifically, we combine data sets related to Alzheimer’s disease into a novel heterogeneous network-based data set for Alzheimer’s disease (HENA). We then apply graph convolutional networks, state-of-the-art deep learning methods, to node classification task in HENA to find genes that are potentially associated with the disease. Combining patient’s data related to immune disease helps to uncover its pathological mechanisms and to find better treatments in the future. We analyse laboratory data from patients’ skin and blood samples by combining them with clinical information. Subsequently, we bring together the results of individual analyses using available domain knowledge to form a more systematic view on the disease pathogenesis. Toxicity testing is the process of defining harmful effects of the substances for the living organisms. One of its applications is safety assessment of drugs or other chemicals for a human organism. In this work we identify groups of toxicants that have similar mechanism of actions. Additionally, we develop a classification model that allows to assess toxic actions of unknown compounds.https://www.ester.ee/record=b523255

    Automatically exploiting genomic and metabolic contexts to aid the functional annotation of prokaryote genomes

    Get PDF
    Cette thÚse porte sur le développement d'approches bioinformatiques exploitant de l'information de contextes génomiques et métaboliques afin de générer des annotations fonctionnelles de gÚnes prokaryotes, et comporte deux projets principaux. Le premier projet focalise sur les activités enzymatiques orphelines de séquence. Environ 27% des activités définies par le International Union of Biochemistry and Molecular Biology sont encore aujourd'hui orphelines. Pour celles-ci, les méthodes bioinformatiques traditionnelles ne peuvent proposer de gÚnes candidats; il est donc impératif d'utiliser des méthodes exploitant des informations contextuelles dans ces cas. La stratégie CanOE (fishingCandidate genes for Orphan Enzymes) a été développée et rajoutée à la plateforme MicroScope dans ce but, intégrant des informations génomiques et métaboliques sur des milliers d'organismes prokaryotes afin de localiser des gÚnes probants pour des activités orphelines. Le projet miroir au précédent est celui des protéines de fonction inconnue. Un projet collaboratif a été initié au Genoscope afin de formaliser les stratégies d'exploration des fonctions de familles protéiques prokaryotes. Une version pilote du projet a été mise en place sur la famille DUF849 dont une fonction enzymatique avait été récemment découverte. Des stratégies de proposition d'activités enzymatiques alternatives et d'établissement de sous familles isofonctionnelles ont été mises en place dans le cadre de cette thÚse, afin de guider les expérimentations de paillasse et d'analyser leurs résultats.The subject of this thesis concerns the development of bioinformatic strategies exploiting genomic and metabolic contextual information in order to generate functional annotations for prokaryote genes. Two main projects were involved during this work: the first focuses on sequence-orphan enzymatic activities. Today, roughly 27% of activities defined by International Union of Biochemistry and Molecular Biology are sequence-orphans. For these, traditional bioinformatic approaches cannot propose candidate genes. It is thus imperative to use alternative, context-based approaches in such cases. The CanOE strategy fishing Candidate genes for Orphan Enzymes) was developed and added to the MicroScope bioinformatics platform in this aim. It integrates genomic and metabolic information across thousands of prokaryote genomes in order to locate promising gene candidates for orphan activities. The mirror project focuses on protein families of unknown function. A collaborative project has been set up at the Genoscope in hope of formalising functional exploration strategies for prokaryote protein families. A pilot version was created on the DUF849 Pfam family, for which a single activity had recently been elucidated. Strategies for proposing novel functions and activities and creating isofunctional sub-families were researched, so as to guide biochemical experimentations and to analyse their results.EVRY-Bib. électronique (912289901) / SudocSudocFranceF
