Search CORE

1,337 research outputs found

PathExpress update: the enzyme neighbourhood method of associating gene-expression data with metabolic pathways

Author: Antonov
Ashburner
Baitaluk
Breitling
Caspi
Cho
Chung
Cline
Cui
Edgar
G. Weiller
Goto
Goto
Green
Hanisch
Holmes
Ideker
Kanehisa
Konig
Liu
Mlecnik
N. Goffard
Pan
Rapaport
Sackmann
Salomonis
Schuster
Schwartz
T. Frickey
Thimm
Wu
Publication venue: Oxford University Press
Publication date: 24/02/2016
Field of study

The post-genomic era presents us with the challenge of linking the vast amount of raw data obtained with transcriptomic and proteomic techniques to relevant biological pathways. We present an update of PathExpress, a web-based tool to interpret gene-expression data and explore the metabolic network without being restricted to predefined pathways. We define the Enzyme Neighbourhood (EN) as a sub-network of linked enzymes with a limited path length to identify the most relevant sub-networks affected in gene-expression experiments. PathExpress is freely available at: http://bioinfoserver.rsbs.anu.edu.au/utils/PathExpress/

Crossref

PubMed Central

The Australian National University

Visualization and analysis of gene expression in bio-molecular networks

Author: Fung David Cho Yau.
Publication venue: 'The University of Sydney Library'
Publication date: 01/01/2010
Field of study

Sydney eScholarship

A cooperative framework for molecular biology database integration using image object selection.

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration’ and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: - diversity of molecular biology databases schemas, schema constructs and schema implementation -multi-database query using image object keying -database integration technologies using context graph - automated navigation among these databases This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This/involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

A cooperative framework for molecular biology database integration using image object selection

Author: Khan N.
Khan N.
Publication venue
Publication date: 01/01/2004
Field of study

The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

Middlesex University Research Repository

Cholera- and Anthrax-Like Toxins Are among Several New ADP-Ribosyltransferases

Author: A Dereeper
A Gattiker
A Nayeem
A Pautsch
A Sundriyal
A Via
A. Rod Merrill
AC Wallace
AE Lang
AG Murzin
AR Hoffmaster
AR Hoffmaster
Arne Elofsson
AS Olia
B Wallner
BK Beattie
C Dominguez
C Johnson
C Yeats
CA Orengo
CE Bell
CJ O'Neal
CM Fraser
CR Horsburgh Jr
D Bellocchi
D Corda
D Fischer
D Maglott
Dawn White
DD Visschedyk
DL Burns
E Domann
F Armougom
F Koch-Nolte
G Glowacki
G Schiavo
G Suarez
GC Zhou
GE Crooks
H Barth
H Barth
H Hegyi
H Kleine
H Otto
H Ritter
H Rohrer
H Tsuge
H Tsuge
HL Liu
HR Evans
I Iacovache
I Pastan
I Uchida
IT Paulsen
IW Davis
J Audi
J Cheng
J Curak
J Gough
J Menetrey
J Moss
J Sandros
J Sha
J Soding
J Sun
J Sun
JA Young
JC Sierra
JD Thompson
JF Bazan
JL Pons
JP Huelsenbeck
JR Banavar
K Fisher
K Ginalski
K Liolios
KA Siggers
KM Krueger
KP Holbourn
L de Mendonca-Lima
L Van Melderen
LH Coye
M Domenighini
M Domenighini
M Nagahama
M Pawlowski
M Tress
M Zeng
MA Kurowski
MJ Pallen
ML Lesnick
N Eswar
N Waterfield
P Fariselli
P Gouet
PA DiGiuseppe Champion
Q Deng
R Jorgensen
R Jorgensen
RA Laskowski
RA Laskowski
RB Russell
RD Finn
RH Valdivia
RI Sadreyev
RJ Fieldhouse
RL Dunbrack Jr
Robert J. Fieldhouse
S Han
S Han
S Wu
SA Teichmann
SB Avashia
SM Margarit
SO Marx
T Jelinek
TR Kannan
V Briken
V Masignani
V Masignani
XF Wan
Y Chen
Y Qi
Y Zhang
Z Turgeon
Zachari Turgeon
ZQ Fu
Publication venue: Public Library of Science
Publication date
Field of study

Chelt, a cholera-like toxin from Vibrio cholerae, and Certhrax, an anthrax-like toxin from Bacillus cereus, are among six new bacterial protein toxins we identified and characterized using in silico and cell-based techniques. We also uncovered medically relevant toxins from Mycobacterium avium and Enterococcus faecalis. We found agriculturally relevant toxins in Photorhabdus luminescens and Vibrio splendidus. These toxins belong to the ADP-ribosyltransferase family that has conserved structure despite low sequence identity. Therefore, our search for new toxins combined fold recognition with rules for filtering sequences – including a primary sequence pattern – to reduce reliance on sequence identity and identify toxins using structure. We used computers to build models and analyzed each new toxin to understand features including: structure, secretion, cell entry, activation, NAD+ substrate binding, intracellular target binding and the reaction mechanism. We confirmed activity using a yeast growth test. In this era where an expanding protein structure library complements abundant protein sequence data – and we need high-throughput validation – our approach provides insight into the newest toxin ADP-ribosyltransferases

Crossref

Directory of Open Access Journals

PubMed Central

KnetMiner - An integrated data platform for gene mining and biological knowledge discovery

Author: Hassani-Pak Keywan
Publication venue: Universität Bielefeld
Publication date: 01/01/2017
Field of study

Hassani-Pak K. KnetMiner - An integrated data platform for gene mining and biological knowledge discovery. Bielefeld: Universität Bielefeld; 2017.Discovery of novel genes that control important phenotypes and diseases is one of the key challenges in biological sciences. Now, in the post-genomics era, scientists have access to a vast range of genomes, genotypes, phenotypes and ‘omics data which - when used systematically - can help to gain new insights and make faster discoveries. However, the volume and diversity of such un-integrated data is often seen as a burden that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate. Therefore, new tools are required to allow researchers to connect, explore and compare large-scale datasets to identify the genes and pathways that control important phenotypes and diseases in plants, animals and humans. KnetMiner, with a silent "K" and standing for Knowledge Network Miner, is a suite of open-source software tools for integrating and visualising large biological datasets. The software mines the myriad databases that describe an organism’s biology to present links between relevant pieces of information, such as genes, biological pathways, phenotypes and publications with the aim to provide leads for scientists who are investigating the molecular basis for a particular trait. The KnetMiner approach is based on 1) integration of heterogeneous, complex and interconnected biological information into a knowledge graph; 2) text-mining to enrich the knowledge graph with novel relations extracted from literature; 3) graph queries of varying depths to find paths between genes and evidence nodes; 4) evidence-based gene rank algorithm that combines graph and information theory; 5) fast search and interactive knowledge visualisation techniques. Overall, [KnetMiner](http://knetminer.rothamsted.ac.uk) is a publicly available resource that helps scientists trawl diverse biological databases for clues to design better crop varieties and understand diseases. The key strength of KnetMiner is to include the end user into the “interactive” knowledge discovery process with the goal of supporting human intelligence with machine intelligence

Publications at Bielefeld University

Rothamsted Repository

Recommended from our members

Pathway based microarray analysis based on multi-membership gene regulation

Author: Pavlidis Stelios
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityRecent developments in automation and novel experimental techniques have led to the accumulation of vast amounts of biological data and the emergence of numerous databases to store the wealth of information. Consequentially, bioinformatics have drawn considerable attention, accompanied by the development of a plethora of tools for the analysis of biological data. DNA microarrays constitute a prominent example of a high-throughput experimental technique that has required substantial contribution of bioinformatics tools. Following its popularity there is an on-going effort to integrate gene expression with other types of data in a common analytical approach. Pathway based microarray analysis seeks to facilitate microarray data in conjunction with biochemical pathway data and look for a coordinated change in the expression of genes constituting a pathway. However, it has been observed that genes in a pathway may show variable expression, with some appearing activated while others repressed. This thesis aims to add some contribution to pathway based microarray analysis and assist the interpretation of such observations, based on the fact that in all organisms a substantial number of genes take part in more than one biochemical pathway. It explores the hypothesis that the expression of such genes represents a net effect of their contribution to all their constituent pathways, applying statistical and data mining approaches. A heuristic search methodology is proposed to manipulate the pathway contribution of genes to follow underlying trends and interpret microarray results centred on pathway behaviour. The methodology is further refined to account for distinct genes encoding enzymes that catalyse the same reaction, and applied to modules, shorter chains of reactions forming sub-networks within pathways. Results based on various datasets are discussed, showing that the methodology is promising and may assist a biologist to decipher the biochemical state of an organism, in experiments where pathways exhibit variable expression.School of Information Systems, Computing and Mathematics, Brunel Universit

Brunel University Research Archive

Mitmekesiste bioloogiliste andmete ühendamine ja analüüs

Author: Sügis Elena
Publication venue
Publication date: 22/05/2019
Field of study

Väitekirja elektrooniline versioon ei sisalda publikatsiooneTänu tehnoloogiate arengule on bioloogiliste andmete maht viimastel aastatel mitmekordistunud. Need andmed katavad erinevaid bioloogia valdkondi. Piirdudes vaid ühe andmestikuga saab bioloogilisi protsesse või haigusi uurida vaid ühest aspektist korraga. Seetõttu on tekkinud üha suurem vajadus masinõppe meetodite järele, mis aitavad kombineerida eri valdkondade andmeid, et uurida bioloogilisi protsesse tervikuna. Lisaks on nõudlus usaldusväärsete haigusspetsiifiliste andmestike kogude järele, mis võimaldaks vastavaid analüüse efektiivsemalt läbi viia. Käesolev väitekiri kirjeldab, kuidas rakendada masinõppel põhinevaid integratsiooni meetodeid erinevate bioloogiliste küsimuste uurimiseks. Me näitame kuidas integreeritud andmetel põhinev analüüs võimaldab paremini aru saada bioloogilistes protsessidest kolmes valdkonnas: Alzheimeri tõbi, toksikoloogia ja immunoloogia. Alzheimeri tõbi on vanusega seotud neurodegeneratiivne haigus millel puudub efektiivne ravi. Väitekirjas näitame, kuidas integreerida erinevaid Alzheimeri tõve spetsiifilisi andmestikke, et moodustada heterogeenne graafil põhinev Alzheimeri spetsiifiline andmestik HENA. Seejärel demonstreerime süvaõppe meetodi, graafi konvolutsioonilise tehisnärvivõrgu, rakendamist HENA-le, et leida potentsiaalseid haigusega seotuid geene. Teiseks uurisime kroonilist immuunpõletikulist haigust psoriaasi. Selleks kombineerisime patsientide verest ja nahast pärinevad laboratoorsed mõõtmised kliinilise infoga ning integreerisime vastavad analüüside tulemused tuginedes valdkonnaspetsiifilistel teadmistel. Töö viimane osa keskendub toksilisuse testimise strateegiate edasiarendusele. Toksilisuse testimine on protsess, mille käigus hinnatakse, kas uuritavatel kemikaalidel esineb organismile kahjulikke toimeid. See on vajalik näiteks ravimite ohutuse hindamisel. Töös me tuvastasime sarnase toimemehhanismiga toksiliste ühendite rühmad. Lisaks arendasime klassifikatsiooni mudeli, mis võimaldab hinnata uute ühendite toksilisust.A fast advance in biotechnological innovation and decreasing production costs led to explosion of experimental data being produced in laboratories around the world. Individual experiments allow to understand biological processes, e.g. diseases, from different angles. However, in order to get a systematic view on disease it is necessary to combine these heterogeneous data. The large amounts of diverse data requires building machine learning models that can help, e.g. to identify which genes are related to disease. Additionally, there is a need to compose reliable integrated data sets that researchers could effectively work with. In this thesis we demonstrate how to combine and analyze different types of biological data in the example of three biological domains: Alzheimer’s disease, immunology, and toxicology. More specifically, we combine data sets related to Alzheimer’s disease into a novel heterogeneous network-based data set for Alzheimer’s disease (HENA). We then apply graph convolutional networks, state-of-the-art deep learning methods, to node classification task in HENA to find genes that are potentially associated with the disease. Combining patient’s data related to immune disease helps to uncover its pathological mechanisms and to find better treatments in the future. We analyse laboratory data from patients’ skin and blood samples by combining them with clinical information. Subsequently, we bring together the results of individual analyses using available domain knowledge to form a more systematic view on the disease pathogenesis. Toxicity testing is the process of defining harmful effects of the substances for the living organisms. One of its applications is safety assessment of drugs or other chemicals for a human organism. In this work we identify groups of toxicants that have similar mechanism of actions. Additionally, we develop a classification model that allows to assess toxic actions of unknown compounds.https://www.ester.ee/record=b523255

DSpace at Tartu University Library

Automatically exploiting genomic and metabolic contexts to aid the functional annotation of prokaryote genomes

Author: MEDIGUE Claudine
SMITH Adam Alexander Thil
Publication venue
Publication date: 01/01/2012
Field of study

Cette thèse porte sur le développement d'approches bioinformatiques exploitant de l'information de contextes génomiques et métaboliques afin de générer des annotations fonctionnelles de gènes prokaryotes, et comporte deux projets principaux. Le premier projet focalise sur les activités enzymatiques orphelines de séquence. Environ 27% des activités définies par le International Union of Biochemistry and Molecular Biology sont encore aujourd'hui orphelines. Pour celles-ci, les méthodes bioinformatiques traditionnelles ne peuvent proposer de gènes candidats; il est donc impératif d'utiliser des méthodes exploitant des informations contextuelles dans ces cas. La stratégie CanOE (fishingCandidate genes for Orphan Enzymes) a été développée et rajoutée à la plateforme MicroScope dans ce but, intégrant des informations génomiques et métaboliques sur des milliers d'organismes prokaryotes afin de localiser des gènes probants pour des activités orphelines. Le projet miroir au précédent est celui des protéines de fonction inconnue. Un projet collaboratif a été initié au Genoscope afin de formaliser les stratégies d'exploration des fonctions de familles protéiques prokaryotes. Une version pilote du projet a été mise en place sur la famille DUF849 dont une fonction enzymatique avait été récemment découverte. Des stratégies de proposition d'activités enzymatiques alternatives et d'établissement de sous familles isofonctionnelles ont été mises en place dans le cadre de cette thèse, afin de guider les expérimentations de paillasse et d'analyser leurs résultats.The subject of this thesis concerns the development of bioinformatic strategies exploiting genomic and metabolic contextual information in order to generate functional annotations for prokaryote genes. Two main projects were involved during this work: the first focuses on sequence-orphan enzymatic activities. Today, roughly 27% of activities defined by International Union of Biochemistry and Molecular Biology are sequence-orphans. For these, traditional bioinformatic approaches cannot propose candidate genes. It is thus imperative to use alternative, context-based approaches in such cases. The CanOE strategy fishing Candidate genes for Orphan Enzymes) was developed and added to the MicroScope bioinformatics platform in this aim. It integrates genomic and metabolic information across thousands of prokaryote genomes in order to locate promising gene candidates for orphan activities. The mirror project focuses on protein families of unknown function. A collaborative project has been set up at the Genoscope in hope of formalising functional exploration strategies for prokaryote protein families. A pilot version was created on the DUF849 Pfam family, for which a single activity had recently been elucidated. Strategies for proposing novel functions and activities and creating isofunctional sub-families were researched, so as to guide biochemical experimentations and to analyse their results.EVRY-Bib. électronique (912289901) / SudocSudocFranceF

OpenGrey Repository