14 research outputs found

    Bioinformatics as a Tool to Identify Infectious Disease Pathogen Peptide Sequences as Targets for Antibody Engineering

    Get PDF
    Bioinformatics is an interdisciplinary field of information technology for understanding biological data from genome to protein. It includes a combination of fields of science, computer science, statistics, mathematics, and engineering to analyze, interpret and derive biological data. This chapter describes how to use Bioinformatics to identify pathogen virulence factor peptide sequence similarities in human nerve tissue proteins and for evaluation as antibody engineering target peptides

    Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes

    Get PDF
    Genetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.&nbsp

    Phylogenetic placement of environmental sequences using taxonomically reliable databases helps to rigorously assess dinophyte biodiversity in Bavarian lakes (Germany).

    Get PDF
    1. Reliable determination of organisms is a prerequisite to explore their spatial and temporal occurrence and to study their evolution, ecology, and dispersal. In Europe, Bavaria (Germany) provides an excellent study system for research on the origin and diversification of freshwater organisms including dinophytes, due to the presence of extensive lake districts and ice age river valleys. Bavarian freshwater environments are ecologically diverse and range from deep nutrient‐poor mountain lakes to shallow nutrient‐rich lakes and ponds. 2. We obtained amplicon sequence data (V4 region of small subunit‐rRNA, c. 410 bp long) from environmental samples collected at 11 sites in Upper Bavaria. We found 186 operational taxonomic units (OTUs) associated with Dinophyceae that were further classified by means of a phylogenetic placement approach. 3. The maximum likelihood tree inferred from a well‐curated reference alignment comprised a systematically representative set of 251 dinophytes, covering the currently known molecular diversity and OTUs linked to type material if possible. Environmental OTUs were scattered across the reference tree, but accumulated mostly in freshwater lineages, with 79% of OTUs placed in either Apocalathium, Ceratium, or Peridinium, the most frequently encountered taxa in Bavaria based on morphology. 4. Twenty‐one Bavarian OTUs showed identical sequences to already known and vouchered accessions, two of which are linked to type material, namely Palatinus apiculatus and Theleodinium calcisporum. Particularly within Peridiniaceae, delimitation of Peridinium species was based on the intraspecific sequence variation. 5. Our approach indicates that high‐throughput sequencing of environmental samples is effective for reliable determination of dinophyte species in Bavarian lakes. We further discuss the importance of well‐curated reference databases that remain to be developed in the future

    The Majority of Active Rhodobacteraceae in Marine Sediments Belong to Uncultured Genera: A Molecular Approach to Link Their Distribution to Environmental Conditions

    Get PDF
    General studies on benthic microbial communities focus on fundamental biogeochemical processes or the most abundant constituents. Thereby, minor fractions such as the Rhodobacteraceae are frequently neglected. Even though this family belongs to the most widely distributed bacteria in the marine environment, their proportion on benthic microbial communities is usually within or below the single digit range. Thus, knowledge on these community members is limited, even though their absolute numbers might exceed those from the pelagic zone by orders of magnitudes. To unravel the distribution and diversity of benthic, metabolically active Rhodobacteraceae, we have now analyzed an already existing library of bacterial 16S rRNA transcripts. The dataset originated from 154 individual sediment samples comprising seven oceanic regions and a broad variety of environmental conditions. Across all samples, a total of 0.7% of all 16S rRNA transcripts was annotated as Rhodobacteraceae. Among those, Sulfitobacter, Paracoccus, and Phaeomarinomonas were the most abundant cultured representatives, but the majority (78%) was affiliated to uncultured family members. To define them, the 45 most abundant Rhodobacteraceae-OTUs assigned as “uncultured” were phylogenetically assembled in new clusters. Their next relatives particularly belonged to different subgroups other than the Roseobacter group, reflecting a large part of the hidden diversity within the benthic Rhodobacteraceae with unknown functions. The general composition of active Rhodobacteraceae communities was found to be specific for the geographical location, exhibiting a decreasing richness with sediment depth. One-third of the Rhodobacteraceae-OTUs significantly responded to the prevailing redox regime, suggesting an adaption to anoxic conditions. A possible approach to predict their physiological properties is to identify the metabolic capabilities of their nearest relatives. Those need to be proven by physiological experiments, as soon an isolate is available. Because many uncultured members of these subgroups likely thrive under anoxic conditions, in future research, a molecular-guided cultivation strategy can be pursued to isolate novel Rhodobacteraceae from sediments

    遺伝子変異株の代謝流束を推定するためのデータベース構築と Genetic Modification Fluxソフトウェアの開発

    Get PDF
    In understanding the complexity of a metabolic network structure, flux distribution is the key information to observe as it holds direct representation of cellular phenotype. To examine this, the study on genetically perturbed conditions (e.g. gene deletion/knockout) is one of the useful methods, which significantly contributes to metabolic engineering and biotechnology applications. Currently, metabolic flux analysis (MFA) is proven to be suitable mechanism for specific gene knockout studies, yet the method involves exhaustive computational effort since the calculation are derived by a stoichiometric model of major intracellular reactions and applying mass balances to the intracellular metabolites. Metabolic Flux Analysis (MFA) is widely used to investigate the metabolic fluxes of a variety of cells. MFA is based on the stoichiometric matrix of metabolic reactions and their thermodynamic constraints. The matrix is derived from a metabolic network map, where the rows and columns represent metabolites, chemical/transport reactions, respectively. MFA is very effective in understanding the mechanism of how metabolic networks generate a variety of cellular functions and in rationally planning a gene deletion/amplification strategy for strain improvements. Flux Balance Analysis (FBA) is used to predict the steady-state flux distribution of genetically modified cells under different culture conditions. Minimization of Metabolic Adjustment (MOMA) was developed to predict the flux distributions of gene deletion mutants. FBA and MOMA often lead to incorrect predictions in situations where the constraints associated with regulation of gene expression or activity of the gene products are dominant, because they apply the Boolean logics or its related simple logics to gene regulations and enzyme activities. On the other hand, network-based pathway analyses, elementary modes (EMs) and extreme pathways emerge as alternative ways for constructing a mathematical model of metabolic networks with gene regulations. EM analysis was suggested to be convenient for integrating an enzyme activity profile into the flux distribution. Enzyme Control Fluxes (ECFs) uses the relative enzyme activity profile of a mutant to wild type to predict its flux distribution. In facilitating the analysis of metabolic flux distributions, the support of computational approaches is significantly essential. In addition, the availability of real sample data particularly for further observation, a large number of knockout mutant data provides assistance in enhancing the process. We had presented Genetic Modification Flux (GMF) that predicts the flux distribution of a broad range of genetically modified mutants. The feasibility of GMF to predict the flux distribution of genetic modification mutants is validated on various metabolic network models. The prediction using GMF shows higher prediction accuracy as compared to FBA and MOMA. To enhance the feasibility and usability of GMF, we developed two versions of simulator application with metabolic network database to predict flux distribution of genetically modified mutants. 112 data sets of Escherichia coli (E.coli), Corynebacterium glutamicum (C.glutamicum), Saccharomyces cerevisiae (S.cerevisiae), and Chinese Hamster Ovary (CHO) were registered as standard models.九州工業大学博士学位論文 学位記番号:情工博甲第313号 学位授与年月日:平成28年6月30日1: INTRODUCTION AND BACKGROUND|2: MATERIALS AND METHODS|3: RESULT AND DISCUSSION|4: CONCLUSION九州工業大学平成28年

    Αλλάζοντας ραγδαία το ηλεκτρονικό εμπόριο : ;ένα μοντέλο ανάλυσης μεγάλων δεδομένων για καλύτερες αποφάσεις μάρκετινγκ

    Get PDF
    Διπλωματική εργασία--Πανεπιστήμιο Μακεδονίας--Μεταπτυχιακό στη Διοίκηση Επιχειρήσεων (για Στελέχη), Θεσσαλονίκη, 2019.Τα big data αποτελούν ένα σύγχρονο πεδίο ιδιαίτερου ερευνητικού ενδιαφέροντος, καθώς συνιστούν έναν κυρίαρχο παράγοντα της αποδοτικότητας και αποτελεσματικότητας των επιχειρήσεων, ιδιαίτερα αυτών που δραστηριοποιούνται στο ηλεκτρονικό εμπόριο. Η επιτυχής διαχείριση των big data συνεπάγεται αυξημένη ανταγωνιστικότητα, βελτιωμένες σχέσεις με τους πελάτες και αναβαθμισμένη πελατειακή δέσμευση, τα οποία μεταφράζονται σε αυξημένες πωλήσεις και διευρυμένα μερίδια αγοράς. Στο πεδίο του marketing, η αποτελεσματική διαχείριση των big data παρέχει δυνατότητες βελτιστοποίησης των διαδικασιών λήψης αποφάσεων, μέσω της παροχής χρήσιμης, πολύπλευρης και ουσιαστικής πληροφόρησης για το καταναλωτικό κοινό στο οποίο κατευθύνονται οι δράσεις του marketing. Ωστόσο, η εφαρμογή ολοκληρωμένων συστημάτων διαχείρισης big data αποτελεί ένα δύσκολο εγχείρημα, το οποίο θα πρέπει να λαμβάνει υπόψη όλες τις επιδρούμενες μεταβλητές, συμπεριλαμβανομένων των στόχων, των παραγόντων επιτυχίας και της τεχνολογίας. Σκοπός της παρούσας μελέτης είναι η ανάπτυξη ενός υποδείγματος διαχείρισης big data στο πεδίο του marketing, εστιάζοντας στους στόχους του συστήματος, τους Key Performance Indicators (KPIs), τις επιμέρους λειτουργίες, τις απαιτούμενες τεχνικές, τη διαθέσιμη τεχνολογία και τα ζητήματα ασφάλειας και ιδιωτικότητας των δεδομένων

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Bioinformatics-driven development of a queryable cardiometabolic database and its application in a biological setting

    Get PDF
    A thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg in fulfilment of the requirements for the degree of Doctor of Philosophy. June 2017, JohannesburgAs sequencing and genotyping technologies are advancing, larger and more complex sets of biological data are being produced. Databases can be used to efficiently store and manage the data. Typically, publicly available datasets are accessed through web browsers that offer a user-friendly interface to a database, making complex queries simple to execute. However, research projectspecific data are not commonly stored in this way. In this research, a database (designed in MySQL) and accompanying interface (developed using PHP, HTML and CSS) has been designed for the storage and querying of the quality controlled data from the current project using Metabochip-genotyped Birth to Twenty (Bt20) cohort participants and their female caregivers. Users can easily access the data to generate summary statistics on the phenotype data and download phenotype, single nucleotide polymorphism (SNP) annotation and association analysis data that match user-supplied criteria. Some of the data from the database was used to investigate the genetics of blood pressure (BP) in black South African individuals. Hypertension is a major risk factor for cardiovascular diseases (CVDs). BP variation is known to have a genetic component, but genetic studies in indigenous Africans have been limited. Association analysis, carried out in a merged sample of caregivers and participants, pointed to novel regions of interest in the NOS1AP (DBP and SBP), MYRF (SBP) and POC1B (SBP) genes and two intergenic regions (DACH1|LOC440145 (DBP and SBP) and INTS10|LPL (SBP)). Two SNPs in the MYRF gene met the calculated “array-wide” significance threshold (p<6.7x10-7 for the merged dataset) for multiple testing. Genotype imputation is a useful addition to association studies to increase the SNP panel for association testing. An investigation into the efficiency of imputation in this dataset using a mixed population reference panel was carried out. Imputation was achieved with high confidence in all genes, but a more detailed view of the region was only seen in NOS1AP (DBP and SBP in both the merged and female caregiver datasets) and POC1B (Bt20 participant dataset only). Overall, the research contributed a useful tool for the efficient management of project-specific biological data. The analysis and genotype imputation, which is a promising tool in future studies in this or other African datasets, also provided some insight into the genetics of blood pressure in black South Africans with further functional and replication studies in larger samples required to confirm and explain the findings.MT 201

    Développement de nouveaux outils pour l'intégration des données du ChIP-Seq et leurs applications pour l'étude du contrôle de la transcription

    Get PDF
    Les progrès fulgurants des technologies de séquençage permettent de développer des projets de recherche très complexes. De plus, les consortiums internationaux tels qu’ENCODE, Roadmap Epigenomics et Fantom offrent publiquement de vastes jeux de donnés à la communauté scientifique. Ainsi, mon projet de recherche au doctorat a pour but de développer de nouvelles approches bioinformatiques afin d’analyser efficacement les données génomiques de type ChIP-Seq pour cibler les changements dans les patrons d’interactions entre les protéines et l’ADN. De nouveaux outils R tels ENCODExplorer et FantomTSS ont donc été développés afin de faciliter l’intégration des données publiques. De plus, l’outil metagene, développé dans le cadre de mon doctorat, permet de comparer les patrons d’enrichissement des protéines interagissant avec l’ADN. Il extrait efficacement la couverture des régions génomiques, normalise le signal et d’utilise les contrôles pour retirer le bruit de fond. Il produit des graphiques pour comparer visuellement les facteurs et conditions et offre des outils statistiques pour cibler les profils significativement différents. Afin de valider mon approche expérimentale, j’ai analysé une centaine de jeux de données de ChIP-Seq de la lignée GM12878 pour étudier les profils d’enrichissement au niveau des amplificateurs et des promoteurs en fonction de leur activité transcriptionnelle. Cette étude a ciblé deux modes de recrutement distincts, soit l’effet gradient et l’effet seuil. Face à la complexité et la quantité de données disponibles, il est essentiel de développer de nouvelles approches méthodologiques et statistiques afin d’améliorer notre compréhension des mécanismes biologiques. ENCODExplorer et metagene sont disponibles sur Bioconductor.Recent progress in sequencing technologies opened the possibility of performing very complex research experiments. Combined with the vast public datasets produced by intenational consortiums such as ENCODE, Roadmap Epigenomics and Fantoms, the amount of data to process can be daunting. The goal of my doctoral project is to develop new bioinformatic approaches to facilitate the integration of ChIP-Seq data for the study of the dynamic of the interactions between proteins and DNA. New tools such as ENCODExplorer and FantomTSS were developped in R to make the publicly available datasets easier to integrate. Futhermore, the metagene package allows the comparison of enrichment patterns of DNA-interacting proteins. This package efficiently extracts read coverage from genomic regions of interest, normalize the signal and uses controls to remove background noise. The main functionnality of the metagene package is to visually compare enrichment profiles from multiple groups of genomic regions and to offer statistical tools to caracterize and compare those profiles. To validate my experimental approach, I used over a hundred datasets from the GM12878 cell line produced by the ENCODE consortium to study the enrichment profiles of transcription factors and histones in enhnacer and promoter regions. I was able to define two distinct recruitment patterns: the gradient effect and the threshold effect. With the ever growing complexity of genomic datasets, it is essential to develop new methodotical approaches to allow a better understanding of the underlying biological processes. ENCODExplorer and metagene are both available on Bioconductor
    corecore