5,435 research outputs found
Influenza research database: an integrated bioinformatics resource for influenza research and surveillance.
BackgroundThe recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics.DesignThe Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in-protected 'workbench' spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.ResultsTo demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross-protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks.ConclusionsThe IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics
Recommended from our members
Evolutionary relationships among bifidobacteria and their hosts and environments.
BACKGROUND:The assembly of animal microbiomes is influenced by multiple environmental factors and host genetics, although the relative importance of these factors remains unclear. Bifidobacteria (genus Bifidobacterium, phylum Actinobacteria) are common first colonizers of gut microbiomes in humans and inhabit other mammals, social insects, food, and sewages. In humans, the presence of bifidobacteria in the gut has been correlated with health-promoting benefits. Here, we compared the genome sequences of a subset of the over 400 Bifidobacterium strains publicly available to investigate the adaptation of bifidobacteria diversity. We tested 1) whether bifidobacteria show a phylogenetic signal with their isolation sources (hosts and environments) and 2) whether key traits encoded by the bifidobacteria genomes depend on the host or environment from which they were isolated. We analyzed Bifidobacterium genomes available in the PATRIC and NCBI repositories and identified the hosts and/or environment from which they were isolated. A multilocus phylogenetic analysis was conducted to compare the genetic relatedness the strains harbored by different hosts and environments. Furthermore, we examined differences in genomic traits and genes related to amino acid biosynthesis and degradation of carbohydrates. RESULTS:We found that bifidobacteria diversity appears to have evolved with their hosts as strains isolated from the same host were non-randomly associated with their phylogenetic relatedness. Moreover, bifidobacteria isolated from different sources displayed differences in genomic traits such as genome size and accessory gene composition and on particular traits related to amino acid production and degradation of carbohydrates. In contrast, when analyzing diversity within human-derived bifidobacteria, we observed no phylogenetic signal or differences on specific traits (amino acid biosynthesis genes and CAZymes). CONCLUSIONS:Overall, our study shows that bifidobacteria diversity is strongly adapted to specific hosts and environments and that several genomic traits were associated with their isolation sources. However, this signal is not observed in human-derived strains alone. Looking into the genomic signatures of bifidobacteria strains in different environments can give insights into how this bacterial group adapts to their environment and what types of traits are important for these adaptations
Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm
Peer reviewedPublisher PD
Domain-mediated interactions for protein subfamily identification
Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu
ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web
Visualization tools for biological data are often limited in their ability to interactively integrate data at multiple scales. These computational tools are also typically limited by two-dimensional displays and programmatic implementations that require separate configurations for each of the user's computing devices and recompilation for functional expansion. Towards overcoming these limitations we have developed “ePlant” (http://bar.utoronto.ca/eplant) – a suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. These tools display data spanning multiple biological scales on interactive three-dimensional models. Currently, ePlant consists of the following modules: a sequence conservation explorer that includes homology relationships and single nucleotide polymorphism data, a protein structure model explorer, a molecular interaction network explorer, a gene product subcellular localization explorer, and a gene expression pattern explorer. The ePlant's protein structure explorer module represents experimentally determined and theoretical structures covering >70% of the Arabidopsis proteome. The ePlant framework is accessed entirely through a web browser, and is therefore platform-independent. It can be applied to any model organism. To facilitate the development of three-dimensional displays of biological data on the world wide web we have established the “3D Data Display Initiative” (http://3ddi.org)
Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments.
First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing
value imputation.
Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA).
Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.Siirretty Doriast
Frustration in Biomolecules
Biomolecules are the prime information processing elements of living matter.
Most of these inanimate systems are polymers that compute their structures and
dynamics using as input seemingly random character strings of their sequence,
following which they coalesce and perform integrated cellular functions. In
large computational systems with a finite interaction-codes, the appearance of
conflicting goals is inevitable. Simple conflicting forces can lead to quite
complex structures and behaviors, leading to the concept of "frustration" in
condensed matter. We present here some basic ideas about frustration in
biomolecules and how the frustration concept leads to a better appreciation of
many aspects of the architecture of biomolecules, and how structure connects to
function. These ideas are simultaneously both seductively simple and perilously
subtle to grasp completely. The energy landscape theory of protein folding
provides a framework for quantifying frustration in large systems and has been
implemented at many levels of description. We first review the notion of
frustration from the areas of abstract logic and its uses in simple condensed
matter systems. We discuss then how the frustration concept applies
specifically to heteropolymers, testing folding landscape theory in computer
simulations of protein models and in experimentally accessible systems.
Studying the aspects of frustration averaged over many proteins provides ways
to infer energy functions useful for reliable structure prediction. We discuss
how frustration affects folding, how a large part of the biological functions
of proteins are related to subtle local frustration effects and how frustration
influences the appearance of metastable states, the nature of binding
processes, catalysis and allosteric transitions. We hope to illustrate how
Frustration is a fundamental concept in relating function to structural
biology.Comment: 97 pages, 30 figure
Applications of Evolutionary Bioinformatics in Basic and Biomedical Research
With the revolutionary progress in sequencing technologies, computational biology emerged as a game-changing field which is applied in understanding molecular events of life for not only complementary but also exploratory purposes. Bioinformatics resources and tools significantly help in data generation, organization and analysis. However, there is still a need for developing new approaches built based on a biologist’s point of view. In protein bioinformatics, there are several fundamental problems such as (i) determining protein function; (ii) identifying protein-protein interactions; (iii) predicting the effect of amino acid variants. Here, I present three chapters addressing these problems from an evolutionary perspective. Firstly, I describe a novel search pipeline for protein domain identification. The algorithm chain provides sensitive domain assignments with the highest possible specificity. Secondly, I present a tool enabling large-scale visualization of presences and absences of proteins in hierarchically clustered genomes. This tool visualizes multi-layer information of any kind of genome-linked data with a special focus on domain architectures, enabling identification of coevolving domains/proteins, which can eventually help in identifying functionally interacting proteins. And finally, I propose an approach for distinguishing between benign and damaging missense mutations in a human disease by establishing the precise evolutionary history of the associated gene. This part introduces new criteria on how to determine functional orthologs via phylogenetic analysis. All three parts use comparative genomics and/or sequence analyses. Taken together, this study addresses important problems in protein bioinformatics and as a whole it can be utilized to describe proteins by their domains, coevolving partners and functionally important residues
Quantification of DNA-associated proteins inside eukaryotic cells using single-molecule localization microscopy
Development of single-molecule localization microscopy techniques has allowed nanometre scale localization accuracy inside cells, permitting the resolution of ultra-fine cell structure and the elucidation of crucial molecular mechanisms. Application of these methodologies to understanding processes underlying DNA replication and repair has been limited to defined in vitro biochemical analysis and prokaryotic cells. In order to expand these techniques to eukaryotic systems, we have further developed a photo-activated localization microscopy-based method to directly visualize DNA-associated proteins in unfixed eukaryotic cells. We demonstrate that motion blurring of fluorescence due to protein diffusivity can be used to selectively image the DNA-bound population of proteins. We designed and tested a simple methodology and show that it can be used to detect changes in DNA binding of a replicative helicase subunit, Mcm4, and the replication sliding clamp, PCNA, between different stages of the cell cycle and between distinct genetic backgrounds
- …