1,628 research outputs found

    Representing short sequences in the context of a model organism genome

    Get PDF
    In the post-genomics era, the sheer volume of data is overwhelming without appropriate tools for data integration and analysis. Studying genomic sequences in the context of other related genomic sequences, i.e. comparative genomics, is a powerful technique enabling the identification of functionally interesting sequence regions based on the principal that similar sequences tend to be either homologous or provide similar functionality. Costs associated with full genome sequencing make it infeasible to sequence every genome of interest. Consequently, simple, smaller genomes are used as model organisms for more complex organisms, for instance, Mouse/Human. An annotated model organism provides a source of annotation for transcribed sequences and other gene regions of the more complex organism based on sequence homology. For example, the gene annotations from the model organism aid interpretation of expression studies in more complex organisms. To assist with comparative genomics research in the Arabidopsis/Brassica (Thale-cress/Canola) model-crop pair, a web-based, graphical genome browser (BioViz) was developed to display short Brassica genomic sequences in the context of the Arabidopsis model organism genome. This involved the development of graphical representations to integrate data from multiple sources and tools, and a novel user interface to provide the user with a more interactive web-based browsing experience. While BioViz was developed for the Arabidopsis/Brassica comparative genomics context, it could be applied to comparative browsing relative to other reference genomes. BioViz proved to be an valuable research support tool for Brassica / Arabidopsis comparative genomics. It provided convenient access to the underlying Arabidopsis annotation, allowed the user to view specific EST sequences in the context of the Arabidopsis genome and other related EST sequences. In addition, the limits to which the project pushed the SVG specification proved influential in the SVG community. The work done for BioViz inspired the definition of an opensource project to define standards for SVG based web applications and a standard framework for SVG based widget sets

    High-performance integrated virtual environment (HIVE) tools and applications for big data analysis

    Get PDF
    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis

    Selected Works in Bioinformatics

    Get PDF
    This book consists of nine chapters covering a variety of bioinformatics subjects, ranging from database resources for protein allergens, unravelling genetic determinants of complex disorders, characterization and prediction of regulatory motifs, computational methods for identifying the best classifiers and key disease genes in large-scale transcriptomic and proteomic experiments, functional characterization of inherently unfolded proteins/regions, protein interaction networks and flexible protein-protein docking. The computational algorithms are in general presented in a way that is accessible to advanced undergraduate students, graduate students and researchers in molecular biology and genetics. The book should also serve as stepping stones for mathematicians, biostatisticians, and computational scientists to cross their academic boundaries into the dynamic and ever-expanding field of bioinformatics

    Work ow-based systematic design of high throughput genome annotation

    No full text
    The genus Eimeria belongs to the phylum Apicomplexa, which includes many obligate intra-cellular protozoan parasites of man and livestock. E. tenella is one of seven species that infect the domestic chicken and cause the intestinal disease coccidiosis which is economy important for poultry industry. E. tenella is highly pathogenic and is often used as a model species for the Eimeria biology studies. In this PhD thesis, a comprehensive annotation system named as \WAGA" (Workflow-based Automatically Genome Annotation) was built and applied to the E. tenella genome. InforSense KDE, and its BioSense plug-in (products of the InforSense Company), were the core softwares used to build the workflows. Workflows were made by integrating individual bioinformatics tools into a single platform. Each workflow was designed to provide a standalone service for a particular task. Three major workflows were developed based on the genomic resources currently available for E. tenella. These were of ESTs-based gene construction, HMM-based gene prediction and protein-based annotation. Finally, a combining workflow was built to sit above the individual ones to generate a set of automatic annotations using all of the available information. The overall system and its three major components were deployed as web servers that are fully tuneable and reusable for end users. WAGA does not require users to have programming skills or knowledge of the underlying algorithms or mechanisms of its low level components. E. tenella was the target genome here and all the results obtained were displayed by GBrowse. A sample of the results is selected for experimental validation. For evaluation purpose, WAGA was also applied to another Apicomplexa parasite, Plasmodium falciparum, the causative agent of human malaria, which has been extensively annotated. The results obtained were compared with gene predictions of PHAT, a gene finder designed for and used in the P. falciparum genome project

    Proceedings, MSVSCC 2018

    Get PDF
    Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

    Security, Trust and Privacy (STP) Model for Federated Identity and Access Management (FIAM) Systems

    Get PDF
    The federated identity and access management systems facilitate the home domain organization users to access multiple resources (services) in the foreign domain organization by web single sign-on facility. In federated environment the user’s authentication is performed in the beginning of an authentication session and allowed to access multiple resources (services) until the current session is active. In current federated identity and access management systems the main security concerns are: (1) In home domain organization machine platforms bidirectional integrity measurement is not exist, (2) Integrated authentication (i.e., username/password and home domain machine platforms mutual attestation) is not present and (3) The resource (service) authorization in the foreign domain organization is not via the home domain machine platforms bidirectional attestation

    Development of novel software tools and methods for investigating the significance of overlapping transcription factor genomic interactions

    Get PDF
    Identifying overlapping DNA binding patterns of different transcription factors is a major objective of genomic studies, but existing methods to archive large numbers of datasets in a personalised database lack sophistication and utility. To address this need, various database systems were benchmarked and a tool BiSA (Binding Sites Analyser) was developed for archiving of genomic regions and easy identification of overlap with or proximity to other regions of interest. BiSA can also calculate statistical significance of overlapping regions and can also identify genes located near binding regions of interest or genomic features near a gene or locus of interest. BiSA was populated with >1000 datasets from previously published genomic studies describing transcription factor binding sites and histone modifications. Using BiSA, the relationships between binding sites for a range of transcription factors were analysed and a number of statistically significant relationships were identified. This included an extensive comparison of estrogen receptor alpha (ERα) and progesterone receptor (PR) in breast cancer cells, which revealed a statistically significant functional relationship at a subset of sites. In summary, the BiSA comprehensive knowledge base contains publicly available datasets describing transcription factor binding sites and epigenetic modification and provides an easy graphical interface to biologists for advanced analysis of genomic interactions

    Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

    Get PDF
    The aim of the project described in this thesis is to understand and predict the characteristics and behaviour of a family of bacteria through an analysis of genome wide data from a variety of sources. The focus of the research is a family of bacteria, Bacillus, whose members show a diverse range of phenotypes, from the non-pathogenic B. subtilis to B. anthrncis, the causative agent of anthrax. Specifically, the focus was on the genomic scale identification and characterisation of secreted proteins from Bacillus species. Firstly, the application of Grid-based computational approaches to problems in genomic analysis and annotation was investigated, applying mllGrid technology to a biological problem not previously addressed using this approach. e-Science workflows and a service-oriented approach were developed and applied to predict and characterise secreted proteins, and the results automatically integrated into a custom relational database. An associated Web portal was also developed to facilitate expert curation, results browsing and querying over the database. Workflow technology was also used to classify the putative secreted proteins into families and to study the relationships between and within these families. The design of the workflows, the architecture and the reasoning behind the approach used to build this system, called BaSPP, are discussed. Analysis of the putative Bacillus secretomes revealed clear distinctions between proteins present in the pathogens and those in the non-pathogens. The properties of the protein families present in all Bacillus secretomes, as well as those specific either to the pathogens or to the non-pathogens were investigated. Many of the protein families contained members of unknown function. In the iv second part of the project, these families were investigated in more depth, using additional data integration strategies not previously applied to these organisms. The secretomes were modelled at the system level, in the broader context of interactomes. A system called SubtilNet was therefore developed, using B. subtilis as the reference organism. As part of SubtilNet, a toolkit and architecture were developed and implemented for building and analysing probabilistic functional integrated networks (PFINs). The PFINs built for each Bacillus species using this system were subsequently used to delve further into the interactions specific to the secreted proteins by extracting and exploring the cross-species PFINs of these proteins. The cross-species PFINs for the protein families specific to the pathogens and non-pathogens were explored, with particular emphasis on the core PrsA-like protein family, which acted as a use case to show how the PFIN s can be used to shed light on protein function. The addition of orthologous links between species was demonstrated to facilitate network clustering and analysis, enabling putative annotations to be applied to proteins previously of unknown function.EThOS - Electronic Theses Online ServiceNorth East Regional e-Science Centre : European Commission (LSHC-CT-2004-503468) : EPSRC : Non-Linear DynamicsGBUnited Kingdo
    • …
    corecore