137 research outputs found
Analysis of superfamily specific profile-profile recognition accuracy
BACKGROUND: Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily. RESULTS: The performance varies greatly between superfamilies with the truncated receiver operating characteristic, ROC(10), varying from 0.95 down to 0.01. These large differences persist even when the profiles are trimmed to approximately the same level of diversity. CONCLUSIONS: Although the number of sequences in the profile (profile width) and degree of sequence variation within positions in the profile (profile diversity) contribute to accurate detection there are other superfamily specific factors
A high level interface to SCOP and ASTRAL implemented in Python
BACKGROUND: Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. RESULTS: We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. CONCLUSION: The modules make the analysis and generation of datasets for use in structural genomics easier and more principled
Representing and querying disease networks using graph databases
BACKGROUND: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. RESULTS: We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. CONCLUSIONS: Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0102-8) contains supplementary material, which is available to authorized users
Clustering of Pseudomonas aeruginosa transcriptomes from planktonic cultures, developing and mature biofilms reveals distinct expression profiles
BACKGROUND: Pseudomonas aeruginosa is a genetically complex bacterium which can adopt and switch between a free-living or biofilm lifestyle, a versatility that enables it to thrive in many different environments and contributes to its success as a human pathogen. RESULTS: Transcriptomes derived from growth states relevant to the lifestyle of P. aeruginosa were clustered using three different methods (K-means, K-means spectral and hierarchical clustering). The culture conditions used for this study were; biofilms incubated for 8, 14, 24 and 48 hrs, and planktonic culture (logarithmic and stationary phase). This cluster analysis revealed the existence and provided a clear illustration of distinct expression profiles present in the dataset. Moreover, it gave an insight into which genes are up-regulated in planktonic, developing biofilm and confluent biofilm states. In addition, this analysis confirmed the contribution of quorum sensing (QS) and RpoS regulated genes to the biofilm mode of growth, and enabled the identification of a 60.69 Kbp region of the genome associated with stationary phase growth (stationary phase planktonic culture and confluent biofilms). CONCLUSION: This is the first study to use clustering to separate a large P. aeruginosa microarray dataset consisting of transcriptomes obtained from diverse conditions relevant to its growth, into different expression profiles. These distinct expression profiles not only reveal novel aspects of P. aeruginosa gene expression but also provide a growth specific transcriptomic reference dataset for the research community
Recommended from our members
Targeting SLMAP-ALK—a novel gene fusion in lung adenocarcinoma
Assessment of ALK gene rearrangements is strongly recommended by the Molecular Testing Guideline for Selection of Lung Cancer Patients proposed by IASLC, AMP, and CAP at the time of diagnosis for patients with advanced stage disease. Non- small-cell lung cancer (NSCLC) with ALK gene rearrangements or the resulting fusion pro- teins have been, for the most part, successfully targeted with ALK tyrosine kinase inhibitors (TKIs). The most frequent rearrangement, the EML4-ALK oncogenic fusion, has more than 10 distinct variants, each with a discrete breakpoint in EML4. Recent studies have suggested that EML4-ALK variants may have differential responses to TKIs. Additionally, non-EML4- ALK fusions that result from ALK rearrangements with diverse 5′ partners could possibly have varied biologic and clinical implications in their therapeutic responses and outcomes of patients with NSCLC. Existing literature documents at least 20 non-EML4 fusion partners for ALK, and the clinical responsiveness to crizotinib ranges from increased sensitivity to re- sistance. This underscores the importance of identifying the precise 5′ fusion partner to ALK before initiation of therapy. Herein we report the identification of a novel SLMAP-ALK fusion in a patient with NSCLC
Extrinsic post burn peri-anal contracture leading to sub acute intestinal obstruction: A case report
Peri-anal contracture lead to intestinal obstruction whenever there is involvement of anal orifice. In this case anus and peri-anal skin up to two cm was normal; however both gluteal folds were fused because of post burn scar leaving a very small opening which lead to faecal impaction and sub acute intestinal obstruction
AIGO: towards a unified framework for the analysis and the inter-comparison of GO functional annotations
BACKGROUND: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. RESULTS: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. CONCLUSIONS: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way
- …