5 research outputs found

    Conservation and divergence among Salmonella enterica subspecies

    No full text
    Genome sequencing efforts of taxonomically proximate organisms successfully divulged proteomic diversity embedded within closely related organisms. The Salmonella enterica subspecies represents a group of enterobacteric pathogens known to share similar genomic content yet possess diverse host specificity and distinct disease symptoms. Study of Salmonella enterica subspecies proteomes reports an overestimation of the proximity among the subspecies. Interestingly, orthology comparison among Salmonella typhi and Salmonella typhimurium across the proteome suggested the metabolic proteins possessed the highest propensity of the divergence, while proteins involved in environment information processing and genetic information processing are least susceptible to evolution. Consistent with earlier reports, transporter proteins and transcription factors are the most populated protein families in the Salmonellae. Several of the unique domains present in Salmonella typhi and Salmonella typhimurium genomes were introduced into the genome through phage invasion and eventually selected. Redundancy and divergence is observed among the metabolic pathway proteins. Though complying with essentiality of their function, the metabolic proteins possess the highest propensity of sampling sequence space for imbibing new function. The detailed cross-genome analysis of the subspecies provides an understanding of diversity and unique attributes defined in the individual Salmonella enterica genomes

    Myosinome: A Database of Myosins from Select Eukaryotic Genomes to Facilitate Analysis of Sequence-Structure-Function Relationships

    Get PDF
    Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms ( Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae ) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome

    A harmonized resource of integrated prostate cancer clinical, -omic, and signature features

    No full text
    Abstract Genomic and transcriptomic data have been generated across a wide range of prostate cancer (PCa) study cohorts. These data can be used to better characterize the molecular features associated with clinical outcomes and to test hypotheses across multiple, independent patient cohorts. In addition, derived features, such as estimates of cell composition, risk scores, and androgen receptor (AR) scores, can be used to develop novel hypotheses leveraging existing multi-omic datasets. The full potential of such data is yet to be realized as independent datasets exist in different repositories, have been processed using different pipelines, and derived and clinical features are often not provided or  not standardized. Here, we present the curatedPCaData R package, a harmonized data resource representing >2900 primary tumor, >200 normal tissue, and >500 metastatic PCa samples across 19 datasets processed using standardized pipelines with updated gene annotations. We show that meta-analysis across harmonized studies has great potential for robust and clinically meaningful insights. curatedPCaData is an open and accessible community resource with code made available for reproducibility
    corecore