94 research outputs found
Projections for fast protein structure retrieval
BACKGROUND: In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. RESULTS: Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali
Comparison of protein structures by growing neighborhood alignments
BACKGROUND: Design of protein structure comparison algorithm is an important research issue, having far reaching implications. In this article, we describe a protein structure comparison scheme, which is capable of detecting correct alignments even in difficult cases, e.g. non-topological similarities. The proposed method computes protein structure alignments by comparing, small substructures, called neighborhoods. Two different types of neighborhoods, sequence and structure, are defined, and two algorithms arising out of the scheme are detailed. A new method for computing equivalences having non-topological similarities from pairwise similarity score is described. A novel and fast technique for comparing sequence neighborhoods is also developed. RESULTS: The experimental results show that the current programs show better performance on Fischer and Novotny's benchmark datasets, than state of the art programs, e.g. DALI, CE and SSM. Our programs were also found to calculate correct alignments for proteins with huge amount of indels and internal repeats. Finally, the sequence neighborhood based program was used in extensive fold and non-topological similarity detection experiments. The accuracy of the fold detection experiments with the new measure of similarity was found to be similar or better than that of the standard algorithm CE. CONCLUSION: A new scheme, resulting in two algorithms, have been developed, implemented and tested. The programs developed are accessible at
Prediction of an HMG-box fold in the C-terminal domain of histone H1: insights into its role in DNA condensation
In eukaryotes, histone H1 promotes the organization of polynucleosome filaments into chromatin fibers, thus contributing to the formation of an important structural framework responsible for various DNA transaction processes. The H1 protein consists of a short N-terminal "nose," a central globular domain, and a highly basic C-terminal domain. Structure prediction of the C-terminal domain using fold recognition methods reveals the presence of an HMG-box-like fold. We recently showed by extensive site-directed and deletion mutagenesis studies that a 34 amino acid segment encompassing the three S/TPKK motifs, within the C-terminal domain, is responsible for DNA condensing properties of H1. The position of these motifs in the predicted structure corresponds exactly to the DNA-binding segments of HMG-box-containing proteins such as Lef-1 and SRY. Previous analyses have suggested that histone H1 is likely to bend DNA bound to the C-terminal domain, directing the path of linker DNA in chromatin. Prediction of the structure of this domain provides a framework for understanding the higher order of chromatin organization
A linear programming approach for estimating the structure of a sparse linear genetic network from transcript profiling data
<p>Abstract</p> <p>Background</p> <p>A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data.</p> <p>Results</p> <p>The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (<it>l</it><sub>1</sub>-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>1, I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>2 and I<smcaps>N</smcaps>S<smcaps>ILICO</smcaps>3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published <it>Saccharomyces cerevisae </it>cell cycle transcript profiling data sets capture known regulatory associations. In each <it>S. cerevisiae </it>LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification.</p> <p>Conclusion</p> <p>A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.</p
Identifying feasible metabolic routes in Mycobacterium smegmatis and possible alterations under diverse nutrient conditions
Background: Many studies on M. tuberculosis have emerged from using M. smegmatis MC2 155 (Msm), since they share significant similarities and yet Msm is non-pathogenic and faster growing. Although several individual molecules have been studied from Msm, many questions remain open about its metabolism as a whole and its capability to be versatile. Adaptability and versatility are emergent properties of a system, warranting a molecular systems perspective to understand them. Results: We identify feasible metabolic pathways in Msm in reference condition with transcriptome, phenotypic microarray, along with functional annotation of the genome. Together with transcriptome data, specific genes from a set of alternatives have been mapped onto different pathways. About 257 metabolic pathways can be considered to be feasible in Msm. Next, we probe cellular metabolism with an array of alternative carbon and nitrogen sources and identify those that are utilized and favour growth as well as those that do not support growth. In all, about 135 points in the entire metabolic map are probed. Analyzing growth patterns under these conditions, lead us to hypothesize different pathways that can become active in various conditions and possible alternate routes that may be induced, thus explaining the observed physiological adaptations. Conclusions: The study provides the first detailed analysis of feasible pathways towards adaptability. We obtain mechanistic insights that explain observed phenotypic behaviour by studying gene-expression profiles and pathways inferred from the genome sequence. Comparison of transcriptome and phenome analysis of Msm and Mtb provides a rationale for understanding commonalities in metabolic adaptability
Structural studies on MtRecA-nucleotide complexes: insights into DNA and nucleotide binding and the structural signature of NTP recognition
RecA protein plays a crucial role in homologous recombination and repair of DNA. Central to all activities of RecA is its binding to Mg+2-ATP. The active form of the protein is a helical nucleoprotein filament containing the nucleotide cofactor and single-stranded DNA. The stability and structure of the helical nucleoprotein filament formed by RecA are modulated by nucleotide cofactors. Here we report crystal structures of a MtRecA-ADP complex, complexes with ATPS in the presence and absence of magnesium as well as a complex with dATP and Mg+2. Comparison with the recently solved crystal structures of the apo form as well as a complex with ADP-AlF4 confirms an expansion of the P-loop region in MtRecA, compared to its homologue in Escherichia coli, correlating with the reduced affinity of MtRecA for ATP. The ligand bound structures reveal subtle variations in nucleotide conformations among different nucleotides that serve in maintaining the network of interactions crucial for nucleotide binding. The nucleotide binding site itself, however, remains relatively unchanged. The analysis also reveals that ATPS rather than ADP-AlF4 is structurally a better mimic of ATP. From among the complexed structures, a definition for the two DNA-binding loops L1 and L2 has clearly emerged for the first time and provides a basis to understand DNA binding by RecA. The structural information obtained from these complexes correlates well with the extensive biochemical data on mutants available in the literature, contributing to an understanding of the role of individual residues in the nucleotide binding pocket, at the molecular level. Modeling studies on the mutants again point to the relative rigidity of the nucleotide binding site. Comparison with other NTP binding proteins reveals many commonalties in modes of binding by diverse members in the structural family, contributing to our understanding of the structural signature of NTP recognition
Crystallographic identification of an ordered C-terminal domain and a second nucleotide-binding site in RecA: new insights into allostery
RecA protein is a crucial and central component of the homologous recombination and DNA repair machinery. Despite numerous studies on the protein, several issues concerning its action, including the allosteric regulation mechanism have remained unclear. Here we report, for the first time, a crystal structure of a complex of Mycobacterium smegmatis RecA (MsRecA) with dATP, which exhibits a fully ordered C-terminal domain, with a second dATP molecule bound to it. ATP binding is an essential step for all activities of RecA, since it triggers the formation of active nucleoprotein filaments. In the crystal filament, dATP at the first site communicates with a dATP of the second site of an adjacent subunit, through conserved residues, suggesting a new route for allosteric regulation. In addition, subtle but definite changes observed in the orientation of the nucleotide at the first site and in the positions of the segment preceding loop L2 as well as in the segment 102–105 situated between the 2 nt, all appear to be concerted and suggestive of a biological role for the second bound nucleotide
Homologous recombination in mycobacteria
In recent years, considerable effort and resources have been expended to develop targeted gene delivery methods, and generation of auxotrophic mutants of mycobacteria. The results of these studies suggest that mycobacteria exhibit a wide range of recombination rates, which vary from loci to loci. Here we review the methods developed for allele exchange and targeted gene disruption as well as the mechanistic aspects of homologous recombination in mycobacteria. The results of whole genome, functional and structural analyses of Mycobacterium tuberculosis and Mycobacterium smegmatis RecA and SSB proteins provide insights into variations of the prototypic Escherichia coli paradigm. This variation of a common theme might allow mycobacteria to function in their natural but complex physiological environments
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis
We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding
- …