2,673 research outputs found

    SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs

    Get PDF
    Establishment of similarities between proteins is very important for the study of the relationship between sequence, structure and function and for the analysis of evolutionary relationships. Motif-based search methods play a crucial role in establishing the connections between proteins that are particularly useful for distant relationships. This paper reports SCANMOT, a web-based server that searches for similarities between proteins by simultaneous matching of multiple motifs. SCANMOT searches for similar sequences in entire sequence databases using multiple conserved regions and utilizes inter-motif spacing as restraints. The SCANMOT server is available via

    NASSAM: a server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules

    Get PDF
    Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at http://mfrlab.org/grafss/nassam/

    AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

    Get PDF
    We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

    A comparative study of sequence analysis tools in computational biology

    Get PDF
    A biomolecular object, such as a deoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or a protein molecule, is made up of a long chain of subunits. A protein is represented as a sequence made from 20 different amino acids, each represented as a letter. There are a vast number of ways in which similar structural domains can be generated in proteins by different amino acid sequences. By contrast, the structure of DNA, made up of only four different nucleotide building blocks that occur in two pairs, is relatively simple, regular, and predictable. Biomolecular sequence alignment/string search is the most important issue and challenging task in many areas of science and information processing. It involves identifying one-to-one correspondences between subunits of different sequences. An efficient algorithm or tool is involved with many important factors, these include the following: Scoring systems, Alignment statistics, Database redundancy and sequence repetitiveness. Sequence motifs are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone. A more comprehensive solution to the efficient string search is approached by building a small, representative set of motifs and using this as a screening database with automatic masking of matching query subsequences. This technology is still under development but recent studies indicate that a representative set of only 1,000 - 3,000 sequences may suffice and such a database can be searched in seconds

    From IMGT-ONTOLOGY to IMGT/LIGMotif: the IMGT® standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The antigen receptors, immunoglobulins (IG) and T cell receptors (TR), are specific molecular components of the adaptive immune response of vertebrates. Their genes are organized in the genome in several loci (7 in humans) that comprise different gene types: variable (V), diversity (D), joining (J) and constant (C) genes. Synthesis of the IG and TR proteins requires rearrangements of V and J, or V, D and J genes at the DNA level, followed by the splicing at the RNA level of the rearranged V-J and V-D-J genes to C genes. Owing to the particularities of IG and TR gene structures related to these molecular mechanisms, conventional bioinformatic software and tools are not adapted to the identification and description of IG and TR genes in large genomic sequences. In order to answer that need, IMGT<sup>®</sup>, the international ImMunoGeneTics information system<sup>®</sup>, has developed IMGT/LIGMotif, a tool for IG and TR gene annotation. This tool is based on standardized rules defined in IMGT-ONTOLOGY, the first ontology in immunogenetics and immunoinformatics.</p> <p>Results</p> <p>IMGT/LIGMotif currently annotates human and mouse IG and TR loci in large genomic sequences. The annotation includes gene identification and orientation on DNA strand, description of the V, D and J genes by assigning IMGT<sup>® </sup>labels, gene functionality, and finally, gene delimitation and cluster assembly. IMGT/LIGMotif analyses sequences up to 2.5 megabase pairs and can analyse them in batch files.</p> <p>Conclusions</p> <p>IMGT/LIGMotif is currently used by the IMGT<sup>® </sup>biocurators to annotate, in a first step, IG and TR genomic sequences of human and mouse in new haplotypes and those of closely related species, nonhuman primates and rat, respectively. In a next step, and following enrichment of its reference databases, IMGT/LIGMotif will be used to annotate IG and TR of more distantly related vertebrate species. IMGT/LIGMotif is available at <url>http://www.imgt.org/ligmotif/</url>.</p

    Insights into the early evolution of NF-kappaB signaling based on computational analyses of cnidarian genomes and transcriptomes

    Full text link
    NF-kappaB is an ancient transcription factor that is known to play a central role in regulating cellular stress responses in vertebrates and insects, including the innate immune response, and the response to a range of physiochemical insults such as UV radiation and oxidative stress. The early evolution of this pathway is not well understood, because little is known about NF-kappaB signaling in so-called basal animal lineages (e.g., sponges, cnidarians) or closely related outgroups to the Metazoa. Key to understanding the function of a transcription factor is to identify the target genes whose transcription it regulates. To investigate the regulatory role of NF-kappaB in basal animals, specifically the sea anemone Nematostella vectensis, I developed ForSite, a computational tool that identifies putative transcription factor binding sites in the genome in proximity to expressed genes, and I helped to generate a new annotated reference transcriptome for N. vectensis. After demonstrating that ForSite could be used to identify a set of genes enriched for known NF-kappaB targets in human, I applied ForSite along with multiple winnowing criteria (co-localization of p300 binding; evolutionary conservation of target genes) to identify a high-priority list of potential NF-kappaB targets in the anemone. Among the most convincing set of likely target genes are members of a conserved anti-viral pathway, which suggests NF-kappaB plays an ancient role in innate immunity that dates to the cnidarian-bilaterian ancestor. Application of ForSite to two additional cnidarian species, Hydra magnipapillata and Acropora digitifera, failed to show significant conservation of regulation of biological processes by NF-kappaB among the cnidarian species

    Genome-wide analysis of the AP2/ERF superfamily in apple and transcriptional evidence of ERF involvement in scab pathogenesis

    Get PDF
    The APETALA2 (AP2)/ETHYLENE RESPONSE FACTOR (ERF) superfamily of transcriptional regulators is involved in several growth, development and stress responses processes in higher plants. Currently, the available information on the biological roles of AP2/ERF genes is derived from Arabidopsis thaliana. In the present work, we have investigated genomic and transcriptional aspects of AP2/ERF genes in the economically important perennial species, Malus ×domestica. We have identified 259 sequences containing at least one ERF domain in apple genome. The vast majority of the putative proteins display predicted nuclear localization, compatible with a biological role in transcription regulation. The AP2 and ERF families are greatly expanded in apple. Whole-genome analyses in other plant species have identified a single genomic sequence with divergent ERF, whereas in apple seven soloists are present. In the apple genome, the most noteworthy expansion occurred in sub-groups V, VIII and IX of the ERF family. Expression profiling analyses have revealed the association of ripening-involved ERF genes to scab (Venturia inequalis) pathogenesis in the susceptible Gala cultivar, indicating that gene expansion processes were accompanied by functional divergence. The presented analyses of AP2/ERF genes in apple provide evidences of shared ethylene-mediated signaling pathways in ripening and disease responses

    Recognition of short functional motifs in protein sequences

    Get PDF
    The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool

    BLISS: biding site level identification of shared signal-modules in DNA regulatory sequences

    Get PDF
    BACKGROUND: Regulatory modules are segments of the DNA that control particular aspects of gene expression. Their identification is therefore of great importance to the field of molecular genetics. Each module is composed of a distinct set of binding sites for specific transcription factors. Since experimental identification of regulatory modules is an arduous process, accurate computational techniques that supplement this process can be very beneficial. Functional modules are under selective pressure to be evolutionarily conserved. Most current approaches therefore attempt to detect conserved regulatory modules through similarity comparisons at the DNA sequence level. However, some regulatory modules, despite the conservation of their responsible binding sites, are embedded in sequences that have little overall similarity. RESULTS: In this study, we present a novel approach that detects conserved regulatory modules via comparisons at the binding site level. The technique compares the binding site profiles of orthologs and identifies those segments that have similar (not necessarily identical) profiles. The similarity measure is based on the inner product of transformed profiles, which takes into consideration the p values of binding sites as well as the potential shift of binding site positions. We tested this approach on simulated sequence pairs as well as real world examples. In both cases our technique was able to identify regulatory modules which could not to be identified using sequence-similarity based approaches such as rVista 2.0 and Blast. CONCLUSION: The results of our experiments demonstrate that, for sequences with little overall similarity at the DNA sequence level, it is still possible to identify conserved regulatory modules based solely on binding site profiles

    smyRNA: A Novel Ab Initio ncRNA Gene Finder

    Get PDF
    Background: Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs. Methodology/Principal Findings: We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences. Conclusions/Significance: Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability
    corecore