292 research outputs found

    Cooperative Metaheuristics for Exploring Proteomic Data

    Get PDF
    Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif conten

    Beyond position weight matrices: nucleotide correlations in transcription factor binding sites and their description

    Full text link
    The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair independently contributes to the transcription factor (TF) binding, despite mounting evidence of interdependence between base pairs positions. The recent availability of genome-wide data on TF-bound DNA regions offers the possibility to revisit this question in detail for TF binding {\em in vivo}. Here, we use available fly and mouse ChIPseq data, and show that the independent model generally does not reproduce the observed statistics of TFBS, generalizing previous observations. We further show that TFBS description and predictability can be systematically improved by taking into account pairwise correlations in the TFBS via the principle of maximum entropy. The resulting pairwise interaction model is formally equivalent to the disordered Potts models of statistical mechanics and it generalizes previous approaches to interdependent positions. Its structure allows for co-variation of two or more base pairs, as well as secondary motifs. Although models consisting of mixtures of PWMs also have this last feature, we show that pairwise interaction models outperform them. The significant pairwise interactions are found to be sparse and found dominantly between consecutive base pairs. Finally, the use of a pairwise interaction model for the identification of TFBSs is shown to give significantly different predictions than a model based on independent positions

    Metagenome – Processing and Analysis

    Get PDF
    Metagenome means “multiple genomes” and the study of culture independent genomic content in environment is called metagenomics. Because of the advent of powerful and economic next generation sequencing technology, sequencing has become cheaper and faster and thus the study of genes and phenotypes is transitioning from single organism to that of a community present in the natural environmental sample. Once sequence data are obtained from an environmental sample, the challenge is to process, assemble and bin the metagenome data in order to get as accurate and complete a representation of the populations present in the community or to get high confident draft assembly. In this paper we describe the existing bioinformatics workflow to process the metagenomic data. Next, we examine one way of parallelizing the sequence similarity program on a High Performance Computing (HPC) cluster since sequence similarity is the most common and frequently used technique throughout the metagenome data processing and analyzing steps. In order to address the challenges involved in analyzing the result file obtained from sequence similarity program, we developed a web application tool called Contig Analysis Tool (CAT). Later, we applied the tools and techniques to the real world virome metagenomic data i.e., to the genomes of all the viruses present in the environmental sample obtained from microbial mats derived from hot springs in Yellowstone National Park. There are several challenges associated with the assembly and binning of virome data particularly because of the following reasons: 1. Not many viral sequence data in the existing databases for sequence similarity. 2. No reference genome 3. No phylogenetic marker genes like the ones present in the bacteria and archaea. We will see how we overcame these problems by performing sequence similarity using CRISPR data and sequence composition using tetranucleotide analysis

    Conserved expression of vertebrate microvillar gene homologs in choanocytes of freshwater sponges

    Get PDF
    International audienceBackground: The microvillus is a versatile organelle that serves important functions in disparate animal cell types. However, from a molecular perspective, the microvillus has been well studied in only a few, predominantly vertebrate, contexts. Little is known about how differences in microvillar structure contribute to differences in function, and how these differences evolved. We sequenced the transcriptome of the freshwater sponge, Ephydatia muelleri, and examined the expression of vertebrate microvillar gene homologs in choanocytes—the only microvilli-bearing cell type present in sponges. Sponges offer a distant phylogenetic comparison with vertebrates, and choanocytes are central to discussions about early animal evolution due to their similarity with choanoflagellates, the single-celled sister line-age of modern animals. Results: We found that, from a genomic perspective, sponges have conserved homologs of most vertebrate microvillar genes, most of which are expressed in choanocytes, and many of which exhibit choanocyte-specific or choanocyte-enriched expression. Possible exceptions include the cadherins that form intermicrovillar links in the enterocyte brush border and hair cell stereocilia of vertebrates and cnidarians. No obvious orthologs of these proteins were detected in sponges, but at least four candidate cadherins were identified as choanocyte-enriched and might serve this function. In contrast to the evidence for conserved microvillar structure in sponges and vertebrates, we found that choanoflagellates and ctenophores lack homologs of many fundamental microvillar genes, suggesting that microvillar structure may diverge significantly in these lineages, warranting further study. Conclusions: The available evidence suggests that microvilli evolved early in the prehistory of modern animals and have been repurposed to serve myriad functions in different cellular contexts. Detailed understanding of the sequence by which different microvilli-bearing cell/tissue types diversified will require further study of microvillar composition and development in disparate cell types and lineages. Of particular interest are the microvilli of choano-flagellates, ctenophores, and sponges, which collectively bracket the earliest events in animal evolution

    Genomic organization of duplicated short wave-sensitive and long wave-sensitive opsin genes in the green swordtail, Xiphophorus helleri

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Long wave-sensitive (<it>LWS</it>) opsin genes have undergone multiple lineage-specific duplication events throughout the evolution of teleost fishes. <it>LWS </it>repertoire expansions in live-bearing fishes (family Poeciliidae) have equipped multiple species in this family with up to four <it>LWS </it>genes. Given that color vision, especially attraction to orange male coloration, is important to mate choice within poeciliids, <it>LWS </it>opsins have been proposed as candidate genes driving sexual selection in this family. To date the genomic organization of these genes has not been described in the family Poeciliidae, and little is known about the mechanisms regulating the expression of <it>LWS </it>opsins in any teleost.</p> <p>Results</p> <p>Two BAC clones containing the complete genomic repertoire of <it>LWS </it>opsin genes in the green swordtail fish, <it>Xiphophorus helleri</it>, were identified and sequenced. Three of the four <it>LWS </it>loci identified here were linked in a tandem array downstream of two tightly linked short wave-sensitive 2 (<it>SWS2</it>) opsin genes. The fourth <it>LWS </it>opsin gene, containing only a single intron, was not linked to the other three and is the product of a retrotransposition event. Genomic and phylogenetic results demonstrate that the <it>LWS </it>genes described here share a common evolutionary origin with those previously characterized in other poeciliids. Using qualitative RT-PCR and MSP we showed that each of the <it>LWS </it>and <it>SWS2 </it>opsins, as well as three other cone opsin genes and a single rod opsin gene, were expressed in the eyes of adult female and male <it>X. helleri</it>, contributing to six separate classes of adult retinal cone and rod cells with average λ<sub>max </sub>values of 365 nm, 405 nm, 459 nm, 499 nm, 534 nm and 568 nm. Comparative genomic analysis identified two candidate teleost opsin regulatory regions containing putative CRX binding sites and hormone response elements in upstream sequences of <it>LWS </it>gene regions of seven teleost species, including <it>X. helleri</it>.</p> <p>Conclusions</p> <p>We report the first complete genomic description of <it>LWS </it>and <it>SWS2 </it>genes in poeciliids. These data will serve as a reference for future work seeking to understand the relationship between <it>LWS </it>opsin genomic organization, gene expression, gene family evolution, sexual selection and speciation in this fish family.</p

    Character-based DNA barcoding for authentication and conservation of IUCN Red listed threatened species of genus Decalepis (Apocynaceae)

    Get PDF
    open access articleThe steno-endemic species of genus Decalepis are highly threatened by destructive wild harvesting. The medicinally important fleshy tuberous roots of Decalepis hamiltonii are traded as substitute, to meet the international market demand of Hemidesmus indicus. In addition, the tuberous roots of all three species of Decalepis possess similar exudates and texture, which challenges the ability of conventional techniques alone to perform accurate species authentication. This study was undertaken to generate DNA barcodes that could be utilized in monitoring and curtailing the illegal trade of these endangered species. The DNA barcode reference library was developed in BOLD database platform for candidate barcodes rbcL, matK, psbA-trnH, ITS and ITS2. The average intra-specific variations (0–0.27%) were less than the distance to nearest neighbour (0.4–11.67%) with matK and ITS. Anchoring the coding region rbcL in multigene tiered approach, the combination rbcL + matK + ITS yielded 100% species resolution, using the least number of loci combinations either with PAUP or BLOG methods to support a character-based approach. Species-specific SNP position (230 bp) in the matK region that is characteristic of D. hamiltonii could be used to design specific assays, enhancing its applicability for direct use in CITES enforcement for distinguishing it from H. indicus

    A Feature-Based Approach to Modeling Protein–DNA Interactions

    Get PDF
    Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/
    corecore