196 research outputs found

    “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

    Get PDF
    International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes

    Inference of sparse combinatorial-control networks from gene-expression data: a message passing approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptional gene regulation is one of the most important mechanisms in controlling many essential cellular processes, including cell development, cell-cycle control, and the cellular response to variations in environmental conditions. Genes are regulated by transcription factors and other genes/proteins via a complex interconnection network. Such regulatory links may be predicted using microarray expression data, but most regulation models suppose transcription factor independence, which leads to spurious links when many genes have highly correlated expression levels.</p> <p>Results</p> <p>We propose a new algorithm to infer combinatorial control networks from gene-expression data. Based on a simple model of combinatorial gene regulation, it includes a message-passing approach which avoids explicit sampling over putative gene-regulatory networks. This algorithm is shown to recover the structure of a simple artificial cell-cycle network model for baker's yeast. It is then applied to a large-scale yeast gene expression dataset in order to identify combinatorial regulations, and to a data set of direct medical interest, namely the Pleiotropic Drug Resistance (PDR) network.</p> <p>Conclusions</p> <p>The algorithm we designed is able to recover biologically meaningful interactions, as shown by recent experimental results <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Moreover, new cases of combinatorial control are predicted, showing how simple models taking this phenomenon into account can lead to informative predictions and allow to extract more putative regulatory interactions from microarray databases.</p

    Clustering with shallow trees

    Full text link
    We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight.Comment: 11 pages, 7 figure

    Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters

    No full text
    International Speech Communication Association (Isca) - International Astronautical Federation.ISBN : 13 9781604234480.This article reports on the results of a cooperative evaluation of grapheme-to-phoneme (GP) conversion for proper names in French. This work was carried out within the framework of a general evaluation campaign of various speech and language processing devices, including text-to-speech synthesis. The corpus and the methodology are described. The results of 4 systems are analysed: with 12-20% word error rates on a list of 8,000 proper names, they give a fairly accurate picture of the progress achieved, the state-of-the-art and the problems still to be solved, in the domain of GP conversion in French. In addition, the resources and collected data will be made available to the scientific and industrial community, in order to be re-used in future bench-marks

    Finding undetected protein associations in cell signaling by belief propagation

    Full text link
    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio

    Genome Expression Dynamics Reveal the Parasitism Regulatory Landscape of the Root-Knot Nematode Meloidogyne incognita and a Promoter Motif Associated with Effector Genes.

    Get PDF
    Root-knot nematodes (genus Meloidogyne) are the major contributor to crop losses caused by nematodes. These nematodes secrete effector proteins into the plant, derived from two sets of pharyngeal gland cells, to manipulate host physiology and immunity. Successful completion of the life cycle, involving successive molts from egg to adult, covers morphologically and functionally distinct stages and will require precise control of gene expression, including effector genes. The details of how root-knot nematodes regulate transcription remain sparse. Here, we report a life stage-specific transcriptome of Meloidogyne incognita. Combined with an available annotated genome, we explore the spatio-temporal regulation of gene expression. We reveal gene expression clusters and predicted functions that accompany the major developmental transitions. Focusing on effectors, we identify a putative cis-regulatory motif associated with expression in the dorsal glands, providing an insight into effector regulation. We combine the presence of this motif with several other criteria to predict a novel set of putative dorsal gland effectors. Finally, we show this motif, and thereby its utility, is broadly conserved across the Meloidogyne genus, and we name it Mel-DOG. Taken together, we provide the first genome-wide analysis of spatio-temporal gene expression in a root-knot nematode and identify a new set of candidate effector genes that will guide future functional analyses

    Beyond inverse Ising model: structure of the analytical solution for a class of inverse problems

    Full text link
    I consider the problem of deriving couplings of a statistical model from measured correlations, a task which generalizes the well-known inverse Ising problem. After reminding that such problem can be mapped on the one of expressing the entropy of a system as a function of its corresponding observables, I show the conditions under which this can be done without resorting to iterative algorithms. I find that inverse problems are local (the inverse Fisher information is sparse) whenever the corresponding models have a factorized form, and the entropy can be split in a sum of small cluster contributions. I illustrate these ideas through two examples (the Ising model on a tree and the one-dimensional periodic chain with arbitrary order interaction) and support the results with numerical simulations. The extension of these methods to more general scenarios is finally discussed.Comment: 15 pages, 6 figure

    Genome sequences of two novel phages infecting marine roseobacters

    Get PDF
    Two bacteriophages, DSS3Φ2 and EE36Φ1, which infect marine roseobacters Silicibacter pomeroyi DSS-3 and Sulfitobacter sp. EE-36, respectively, were isolated from Baltimore Inner Harbor water. These two roseophages resemble bacteriophage N4, a large, short-tailed phage infecting Escherichia coli K12, in terms of their morphology and genomic structure. The full genome sequences of DSS3Φ2 and EE36Φ1 reveal that their genome sizes are 74.6 and 73.3 kb, respectively, and they both contain a highly conserved N4-like DNA replication and transcription system. Both roseophages contain a large virion-encapsidated RNA polymerase gene (> 10 kb), which was first discovered in N4. DSS3Φ2 and EE36Φ1 also possess several genes (i.e. ribonucleotide reductase and thioredoxin) that are most similar to the genes in roseobacters. Overall, the two roseophages are highly closely related, and share 80–94% nucleotide sequence identity over 85% of their ORFs. This is the first report of N4-like phages infecting marine bacteria and the second report of N4-like phage since the discovery of phage N4 40 years ago. The finding of these two N4-like roseophages will allow us to further explore the specific phage–host interaction and evolution for this unique group of bacteriophages

    Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

    Get PDF
    Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat
    corecore