196 research outputs found
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files
International audienceBackground: Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results: We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions: Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes
Inference of sparse combinatorial-control networks from gene-expression data: a message passing approach
<p>Abstract</p> <p>Background</p> <p>Transcriptional gene regulation is one of the most important mechanisms in controlling many essential cellular processes, including cell development, cell-cycle control, and the cellular response to variations in environmental conditions. Genes are regulated by transcription factors and other genes/proteins via a complex interconnection network. Such regulatory links may be predicted using microarray expression data, but most regulation models suppose transcription factor independence, which leads to spurious links when many genes have highly correlated expression levels.</p> <p>Results</p> <p>We propose a new algorithm to infer combinatorial control networks from gene-expression data. Based on a simple model of combinatorial gene regulation, it includes a message-passing approach which avoids explicit sampling over putative gene-regulatory networks. This algorithm is shown to recover the structure of a simple artificial cell-cycle network model for baker's yeast. It is then applied to a large-scale yeast gene expression dataset in order to identify combinatorial regulations, and to a data set of direct medical interest, namely the Pleiotropic Drug Resistance (PDR) network.</p> <p>Conclusions</p> <p>The algorithm we designed is able to recover biologically meaningful interactions, as shown by recent experimental results <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Moreover, new cases of combinatorial control are predicted, showing how simple models taking this phenomenon into account can lead to informative predictions and allow to extract more putative regulatory interactions from microarray databases.</p
Clustering with shallow trees
We propose a new method for hierarchical clustering based on the optimisation
of a cost function over trees of limited depth, and we derive a
message--passing method that allows to solve it efficiently. The method and
algorithm can be interpreted as a natural interpolation between two well-known
approaches, namely single linkage and the recently presented Affinity
Propagation. We analyze with this general scheme three biological/medical
structured datasets (human population based on genetic information, proteins
based on sequences and verbal autopsies) and show that the interpolation
technique provides new insight.Comment: 11 pages, 7 figure
Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters
International Speech Communication Association (Isca) - International Astronautical Federation.ISBN : 13 9781604234480.This article reports on the results of a cooperative evaluation of grapheme-to-phoneme (GP) conversion for proper names in French. This work was carried out within the framework of a general evaluation campaign of various speech and language processing devices, including text-to-speech synthesis. The corpus and the methodology are described. The results of 4 systems are analysed: with 12-20% word error rates on a list of 8,000 proper names, they give a fairly accurate picture of the progress achieved, the state-of-the-art and the problems still to be solved, in the domain of GP conversion in French. In addition, the resources and collected data will be made available to the scientific and industrial community, in order to be re-used in future bench-marks
Finding undetected protein associations in cell signaling by belief propagation
External information propagates in the cell mainly through signaling cascades
and transcriptional activation, allowing it to react to a wide spectrum of
environmental changes. High throughput experiments identify numerous molecular
components of such cascades that may, however, interact through unknown
partners. Some of them may be detected using data coming from the integration
of a protein-protein interaction network and mRNA expression profiles. This
inference problem can be mapped onto the problem of finding appropriate optimal
connected subgraphs of a network defined by these datasets. The optimization
procedure turns out to be computationally intractable in general. Here we
present a new distributed algorithm for this task, inspired from statistical
physics, and apply this scheme to alpha factor and drug perturbations data in
yeast. We identify the role of the COS8 protein, a member of a gene family of
previously unknown function, and validate the results by genetic experiments.
The algorithm we present is specially suited for very large datasets, can run
in parallel, and can be adapted to other problems in systems biology. On
renowned benchmarks it outperforms other algorithms in the field.Comment: 6 pages, 3 figures, 1 table, Supporting Informatio
Genome Expression Dynamics Reveal the Parasitism Regulatory Landscape of the Root-Knot Nematode Meloidogyne incognita and a Promoter Motif Associated with Effector Genes.
Root-knot nematodes (genus Meloidogyne) are the major contributor to crop losses caused by nematodes. These nematodes secrete effector proteins into the plant, derived from two sets of pharyngeal gland cells, to manipulate host physiology and immunity. Successful completion of the life cycle, involving successive molts from egg to adult, covers morphologically and functionally distinct stages and will require precise control of gene expression, including effector genes. The details of how root-knot nematodes regulate transcription remain sparse. Here, we report a life stage-specific transcriptome of Meloidogyne incognita. Combined with an available annotated genome, we explore the spatio-temporal regulation of gene expression. We reveal gene expression clusters and predicted functions that accompany the major developmental transitions. Focusing on effectors, we identify a putative cis-regulatory motif associated with expression in the dorsal glands, providing an insight into effector regulation. We combine the presence of this motif with several other criteria to predict a novel set of putative dorsal gland effectors. Finally, we show this motif, and thereby its utility, is broadly conserved across the Meloidogyne genus, and we name it Mel-DOG. Taken together, we provide the first genome-wide analysis of spatio-temporal gene expression in a root-knot nematode and identify a new set of candidate effector genes that will guide future functional analyses
Beyond inverse Ising model: structure of the analytical solution for a class of inverse problems
I consider the problem of deriving couplings of a statistical model from
measured correlations, a task which generalizes the well-known inverse Ising
problem. After reminding that such problem can be mapped on the one of
expressing the entropy of a system as a function of its corresponding
observables, I show the conditions under which this can be done without
resorting to iterative algorithms. I find that inverse problems are local (the
inverse Fisher information is sparse) whenever the corresponding models have a
factorized form, and the entropy can be split in a sum of small cluster
contributions. I illustrate these ideas through two examples (the Ising model
on a tree and the one-dimensional periodic chain with arbitrary order
interaction) and support the results with numerical simulations. The extension
of these methods to more general scenarios is finally discussed.Comment: 15 pages, 6 figure
Genome sequences of two novel phages infecting marine roseobacters
Two bacteriophages, DSS3Φ2 and EE36Φ1, which infect marine roseobacters Silicibacter pomeroyi DSS-3 and Sulfitobacter sp. EE-36, respectively, were isolated from Baltimore Inner Harbor water. These two roseophages resemble bacteriophage N4, a large, short-tailed phage infecting Escherichia coli K12, in terms of their morphology and genomic structure. The full genome sequences of DSS3Φ2 and EE36Φ1 reveal that their genome sizes are 74.6 and 73.3 kb, respectively, and they both contain a highly conserved N4-like DNA replication and transcription system. Both roseophages contain a large virion-encapsidated RNA polymerase gene (> 10 kb), which was first discovered in N4. DSS3Φ2 and EE36Φ1 also possess several genes (i.e. ribonucleotide reductase and thioredoxin) that are most similar to the genes in roseobacters. Overall, the two roseophages are highly closely related, and share 80–94% nucleotide sequence identity over 85% of their ORFs. This is the first report of N4-like phages infecting marine bacteria and the second report of N4-like phage since the discovery of phage N4 40 years ago. The finding of these two N4-like roseophages will allow us to further explore the specific phage–host interaction and evolution for this unique group of bacteriophages
Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms
Motivation :Reconstructing the topology of a gene regulatory network is one
of the key tasks in systems biology. Despite of the wide variety of proposed
methods, very little work has been dedicated to the assessment of their
stability properties. Here we present a methodical comparison of the
performance of a novel method (RegnANN) for gene network inference based on
multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER),
focussing our analysis on the prediction variability induced by both the
network intrinsic structure and the available data.
Results: The extensive evaluation on both synthetic data and a selection of
gene modules of "Escherichia coli" indicates that all the algorithms suffer of
instability and variability issues with regards to the reconstruction of the
topology of the network. This instability makes objectively very hard the task
of establishing which method performs best. Nevertheless, RegnANN shows MCC
scores that compare very favorably with all the other inference methods tested.
Availability: The software for the RegnANN inference algorithm is distributed
under GPL3 and it is available at the corresponding author home page
(http://mpba.fbk.eu/grimaldi/regnann-supmat
- …