7 research outputs found
Effective ambiguity checking in biosequence analysis
BACKGROUND: Ambiguity is a problem in biosequence analysis that arises in various analysis tasks solved via dynamic programming, and in particular, in the modeling of families of RNA secondary structures with stochastic context free grammars. Several types of analysis are invalidated by the presence of ambiguity. As this problem inherits undecidability (as we show here) from the namely problem for context free languages, there is no complete algorithmic solution to the problem of ambiguity checking. RESULTS: We explain frequently observed sources of ambiguity, and show how to avoid them. We suggest four testing procedures that may help to detect ambiguity when present, including a just-in-time test that permits to work safely with a potentially ambiguous grammar. We introduce, for the special case of stochastic context free grammars and RNA structure modeling, an automated partial procedure for proving non-ambiguity. It is used to demonstrate non-ambiguity for several relevant grammars. CONCLUSION: Our mechanical proof procedure and our testing methods provide a powerful arsenal of methods to ensure non-ambiguity
Locomotif - a graphical programming system for RNA motif search
Reeder J. Locomotif - a graphical programming system for RNA motif search. Bielefeld (Germany): Bielefeld University; 2006.In this thesis, I am presenting the results of my work in designing, implementing and installing a software environment for RNA motif searches: Locomotif. It includes a visual editor for motif definition, translation of the motif structure to XML code and client-server interactions, and further, translation of the XML code to ADP and compilation to C
Locomotif: from graphical motif description to RNA motif search
Reeder J, Reeder J, Giegerich R. Locomotif: from graphical motif description to RNA motif search. In: Bioinformatics. Bioinformatics. Vol 23. OXFORD UNIV PRESS; 2007: I392-I400.Motivation and Results: Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif-a new approach for locating RNA motifs that goes beyond the previous ones in three ways: ( 1) motif search is based on efficient dynamic programming algorithms, incorporating the established thermodynamic model of RNA secondary structure formation. ( 2) motifs are described graphically, using a Java-based editor, and search algorithms are derived from the graphics in a fully automatic way. The editor allows us to draw secondary structures, annotated with size and sequence information. They closely resemble the established, but informal way in which RNA motifs are communicated in the literature. Thus, the learning effort for Locomotif users is minimal. ( 3) Locomotif employs a client-server approach. Motifs are designed by the user locally. Search programs are generated and compiled on a bioinformatics server. They are made available both for execution on the server, and for download as C source code plus an appropriate makefile
A graphical programming system for molecular motif search
Reeder J, Giegerich R. A graphical programming system for molecular motif search. In: Proceedings of the 5th International Conference on Generative Programming and Component Engineering. ACM Press; 2006: 131-140
Contents
I thank my supervisor, Robert Giegerich, for providing me with an interesting research topic and for his support throughout the years, both scientifically, as well as by offering me the chance to continue my work while taking care of my daughter. Thanks also to Jens Stoye for appraising this thesis. I thank Peter Steffen who developed the ADP compiler that is an integral part of the Locomotif system and adapted it for my needs. I thank Jan Krüger for help in installing the system on the webserver and for guidance in XML and Java questions. Nan Zhang gave me a headstart on the XML schema. I appreciate the financial support from the DFG. I enjoyed being part of the GK Bioin-formatik and value travel opportunities and scientific merit of the BREW workshops in Helsinki and Berlin. I thank Jens Reeder for many ideas, discussions, hours spent debugging the Locomotif system and proofreading this thesis, but most of all for his enduring emotional support over all these years. I am deeply grateful for my daughter Emma
Recommended from our members
Analysis of protein-altering variants in telomerase genes and their association with MUC5B common variant status in patients with idiopathic pulmonary fibrosis: a candidate gene sequencing study
BackgroundIdiopathic pulmonary fibrosis (IPF) risk has a strong genetic component. Studies have implicated variations at several loci, including TERT, surfactant genes, and a single nucleotide polymorphism at chr11p15 (rs35705950) in the intergenic region between TOLLIP and MUC5B. Patients with IPF who have risk alleles at rs35705950 have longer survival from the time of IPF diagnosis than do patients homozygous for the non-risk allele, whereas patients with shorter telomeres have shorter survival times. We aimed to assess whether rare protein-altering variants in genes regulating telomere length are enriched in patients with IPF homozygous for the non-risk alleles at rs35705950.MethodsBetween Nov 1, 2014, and Nov 1, 2016, we assessed blood samples from patients aged 40 years or older and of European ancestry with sporadic IPF from three international phase 3 clinical trials (INSPIRE, CAPACITY, ASCEND), one phase 2 study (RIFF), and US-based observational studies (Vanderbilt Clinical Interstitial Lung Disease Registry and the UCSF Interstitial Lung Disease Clinic registry cohorts) at the Broad Institute (Cambridge, MA, USA) and Human Longevity (San Diego, CA, USA). We also assessed blood samples from non-IPF controls in several clinical trials. We did whole-genome sequencing to assess telomere length and identify rare protein-altering variants, stratified by rs35705950 genotype. We also assessed rare functional variation in TERT exons and compared telomere length and disease progression across genotypes.FindingsWe assessed samples from 1510 patients with IPF and 1874 non-IPF controls. 30 (3%) of 1046 patients with an rs35705950 risk allele had a rare protein-altering variant in TERT compared with 34 (7%) of 464 non-risk allele carriers (odds ratio 0·40 [95% CI 0·24-0·66], p=0·00039). Subsequent analyses identified enrichment of rare protein-altering variants in PARN and RTEL1, and rare variation in TERC in patients with IPF compared with controls. We expanded our study population to provide a more accurate estimation of rare variant frequency in these four loci, and to calculate telomere length. The proportion of patients with at least one rare variant in TERT, PARN, TERC, or RTEL1 was higher in patients with IPF than in controls (149 [9%] of 1739 patients vs 205 [2%] of 8645 controls, p=2·44 × 10-8). Patients with IPF who had a variant in any of the four identified telomerase component genes had telomeres that were 3·69-16·10% shorter than patients without a variant in any of the four genes and had an earlier mean age of disease onset than patients without one or more variants (65·1 years [SD 7·8] vs 67·1 years [7·9], p=0·004). In the placebo arms of clinical trials, shorter telomeres were significantly associated with faster disease progression (1·7% predicted forced vital capacity per kb per year, p=0·002). Pirfenidone had treatment benefit regardless of telomere length (p=4·24 × 10-8 for telomere length lower than the median, p=0·0044 for telomere length greater than the median).InterpretationRare protein-altering variants in TERT, PARN, TERC, and RTEL1 are enriched in patients with IPF compared with controls, and, in the case of TERT, particularly in individuals without a risk allele at the rs35705950 locus. This suggests that multiple genetic factors contribute to sporadic IPF, which might implicate distinct mechanisms of pathogenesis and disease progression.FundingGenentech, National Institutes of Health, Francis Family Foundation, Pulmonary Fibrosis Foundation, Nina Ireland Program for Lung Health, US Department of Veterans Affairs