1,490 research outputs found

    Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics

    Get PDF
    The technique of Finite Markov Chain Imbedding (FMCI) is a classical approach to complex combinatorial problems related to sequences. In order to get efficient algorithms, it is known that such approaches need to be first rewritten using recursive relations. We propose here to give here a general recursive algorithms allowing to compute in a numerically stable manner exact Cumulative Distribution Function (CDF) or complementary CDF (CCDF). These algorithms are then applied in two particular cases: the local score of one sequence and pattern statistics. In both cases, asymptotic developments are derived. For the local score, our new approach allows for the very first time to compute exact p-values for a practical study (finding hydrophobic segments in a protein database) where only approximations were available before. In this study, the asymptotic approximations appear to be completely unreliable for 99.5% of the considered sequences. Concerning the pattern statistics, the new FMCI algorithms dramatically outperform the previous ones as they are more reliable, easier to implement, faster and with lower memory requirements

    Systematic evaluation of patient-reported outcome (PRO) protocol content and reporting in UK cancer clinical trials: the EPiC study protocol.

    Get PDF
    Emerging evidence suggests that patient-reported outcome (PRO)-specific information may be omitted in trial protocols and that PRO results are poorly reported, limiting the use of PRO data to inform cancer care. This study aims to evaluate the standards of PRO-specific content in UK cancer trial protocols and their arising publications and to highlight examples of best-practice PRO protocol content and reporting where they occur. The objective of this study is to determine if these early findings are generalisable to UK cancer trials, and if so, how best we can bring about future improvements in clinical trials methodology to enhance the way PROs are assessed, managed and reported.Trials in which the primary end point is based on a PRO will have more complete PRO protocol and publication components than trials in which PROs are secondary end points.Completed National Institute for Health Research (NIHR) Portfolio Cancer clinical trials (all cancer specialities/age-groups) will be included if they contain a primary/secondary PRO end point. The NIHR portfolio includes cancer trials, supported by a range of funders, adjudged as high-quality clinical research studies. The sample will be drawn from studies completed between 31 December 2000 and 1 March 2014 (n=1141) to allow sufficient time for completion of the final trial report and publication. Two reviewers will then review the protocols and arising publications of included trials to: (1) determine the completeness of their PRO-specific protocol content; (2) determine the proportion and completeness of PRO reporting in UK Cancer trials and (3) model factors associated with PRO protocol and reporting completeness and with PRO reporting proportion.The study was approved by the ethics committee at University of Birmingham (ERN_15-0311). Trial findings will be disseminated via presentations at local, national and international conferences, peer-reviewed journals and social media including the CPROR twitter account and UOB departmental website (http://www.birmingham.ac.uk/cpro0r)

    Representative transcript sets for evaluating a translational initiation sites predictor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Translational initiation site (TIS) prediction is a very important and actively studied topic in bioinformatics. In order to complete a comparative analysis, it is desirable to have several benchmark data sets which can be used to test the effectiveness of different algorithms. An ideal benchmark data set should be reliable, representative and readily available. Preferably, proteins encoded by members of the data set should also be representative of the protein population actually expressed in cellular specimens.</p> <p>Results</p> <p>In this paper, we report a general algorithm for constructing a reliable sequence collection that only includes mRNA sequences whose corresponding protein products present an average profile of the general protein population of a given organism, with respect to three major structural parameters. Four representative transcript collections, each derived from a model organism, have been obtained following the algorithm we propose. Evaluation of these data sets shows that they are reasonable representations of the spectrum of proteins obtained from cellular proteomic studies. Six state-of-the-art predictors have been used to test the usefulness of the construction algorithm that we proposed. Comparative study which reports the predictors' performance on our data set as well as three other existing benchmark collections has demonstrated the actual merits of our data sets as benchmark testing collections.</p> <p>Conclusion</p> <p>The proposed data set construction algorithm has demonstrated its property of being a general and widely applicable scheme. Our comparison with published proteomic studies has shown that the expression of our data set of transcripts generates a polypeptide population that is representative of that obtained from evaluation of biological specimens. Our data set thus represents "real world" transcripts that will allow more accurate evaluation of algorithms dedicated to identification of TISs, as well as other translational regulatory motifs within mRNA sequences. The algorithm proposed by us aims at compiling a redundancy-free data set by removing redundant copies of homologous proteins. The existence of such data sets may be useful for conducting statistical analyses of protein sequence-structure relations. At the current stage, our approach's focus is to obtain an "average" protein data set for any particular organism without posing much selection bias. However, with the three major protein structural parameters deeply integrated into the scheme, it would be a trivial task to extend the current method for obtaining a more selective protein data set, which may facilitate the study of some particular protein structure.</p

    Electronic Patient Reporting of Adverse Events and Quality of Life: A Prospective Feasibility Study in General Oncology

    Get PDF
    PURPOSE: Adverse event (AE) reporting is essential in clinical trials. Clinician interpretation can result in under-reporting; therefore, the value of patient self-reporting has been recognized. The National Cancer Institute has developed a Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) for direct patient AE reporting. A nonrandomized prospective cohort feasibility study aimed to explore the compliance and acceptability of an electronic (Internet or telephone) system for collecting patient self-reported AEs and quality of life (QOL). METHODS: Oncology patients undergoing treatment (chemotherapy, targeted agents, hormone therapy, radiotherapy, and/or surgery) at 2 hospitals were sent automated weekly reminders to complete PRO-CTCAE once a week and QOL (for a maximum of 12 weeks). Patients had to speak/understand English and have access to the Internet or a touch-tone telephone. Primary outcome was compliance (proportion of expected questionnaires), and recruitment rate, attrition, and patient/staff feedback were also explored. RESULTS: Of 520 patients, 249 consented (47.9%)β€”mean age was 62 years, 51% were male, and 70% were marriedβ€”and 230 remained on the study at week 12. PRO-CTCAE was completed at 2,301 (74.9%) of 3,074 timepoints and QOL at 749 (79.1%) of 947 timepoints. Individual weekly/once every 4 weeks compliance reduced over time but was more than 60% throughout. Of 230 patients, 106 (46.1%) completed 13 or more PRO-CTCAE, and 136 (59.1%) of 230 patients completed 4 QOL questionnaires. Most were completed on the Internet (82.3%; mean age, 60.8 years), which was quicker, but older patients preferred the telephone option (mean age, 70.0 years). Positive feedback was received from patients and staff. CONCLUSION: Self-reporting of AEs and QOL using an electronic home-based system is feasible and acceptable. Implementation of this approach in cancer clinical trials may improve the precision and accuracy of AE reporting

    The complete sequences and gene organisation of the mitochondrial genomes of the heterodont bivalves Acanthocardia tuberculata and Hiatella arctica – and the first record for a putative Atpase subunit 8 gene in marine bivalves

    Get PDF
    BACKGROUND: Mitochondrial (mt) gene arrangement is highly variable among molluscs and especially among bivalves. Of the 30 complete molluscan mt-genomes published to date, only one is of a heterodont bivalve, although this is the most diverse taxon in terms of species numbers. We determined the complete sequence of the mitochondrial genomes of Acanthocardia tuberculata and Hiatella arctica, (Mollusca, Bivalvia, Heterodonta) and describe their gene contents and genome organisations to assess the variability of these features among the Bivalvia and their value for phylogenetic inference. RESULTS: The size of the mt-genome in Acanthocardia tuberculata is 16.104 basepairs (bp), and in Hiatella arctica 18.244 bp. The Acanthocardia mt-genome contains 12 of the typical protein coding genes, lacking the Atpase subunit 8 (atp8) gene, as all published marine bivalves. In contrast, a complete atp8 gene is present in Hiatella arctica. In addition, we found a putative truncated atp8 gene when re-annotating the mt-genome of Venerupis philippinarum. Both mt-genomes reported here encode all genes on the same strand and have an additional trnM. In Acanthocardia several large non-coding regions are present. One of these contains 3.5 nearly identical copies of a 167 bp motive. In Hiatella, the 3' end of the NADH dehydrogenase subunit (nad)6 gene is duplicated together with the adjacent non-coding region. The gene arrangement of Hiatella is markedly different from all other known molluscan mt-genomes, that of Acanthocardia shows few identities with the Venerupis philippinarum. Phylogenetic analyses on amino acid and nucleotide levels robustly support the Heterodonta and the sister group relationship of Acanthocardia and Venerupis. Monophyletic Bivalvia are resolved only by a Bayesian inference of the nucleotide data set. In all other analyses the two unionid species, being to only ones with genes located on both strands, do not group with the remaining bivalves. CONCLUSION: The two mt-genomes reported here add to and underline the high variability of gene order and presence of duplications in bivalve and molluscan taxa. Some genomic traits like the loss of the atp8 gene or the encoding of all genes on the same strand are homoplastic among the Bivalvia. These characters, gene order, and the nucleotide sequence data show considerable potential of resolving phylogenetic patterns at lower taxonomic levels

    Molecular cloning and expression profiling of a chalcone synthase gene from hairy root cultures of Scutellaria viscidula Bunge

    Get PDF
    A cDNA encoding chalcone synthase (CHS), the key enzyme in flavonoid biosynthesis, was isolated from hairy root cultures of Scutellaria viscidula Bunge by rapid amplification of cDNA ends (RACE). The full-length cDNA of S. viscidula CHS, designated as Svchs (GenBank accession no. EU386767), was 1649 bp with a 1170 bp open reading frame (ORF) that corresponded to a deduced protein of 390 amino acid residues, a calculated molecular mass of 42.56 kDa and a theoretical isoelectric point (pI) of 5.79. Multiple sequence alignments showed that SvCHS shared high homology with CHS from other plants. Functional analysis in silico indicated that SvCHS was a hydrophilic protein most likely associated with intermediate metabolism. The active sites of the malonyl-CoA binding motif, coumaroyl pocket and cyclization pocket in CHS of Medicago sativa were also found in SvCHS. Molecular modeling indicated that the secondary structure of SvCHS contained mainly Ξ±-helixes and random coils. Phylogenetic analysis showed that SvCHS was most closely related to CHS from Scutellaria baicalensis. In agreement with its function as an elicitor-responsive gene, the expression of Svchs was induced and coordinated by methyl jasmonate. To our knowledge, this is the first report to describe the isolation and expression of a gene from S. viscidula

    Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptome sequences provide a complement to structural genomic information and provide snapshots of an organism's transcriptional profile. Such sequences also represent an alternative method for characterizing neglected species that are not expected to undergo whole-genome sequencing. One difficulty for transcriptome sequencing of these organisms is the low quality of reads and incomplete coverage of transcripts, both of which compromise further bioinformatics analyses. Another complicating factor is the lack of known protein homologs, which frustrates searches against established protein databases. This lack of homologs may be caused by divergence from well-characterized and over-represented model organisms. Another explanation is that non-coding RNAs (ncRNAs) may be caught during sequencing. NcRNAs are RNA sequences that, unlike messenger RNAs, do not code for protein products and instead perform unique functions by folding into higher order structural conformations. There is ncRNA screening software available that is specific for transcriptome sequences, but their analyses are optimized for those transcriptomes that are well represented in protein databases, and also assume that input ESTs are full-length and high quality.</p> <p>Results</p> <p>We propose an algorithm called PORTRAIT, which is suitable for ncRNA analysis of transcriptomes from poorly characterized species. Sequences are translated by software that is resistant to sequencing errors, and the predicted putative proteins, along with their source transcripts, are evaluated for coding potential by a support vector machine (SVM). Either of two SVM models may be employed: if a putative protein is found, a protein-dependent SVM model is used; if it is not found, a protein-independent SVM model is used instead. Only <it>ab initio </it>features are extracted, so that no homology information is needed. We illustrate the use of PORTRAIT by predicting ncRNAs from the transcriptome of the pathogenic fungus <it>Paracoccidoides brasiliensis </it>and five other related fungi.</p> <p>Conclusion</p> <p>PORTRAIT can be integrated into pipelines, and provides a low computational cost solution for ncRNA detection in transcriptome sequencing projects.</p

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution
    • …
    corecore