61,001 research outputs found

    REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

    Get PDF
    Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

    N-terminal proteomics assisted profiling of the unexplored translation initiation landscape in Arabidopsis thaliana

    Get PDF
    Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well-and poorly-annotated genomes

    Matchmaking for covariant hierarchies

    Get PDF
    We describe a model of matchmaking suitable for the implementation of services, rather than their for their discovery and composition. In the model, processing requirements are modelled by client requests and computational resources are software processors that compete for request processing as the covariant implementations of an open service interface. Matchmaking then relies on type analysis to rank processors against requests in support of a wide range of dispatch strategies. We relate the model to the autonomicity of service provision and briefly report on its deployment within a production-level infrastructure for scientic computing

    College admissions with stable score-limits

    Get PDF
    A common feature of the Hungarian, Irish, Spanish and Turkish higher education admission systems is that the students apply for programmes and they are ranked according to their scores. Students who apply for a programme with the same score are in a tie. Ties are broken by lottery in Ireland, by objective factors in Turkey (such as date of birth) and other precisely defined rules in Spain. In Hungary, however, an equal treatment policy is used, students applying for a programme with the same score are all accepted or rejected together. In such a situation there is only one question to decide, whether or not to admit the last group of applicants with the same score who are at the boundary of the quota. Both concepts can be described in terms of stable score-limits. The strict rejection of the last group with whom a quota would be violated corresponds to the concept of H-stable (i.e. higher-stable) score-limits that is currently used in Hungary. We call the other solutions based on the less strict admission policy as L-stable (i.e. lower-stable) score-limits. We show that the natural extensions of the Gale-Shapley algorithms produce stable score-limits, moreover, the applicant-oriented versions result in the lowest score-limits (thus optimal for students) and the college-oriented versions result in the highest score-limits with regard to each concept. When comparing the applicant-optimal H-stable and L-stable score-limits we prove that the former limits are always higher for every college. Furthermore, these two solutions provide upper and lower bounds for any solution arising from a tie-breaking strategy. Finally we show that both the H-stable and the L-stable applicant-proposing scorelimit algorithms are manipulable

    On Region Algebras, XML Databases, and Information Retrieval

    Get PDF
    This paper describes some new ideas on developing a logical algebra for databases that manage textual data and support information retrieval functionality. We describe a first prototype of such a system
    • ā€¦
    corecore