25 research outputs found

    Plastid Transcript Editing across Dinoflagellate Lineages Shows Lineage-Specific Application but Conserved Trends.

    Get PDF
    Dinoflagellates are a group of unicellular protists with immense ecological and evolutionary significance and cell biological diversity. Of the photosynthetic dinoflagellates, the majority possess a plastid containing the pigment peridinin, whereas some lineages have replaced this plastid by serial endosymbiosis with plastids of distinct evolutionary affiliations, including a fucoxanthin pigment-containing plastid of haptophyte origin. Previous studies have described the presence of widespread substitutional RNA editing in peridinin and fucoxanthin plastid genes. Because reports of this process have been limited to manual assessment of individual lineages, global trends concerning this RNA editing and its effect on the biological function of the plastid are largely unknown. Using novel bioinformatic methods, we examine the dynamics and evolution of RNA editing over a large multispecies data set of dinoflagellates, including novel sequence data from the peridinin dinoflagellate Pyrocystis lunula and the fucoxanthin dinoflagellate Karenia mikimotoi. We demonstrate that while most individual RNA editing events in dinoflagellate plastids are restricted to single species, global patterns, and functional consequences of editing are broadly conserved. We find that editing is biased toward specific codon positions and regions of genes, and generally corrects otherwise deleterious changes in the genome prior to translation, though this effect is more prevalent in peridinin than fucoxanthin lineages. Our results support a model for promiscuous editing application subsequently shaped by purifying selection, and suggest the presence of an underlying editing mechanism transferred from the peridinin-containing ancestor into fucoxanthin plastids postendosymbiosis, with remarkably conserved functional consequences in the new lineage

    Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

    Get PDF
    Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROCn) score, the area under the ROC curve (AUC) of a ā€˜pooledā€™ ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROCn score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROCn score can be very sensitive to retrieval results from as little as a single query

    Creating an oer collection of automatically scored practice exercises for computer science

    No full text
    Ā© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. Open Educational Resources (OER) is one way to help reduce the cost of higher education. We created a repository of 90 (and growing) practice problems for learning introductory programming using Python 3. In order to provide immediate feedback to learners as well as alleviate the scoring burden on instructors, these exercises include tests for a popular automatic web based scoring platform. We have been using and refining these materials for the past four semesters and collecting student user survey data. Overall, students have had a positive reaction to the practice format

    MultiDomainBenchmark: a multi-domain query and subject database suite

    No full text
    Abstract Background Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. Description This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. Conclusion MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/
    corecore