776 research outputs found

    A Method for Improving the Accuracy and Efficiency of Bacteriophage Genome Annotation

    Get PDF
    Bacteriophages are the most numerous entities on Earth. The number of sequenced phage genomes is approximately 8000 and increasing rapidly. Sequencing of a genome is followed by annotation, where genes, start codons, and functions are putatively identified. The mainstays of phage genome annotation are auto-annotation programs such as Glimmer and GeneMark. Due to the relatively small size of phage genomes, many groups choose to manually curate auto-annotation results to increase accuracy. An additional benefit of manual curation of auto-annotated phage genomes is that the process is amenable to be performed by students, and has been shown to improve student recruitment to the sciences. However, despite its greater accuracy and pedagogical value, manual curation suffers from high labor cost, lack of standardization and a degree of subjectivity in decision making, and susceptibility to mistakes. Here, we present a method developed in our lab that is designed to produce accurate annotations while reducing subjectivity and providing a degree of standardization in decision-making. We show that our method produces genome annotations more accurate than auto-annotation programs while retaining the pedagogical benefits of manual genome curation

    Comparison of Bacteriophage Annotation Methods

    Full text link
    The rise of antibiotic-resistant bacteria has increased interest in bacteriophages (viruses that kill bacteria) in recent years. Due to the decreasing cost of genome sequencing, the number of sequenced phage genomes is growing at a geometric rate. Sequencing is followed by annotation, in which genes, start codons, and putative protein functions are identified. Most phage genomes are auto-annotated with programs designed for prokaryotes. Accuracy metrics for these programs with regard to phage genomes are not available. The genome of Escherichia coli phage Lambda was used to benchmark the accuracy of several genome annotation methods and programs. Discovered in 1951, Lambda is the most well studied phage, with nearly all gene functions and start sites demonstrated experimentally. Eight programs were used to annotate the Lambda genome: Glimmer, BASys, RAST, GeneMark, GeneMark.hmm, GeneMarkS, GeneMarkS2, and GeneMark with Heuristic models. Calls were compared to the reference genome from the literature in order to determine the accuracy of the eight selected programs in regard to bacteriophage genome annotation. Manual curation and compilation of auto-annotation results obtained from several programs is expected to yield more accurate gene feature and start codon prediction than auto-annotation alone

    Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

    No full text
    Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval

    Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

    No full text
    This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library

    Labeling expressive speech in L2 Italian: the role of prosody in auto-and external annotation

    Get PDF
    The present study is intended to compare two approaches of labeling expressive corpora: auto-annotation and annotation by external lay listeners. These two methods have been applied to the semi-spontaneous emotional speech produced by Chinese learners of L2 Italian, involved in the CardTask, a moodinduction procedure that allows us to control the context of interaction, preserving the spontaneity of reactions. The emotional responses to the stimuli presented in the task were the object of an auto-annotation session. The same samples were then administered only in the auditory mode to 20 Italian and 20 Chinese lay listeners. The results of perceptual tests have underlined some similarities and differences between both auto- and external annotation, and between the ratings given by external Italian and Chinese listeners. The labels chosen by native Italians were similar to those selected in the auto-annotation session, particularly in the case of anxiety, fear and disgust. The correspondence between the results of the two annotation methods may be ascribed to the different prosodic patterns characterizing the emotional states. The results of the annotation made by Chinese listeners show that they found it hard to give a specific emotional label to utterances produced in a second language relying solely on prosodic patterns

    On image auto-annotation with latent space models

    Get PDF

    Bioinformatic Investigation into Mycobacterium phage DuncansLeg

    Get PDF
    Bacteriophage research is increasingly important to perform as antibacterial resistance becomes more common. The novel phage DuncansLeg was isolated and sequenced by students in the HHMI SEA-PHAGES Phage Discovery course in the fall of 2021 at Coastal Carolina University\u27s campus. The DNA sequence of DuncansLeg (75,593 base pairs) was subjected to bioinformatic auto-annotation, which placed the phage into subcluster L3. The scope of this investigation goes beyond lab work and discovery, instead focusing on applying multiple bioinformatic approaches to refine the genomic auto-annotation and assign potential gene functions where possible. To this end, the bioinformatic programs used to identify coding potential and gene starts were DNA Master, GeneMark, Starterator, and Phamerator. For the assignment of gene functions, pBLAST, HHpred, and synteny data were used in combination to provide evidence for functionality if possible. PECAAN software was then used to centralize data for further analytics. The results of these analyses and specific genomic regions will be discussed in this presentation of data

    TACT: Transcriptome Auto-annotation Conducting Tool of H-InvDB

    Get PDF
    Transcriptome Auto-annotation Conducting Tool (TACT) is a newly developed web-based automated tool for conducting functional annotation of transcripts by the integration of sequence similarity searches and functional motif predictions. We developed the TACT system by integrating two kinds of similarity searches, FASTY and BLASTX, against protein sequence databases, UniProtKB (Swiss-Prot/TrEMBL) and RefSeq, and a unified motif prediction program, InterProScan, into the ORF-prediction pipeline originally designed for the ‘H-Invitational’ human transcriptome annotation project. This system successively applies these constituent programs to an mRNA sequence in order to predict the most plausible ORF and the function of the protein encoded. In this study, we applied the TACT system to 19 574 non-redundant human transcripts registered in H-InvDB and evaluated its predictive power by the degree of agreement with human-curated functional annotation in H-InvDB. As a result, the TACT system could assign functional description to 12 559 transcripts (64.2%), the remainder being hypothetical proteins. Furthermore, the overall agreement of functional annotation with H-InvDB, including those transcripts annotated as hypothetical proteins, was 83.9% (16 432/19 574). These results show that the TACT system is useful for functional annotation and that the prediction of ORFs and protein functions is highly accurate and close to the results of human curation. TACT is freely available at

    Mind the Gap: Another look at the problem of the semantic gap in image retrieval

    No full text
    This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down
    corecore