Search CORE

13 research outputs found

A novel approach to sequence validating protein expression clones with automated decision making

Author: Hu Yanhui
LaBaer Joshua
Mohr Stephanie E
Rolfs Andreas
Taycher Elena
Williamson Janice
Zuo Dongmei
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation. Results We have developed an Automated Clone Evaluation (ACE) system – the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set. Conclusion ACE was designed to facilitate high throughput clone sequence verification projects. The software has been used successfully to evaluate more than 55,000 clones at the Harvard Institute of Proteomics. The software dramatically reduced the amount of time and labor required to evaluate clone sequences and decreased the number of missed sequence discrepancies, which commonly occur during manual evaluation. In addition, ACE helped to reduce the number of sequencing reads needed to achieve adequate coverage for making decisions on clones.</p

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

A Full-Genomic Sequence-Verified Protein-Coding Gene Collection for Francisella tularensis

Author: Brizuela Leonardo
Fernandez Mauricio J.
Hu Yanhui
Jepson Daniel Ashley
Kelley Fontina
LaBaer Joshua
McCarron Seamus
Mohr Stephanie
Moreira Donna
Murthy Tal
Petrosino Joseph
Raphael Jacob
Rolfs Andreas
Shi Zhenwei
Taycher Elena
Zuo Dongmei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/04/2011
Field of study

The rapid development of new technologies for the high throughput (HT) study of proteins has increased the demand for comprehensive plasmid clone resources that support protein expression. These clones must be full-length, sequence-verified and in a flexible format. The generation of these resources requires automated pipelines supported by software management systems. Although the availability of clone resources is growing, current collections are either not complete or not fully sequence-verified. We report an automated pipeline, supported by several software applications that enabled the construction of the first comprehensive sequence-verified plasmid clone resource for more than 96% of protein coding sequences of the genome of F. tularensis, a highly virulent human pathogen and the causative agent of tularemia. This clone resource was applied to a HT protein purification pipeline successfully producing recombinant proteins for 72% of the genes. These methods and resources represent significant technological steps towards exploiting the genomic information of F. tularensis in discovery applications

Harvard University - DASH

Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community

Author: Altschul
Andreas Rolfs
Bennett
Berman
Blommel
Blommel
Catherine Y. Cormier
Chen
Dongmei Zuo
Donnelly
Dove
Elena Taycher
Fontina Kelley
Greggory Turnbull
Heller
Hu
Jason Kramer
Joshua LaBaer
Kouranov
Ku
LaBaer
Michael Fiacco
Murthy
Rai
Rodriguez
Rolfs
Rolfs
Services National Institutes of Health/Department of Health and Human Services
Stephanie E. Mohr
Stols
Streitz
Thao
Vinarov
Walsh
Yanhui Hu
Zuo
Publication venue: Oxford University Press
Publication date: 22/12/2010
Field of study

The Protein Structure Initiative Material Repository (PSI-MR; http://psimr.asu.edu) provides centralized storage and distribution for the protein expression plasmids created by PSI researchers. These plasmids are a resource that allows the research community to dissect the biological function of proteins whose structures have been identified by the PSI. The plasmid annotation, which includes the full length sequence, vector information and associated publications, is stored in a freely available, searchable database called DNASU (http://dnasu.asu.edu). Each PSI plasmid is also linked to a variety of additional resources, which facilitates cross-referencing of a particular plasmid to protein annotations and experimental data. Plasmid samples can be requested directly through the website. We have also developed a novel strategy to avoid the most common concern encountered when distributing plasmids namely, the complexity of material transfer agreement (MTA) processing and the resulting delays this causes. The Expedited Process MTA, in which we created a network of institutions that agree to the terms of transfer in advance of a material request, eliminates these delays. Our hope is that by creating a repository of expression-ready plasmids and expediting the process for receiving these plasmids, we will help accelerate the accessibility and pace of scientific discovery

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

A Biomedically Enriched Collection of 7000 Human ORF Clones

Author: A Baross
AE Witt
Andreas Hoerlein
Andreas Rolfs
Bernhard Korn
Binghua Shen
Craig DeLoughery
Daniel A. Jepson
Dietmar Hoffmann
Dongmei Zuo
DS Gerhard
E Pennisi
E Taycher
Elena Taycher
Fontina Kelley
G Temple
J Park
Jacob Raphael
JE Collins
JF Rual
Joseph Pearlberg
Joshua LaBaer
KD Pruitt
KD Pruitt
KD Pruitt
KD Pruitt
Lars Ebert
Munira M. A. Baqui
N Ramachandran
Niro Ramachandran
OJ Harrison
P De Los Rios
P Lamesch
R Staden
RL Strausberg
RS Hegde
S Haas
Seamus McCarron
Suzannah Rutherford
T Murthy
Y Hu
Y Hu
Yanhui Hu
Publication venue: Public Library of Science
Publication date
Field of study

We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in ∼15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance

Crossref

Directory of Open Access Journals

PubMed Central

A Full-Genomic Sequence-Verified Protein-Coding Gene Collection for Francisella tularensis

Author: A Ecker
A Matsuyama
A Sjostedt
AE Witt
AH Fortier
Andreas Rolfs
D Serruto
Daniel Jepson
Dongmei Zuo
Donna Moreira
DT Dennis
Elena Taycher
Fontina Kelley
G Temple
J LaBaer
J Park
J Reboul
JA Heyman
Jacob Raphael
JF Rual
Joseph Petrosino
Joshua LaBaer
JR Parrish
L Dieckman
Leonardo Brizuela
Mauricio Fernandez
MK McLendon
P Braun
P Braun
P Larsson
Seamus McCarron
SP Chambers
Stephanie E. Mohr
Tal Murthy
TV Murthy
Y Hu
Yanhui Hu
Zhenwei Shi
Publication venue: Public Library of Science
Publication date: 01/06/2007
Field of study

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A novel approach to sequence validating protein expression clones with automated decision making-1

Author: Andreas Rolfs (30154)
Dongmei Zuo (30163)
Elena Taycher (30162)
Janice Williamson (76594)
Joshua LaBaer (30167)
Stephanie E Mohr (76593)
Yanhui Hu (30155)
Publication venue
Publication date
Field of study

Copyright information:Taken from "A novel approach to sequence validating protein expression clones with automated decision making"http://www.biomedcentral.com/1471-2105/8/198BMC Bioinformatics 2007;8():198-198.Published online 13 Jun 2007PMCID:PMC1914086.ed number of discrepancies of each type. Different values can be set for discrepancies of low and high confidence. The user sets values for two thresholds – one that triggers a manual review, and one that automatically rejects the clone. Users can also opt to handle conservative and non-conservative amino acids substitutions separately or to treat all amino acid changes as one type. Once the settings are created, users can name the set and store it for future use. In this way, users may create different acceptance criteria for different purposes. Thus, a single collection of clones can be evaluated by different acceptance criteria by invoking these named sets. The criteria shown here are used routinely for determining final acceptance of clones. The numbers in the boxes indicate the absolute number of the indicated type of discrepancy for inclusion in that category. As indicated, this set of criteria does not distinguish between conservative and non-conservative missense mutations. Any clones with 1 or 0 high confidence missense substitution(s) are automatically accepted (as long as they have no other discrepancies that prevent automatic acceptance). Clones with 3 or more high-confidence missense substitutions are automatically rejected; if the clones have 2 they are triaged for additional sequencing or manual analysis. A higher bar is set to automatically reject clones based on low-confidence substitutions (10 or more), because many of these will be resolved with further sequencing. Similarly, this parameter set automatically passes clones only if they have no frameshift discrepancies of any type. Clones with 1 high-confidence or 9 low-confidence frameshift discrepancies or more are automatically rejected. Clones must meet all the pass criteria for automatic acceptance, whereas clones that meet any automatic fail criteria are automatically failed

FigShare

A novel approach to sequence validating protein expression clones with automated decision making-3

Author: Andreas Rolfs (30154)
Dongmei Zuo (30163)
Elena Taycher (30162)
Janice Williamson (76594)
Joshua LaBaer (30167)
Stephanie E Mohr (76593)
Yanhui Hu (30155)
Publication venue
Publication date
Field of study

FigShare