Search CORE

A comprehensive functional map of the hepatitis C virus genome provides a resource for probing viral proteins.

Author: Arumugaswami Vaithilingaraja
Chen Shu-Hwa
Chen Zugen
Chu Virginia
Lin Chung-Yen
Lo Hung-Hao
Olson C Anders
Qi Hangfei
Remenyi Roland
Stokelman Tamar
Su Sheng-Yao
Sun Ren
Truong Shawna
Wu Nicholas C
Wu Ting-Ting
Publication venue: eScholarship, University of California
Publication date: 01/09/2014
Field of study

UnlabelledPairing high-throughput sequencing technologies with high-throughput mutagenesis enables genome-wide investigations of pathogenic organisms. Knowledge of the specific functions of protein domains encoded by the genome of the hepatitis C virus (HCV), a major human pathogen that contributes to liver disease worldwide, remains limited to insight from small-scale studies. To enhance the capabilities of HCV researchers, we have obtained a high-resolution functional map of the entire viral genome by combining transposon-based insertional mutagenesis with next-generation sequencing. We generated a library of 8,398 mutagenized HCV clones, each containing one 15-nucleotide sequence inserted at a unique genomic position. We passaged this library in hepatic cells, recovered virus pools, and simultaneously assayed the abundance of mutant viruses in each pool by next-generation sequencing. To illustrate the validity of the functional profile, we compared the genetic footprints of viral proteins with previously solved protein structures. Moreover, we show the utility of these genetic footprints in the identification of candidate regions for epitope tag insertion. In a second application, we screened the genetic footprints for phenotypes that reflected defects in later steps of the viral life cycle. We confirmed that viruses with insertions in a region of the nonstructural protein NS4B had a defect in infectivity while maintaining genome replication. Overall, our genome-wide HCV mutant library and the genetic footprints obtained by high-resolution profiling represent valuable new resources for the research community that can direct the attention of investigators toward unidentified roles of individual protein domains.ImportanceOur insertional mutagenesis library provides a resource that illustrates the effects of relatively small insertions on local protein structure and HCV viability. We have also generated complementary resources, including a website (http://hangfei.bol.ucla.edu) and a panel of epitope-tagged mutant viruses that should enhance the research capabilities of investigators studying HCV. Researchers can now detect epitope-tagged viral proteins by established antibodies, which will allow biochemical studies of HCV proteins for which antibodies are not readily available. Furthermore, researchers can now quickly look up genotype-phenotype relationships and base further mechanistic studies on the residue-by-residue information from the functional profile. More broadly, this approach offers a general strategy for the systematic functional characterization of viruses on the genome scale

eScholarship - University of California

Cartilage-selective genes identified in genome-scale analysis of non-cartilage and cartilage gene expression

Author: Chen Zugen
Cohn Daniel H
Cohn Zachary A
Day Allen
Funari Vincent A
Krakow Deborah
Nelson Stanley F
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Cartilage plays a fundamental role in the development of the human skeleton. Early in embryogenesis, mesenchymal cells condense and differentiate into chondrocytes to shape the early skeleton. Subsequently, the cartilage anlagen differentiate to form the growth plates, which are responsible for linear bone growth, and the articular chondrocytes, which facilitate joint function. However, despite the multiplicity of roles of cartilage during human fetal life, surprisingly little is known about its transcriptome. To address this, a whole genome microarray expression profile was generated using RNA isolated from 18–22 week human distal femur fetal cartilage and compared with a database of control normal human tissues aggregated at UCLA, termed Celsius. Results 161 cartilage-selective genes were identified, defined as genes significantly expressed in cartilage with low expression and little variation across a panel of 34 non-cartilage tissues. Among these 161 genes were cartilage-specific genes such as cartilage collagen genes and 25 genes which have been associated with skeletal phenotypes in humans and/or mice. Many of the other cartilage-selective genes do not have established roles in cartilage or are novel, unannotated genes. Quantitative RT-PCR confirmed the unique pattern of gene expression observed by microarray analysis. Conclusion Defining the gene expression pattern for cartilage has identified new genes that may contribute to human skeletogenesis as well as provided further candidate genes for skeletal dysplasias. The data suggest that fetal cartilage is a complex and transcriptionally active tissue and demonstrate that the set of genes selectively expressed in the tissue has been greatly underestimated.</p

Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing

Author: Chen Zugen
Cohn Daniel H
Funari Vincent A
Homer Nils
Lee Hane
Merriman Barry
Nelson Stanley F
O'Connor Brian D
Publication venue: BioMed Central
Publication date: 01/12/2009
Field of study

Abstract Background The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance. Results Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions. Conclusions The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.</p

Ribosomal Proteins RPS11 and RPS20, Two Stress-Response Markers of Glioblastoma Stem Cells, Are Novel Predictors of Poor Prognosis in Glioblastoma Patients.

Author: Chen Zugen
Cloughesy Timothy F
Lai Albert
Liau Linda M
Lucey Gregory M
Mareninov Sergey
Menjivar Jimmy C
Nelson Stanley F
Shabihkhani Maryam
Telesca Donatello
Tso Cho-Lea
Tso Jonathan L
Wei Bowen
Yang Shuai
Yong William H
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Glioblastoma stem cells (GSC) co-exhibiting a tumor-initiating capacity and a radio-chemoresistant phenotype, are a compelling cell model for explaining tumor recurrence. We have previously characterized patient-derived, treatment-resistant GSC clones (TRGC) that survived radiochemotherapy. Compared to glucose-dependent, treatment-sensitive GSC clones (TSGC), TRGC exhibited reduced glucose dependence that favor the fatty acid oxidation pathway as their energy source. Using comparative genome-wide transcriptome analysis, a series of defense signatures associated with TRGC survival were identified and verified by siRNA-based gene knockdown experiments that led to loss of cell integrity. In this study, we investigate the prognostic value of defense signatures in glioblastoma (GBM) patients using gene expression analysis with Probeset Analyzer (131 GBM) and The Cancer Genome Atlas (TCGA) data, and protein expression with a tissue microarray (50 GBM), yielding the first TRGC-derived prognostic biomarkers for GBM patients. Ribosomal protein S11 (RPS11), RPS20, individually and together, consistently predicted poor survival of newly diagnosed primary GBM tumors when overexpressed at the RNA or protein level [RPS11: Hazard Ratio (HR) = 11.5, p<0.001; RPS20: HR = 4.5, p = 0.03; RPS11+RPS20: HR = 17.99, p = 0.001]. The prognostic significance of RPS11 and RPS20 was further supported by whole tissue section RPS11 immunostaining (27 GBM; HR = 4.05, p = 0.01) and TCGA gene expression data (578 primary GBM; RPS11: HR = 1.19, p = 0.06; RPS20: HR = 1.25, p = 0.02; RPS11+RPS20: HR = 1.43, p = 0.01). Moreover, tumors that exhibited unmethylated O-6-methylguanine-DNA methyltransferase (MGMT) or wild-type isocitrate dehydrogenase 1 (IDH1) were associated with higher RPS11 expression levels [corr (IDH1, RPS11) = 0.64, p = 0.03); [corr (MGMT, RPS11) = 0.52, p = 0.04]. These data indicate that increased expression of RPS11 and RPS20 predicts shorter patient survival. The study also suggests that TRGC are clinically relevant cells that represent resistant tumorigenic clones from patient tumors and that their properties, at least in part, are reflected in poor-prognosis GBM. The screening of TRGC signatures may represent a novel alternative strategy for identifying new prognostic biomarkers

eScholarship - University of California

Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods

Author: Cai Yu-Dong
Chen Xiaoyun
Chen Zugen
Chou Kuo-Chen
Hu Lele
Jiang Nan
Li Li
Liu Guiyou
Song Hui
Tan Ming
Wang Ping
Xu Jianyong
Zheng Wen
Publication venue: Public Library of Science
Publication date
Field of study

Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of ‘nature's antibiotics’ is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/

High-throughput sequencing of the DBA/2J mouse genome

Author: Daniel C Ciobanu
Deanna M Church
Donald B Thomason
JL Peirce
John A Capra
Katherine S Pollard
Khyobeni Mozhui
Lu Lu
Megan K Mulligan
RH Waterston
Richa Agarwala
Robert W Williams
Stanley F Nelson
Williams L Taylor
Xusheng Wang
Zhengsheng Li
Zugen Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The DBA/2J mouse is not only the oldest inbred strain, but also one of the most widely used strains. DBA/2J exhibits many unique anatomical, physiological, and behavior traits. In addition, DBA/2J is one parent of the large BXD family of recombinant inbred strains [1]. The genome of the other parent of this BXD family— C57BL/6J—has been sequenced and serves as the mouse reference genome [2]. We sequenced the genome of DBA/2J using SOLiD and Illumina high throughput short read protocols to generate a comprehensive set of ~5 million sequence variants segregating in the BXD family that ultimately cause developmental, anatomical, functional and behavioral differences among these 80+ strains

DigitalCommons@University of Nebraska

Public Library of Science (PLOS)

Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

Author: A Futschik
A Srivatsan
C Herring
C Honisch
C Lee
CE Bonferroni
Christopher J. Lee
D Lee
D Smith
E Jones
G Velicer
H Li
Iara M. P. Machado
J Barrick
J Barrick
J Cridland
J Klockgether
J Miller
J Ohnishi
James C. Liao
JL Cocchiaro
K Holt
K McKernan
Marc A. Harper
O Harismendy
P Chen
P Cock
Raphael Valdivia
S Atsumi
S Atsumi
S Atsumi
S Le Crom
Stanley F. Nelson
T Conrad
T Hanai
Traci Toy
Zugen Chen
Publication venue: Public Library of Science
Publication date: 18/02/2011
Field of study

Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost

7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only

1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only

110–

340

A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array

Author: AA Margolin
AB Olshen
D Pinkel
David T Wong
Dione K Bailey
ES Lander
F Picard
H Willenbrock
Hui Ye
J Fridlyand
J Huang
J Huang
J Liu
JC Marioni
K Jong
Ker-Chau Li
M Khojasteh
M Lin
OC Lingjaerde
OM Rueda
P Broet
P Hupe
PH Eilers
RS Daruwala
Sharoni Jacobs
SP Shah
SY Kim
Tianwei Yu
TS Price
Wei Sun
WR Lai
X Zhou
X Zhou
Xiaofeng Zhou
Y Lai
Y Nannya
Zugen Chen
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required. Results We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step. Conclusion Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.</p