Search CORE

Patterns in the sequence context of protein disulfide bonds

Author: Eklund Aron Charles, 1974-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2002
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Biology, February 2002.Includes bibliographical references (leaves 60-62).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Disulfide bonds play an important role in the structural stability of the proteins that contain them. Yet, little is known about the specificity with which they are formed. To address this, a representative set of disulfide bonds from nonhomologous eukaryotic polypeptides was created. The amino acid sequences flanking these disulfide bonds were searched for conserved patterns that may reflect recognition sites by the disulfide bond forming enzyme protein disulfide isomerase (PDI). Several methods of classifying disulfide bonds were explored, and each class was analyzed for conserved sequence patterns. To maximize the chances of finding a conserved recognition site, a simulated annealing algorithm was implemented to divide a set of disulfide-bonded cysteines into two sets of cysteines with an average sequence environment that is as far from randomly-distributed as possible. No significant conserved patterns were found in the set of disulfide bonds or within any of the classification schemes introduced. Additionally, several methods for predicting disulfide bond connectivity were explored. The most successful methods predicted connectivity based on the sequential distance between cysteines.by Aron Charles Eklund.S.M

DSpace@MIT

Optimization of the BLASTN substitution matrix for prediction of non-specific DNA microarray hybridization

Author: Eklund Aron C.
Friis Pia
Szallasi Zoltan
Wernersson Rasmus
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

DNA microarray measurements are susceptible to error caused by non-specific hybridization between a probe and a target (cross-hybridization), or between two targets (bulk-hybridization). Search algorithms such as BLASTN can quickly identify potentially hybridizing sequences. We set out to improve BLASTN accuracy by modifying the substitution matrix and gap penalties. We generated gene expression microarray data for samples in which 1 or 10% of the target mass was an exogenous spike of known sequence. We found that the 10% spike induced 2-fold intensity changes in 3% of the probes, two-third of which were decreases in intensity likely caused by bulk-hybridization. These changes were correlated with similarity between the spike and probe sequences. Interestingly, even very weak similarities tended to induce a change in probe intensity with the 10% spike. Using this data, we optimized the BLASTN substitution matrix to more accurately identify probes susceptible to non-specific hybridization with the spike. Relative to the default substitution matrix, the optimized matrix features a decreased score for A–T base pairs relative to G–C base pairs, resulting in a 5–15% increase in area under the ROC curve for identifying affected probes. This optimized matrix may be useful in the design of microarray probes, and in other BLASTN-based searches for hybridization partners

CiteSeerX

Jetset: selecting the optimal microarray probe set to represent a gene

Author: Birkbak Nicolai J
Eklund Aron C
Gyorffy Balazs
Li Qiyuan
Szallasi Zoltan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Interpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task. Results We developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes that were nominally detected by multiple probe sets, and we found that the probe set chosen by our method showed stronger concordance. Conclusions This method provides a simple, unambiguous mapping to allow assessment of the expression levels of specific genes of interest.</p

Springer - Publisher Connector

UCL Discovery

Semmelweis Repository

Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples

Author: Aron C. Eklund
Balazs Gyorffy
Bela Molnar
Chad Creighton
Hermann Lage
Zoltan Szallasi
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

BACKGROUND Several preprocessing algorithms for Affymetrix gene expression microarrays have been developed, and their performance on spike-in data sets has been evaluated previously. However, a comprehensive comparison of preprocessing algorithms on samples taken under research conditions has not been performed. METHODOLOGY/PRINCIPAL FINDINGS We used TaqMan RT-PCR arrays as a reference to evaluate the accuracy of expression values from Affymetrix microarrays in two experimental data sets: one comprising 84 genes in 36 colon biopsies, and the other comprising 75 genes in 29 cancer cell lines. We evaluated consistency using the Pearson correlation between measurements obtained on the two platforms. Also, we introduce the log-ratio discrepancy as a more relevant measure of discordance between gene expression platforms. Of nine preprocessing algorithms tested, PLIER+16 produced expression values that were most consistent with RT-PCR measurements, although the difference in performance between most of the algorithms was not statistically significant. CONCLUSIONS/SIGNIFICANCE Our results support the choice of PLIER+16 for the preprocessing of clinical Affymetrix microarray data. However, other algorithms performed similarly and are probably also good choices

Repository of the Academy's Library

Semmelweis Repository

Recommended from our members

Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements

Author: Carter Scott L
Eklund Aron C
Kohane Isaac S
Mecham Brigham H
Szallasi Zoltan
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Comparison of data produced on different microarray platforms often shows surprising discordance. It is not clear whether this discrepancy is caused by noisy data or by improper probe matching between platforms. We investigated whether the significant level of inconsistency between results produced by alternative gene expression microarray platforms could be reduced by stringent sequence matching of microarray probes. We mapped the short oligo probes of the Affymetrix platform onto cDNA clones of the Stanford microarray platform. Affymetrix probes were reassigned to redefined probe sets if they mapped to the same cDNA clone sequence, regardless of the original manufacturer-defined grouping. The NCI-60 gene expression profiles produced by Affymetrix HuFL platform were recalculated using these redefined probe sets and compared to previously published cDNA measurements of the same panel of RNA samples. RESULTS: The redefined probe sets displayed a substantially higher level of cross-platform consistency at the level of gene correlation, cell line correlation and unsupervised hierarchical clustering. The same strategy allowed an almost complete correspondence of breast cancer subtype classification between Affymetrix gene chip and cDNA microarray derived gene expression data, and gave an increased level of similarity between normal lung derived gene expression profiles using the two technologies. In total, two Affymetrix gene-chip platforms were remapped to three cDNA platforms in the various cross-platform analyses, resulting in improved concordance in each case. CONCLUSION: We have shown that probes which target overlapping transcript sequence regions on cDNA microarrays and Affymetrix gene-chips exhibit a greater level of concordance than the corresponding Unigene or sequence matched features. This method will be useful for the integrated analysis of gene expression data generated by multiple disparate measurement platforms

Springer - Publisher Connector

Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data

Author: Birkbak Nicolai Juul
Eklund Aron Charles
Favero Francesco
Joshi Tejal
Krzystanek Marcin
Li Qiyuan
Marquard Andrea Marion
Szallasi Zoltan Imre
Publication venue
Publication date: 01/01/2015
Field of study

Method for identification of tissue or organ localization of a tumour

Author: Birkbak Nicolai Juul
Eklund Aron Charles
Marquard Andrea Marion
Szallasi Zoltan Imre
Publication venue
Publication date: 23/06/2016
Field of study

The invention relates to a method for predicting the localization of a primary tumour, wherein said method comprises the use of genomic profile data, and wherein the method is capable of predicting the type of cancer by a classification score ranking among a variety of the possible tumour types.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

An analysis of natural T cell responses to predicted tumor neoepitopes

Author: Anne-Mette Bjerregaard
Aron Charles Eklund
Carolina M. Barra
Morten Nielsen
Morten Nielsen
Sine Reker Hadrup
Vanessa Jurtz
Zoltan Szallasi
Zoltan Szallasi
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2017
Field of study

Personalization of cancer immunotherapies such as therapeutic vaccines and adoptive T-cell therapy may benefit from efficient identification and targeting of patient-specific neoepitopes. However, current neoepitope prediction methods based on sequencing and predictions of epitope processing and presentation result in a low rate of validation, suggesting that the determinants of peptide immunogenicity are not well understood. We gathered published data on human neopeptides originating from single amino acid substitutions for which T cell reactivity had been experimentally tested, including both immunogenic and non-immunogenic neopeptides. Out of 1,948 neopeptide-HLA (human leukocyte antigen) combinations from 13 publications, 53 were reported to elicit a T cell response. From these data, we found an enrichment for responses among peptides of length 9. Even though the peptides had been pre-selected based on presumed likelihood of being immunogenic, we found using NetMHCpan-4.0 that immunogenic neopeptides were predicted to bind significantly more strongly to HLA compared to non-immunogenic peptides. Investigation of the HLA binding strength of the immunogenic peptides revealed that the vast majority (96%) shared very strong predicted binding to HLA and that the binding strength was comparable to that observed for pathogen-derived epitopes. Finally, we found that neopeptide dissimilarity to self is a predictor of immunogenicity in situations where neo- and normal peptides share comparable predicted binding strength. In conclusion, these results suggest new strategies for prioritization of mutated peptides, but new data will be needed to confirm their value.Fil: Bjerregaard, Anne-Mette. Technical University of Denmark; DinamarcaFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; DinamarcaFil: Jurtz, Vanessa. Technical University of Denmark; DinamarcaFil: Barra, Carolina M.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Hadrup, Sine Reker. Technical University of Denmark; DinamarcaFil: Szallasi, Zoltan. Technical University of Denmark; Dinamarca. Harvard Medical School; Estados UnidosFil: Eklund, Aron Charles. Technical University of Denmark; Dinamarc

CONICET Digital

Frontiers - Publisher Connector

Public Library of Science (PLOS)

Biasogram: visualization of confounding technical bias in gene expression data.

Author: AC Eklund
Aron C. Eklund
BJ Daigle Jr
C Bartenhagen
D Venet
FM Giorgi
H Auer
HK Dressman
J Wang
JC Chang
JC Chang
JK Lee
KA Baggerly
KR Gabriel
KR Hess
Marcin Krzystanek
Q Li
S Dudoit
Xiaofeng Wang
Zoltan Szallasi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results

CiteSeerX