Search CORE

191 research outputs found

Notes on the Bioinformatics of Gene Patents

Author: Kepler Thomas B.
Publication venue: US Patent and Trademark Office
Publication date
Field of study

A model of large-scale proteome evolution

Author: Kepler Thomas B.
Pastor-Satorras Romualdo
Smith Eric
Sole Ricard V.
Publication venue
Publication date: 01/01/2002
Field of study

The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have shown that its topology displays a number of surprising features shared by other complex networks, both natural and artificial. In order to understand the origins of this topology and its evolutionary implications, we present a simple model of proteome evolution that is able to reproduce many of the observed statistical regularities reported from the analysis of the yeast proteome. Our results suggest that the observed patterns can be explained by a process of gene duplication and diversification that would evolve proteome networks under a selection pressure, favoring robustness against failure of its individual components

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Large-scale analysis of human heavy chain V(D)J recombination patterns

Author: Kepler Thomas B
Volpe Joseph M
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Springer - Publisher Connector

PubMed Central

Genetic correlates of autoreactivity and autoreactive potential in human Ig heavy chains

Author: Kepler Thomas B
Volpe Joseph M
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Pathway level analysis of gene expression using singular value decomposition

Author: Kepler Thomas B
Lu Jun
Tomfohr John
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: A promising direction in the analysis of gene expression focuses on the changes in expression of specific predefined sets of genes that are known in advance to be related (e.g., genes coding for proteins involved in cellular pathways or complexes). Such an analysis can reveal features that are not easily visible from the variations in the individual genes and can lead to a picture of expression that is more biologically transparent and accessible to interpretation. In this article, we present a new method of this kind that operates by quantifying the level of 'activity' of each pathway in different samples. The activity levels, which are derived from singular value decompositions, form the basis for statistical comparisons and other applications. RESULTS: We demonstrate our approach using expression data from a study of type 2 diabetes and another of the influence of cigarette smoke on gene expression in airway epithelia. A number of interesting pathways are identified in comparisons between smokers and non-smokers including ones related to nicotine metabolism, mucus production, and glutathione metabolism. A comparison with results from the related approach, 'gene-set enrichment analysis', is also provided. CONCLUSION: Our method offers a flexible basis for identifying differentially expressed pathways from gene expression data. The results of a pathway-based analysis can be complementary to those obtained from one more focused on individual genes. A web program PLAGE (Pathway Level Analysis of Gene Expression) for performing the kinds of analyses described here is accessible at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Author: Kepler Thomas B
Lu Jun
Tomfohr John K
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, t(w )test, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made. RESULTS: In this article, we introduce an overdispersed log-linear model approach to analyzing SAGE; we evaluate and compare its performance with three other tests: the two-sample t test, t(w )test and another based on overdispersed logistic linear regression. Analysis of simulated and real datasets show that both the log-linear and logistic overdispersion methods generally perform better than the t and t(w )tests; the log-linear method is further found to have better performance than the logistic method, showing equal or higher statistical power over a range of parameter values and with different data distributions. CONCLUSION: Overdispersed log-linear models provide an attractive and reliable framework for analyzing SAGE experiments involving multiple libraries. For convenience, the implementation of this method is available through a user-friendly web-interface available at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Flow: Statistics, visualization and informatics for flow cytometry

Author: Biomed Central
Cliburn Chan
Jacob Frelinger
Thomas B Kepler
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Flow is an open source software application for clinical and experimental researchers to perform exploratory data analysis, clustering and annotation of flow cytometric data. Flow is an extensible system that offers the ease of use commonly found in commercial flow cytometry software packages and the statistical power of academic packages like the R BioConductor project

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improving peptide-MHC class I binding prediction for unbalanced datasets

Author: Kepler Thomas B
Sales Ana Paula
Tomaras Georgia D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Establishment of peptide binding to Major Histocompatibility Complex class I (MHCI) is a crucial step in the development of subunit vaccines and prediction of such binding could greatly reduce costs and accelerate the experimental process of identifying immunogenic peptides. Many methods have been applied to the prediction of peptide-MHCI binding, with some achieving outstanding performance. Because of the experimental methods used to measure binding or affinity between peptides and MHCI molecules, however, available datasets are enriched for nonbinders, and thus highly unbalanced. Although there is no consensus on the ideal class distribution for training sets, extremely unbalanced datasets can be detrimental to the performance of prediction algorithms. Results We have developed a decision-theoretic framework to construct cost-sensitive trees to predict peptide-MHCI binding and have used them to 1) Assess the impact of the training data's class distribution on classifier accuracy, and 2) Compare resampling and cost-sensitive methods as approaches to compensate for training data imbalance. Our results confirm that highly unbalanced training sets can reduce the accuracy of classifier predictions and show that, in the peptide-MHCI binding context, resampling methods do not improve the classifier performance. In contrast, cost-sensitive methods significantly improve accuracy of decision trees. Finally, we propose the use of a training scheme that, when the training set is enriched for nonbinders, consistently improves the overall classifier accuracy compared to cost-insensitive classifiers and, in particular, increases the sensitivity of the classifiers. This method minimizes the expected classification cost for large datasets. Conclusion Our method consistently improves the performance of decision trees in predicting peptide-MHC class I binding by using cost-balancing techniques to compensate for the imbalance in the training dataset.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

First Qualification Study of Serum Biomarkers as Indicators of Total Body Burden of Osteoarthritis

Author: Joanne Jordan
Jordan Renner
Thomas B. Kepler
Thomas Stabler
Virginia B. Kraus
William Taylor
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Osteoarthritis (OA) is a debilitating chronic multijoint disease of global proportions. OA presence and severity is usually documented by x-ray imaging but whole body imaging is impractical due to radiation exposure, time and cost. Systemic (serum or urine) biomarkers offer a potential alternative method of quantifying total body burden of disease but no OA-related biomarker has ever been stringently qualified to determine the feasibility of this approach. The goal of this study was to evaluate the ability of three OA-related biomarkers to predict various forms or subspecies of OA and total body burden of disease. METHODOLOGY/PRINCIPAL FINDINGS: Female participants (461) with clinical hand OA underwent radiography of hands, hips, knees and lumbar spine; x-rays were comprehensively scored for OA features of osteophyte and joint space narrowing. Three OA-related biomarkers, serum hyaluronan (sHA), cartilage oligomeric matrix protein (sCOMP), and urinary C-telopeptide of type II collagen (uCTX2), were measured by ELISA. sHA, sCOMP and uCTX2 correlated positively with total osteophyte burden in models accounting for demographics (age, weight, height): R(2) = 0.60, R(2) = 0.47, R(2) = 0.51 (all p<10(-6)); sCOMP correlated negatively with total joint space narrowing burden: R(2) = 0.69 (p<10(-6)). Biomarkers and demographics predicted 35-38% of variance in total burden of OA (total joint space narrowing or osteophyte). Joint size did not determine the contribution to the systemic biomarker concentration. Biomarker correlation with disease in the lumbar spine resembled that in the rest of the skeleton. CONCLUSIONS/SIGNIFICANCE: We have suspected that the correlation of systemic biomarkers with disease has been hampered by the inability to fully phenotype the burden of OA in a patient. These results confirm the hypothesis, revealed upon adequate patient phenotyping, that systemic joint tissue concentrations of several biomarkers can be quantitative indicators of specific subspecies of OA and of total body burden of disease

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DukeSpace

Carolina Digital Repository