191 research outputs found

    Notes on the Bioinformatics of Gene Patents

    Get PDF

    A model of large-scale proteome evolution

    Get PDF
    The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have shown that its topology displays a number of surprising features shared by other complex networks, both natural and artificial. In order to understand the origins of this topology and its evolutionary implications, we present a simple model of proteome evolution that is able to reproduce many of the observed statistical regularities reported from the analysis of the yeast proteome. Our results suggest that the observed patterns can be explained by a process of gene duplication and diversification that would evolve proteome networks under a selection pressure, favoring robustness against failure of its individual components

    Pathway level analysis of gene expression using singular value decomposition

    Get PDF
    BACKGROUND: A promising direction in the analysis of gene expression focuses on the changes in expression of specific predefined sets of genes that are known in advance to be related (e.g., genes coding for proteins involved in cellular pathways or complexes). Such an analysis can reveal features that are not easily visible from the variations in the individual genes and can lead to a picture of expression that is more biologically transparent and accessible to interpretation. In this article, we present a new method of this kind that operates by quantifying the level of 'activity' of each pathway in different samples. The activity levels, which are derived from singular value decompositions, form the basis for statistical comparisons and other applications. RESULTS: We demonstrate our approach using expression data from a study of type 2 diabetes and another of the influence of cigarette smoke on gene expression in airway epithelia. A number of interesting pathways are identified in comparisons between smokers and non-smokers including ones related to nicotine metabolism, mucus production, and glutathione metabolism. A comparison with results from the related approach, 'gene-set enrichment analysis', is also provided. CONCLUSION: Our method offers a flexible basis for identifying differentially expressed pathways from gene expression data. The results of a pathway-based analysis can be complementary to those obtained from one more focused on individual genes. A web program PLAGE (Pathway Level Analysis of Gene Expression) for performing the kinds of analyses described here is accessible at

    Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

    Get PDF
    BACKGROUND: In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, t(w )test, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made. RESULTS: In this article, we introduce an overdispersed log-linear model approach to analyzing SAGE; we evaluate and compare its performance with three other tests: the two-sample t test, t(w )test and another based on overdispersed logistic linear regression. Analysis of simulated and real datasets show that both the log-linear and logistic overdispersion methods generally perform better than the t and t(w )tests; the log-linear method is further found to have better performance than the logistic method, showing equal or higher statistical power over a range of parameter values and with different data distributions. CONCLUSION: Overdispersed log-linear models provide an attractive and reliable framework for analyzing SAGE experiments involving multiple libraries. For convenience, the implementation of this method is available through a user-friendly web-interface available at

    Flow: Statistics, visualization and informatics for flow cytometry

    Get PDF
    Flow is an open source software application for clinical and experimental researchers to perform exploratory data analysis, clustering and annotation of flow cytometric data. Flow is an extensible system that offers the ease of use commonly found in commercial flow cytometry software packages and the statistical power of academic packages like the R BioConductor project

    Improving peptide-MHC class I binding prediction for unbalanced datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Establishment of peptide binding to Major Histocompatibility Complex class I (MHCI) is a crucial step in the development of subunit vaccines and prediction of such binding could greatly reduce costs and accelerate the experimental process of identifying immunogenic peptides. Many methods have been applied to the prediction of peptide-MHCI binding, with some achieving outstanding performance. Because of the experimental methods used to measure binding or affinity between peptides and MHCI molecules, however, available datasets are enriched for nonbinders, and thus highly unbalanced. Although there is no consensus on the ideal class distribution for training sets, extremely unbalanced datasets can be detrimental to the performance of prediction algorithms.</p> <p>Results</p> <p>We have developed a decision-theoretic framework to construct cost-sensitive trees to predict peptide-MHCI binding and have used them to 1) Assess the impact of the training data's class distribution on classifier accuracy, and 2) Compare resampling and cost-sensitive methods as approaches to compensate for training data imbalance. Our results confirm that highly unbalanced training sets can reduce the accuracy of classifier predictions and show that, in the peptide-MHCI binding context, resampling methods do not improve the classifier performance. In contrast, cost-sensitive methods significantly improve accuracy of decision trees. Finally, we propose the use of a training scheme that, when the training set is enriched for nonbinders, consistently improves the overall classifier accuracy compared to cost-insensitive classifiers and, in particular, increases the sensitivity of the classifiers. This method minimizes the expected classification cost for large datasets.</p> <p>Conclusion</p> <p>Our method consistently improves the performance of decision trees in predicting peptide-MHC class I binding by using cost-balancing techniques to compensate for the imbalance in the training dataset.</p

    First Qualification Study of Serum Biomarkers as Indicators of Total Body Burden of Osteoarthritis

    Get PDF
    BACKGROUND: Osteoarthritis (OA) is a debilitating chronic multijoint disease of global proportions. OA presence and severity is usually documented by x-ray imaging but whole body imaging is impractical due to radiation exposure, time and cost. Systemic (serum or urine) biomarkers offer a potential alternative method of quantifying total body burden of disease but no OA-related biomarker has ever been stringently qualified to determine the feasibility of this approach. The goal of this study was to evaluate the ability of three OA-related biomarkers to predict various forms or subspecies of OA and total body burden of disease. METHODOLOGY/PRINCIPAL FINDINGS: Female participants (461) with clinical hand OA underwent radiography of hands, hips, knees and lumbar spine; x-rays were comprehensively scored for OA features of osteophyte and joint space narrowing. Three OA-related biomarkers, serum hyaluronan (sHA), cartilage oligomeric matrix protein (sCOMP), and urinary C-telopeptide of type II collagen (uCTX2), were measured by ELISA. sHA, sCOMP and uCTX2 correlated positively with total osteophyte burden in models accounting for demographics (age, weight, height): R(2) = 0.60, R(2) = 0.47, R(2) = 0.51 (all p<10(-6)); sCOMP correlated negatively with total joint space narrowing burden: R(2) = 0.69 (p<10(-6)). Biomarkers and demographics predicted 35-38% of variance in total burden of OA (total joint space narrowing or osteophyte). Joint size did not determine the contribution to the systemic biomarker concentration. Biomarker correlation with disease in the lumbar spine resembled that in the rest of the skeleton. CONCLUSIONS/SIGNIFICANCE: We have suspected that the correlation of systemic biomarkers with disease has been hampered by the inability to fully phenotype the burden of OA in a patient. These results confirm the hypothesis, revealed upon adequate patient phenotyping, that systemic joint tissue concentrations of several biomarkers can be quantitative indicators of specific subspecies of OA and of total body burden of disease
    • …
    corecore