6 research outputs found

    A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

    Get PDF
    BACKGROUND: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. METHODS: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. RESULTS: We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. CONCLUSION: Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets

    Secreted frizzled-related protein 4 predicts progression of autosomal dominant polycystic kidney disease

    No full text
    Autosomal dominant polycystic kidney disease (ADPKD) is a common autosomal dominant condition associated with renal cysts and development of renal failure. With the availability of potential therapies, one major obstacle remains the lack of readily available parameters that identify patients at risk for disease progression and/or determine the efficacy of therapeutic interventions within short observation periods. Increased total kidney volume (TKV) correlates with disease progression, but it remains unknown how accurate this parameter can predict disease progression at early stages. To identify additional parameters that help to stratify ADPKD patients, we measured secreted frizzled-related protein 4 (sFRP4) serum concentrations at baseline and over the course of 18 months in 429 ADPKD patients. Serum creatinine and sFRP4 as well as TKV increased over time, and were significantly different from baseline values within 1 year. Elevated sFRP4 levels at baseline predicted a more rapid decline of renal function at 2, 3 and 5 years suggesting that sFRP4 serum levels may provide additional information to identify ADPKD patients at risk for rapid disease progression
    corecore