61 research outputs found

    Kernelized Infomax Clustering

    Get PDF
    We propose a simple information-theoretic clustering approach based on maximizing the mutual information I(\sfx,y) between the unknown cluster labels yy and the training patterns \sfx with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific (unknown) vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues or inverses of the Gram matrices, which makes it potentially attractive for clustering large data sets

    Computational Semantics with Functional Programming, by Jan van Eijck and Christina Unger

    Get PDF
    One of the fundamental tasks of science is to find explainable relationships between observed phenomena. One approach to this task that has received attention in recent years is based on probabilistic graphical modelling with sparsity constraints on model structures. In this paper, we describe two new approaches to Bayesian inference of sparse structures of Gaussian graphical models (GGMs). One is based on a simple modification of the cutting-edge block Gibbs sampler for sparse GGMs, which results in significant computational gains in high dimensions. The other method is based on a specific construction of the Hamiltonian Monte Carlo sampler, which results in further significant improvements. We compare our fully Bayesian approaches with the popular regularisation-based graphical LASSO, and demonstrate significant advantages of the Bayesian treatment under the same computing costs. We apply the methods to a broad range of simulated data sets, and a real-life financial data set

    Kernel multi-task learning using task-specific features

    Get PDF
    In this paper we are concerned with multitask learning when task-specific features are available. We describe two ways of achieving this using Gaussian process predictors: in the first method, the data from all tasks is combined into one dataset, making use of the task-specific features. In the second method we train specific predictors for each reference task, and then combine their predictions using a gating network. We demonstrate these methods on a compiler performance prediction problem, where a task is defined as predicting the speed-up obtained when applying a sequence of code transformations to a given program.

    Model Selection Approach Suggests Causal Association between 25-Hydroxyvitamin D and Colorectal Cancer

    Get PDF
    Vitamin D deficiency has been associated with increased risk of colorectal cancer (CRC), but causal relationship has not yet been confirmed. We investigate the direction of causation between vitamin D and CRC by extending the conventional approaches to allow pleiotropic relationships and by explicitly modelling unmeasured confounders.Plasma 25-hydroxyvitamin D (25-OHD), genetic variants associated with 25-OHD and CRC, and other relevant information was available for 2645 individuals (1057 CRC cases and 1588 controls) and included in the model. We investigate whether 25-OHD is likely to be causally associated with CRC, or vice versa, by selecting the best modelling hypothesis according to Bayesian predictive scores. We examine consistency for a range of prior assumptions.Model comparison showed preference for the causal association between low 25-OHD and CRC over the reverse causal hypothesis. This was confirmed for posterior mean deviances obtained for both models (11.5 natural log units in favour of the causal model), and also for deviance information criteria (DIC) computed for a range of prior distributions. Overall, models ignoring hidden confounding or pleiotropy had significantly poorer DIC scores.Results suggest causal association between 25-OHD and colorectal cancer, and support the need for randomised clinical trials for further confirmations

    Automated pathway and reaction prediction facilitates in silico identification of unknown metabolites in human cohort studies

    Get PDF
    Identification of metabolites in non-targeted metabolomics continues to be a bottleneck in metabolomics studies in large human cohorts. Unidentified metabolites frequently emerge in the results of association studies linking metabolite levels to, for example, clinical phenotypes. For further analyses these unknown metabolites must be identified. Current approaches utilize chemical information, such as spectral details and fragmentation characteristics to determine components of unknown metabolites. Here, we propose a systems biology model exploiting the internal correlation structure of metabolite levels in combination with existing biochemical and genetic information to characterize properties of unknown molecules. Levels of 758 metabolites (439 known, 319 unknown) in human blood samples of 2279 subjects were measured using a non-targeted metabolomics platform (LC-MS and GC-MS). We reconstructed the structure of biochemical pathways that are imprinted in these metabolomics data by building an empirical network model based on 1040 significant partial correlations between metabolites. We further added associations of these metabolites to 134 genes from genome-wide association studies as well as reactions and functional relations to genes from the public database Recon 2 to the network model. From the local neighborhood in the network, we were able to predict the pathway annotation of 180 unknown metabolites. Furthermore, we classified 100 pairs of known and unknown and 45 pairs of unknown metabolites to 21 types of reactions based on their mass differences. As a proof of concept, we then looked further into the special case of predicted dehydrogenation reactions leading us to the selection of 39 candidate molecules for 5 unknown metabolites. Finally, we could verify 2 of those candidates by applying LC-MS analyses of commercially available candidate substances. The formerly unknown metabolites X-13891 and X-13069 were shown to be 2-dodecendioic acid and 9-tetradecenoic acid, respectively. Our data-driven approach based on measured metabolite levels and genetic associations as well as information from public resources can be used alone or together with methods utilizing spectral patterns as a complementary, automated and powerful method to characterize unknown metabolites

    Biomarkers of rapid chronic kidney disease progression in type 2 diabetes.

    Get PDF
    Here we evaluated the performance of a large set of serum biomarkers for the prediction of rapid progression of chronic kidney disease (CKD) in patients with type 2 diabetes. We used a case-control design nested within a prospective cohort of patients with baseline eGFR 30-60 ml/min per 1.73 m(2). Within a 3.5-year period of Go-DARTS study patients, 154 had over a 40% eGFR decline and 153 controls maintained over 95% of baseline eGFR. A total of 207 serum biomarkers were measured and logistic regression was used with forward selection to choose a subset that were maximized on top of clinical variables including age, gender, hemoglobin A1c, eGFR, and albuminuria. Nested cross-validation determined the best number of biomarkers to retain and evaluate for predictive performance. Ultimately, 30 biomarkers showed significant associations with rapid progression and adjusted for clinical characteristics. A panel of 14 biomarkers increased the area under the ROC curve from 0.706 (clinical data alone) to 0.868. Biomarkers selected included fibroblast growth factor-21, the symmetric to asymmetric dimethylarginine ratio, β2-microglobulin, C16-acylcarnitine, and kidney injury molecule-1. Use of more extensive clinical data including prebaseline eGFR slope improved prediction but to a lesser extent than biomarkers (area under the ROC curve of 0.793). Thus we identified several novel associations of biomarkers with CKD progression and the utility of a small panel of biomarkers to improve prediction.We acknowledge all the SUMMIT partners (http://www.imi-summit.eu/) for their assistance with this project. This work was funded by the Innovative Medicine Initiative under grant agreement no. IMI/115006 (the SUMMIT consortium) and the Go-DARTS cohort was funded by the Chief Scientists Office Scotland.This is the accepted manuscript of a paper published in Kidney International (Looker et al., Kidney International, 2015 doi: 10.1038/ki.2015.199). The final version is available at http://dx.doi.org/10.1038/ki.2015.19

    Apolipoprotein CIII and N-terminal prohormone b-type natriuretic peptide as independent predictors for cardiovascular disease in type 2 diabetes

    Get PDF
    Background and aims: Developing sparse panels of biomarkers for cardiovascular disease in type 2 diabetes would enable risk stratification for clinical decision making and selection into clinical trials. We examined the individual and joint performance of five candidate biomarkers for incident cardiovascular disease (CVD) in type 2 diabetes that an earlier discovery study had yielded. Methods: Apolipoprotein CIII (apoCIII), N-terminal prohormone B-type natriuretic peptide (NT-proBNP), high sensitivity Troponin T (hsTnT), Interleukin-6, and Interleukin-15 were measured in baseline serum samples from the Collaborative Atorvastatin Diabetes trial (CARDS) of atorvastatin versus placebo. Among 2105 persons with type 2 diabetes and median age of 62.9 years (range 39.2–77.3), there were 144 incident CVD (acute coronary heart disease or stroke) cases during the maximum 5-year follow up. We used Cox Proportional Hazards models to identify biomarkers associated with incident CVD and the area under the receiver operating characteristic curves (AUROC) to assess overall model prediction. Results: Three of the biomarkers were singly associated with incident CVD independently of other risk factors; NT-proBNP (Hazard Ratio per standardised unit 2.02, 95% Confidence Interval [CI] 1.63, 2.50), apoCIII (1.34, 95% CI 1.12, 1.60) and hsTnT (1.40, 95% CI 1.16, 1.69). When combined in a single model, only NT-proBNP and apoCIII were independent predictors of CVD, together increasing the AUROC using Framingham risk variables from 0.661 to 0.745. Conclusions: The biomarkers NT-proBNP and apoCIII substantially increment the prediction of CVD in type 2 diabetes beyond that obtained with the variables used in the Framingham risk score

    Serum kidney injury molecule 1 and β2-microglobulin perform as well as larger biomarker panels for prediction of rapid decline in renal function in type 2 diabetes

    Get PDF
    Aims/hypothesis: As part of the Surrogate Markers for Micro- and Macrovascular Hard Endpoints for Innovative Diabetes Tools (SUMMIT) programme we previously reported that large panels of biomarkers derived from three analytical platforms maximised prediction of progression of renal decline in type 2 diabetes. Here, we hypothesised that smaller (n ≤ 5), platform-specific combinations of biomarkers selected from these larger panels might achieve similar prediction performance when tested in three additional type 2 diabetes cohorts. Methods: We used 657 serum samples, held under differing storage conditions, from the Scania Diabetes Registry (SDR) and Genetics of Diabetes Audit and Research Tayside (GoDARTS), and a further 183 nested case–control sample set from the Collaborative Atorvastatin in Diabetes Study (CARDS). We analysed 42 biomarkers measured on the SDR and GoDARTS samples by a variety of methods including standard ELISA, multiplexed ELISA (Luminex) and mass spectrometry. The subset of 21 Luminex biomarkers was also measured on the CARDS samples. We used the event definition of loss of >20% of baseline eGFR during follow-up from a baseline eGFR of 30–75 ml min−1 [1.73 m]−2. A total of 403 individuals experienced an event during a median follow-up of 7 years. We used discrete-time logistic regression models with tenfold cross-validation to assess association of biomarker panels with loss of kidney function. Results: Twelve biomarkers showed significant association with eGFR decline adjusted for covariates in one or more of the sample sets when evaluated singly. Kidney injury molecule 1 (KIM-1) and β2-microglobulin (B2M) showed the most consistent effects, with standardised odds ratios for progression of at least 1.4 (p < 0.0003) in all cohorts. A combination of B2M and KIM-1 added to clinical covariates, including baseline eGFR and albuminuria, modestly improved prediction, increasing the area under the curve in the SDR, Go-DARTS and CARDS by 0.079, 0.073 and 0.239, respectively. Neither the inclusion of additional Luminex biomarkers on top of B2M and KIM-1 nor a sparse mass spectrometry panel, nor the larger multiplatform panels previously identified, consistently improved prediction further across all validation sets. Conclusions/interpretation: Serum KIM-1 and B2M independently improve prediction of renal decline from an eGFR of 30–75 ml min−1 [1.73 m]−2 in type 2 diabetes beyond clinical factors and prior eGFR and are robust to varying sample storage conditions. Larger panels of biomarkers did not improve prediction beyond these two biomarkers
    • …
    corecore