1,819 research outputs found
Recommended from our members
Integrating biomedical research and electronic health records to create knowledge-based biologically meaningful machine-readable embeddings.
In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine
Coanalysis of GWAS with eQTLs reveals disease-tissue associations.
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies
Recommended from our members
Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates.
ObjectiveMedical billing data are an attractive source of secondary analysis because of their ease of use and potential to answer population-health questions with statistical power. Although these datasets have known susceptibilities to biases, the degree to which they can distort the assessment of quality measures such as colorectal cancer screening rates are not widely appreciated, nor are their causes and possible solutions.MethodsUsing a billing code database derived from our institution's electronic health records, we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 years seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.ResultsOut of 4611 patients, an analysis of billing data suggested a 61% screening rate, an estimate that matches the estimate by the Centers for Disease Control. Manual review revealed a positive predictive value of 96% (86%-100%), negative predictive value of 21% (15%-29%) and a corrected screening rate of 85% (81%-90%). Most false negatives occurred due to examinations performed outside the scope of the database-both within and outside of our institution-but 21% of false negatives fell within the database's scope. False positives occurred due to incomplete examinations and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete examinations (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%) and patients declining screening (13%).ConclusionsBilling databases are prone to substantial bias that may go undetected even in the presence of confirmatory external estimates. Caution is recommended when performing population-level inference from these data. We propose several solutions to improve the use of these data for the assessment of healthcare quality
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Gene-network inference by message passing
The inference of gene-regulatory processes from gene-expression data belongs
to the major challenges of computational systems biology. Here we address the
problem from a statistical-physics perspective and develop a message-passing
algorithm which is able to infer sparse, directed and combinatorial regulatory
mechanisms. Using the replica technique, the algorithmic performance can be
characterized analytically for artificially generated data. The algorithm is
applied to genome-wide expression data of baker's yeast under various
environmental conditions. We find clear cases of combinatorial control, and
enrichment in common functional annotations of regulated genes and their
regulators.Comment: Proc. of International Workshop on Statistical-Mechanical Informatics
2007, Kyot
Likelihood ratios for genome medicine
Patients are beginning to present to healthcare providers with the results of high-throughput individualized genotyping, and interpreting these results in the context of the explosive growth of literature linking individual variants with disease may seem daunting. However, we suggest that results of a personal genomic analysis may be viewed as a panel of many tests for multiple diseases. By using well-established methods of evidence based medicine, these very many parallel tests may be combined using likelihood ratios to report a post-test probability of disease for use in patient assessment
Random matrix analysis of localization properties of Gene co-expression network
We analyze gene co-expression network under the random matrix theory
framework. The nearest neighbor spacing distribution of the adjacency matrix of
this network follows Gaussian orthogonal statistics of random matrix theory
(RMT). Spectral rigidity test follows random matrix prediction for a certain
range, and deviates after wards. Eigenvector analysis of the network using
inverse participation ratio (IPR) suggests that the statistics of bulk of the
eigenvalues of network is consistent with those of the real symmetric random
matrix, whereas few eigenvalues are localized. Based on these IPR calculations,
we can divide eigenvalues in three sets; (A) The non-degenerate part that
follows RMT. (B) The non-degenerate part, at both ends and at intermediate
eigenvalues, which deviate from RMT and expected to contain information about
{\it important nodes} in the network. (C) The degenerate part with
eigenvalue, which fluctuates around RMT predicted value. We identify nodes
corresponding to the dominant modes of the corresponding eigenvectors and
analyze their structural properties
- …