9 research outputs found
Applications of Machine Learning in Pharmacogenomics: Clustering Plasma Concentration-Time Curves
Pharmaceutical researchers are continually searching for techniques to
improve both drug development processes and patient outcomes. An area of recent
interest is the potential for machine learning (ML) applications within
pharmacology. One such application not yet given close study is the
unsupervised clustering of plasma concentration-time curves, hereafter,
pharmacokinetic (PK) curves. In this paper, we present our findings on how to
cluster PK curves by their similarity. Specifically, we find clustering to be
effective at identifying similar-shaped PK curves and informative for
understanding patterns within each cluster of PK curves. Because PK curves are
time series data objects, our approach utilizes the extensive body of research
related to the clustering of time series data as a starting point. As such, we
examine many dissimilarity measures between time series data objects to find
those most suitable for PK curves. We identify Euclidean distance as generally
most appropriate for clustering PK curves, and we further show that dynamic
time warping, Fr\'{e}chet, and structure-based measures of dissimilarity like
correlation may produce unexpected results. As an illustration, we apply these
methods in a case study with 250 PK curves used in a previous pharmacogenomic
study. Our case study finds that an unsupervised ML clustering with Euclidean
distance, without any subject genetic information, is able to independently
validate the same conclusions as the reference pharmacogenomic results. To our
knowledge, this is the first such demonstration. Further, the case study
demonstrates how the clustering of PK curves may generate insights that could
be difficult to perceive solely with population level summary statistics of PK
metrics.Comment: 38 pages, 14 figures, 3 table
Semiparametric Multivariate Accelerated Failure Time Model with Generalized Estimating Equations
The semiparametric accelerated failure time model is not as widely used as
the Cox relative risk model mainly due to computational difficulties. Recent
developments in least squares estimation and induced smoothing estimating
equations provide promising tools to make the accelerate failure time models
more attractive in practice. For semiparametric multivariate accelerated
failure time models, we propose a generalized estimating equation approach to
account for the multivariate dependence through working correlation structures.
The marginal error distributions can be either identical as in sequential event
settings or different as in parallel event settings. Some regression
coefficients can be shared across margins as needed. The initial estimator is a
rank-based estimator with Gehan's weight, but obtained from an induced
smoothing approach with computation ease. The resulting estimator is consistent
and asymptotically normal, with a variance estimated through a multiplier
resampling method. In a simulation study, our estimator was up to three times
as efficient as the initial estimator, especially with stronger multivariate
dependence and heavier censoring percentage. Two real examples demonstrate the
utility of the proposed method
Highly adaptive tests for group differences in brain functional connectivity
Resting-state functional magnetic resonance imaging (rs-fMRI) and other technologies have been offering evidence and insights showing that altered brain functional networks are associated with neurological illnesses such as Alzheimer's disease. Exploring brain networks of clinical populations compared to those of controls would be a key inquiry to reveal underlying neurological processes related to such illnesses. For such a purpose, group-level inference is a necessary first step in order to establish whether there are any genuinely disrupted brain subnetworks. Such an analysis is also challenging due to the high dimensionality of the parameters in a network model and high noise levels in neuroimaging data. We are still in the early stage of method development as highlighted by Varoquaux and Craddock (2013) that “there is currently no unique solution, but a spectrum of related methods and analytical strategies” to learn and compare brain connectivity. In practice the important issue of how to choose several critical parameters in estimating a network, such as what association measure to use and what is the sparsity of the estimated network, has not been carefully addressed, largely because the answers are unknown yet. For example, even though the choice of tuning parameters in model estimation has been extensively discussed in the literature, as to be shown here, an optimal choice of a parameter for network estimation may not be optimal in the current context of hypothesis testing. Arbitrarily choosing or mis-specifying such parameters may lead to extremely low-powered tests. Here we develop highly adaptive tests to detect group differences in brain connectivity while accounting for unknown optimal choices of some tuning parameters.
The proposed tests combine statistical evidence against a null hypothesis from multiple sources across a range of plausible tuning parameter values reflecting uncertainty with the unknown truth. These highly adaptive tests are not only easy to use, but also high-powered robustly across various scenarios. The usage and advantages of these novel tests are demonstrated on an Alzheimer's disease dataset and simulated data
Recommended from our members
Highly adaptive tests for group differences in brain functional connectivity.
Resting-state functional magnetic resonance imaging (rs-fMRI) and other technologies have been offering evidence and insights showing that altered brain functional networks are associated with neurological illnesses such as Alzheimer's disease. Exploring brain networks of clinical populations compared to those of controls would be a key inquiry to reveal underlying neurological processes related to such illnesses. For such a purpose, group-level inference is a necessary first step in order to establish whether there are any genuinely disrupted brain subnetworks. Such an analysis is also challenging due to the high dimensionality of the parameters in a network model and high noise levels in neuroimaging data. We are still in the early stage of method development as highlighted by Varoquaux and Craddock (2013) that "there is currently no unique solution, but a spectrum of related methods and analytical strategies" to learn and compare brain connectivity. In practice the important issue of how to choose several critical parameters in estimating a network, such as what association measure to use and what is the sparsity of the estimated network, has not been carefully addressed, largely because the answers are unknown yet. For example, even though the choice of tuning parameters in model estimation has been extensively discussed in the literature, as to be shown here, an optimal choice of a parameter for network estimation may not be optimal in the current context of hypothesis testing. Arbitrarily choosing or mis-specifying such parameters may lead to extremely low-powered tests. Here we develop highly adaptive tests to detect group differences in brain connectivity while accounting for unknown optimal choices of some tuning parameters. The proposed tests combine statistical evidence against a null hypothesis from multiple sources across a range of plausible tuning parameter values reflecting uncertainty with the unknown truth. These highly adaptive tests are not only easy to use, but also high-powered robustly across various scenarios. The usage and advantages of these novel tests are demonstrated on an Alzheimer's disease dataset and simulated data
Additional file 1 of An adaptive association test for microbiome data
Seven supporting figures and one supporting table. A description of each is given within the file. (PDF 3778 kb