15 research outputs found
Dissimilarity for functional data clustering based on smoothing parameter commutation.
Many studies measure the same type of information longitudinally on the same subject at multiple time points, and clustering of such functional data has many important applications. We propose a novel and easy method to implement dissimilarity measure for functional data clustering based on smoothing splines and smoothing parameter commutation. This method handles data observed at regular or irregular time points in the same way. We measure the dissimilarity between subjects based on varying curve estimates with pairwise commutation of smoothing parameters. The intuition is that smoothing parameters of smoothing splines reflect the inverse of the signal-to-noise ratios and that when applying an identical smoothing parameter the smoothed curves for two similar subjects are expected to be close. Our method takes into account the estimation uncertainty using smoothing parameter commutation and is not strongly affected by outliers. It can also be used for outlier detection. The effectiveness of our proposal is shown by simulations comparing it to other dissimilarity measures and by a real application to methadone dosage maintenance levels
Gene selection with multiple ordering criteria
BACKGROUND: A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. RESULTS: We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. CONCLUSION: The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives
Recommended from our members
Dissimilarity for functional data clustering based on smoothing parameter commutation.
Many studies measure the same type of information longitudinally on the same subject at multiple time points, and clustering of such functional data has many important applications. We propose a novel and easy method to implement dissimilarity measure for functional data clustering based on smoothing splines and smoothing parameter commutation. This method handles data observed at regular or irregular time points in the same way. We measure the dissimilarity between subjects based on varying curve estimates with pairwise commutation of smoothing parameters. The intuition is that smoothing parameters of smoothing splines reflect the inverse of the signal-to-noise ratios and that when applying an identical smoothing parameter the smoothed curves for two similar subjects are expected to be close. Our method takes into account the estimation uncertainty using smoothing parameter commutation and is not strongly affected by outliers. It can also be used for outlier detection. The effectiveness of our proposal is shown by simulations comparing it to other dissimilarity measures and by a real application to methadone dosage maintenance levels
Resolution Adaptive Fixed Rank Kriging
<p>The spatial random effects model is flexible in modeling spatial covariance functions and is computationally efficient for spatial prediction via fixed rank kriging (FRK). However, the model depends on a class of basis functions, which if not selected properly, may result in unstable or undesirable results. Additionally, the maximum likelihood (ML) estimates of the model parameters are commonly computed using an expectation-maximization (EM) algorithm, which further limits its applicability when a large number of basis functions are required. In this research, we propose a class of basis functions extracted from thin-plate splines. The functions are ordered in terms of their degrees of smoothness with higher-order functions corresponding to larger-scale features and lower-order ones corresponding to smaller-scale details, leading to a parsimonious representation of a (nonstationary) spatial covariance function with the number of basis functions playing the role of spatial resolution. The proposed class of basis functions avoids the difficult knot-allocation or scale-selection problem. In addition, we show that ML estimates of the random effects covariance matrix can be expressed in simple closed forms, and hence the resulting FRK can accommodate a much larger number of basis functions without numerical difficulties. Finally, we propose to select the number of basis functions using Akaike’s information criterion, which also possesses a simple closed-form expression. The whole procedure, involving no additional tuning parameter, is efficient to compute, easy to program, automatic to implement, and applicable to massive amounts of spatial data even when they are sparsely and irregularly located. Proofs of the theorems and an R package <i>autoFRK</i> are provided in supplementary materials available online.</p