658 research outputs found
Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study
Background. The identification of a location-, scale- and shape-sensitive test to detect differentially expressed features between two comparison groups represents a key point in high dimensional studies. The most commonly used tests refer to differences in location, but general distributional discrepancies might be important to reveal differential biological processes.
Methods. A simulation study was conducted to compare the performance of a set of two-sample tests, i.e. Student's t, Welch's t, Wilcoxon-Mann-Whitney, Podgor-Gastwirth PG2, Cucconi, Kolmogorov-Smirnov (KS), Cramer-von Mises (CvM), Anderson-Darling (AD) and Zhang tests (ZK, ZC and ZA) which were investigated under different distributional patterns. We applied the same tests to a real data example.
Results. AD, CvM, ZA and ZC tests proved to be the most sensitive tests in mixture distribution patterns, while still maintaining a high power in normal distribution patterns. At best, the AD test showed a loss in power of ~ 2% in the comparison of two normal distributions, but a gain of ~ 32% with mixture distributions respect to the parametric tests. Accordingly, the AD test detected the greatest number of differentially expressed features in the real data application.
Conclusion. The tests for the general two-sample problem introduce a more general concept of 'differential expression', thus overcoming the limitations of the other tests restricted to specific moments of the feature distributions. In particular, the AD test should be considered as a powerful alternative to the parametric tests for feature screening in order to keep as many discriminative features as possible for the class prediction analysis
Cholesterol de novo biosynthesis: a promising target to overcome the resistance to aromatase inhibitors in postmenopausal patients with estrogen receptor-positive breast cancer
Aim: Cholesterol is an essential component of cell membranes and serves as a precursor for several bioactive molecules, including steroid hormones and isoprenoids. Generally supplied by the bloodstream, the de novo cholesterol biosynthesis is activated in response to an increased cell requirement due to normal tissue remodeling or tumor proliferation. In estrogen receptor (ER)-positive breast cancers, cholesterol biosynthesis may promote and sustain tumor growth and concur with the failure of the treatment with aromatase inhibitors. Methods: In this study, the comparison of gene compared the expression involved in cholesterol biosynthesis was conducted in ER-positive tumors that were responsive and nonresponsive to letrozole; besides, an exploration of their association with genes implicated in estrogen production, the Hippo pathway, and cell cycle control was performed. Results: In responsive tumors, letrozole significantly decreased the expression of five genes [acetyl-coenzyme A (CoA) acetyltransferase 2 (ACAT2), 3-hydroxy-3-methylglutaryl-CoA synthase 1 (HMGCS1), 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR), farnesyl diphosphate synthase (FDPS), and squalene epoxidase (SQLE)] crucial for the biosynthetic process. Conversely, in nonresponsive tumors, these genes were unaffected by letrozole but associated with several genes involved in estrogens production [cytochrome P450 family 19 subfamily A member 1 (CYP19A1), hydroxysteroid 17-beta dehydrogenase 2 (HSD17B2), and sulfotransferase family 1A member 1 (SULT1A1)], cell cycle [control cyclin dependent kinase 4 (CDK4) and CDK6], and Hippo pathway [Yes1 associated transcriptional regulator (YAP1) and baculoviral inhibitor of apoptosis (IAP) repeat containing 5 (BIRC5)]. Conclusions: The findings corroborated the notion that the dysregulation of the mevalonate pathway may contribute to the resistance to letrozole and supported the use of statins to contrast this metabolic dysfunction
Validation of Gene Expression Profiles in Genomic Data through Complementary Use of Cluster Analysis and PCA-Related Biplots
High-throughput genomic assays are used in molecular biology to explore patterns of joint expression of thousands of genes.
These methodologies had relevant developments in the last decade, and concurrently there was a need for appropriate methods for analyzing the massive data generated.
Identifying sets of genes and samples characterized by similar values of expression and validating these results are two critical issues related to these investigations because of their clinical implication. From a statistical perspective, unsupervised class discovery methods like Cluster Analysis are generally adopted.
However, the use of Cluster Analysis mainly relies on the use of hierarchical techniques without considering possible use of other methods. This is partially due to software availability and to easiness of representation of results through a heatmap, which allows to simultaneously visualize clusterization of genes and samples on the same graphical device. One drawback of this strategy is that clusters' stability is often neglected, thus leading to over-interpretation of results.
Moreover, validation of results using external datasets is still subject of discussion, since it is well known that batch effects may condition gene expression results even after normalization.
In this paper we compared several clustering algorithms (hierarchical, k-means, model-based, Affinity Propagation) and stability indices to discover common patterns of expression and to assess clustering reliability, and propose a rank-based passive projection of Principal Components for validation purposes.
Results from a study involving 23 tumor cell lines and 76 genes related to a specific biological pathway and derived from a publicly available dataset, are presented
Cell Polarity, Epithelial-Mesenchymal Transition, and Cell-Fate Decision Gene Expression in Ductal Carcinoma In Situ
Loss of epithelial cell identity and acquisition of mesenchymal features are early events in the neoplastic transformation of mammary cells. We investigated the pattern of expression of a selected panel of genes associated with cell polarity and apical junction complex or involved in TGF-β-mediated epithelial-mesenchymal transition and cell-fate decision in a series of DCIS and corresponding patient-matched normal tissue. Additionally, we compared DCIS gene profile with that of atypical ductal hyperplasia (ADH) from the same patient. Statistical analysis identified a “core” of genes differentially expressed in both precursors with respect to the corresponding normal tissue mainly associated with a terminally differentiated luminal estrogen-dependent phenotype, in agreement with the model according to which ER-positive invasive breast cancer derives from ER-positive progenitor cells, and with an autocrine production of estrogens through androgens conversion. Although preliminary, present findings provide transcriptomic confirmation that, at least for the panel of genes considered in present study, ADH and DCIS are part of a tumorigenic multistep process and strongly arise the necessity for the regulation, maybe using aromatase inhibitors, of the intratumoral and/or circulating concentration of biologically active androgens in DCIS patients to timely hamper abnormal estrogens production and block estrogen-induced cell proliferation
Cancer profiles by affinity propagation
The affinity propagation algorithm is applied to a problem of breast cancer subtyping using traditional biologic markers. The algorithm provides a procedure to determine the number of profiles to be considered.
A well know breast cancer case series was used to compare the results of the affinity propagation with the results obtained with standard algorithms and indexes for the optimal choice of the number of clusters.
Results from affinity propagation are consistent with the results already obtained having the advantage of providing an indication about the number of clusters
Chapter Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models
In survival analysis, time-varying covariates are endogenous when their measurements are directly related to the event status and incomplete information occur at random points during the follow-up. Consequently, the time-dependent Cox model leads to biased estimates. Joint models (JM) allow to correctly estimate these associations combining a survival and longitudinal sub-models by means of a shared parameter (i.e., random effects of the longitudinal sub-model are inserted in the survival one). This study aims at showing the use of JM to evaluate the association between a set of inflammatory biomarkers and Covid-19 mortality. During Covid-19 pandemic, physicians at Istituto Clinico di Città Studi in Milan collected biomarkers (endogenous time-varying covariates) to understand what might be used as prognostic factors for mortality. Furthermore, in the first epidemic outbreak, physicians did not have standard clinical protocols for management of Covid-19 disease and measurements of biomarkers were highly incomplete especially at the baseline. Between February and March 2020, a total of 403 COVID-19 patients were admitted. Baseline characteristics included sex and age, whereas biomarkers measurements, during hospital stay, included log-ferritin, log-lymphocytes, log-neutrophil granulocytes, log-C-reactive protein, glucose and LDH. A Bayesian approach using Markov chain Monte Carlo algorithm were used for fitting JM. Independent and non-informative priors for the fixed effects (age and sex) and for shared parameters were used. Hazard ratios (HR) from a (biased) time-dependent Cox and joint models for log-ferritin levels were 2.10 (1.67-2.64) and 1.73 (1.38-2.20), respectively. In multivariable JM, doubling of biomarker levels resulted in a significantly increase of mortality risk for log-neutrophil granulocytes, HR=1.78 (1.16-2.69); for log-C-reactive protein, HR=1.44 (1.13-1.83); and for LDH, HR=1.28 (1.09-1.49). Increasing of 100 mg/dl of glucose resulted in a HR=2.44 (1.28-4.26). Age, however, showed the strongest effect with mortality risk starting to rise from 60 years
A "non-parametric" version of the naive Bayes classifier
Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed
- …