47 research outputs found
Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study
A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints
Lipocalin 2 expression is associated with aggressive features of endometrial cancer
Background: Increased expression of lipocalin 2 (LCN2) has been observed in several cancers. The aim of the present study was to investigate LCN2 in endometrial cancer in relation to clinico-pathologic phenotype, angiogenesis, markers of epithelial-mesenchymal transition (EMT), and patient survival. Methods: Immunohistochemical staining was performed using a human LCN2 antibody on a population-based series of endometrial cancer patients collected in Hordaland County (Norway) during 1981-1990 (n = 256). Patients were followed from the time of primary surgery until death or last follow-up in 2007. The median follow-up time for survivors was 17 years. Gene expression data from a prospectively collected endometrial cancer series (n = 76) and a publicly available endometrial cancer series (n = 111) was used for gene correlation studies. Results: Expression of LCN2 protein, found in 49% of the cases, was associated with non-endometrioid histologic type (p = 0.001), nuclear grade 3 (p = 0.001), >50% solid tumor growth (p = 0.001), ER and PR negativity (p = 0.028 and 0.006), and positive EZH2 expression (p < 0.001). LCN2 expression was significantly associated with expression of VEGF-A (p = 0.021), although not with other angiogenesis markers examined (vascular proliferation index, glomeruloid microvascular proliferation, VEGF-C, VEGF-D or bFGF2 expression). Further, LCN2 was not associated with several EMT-related markers (E-cadherin, N-cadherin, P-cadherin, β-catenin), nor with vascular invasion (tumor cells invading lymphatic or blood vessels). Notably, LCN2 was significantly associated with distant tumor recurrences, as well as with the S100A family of metastasis related genes. Patients with tumors showing no LCN2 expression had the best outcome with 81% 5-year survival, compared to 73% for intermediate and 38% for the small subgroup with strong LCN2 staining (p = 0.007). In multivariate analysis, LCN2 expression was an independent prognostic factor in addition to histologic grade and FIGO stage. Conclusion: Increased LCN2 expression is associated with aggressive features and poor prognosis in endometrial cancer
Genetic aberration analysis in thai colorectal adenoma and early-stage adenocarcinoma patients by whole-exome sequencing
Colorectal adenomas are precursor lesions of colorectal adenocarcinoma. The transition from adenoma to carcinoma in patients with colorectal cancer (CRC) has been associated with an accumulation of genetic aberrations. However, criteria that can screen adenoma progression to adenocarcinoma are still lacking. This present study is the first attempt to identify genetic aberrations, such as the somatic mutations, copy number variations (CNVs), and high-frequency mutated genes, found in Thai patients. In this study, we identified the genomic abnormality of two sample groups. In the first group, five cases matched normal-colorectal adenoma-colorectal adenocarcinoma. In the second group, six cases matched normal-colorectal adenomas. For both groups, whole-exome sequencing was performed. We compared the genetic aberration of the two sample groups. In both normal tissues compared with colorectal adenoma and colorectal adenocarcinoma analyses, somatic mutations were observed in the tumor suppressor gene APC (Adenomatous polyposis coli) in eight out of ten patients. In the group of normal tissue comparison with colorectal adenoma tissue, somatic mutations were also detected in Catenin Beta 1 (CTNNB1), Family With Sequence Similarity 123B (FAM123B), F-Box And WD Repeat Domain Containing 7 (FBXW7), Sex-Determining Region Y-Box 9 (SOX9), Low-Density Lipoprotein Receptor-Related Protein 5 (LRP5), Frizzled Class Receptor 10 (FZD10), and AT-Rich Interaction Domain 1A (ARID1A) genes, which are involved in the Wingless-related integration site (Wnt) signaling pathway. In the normal tissue comparison with colorectal adenocarcinoma tissue, Kirsten retrovirus-associated DNA sequences (KRAS), Tumor Protein 53 (TP53), and Ataxia-Telangiectasia Mutated (ATM) genes are found in the receptor tyrosine kinase-RAS (RTK–RAS) signaling pathway and p53 signaling pathway, respectively. These results suggest that APC and TP53 may act as a potential screening marker for colorectal adenoma and early-stage CRC. This preliminary study may help identify patients with adenoma and early-stage CRC and may aid in establishing prevention and surveillance strategies to reduce the incidence of CRC
Dip in the gene pool: metagenomic survey of natural coccolithovirus communities
Despite the global oceanic distribution and recognised biogeochemical impact of coccolithoviruses (EhV), their diversity remains poorly understood. Here we employed a metagenomic approach to study the occurrence and progression of natural EhV community genomic variability. Analysis of EhV metagenomes from the early and late stages of an induced bloom led to three main discoveries. First, we observed resilient and specific genomic signatures in the EhV community associated with the Norwegian coast, which reinforce the existence of limitations to the capacity of dispersal and genomic exchange among EhV populations. Second, we identified a hyper-variable region (approximately 21kbp long) in the coccolithovirus genome. Third, we observed a clear trend for EhV relative amino-acid diversity to reduce from early to late stages of the bloom. This study validated two new methodological combinations, and proved very useful in the discovery of new genomic features associated with coccolithovirus natural communities
MethyCancer: the database of human DNA methylation and cancer
Cancer is ranked as one of the top killers in all human diseases and continues to have a devastating effect on the population around the globe. Current research efforts are aiming to accelerate our understanding of the molecular basis of cancer and develop effective means for cancer diagnostics, treatment and prognosis. An altered pattern of epigenetic modifications, most importantly DNA methylation events, plays a critical role in tumorigenesis through regulating oncogene activation, tumor suppressor gene silencing and chromosomal instability. To study interplay of DNA methylation, gene expression and cancer, we developed a publicly accessible database for human DNA Methylation and Cancer (MethyCancer, http://methycancer.genomics.org.cn). MethyCancer hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing. Interconnections between different data types were analyzed and presented. Furthermore, a powerful search tool is developed to provide user-friendly access to all the data and data connections. A graphical MethyView shows DNA methylation in context of genomics and genetics data facilitating the research in cancer to understand genetic and epigenetic mechanisms that make dramatic changes in gene expression of tumor cells
The cientificWorldJOURNAL Research Article Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study
A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints
Anti-cancer drug development: Computational strategies to identify and target proteins involved in cancer metabolism
Cancer remains a fundamental burden to public health despite substantial efforts aimed at developing effective chemotherapeutics and significant advances in chemotherapeutic regimens. The major challenge in anti-cancer drug design is to selectively target cancer cells with high specificity. Research into treating malignancies by targeting altered metabolism in cancer cells is supported by computational approaches, which can take a leading role in identifying candidate targets for anti-cancer therapy as well as assist in the discovery and optimisation of anti-cancer agents. Natural products appear to have privileged structures for anti-cancer drug development and the bulk of this particularly valuable chemical space still remains to be explored. In this review we aim to provide a comprehensive overview of current strategies for computer-guided anti-cancer drug development. We start with a discussion of state-of-the art bioinformatics methods applied to the identification of novel anti-cancer targets, including machine learning techniques, the Connectivity Map and biological network analysis. This is followed by an extensive survey of molecular modelling and cheminformatics techniques
employed to develop agents targeting proteins involved in the glycolytic, lipid, NAD+, mitochondrial (TCA cycle), amino acid and nucleic acid metabolism of cancer cells. A dedicated section highlights the most promising strategies to develop anti-cancer therapeutics from natural products and the role of metabolism and some of the many targets which are under investigation are reviewed. Recent success stories are reported for all the areas covered in this review. We conclude with a brief summary of the most interesting strategies identified and with an outlook on future directions in anti-cancer drug development