131 research outputs found
Extracting and explaining biological knowledge in microarray data
© Springer-Verlag Berlin Heidelberg 2004. This paper describes a method of clustering lists of genes mined from a microarray dataset using functional information from the Gene Ontology. The method uses relationships between terms in the ontology both to build clusters and to extract meaningful cluster descriptions. The approach is general and may be applied to assist explanation of other datasets associated with ontologies
SMCKAT, a Sequential Multi-Dimensional CNV Kernel-Based Association Test.
Copy number variants (CNVs) are the most common form of structural genetic variation, reflecting the gain or loss of DNA segments compared with a reference genome. Studies have identified CNV association with different diseases. However, the association between the sequential order of CNVs and disease-related traits has not been studied, to our knowledge, and it is still unclear that CNVs function individually or whether they work in coordination with other CNVs to manifest a disease or trait. Consequently, we propose the first such method to test the association between the sequential order of CNVs and diseases. Our sequential multi-dimensional CNV kernel-based association test (SMCKAT) consists of three parts: (1) a single CNV group kernel measuring the similarity between two groups of CNVs; (2) a whole genome group kernel that aggregates several single group kernels to summarize the similarity between CNV groups in a single chromosome or the whole genome; and (3) an association test between the CNV sequential order and disease-related traits using a random effect model. We evaluate SMCKAT on CNV data sets exhibiting rare or common CNVs, demonstrating that it can detect specific biologically relevant chromosomal regions supported by the biomedical literature. We compare the performance of SMCKAT with MCKAT, a multi-dimensional kernel association test. Based on the results, SMCKAT can detect more specific chromosomal regions compared with MCKAT that not only have CNV characteristics, but the CNV order on them are significantly associated with the disease-related trait
A balanced iterative random forest for gene selection from microarray data
Background: The wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging
Comparison of visualization methods of genome-wide SNP profiles in childhood acute lymphoblastic leukaemia
Data mining and knowledge discovery have been applied to datasets in various industries including biomedical data. Modelling, data mining and visualization in biomedical data address the problem of extracting knowledge from large and complex biomedical data. The current challenge of dealing with such data is to develop statistical-based and data mining methods that search and browse the underlying patterns within the data. In this paper, we employ several data reduction methods for visualizing genome- wide Single Nucleotide Polymorphism (SNP) datasets based on state-of-art data reduction techniques. Visualization approach has been selected based on the trustworthiness of the resultant visualizations. To deal with large amounts of genetic variation data, we have chosen to apply different data reduction methods to deal with the problem induced by high dimensionality. Based on the trustworthiness metric we found that neighbour Retrieval Visualizer (NeRV) outperformed other methods. This method optimizes the retrieval quality of Stochastic neighbour Embedding. The quality measure of the visualization (i.e. NeRV) showed excellent results, even though the dataset was reduced from 13917 to 2 dimensions. The visualization results will assist clinicians and biomedical researchers in understanding the systems biology of patients and how to compare different groups of clusters in visualizations. © 2008, Australian Computer Society, Inc
Convolutional deep belief network with feature encoding for classification of neuroblastoma histological images
© 2018 Journal of Pathology Informatics. Background: Neuroblastoma is the most common extracranial solid tumor in children younger than 5 years old. Optimal management of neuroblastic tumors depends on many factors including histopathological classification. The gold standard for classification of neuroblastoma histological images is visual microscopic assessment. In this study, we propose and evaluate a deep learning approach to classify high-resolution digital images of neuroblastoma histology into five different classes determined by the Shimada classification. Subjects and Methods: We apply a combination of convolutional deep belief network (CDBN) with feature encoding algorithm that automatically classifies digital images of neuroblastoma histology into five different classes. We design a three-layer CDBN to extract high-level features from neuroblastoma histological images and combine with a feature encoding model to extract features that are highly discriminative in the classification task. The extracted features are classified into five different classes using a support vector machine classifier. Data: We constructed a dataset of 1043 neuroblastoma histological images derived from Aperio scanner from 125 patients representing different classes of neuroblastoma tumors. Results: The weighted average F-measure of 86.01% was obtained from the selected high-level features, outperforming state-of-the-art methods. Conclusion: The proposed computer-aided classification system, which uses the combination of deep architecture and feature encoding to learn high-level features, is highly effective in the classification of neuroblastoma histological images
MCKAT: a multi-dimensional copy number variant kernel association test.
BACKGROUND: Copy number variants (CNVs) are the gain or loss of DNA segments in the genome. Studies have shown that CNVs are linked to various disorders, including autism, intellectual disability, and schizophrenia. Consequently, the interest in studying a possible association of CNVs to specific disease traits is growing. However, due to the specific multi-dimensional characteristics of the CNVs, methods for testing the association between CNVs and the disease-related traits are still underdeveloped. We propose a novel multi-dimensional CNV kernel association test (MCKAT) in this paper. We aim to find significant associations between CNVs and disease-related traits using kernel-based methods. RESULTS: We address the multi-dimensionality in CNV characteristics. We first design a single pair CNV kernel, which contains three sub-kernels to summarize the similarity between two CNVs considering all CNV characteristics. Then, aggregate single pair CNV kernel to the whole chromosome CNV kernel, which summarizes the similarity between CNVs in two or more chromosomes. Finally, the association between the CNVs and disease-related traits is evaluated by comparing the similarity in the trait with kernel-based similarity using a score test in a random effect model. We apply MCKAT on genome-wide CNV datasets to examine the association between CNVs and disease-related traits, which demonstrates the potential usefulness the proposed method has for the CNV association tests. We compare the performance of MCKAT with CKAT, a uni-dimensional kernel method. Based on the results, MCKAT indicates stronger evidence, smaller p-value, in detecting significant associations between CNVs and disease-related traits in both rare and common CNV datasets. CONCLUSION: A multi-dimensional copy number variant kernel association test can detect statistically significant associated CNV regions with any disease-related trait. MCKAT can provide biologists with CNV hot spots at the cytogenetic band level that CNVs on them may have a significant association with disease-related traits. Using MCKAT, biologists can narrow their investigation from the whole genome, including many genes and CNVs, to more specific cytogenetic bands that MCKAT identifies. Furthermore, MCKAT can help biologists detect significantly associated CNVs with disease-related traits across a patient group instead of examining each subject's CNVs case by case
Increased Efficacy of Histone Methyltransferase G9a Inhibitors Against <i>MYCN</i>-Amplified Neuroblastoma.
Targeted inhibition of proteins modulating epigenetic changes is an increasingly important priority in cancer therapeutics, and many small molecule inhibitors are currently being developed. In the case of neuroblastoma (NB), a pediatric solid tumor with a paucity of intragenic mutations, epigenetic deregulation may be especially important. In this study we validate the histone methyltransferase G9a/EHMT2 as being associated with indicators of poor prognosis in NB. Immunological analysis of G9a protein shows it to be more highly expressed in NB cell-lines with MYCN amplification, which is a primary determinant of dismal outcome in NB patients. Furthermore, G9a protein in primary tumors is expressed at higher levels in poorly differentiated/undifferentiated NB, and correlates with high EZH2 expression, a known co-operative oncoprotein in NB. Our functional analyses demonstrate that siRNA-mediated G9a depletion inhibits cell growth in all NB cell lines, but, strikingly, only triggers apoptosis in NB cells with MYCN amplification, suggesting a synthetic lethal relationship between G9a and MYCN. This pattern of sensitivity is also evident when using small molecule inhibitors of G9a, UNC0638, and UNC0642. The increased efficacy of G9a inhibition in the presence of MYCN-overexpression is also demonstrated in the SHEP-21N isogenic model with tet-regulatable MYCN. Finally, using RNA sequencing, we identify several potential tumor suppressor genes that are reactivated by G9a inhibition in NB, including the CLU, FLCN, AMHR2, and AKR1C1-3. Together, our study underlines the under-appreciated role of G9a in NB, especially in MYCN-amplified tumors
Can Archival Tissue Reveal Answers to Modern Research Questions?: Computer-Aided Histological Assessment of Neuroblastoma Tumours Collected over 60 Years.
Despite neuroblastoma being the most common extracranial solid cancer in childhood, it is still a rare disease. Consequently, the unavailability of tissue for research limits the statistical power of studies. Pathology archives are possible sources of rare tissue, which, if proven to remain consistent over time, could prove useful to research of rare disease types. We applied immunohistochemistry to investigate whether long term storage caused any changes to antigens used diagnostically for neuroblastoma. We constructed and quantitatively assessed a tissue microarray containing neuroblastoma archival material dating between 1950 and 2007. A total of 119 neuroblastoma tissue cores were included spanning 6 decades. Fourteen antibodies were screened across the tissue microarray (TMA). These included seven positive neuroblastoma diagnosis markers (NB84, Chromogranin A, NSE, Ki-67, INI1, Neurofilament Protein, Synaptophysin), two anticipated to be negative (S100A, CD99), and five research antibodies (IL-7, IL-7R, JAK1, JAK3, STAT5). The staining of these antibodies was evaluated using Aperio ImageScope software along with novel pattern recognition and quantification algorithms. This analysis demonstrated that marker signal intensity did not decrease over time and that storage for 60 years had little effect on antigenicity. The construction and assessment of this neuroblastoma TMA has demonstrated the feasibility of using archival samples for research
- …