Search CORE

2,717 research outputs found

An Adaptive Clustering Algorithm for Gene Expression Time-Series Data Analysis

Author: Mangalakumar Naveen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2018
Field of study

Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. In this work, we propose a hierarchical clustering method used to separate dissimilar groups of genes in time-series data, which have the furthest distances from the rest of the genes throughout dierent time intervals. The isolated outliers(genes that trend dierently from other genes) can serve as potential biomarkers of breast cancer survivability. We partition the time axis (time points) into bins of length six months starting from 1-6 up to 337-342 month intervals and, for each gene, we average its expression level over all patients who appear in a survival bin. Gene expressions throughout those time points are cubic spline interpolated to create a trending prole for each gene. First, we universally align the gene expression proles to minimize the total area between them. Then, we cluster them using a sliding window approach and hierarchical clustering based on minimum vertical distances. To the best of our knowledge, this work is the rst time-series model that is built on the survival time of patients after the treatment. With this approach, we identied 46 genes (including 24 oncogenes and 18 tumor suppressor genes) as potential biomarkers of breast cancer survivability

Scholarship at UWindsor

Machine Learning Approaches for Cancer Analysis

Author: Abedalrhman Alkhateeb
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2018
Field of study

In addition, we propose many machine learning models that serve as contributions to solve a biological problem. First, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors, such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Studying the abundance of select mRNA species throughout prostate cancer progression may provide some insight into the molecular mechanisms that advance the disease. In the second contribution of this dissertation, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which show different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. We compare this model with standard hierarchical time-series clustering method based on Euclidean distance. Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (CAM2G gene) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with cancer, and in particular, prostate cancer. Further investigation of these outlier transcripts in prostate cancer may identify them as potential stage-specific biomarkers that can predict the progression of the disease. Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes. Our proposed model achieves 100% accuracy for predicting the breast cancer subtypes using the same or even fewer genes. Studying breast cancer survivability among different patients who received various treatments may help understand the relationship between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. Our classifier is a tree-based hierarchical approach that partitions breast cancer patients based on survivability classes; each node in the tree is associated with a treatment therapy and finds a predictive subset of genes that can best predict whether a given patient will survive after that particular treatment. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. We have further investigated the roles of many biomarkers through the literature. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Discovery of gene indicators can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability. In the last contribution, we introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have significant value in guiding treatment decisions. Our study also supports PTGFR, NREP, scaRNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease

Scholarship at UWindsor

Recommended from our members

Development of a cell-based lab-on-a-chip sensor for detection of oral cancer biomarkers

Author: Weigum Shannon Elise
Publication venue
Publication date: 01/08/2008
Field of study

textOral cancer is the sixth most common cancer worldwide and has been marked by high morbidity and poor survival rates that have changed little over the past few decades. Beyond prevention, early detection is the most crucial determinant for successful treatment and survival of cancer. Yet current methodologies for cancer diagnosis based upon pathological examination alone are insufficient for detecting early tumor progression and molecular transformation. Development of new diagnostic tools incorporating tumor biomarkers could enhance early detection by providing molecular-level insight into the biochemical and cellular changes associated with oral carcinogenesis. The work presented in this doctoral dissertation aims to address this clinical need through the development of new automated cellular analysis methods, incorporating lab-on-a-chip sensor techniques, for examination of molecular and morphological biomarkers associated with oral carcinogenesis. Using the epidermal growth factor receptor (EGFR) as a proof-of-principle biomarker, the sensor system demonstrated capacity to support rapid biomarker analysis in less than one-tenth the time of traditional methods and effectively characterized EGFR biomarker over-expression in oral tumor-derived cell lines. Successful extension from in vitro tumor cell lines to clinically relevant exfoliative brush cytology was demonstrated, providing a non-invasive method for sampling abnormal oral epithelium. Incorporation of exfoliative cytology further helped to define the important assay and imaging parameters necessary for dual molecular and morphological analysis in adherent epithelium. Next, this new sensor assay and method was applied in a small pilot study in order to secure an initial understanding of the diagnostic utility of such biosensor systems in clinical settings. Four cellular features were identified as useful indicators of cancerous or pre-cancerous conditions including, the nuclear area and diameter, nuclear-to-cytoplasm ratio, and EGFR biomarker expression. Further examination using linear regression and ROC curve analysis identified the morphological features as the best predictors of disease while a combination of all features may be ideal for classification of OSCC and pre-malignancy with high sensitivity and specificity. Further testing in a larger sample size is necessary to validate this regression model and the LOC sensor technique, but shows strong promise as a new diagnostic tool for early detection of oral cancer.Chemistry and Biochemistr

Texas ScholarWorks

UFFizi: a generic platform for ranking informative features

Author: Assaf Gottlieb
B Zhang
BJ Herron
C MéplanDagger
CL Tso
D Horn
D Talantov
David Horn
DL Donoho
DL Donoho
DW Huang
E Maestrini
EA Martorell
F Chu
G Dennis Jr
G Verhaegh
H Hellman
H Zou
Hellman-Feynmann
I Guyon
I Guyon
J Chen
J Herrero
JA Rothnagel
JG Dy
K Yamanishi
L Theresa
M Santala
M Santala
M Wall
MdBA Zoubi
Michal Linial
MM Breunig
N Dahiya
O Alter
P Jaccard
PA Devijver
PD Hodgson
PN Robinson
R Edgar
R Varshavsky
R Varshavsky
RA Maronna
Roy Varshavsky
RP Feynman
RS Barsoum
S Metcalfe
S Ramaswamy
T Barrett
V Hodge
WA Stahel
Y Chan
Y Saeys
Y Zhang
YS Lee
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Clinical Study of Saliva Metabolomics and Microbiomics in Respiratory Diseases

Author: Asandei Denisa Ramona
Publication venue
Publication date: 01/01/2018
Field of study

Aberystwyth Research Portal

Integrating gene expression profiling and clinical data

Author: Albanese Davide
Furlanello Cesare
Jurman Giuseppe
Merler Stefano
Paoli Silvano
Publication venue: Elsevier Inc.
Publication date: 31/01/2008
Field of study

AbstractWe propose a combination of machine learning techniques to integrate predictive profiling from gene expression with clinical and epidemiological data. Starting from BioDCV, a complete software setup for predictive classification and feature ranking without selection bias, we apply semisupervised profiling for detecting outliers and deriving informative subtypes of patients. During the profiling process, sampletracking curves are extracted, and then clustered according to a distance derived from dynamic time warping. Sampletracking allows also the identification of outlier cases, whose removal is shown to improve predictive accuracy and stability of derived gene profiles. Here we propose to employ clinical features to validate the semisupervising procedure. The procedure is demonstrated in the analysis of a liver cancer dataset of 213 samples described by 1993 genes and by pathological features

Elsevier - Publisher Connector

Archivio della ricerca - Fondazione Bruno Kessler

Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns

Author: A Dubrovsky
A Kuhn
AI Su
AJ Holloway
AJ Wagers
Armelle Magot
Audrey Bihouée
BR Zeeberg
BS Tseng
C Romualdi
C Thieblemont
C Workman
D Baron
D Baron
D Baron
D Baron
D Baron
D Baron
D Baron
D Ghosh
D Mirebeau-Prunier
Daniel Baron
DJ Lockhart
DN Grigoryev
DR Rhodes
DR Rhodes
DR Rhodes
DR Rhodes
E Calura
E Segal
E Segal
Emeric Dubois
EP Hoffman
EW Forgy
F Chalmel
F Pan
Frédérique Savagner
G Lamirault
G Parmigiani
Gérard Ramstein
H Fang
HK Lee
HM Wain
I Leguen
J Chen
J Lamb
J Wang
JC Newman
JC Newman
JE Larkin
JF Fontaine
JK Choi
JK Choi
JM Stuart
JN Haslett
JN Haslett
JN Haslett
K De Preter
K Wennmalm
KJ Mitchell
M Ashburner
M Bakay
M Pescatori
M Schena
Marja Steenman
MB Eisen
MJ de Hoon
O Larsson
O Larsson
O Troyanskaya
P Cahan
Philippe Jourdon
PJ Rousseeuw
PK Tan
R Chen
R Edgar
R Ihaka
R Jelier
R Mehra
RA Irizarry
Raluca Teusan
Reiner Veitia
RG Jenner
RS Stearman
Rémi Houlgatte
S Ramaswamy
S Tavazoie
SA McCarroll
TE Bertorini
TF Cox
TR Hughes
V Detours
WP Kuo
XJ Zhou
Y Moreau
Y Yi
Yann Péréon
YH Yang
YW Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc. Description We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (<it>Drosophila melanogaster, Caenorhabditis elegans</it>) and vertebrates (<it>Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus</it>). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies. Conclusions Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals