14,079 research outputs found

    Algorithms Implemented for Cancer Gene Searching and Classifications

    Get PDF
    Understanding the gene expression is an important factor to cancer diagnosis. One target of this understanding is implementing cancer gene search and classification methods. However, cancer gene search and classification is a challenge in that there is no an obvious exact algorithm that can be implemented individually for various cancer cells. In this paper a research is con-ducted through the most common top ranked algorithms implemented for cancer gene search and classification, and how they are implemented to reach a better performance. The paper will distinguish algorithms implemented for Bio image analysis for cancer cells and algorithms implemented based on DNA array data. The main purpose of this paper is to explore a road map towards presenting the most current algorithms implemented for cancer gene search and classification

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

    Get PDF
    Biomarkers which predict patient’s survival can play an important role in medical diagnosis and treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time

    Computational methods for the identification of genetic variants in complex diseases

    Get PDF
    Dissertação de mestrado em BioinformáticaComplex diseases, as Type 2 Diabetes, are not only affected by environmental factors but also by genetic factors involving multiple variants and their interactions. Even so, the known risk factors are not suffi cient to predict the manifestation of the disease. Some of these can be discovered with Genome-Wide Association Studies that detect associations between variants, such as Single-Nucleotide Polymorphisms, and phenotypes, but other approaches, like Machine Learning, are needed to identify their effects and interactions. Even though these methods can identify important patterns and produce good results, they are changeling to interpret. In this project, we developed a predictor for complex diseases that uses datasets from Genome-Wide Association Studies to help the identification of new genetic markers associated with Type 2 Diabetes. The pipeline developed integrates gene regions and protein-protein interaction networks in datasets of variants, extracts new features, and employs machine learning models to predict risk of disease. This study showed the models can predict the risk of disease and using gene regions and protein-protein interaction networks improves the models and provides new information about the biology of the disease. From these models it was possible to identify new genes and pathways of interest which, with further investigation, could lead to the development of new strategies for diagnosis, prevention and treatment of Type 2 Diabetes.Doenças complexas, como Diabetes Tipo 2, são tanto causadas por fatores ambientais como por fatores genéticos que envolvem múltiplas variantes e as interações entre elas. Mesmo assim, os fatores de risco conhecidos não são o suficiente para prever a manifestação da doença. Alguns destes fatores podem ser descobertos em Genome-Wide Association Studies que detetam associações entre variantes, como polimorfismos num único nucleotídeo, e fenótipos, contudo são necessárias outras abordagens, como por exemplo Aprendizagem Máquina, para identificar os seus efeitos e interações. Mesmo quando estes métodos conseguem identificar padrões e obter bons resultados, estes são difíceis de interpretar. Neste trabalho, desenvolvemos um algoritmo para doenças complexas que utiliza dados obtidos em Genome-Wide Association Studies para auxiliar na identificação de novos marcadores genéticos as sociados à Diabetes Tipo 2. A abordagem desenvolvida combina conjuntos de dados de variantes com a infomação das regiões de genes e redes de interações entre proteínas, extrai novas características, e utiliza modelos aprendizagem de máquina para prever o risco de doença. Este trabalho mostra que os modelos conseguem prever o risco de doença e que o uso de genes e de redes de interação entre proteínas melhora os seus resultados, assim como também fornecem novas informações sobre a biologia da doença. Usando esta abordagem é possivel identificar novos genes e redes metabólicas de interece, que com investigação adicional, podem levar a criação de novas estratégias de diagnóstico, prevenção e tratamento da Diabetes Tipo 2

    Intratumoral heterogeneity analysis reveals hidden associations between protein expression losses and patient survival in clear cell renal cell carcinoma.

    Get PDF
    Intratumoral heterogeneity (ITH) is a prominent feature of kidney cancer. It is not known whether it has utility in finding associations between protein expression and clinical parameters. We used ITH that is detected by immunohistochemistry (IHC) to aid the association analysis between the loss of SWI/SNF components and clinical parameters.160 ccRCC tumors (40 per tumor stage) were used to generate tissue microarray (TMA). Four foci from different regions of each tumor were selected. IHC was performed against PBRM1, ARID1A, SETD2, SMARCA4, and SMARCA2. Statistical analyses were performed to correlate biomarker losses with patho-clinical parameters. Categorical variables were compared between groups using Fisher\u27s exact tests. Univariate and multivariable analyses were used to correlate biomarker changes and patient survivals. Multivariable analyses were performed by constructing decision trees using the classification and regression trees (CART) methodology. IHC detected widespread ITH in ccRCC tumors. The statistical analysis of the Truncal loss (root loss) found additional correlations between biomarker losses and tumor stages than the traditional Loss in tumor (total) . Losses of SMARCA4 or SMARCA2 significantly improved prognosis for overall survival (OS). Losses of PBRM1, ARID1A or SETD2 had the opposite effect. Thus Truncal Loss analysis revealed hidden links between protein losses and patient survival in ccRCC
    corecore