1,942 research outputs found
Evolutionary approaches for feature selection in biological data
Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control
Memetic micro-genetic algorithms for cancer data classification
Fast and precise medical diagnosis of human cancer is crucial for treatment decisions. Gene selection consists of identifying a set of informative genes from microarray data to allow high predictive accuracy in human cancer classification. This task is a combinatorial search problem, and optimisation methods can be applied for its resolution. In this paper, two memetic micro-genetic algorithms (MμV1 and MμV2) with different hybridisation approaches are proposed for feature selection of cancer microarray data. Seven gene expression datasets are used for experimentation. The comparison with stochastic state-of-the-art optimisation techniques concludes that problem-dependent local search methods combined with micro-genetic algorithms improve feature selection of cancer microarray data.Fil: Rojas, Matias Gabriel. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Vidal, Pablo Javier. Universidad Nacional de Cuyo. Facultad de Ingeniería; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza; Argentin
Computational models and approaches for lung cancer diagnosis
The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results
A survey on computational intelligence approaches for predictive modeling in prostate cancer
Predictive modeling in medicine involves the development of computational models which are capable of analysing large amounts of data in order to predict healthcare outcomes for individual patients. Computational intelligence approaches are suitable when the data to be modelled are too complex forconventional statistical techniques to process quickly and eciently. These advanced approaches are based on mathematical models that have been especially developed for dealing with the uncertainty and imprecision which is typically found in clinical and biological datasets. This paper provides a survey of recent work on computational intelligence approaches that have been applied to prostate cancer predictive modeling, and considers the challenges which need to be addressed. In particular, the paper considers a broad definition of computational intelligence which includes evolutionary algorithms (also known asmetaheuristic optimisation, nature inspired optimisation algorithms), Artificial Neural Networks, Deep Learning, Fuzzy based approaches, and hybrids of these,as well as Bayesian based approaches, and Markov models. Metaheuristic optimisation approaches, such as the Ant Colony Optimisation, Particle Swarm Optimisation, and Artificial Immune Network have been utilised for optimising the performance of prostate cancer predictive models, and the suitability of these approaches are discussed
The importance of data classification using machine learning methods in microarray data
The detection of genetic mutations has attracted global attention. several methods have proposed to detect diseases such as cancers and tumours. One of them is microarrays, which is a type of representation for gene expression that is helpful in diagnosis. To unleash the full potential of microarrays, machine-learning algorithms and gene selection methods can be implemented to facilitate processing on microarrays and to overcome other potential challenges. One of these challenges involves high dimensional data that are redundant, irrelevant, and noisy. To alleviate this problem, this representation should be simplified. For example, the feature selection process can be implemented by reducing the number of features adopted in clustering and classification. A subset of genes can be selected from a pool of gene expression data recorded on DNA micro-arrays. This paper reviews existing classification techniques and gene selection methods. The effectiveness of emerging techniques, such as the swarm intelligence technique in feature selection and classification in microarrays, are reported as well. These emerging techniques can be used in detecting cancer. The swarm intelligence technique can be combined with other statistical methods for attaining better results
Hybrid feature selection of breast cancer gene expression microarray data based on metaheuristic methods: a comprehensive review
Breast cancer (BC) remains the most dominant cancer among women worldwide. Numerous BC gene expression microarray-based studies have been employed in cancer classification and prognosis. The availability of gene expression microarray data together with advanced classification methods has enabled accurate and precise classification. Nevertheless, the microarray datasets suffer from a large number of gene expression levels, limited sample size, and irrelevant features. Additionally, datasets are often asymmetrical, where the number of samples from different classes is not balanced. These limitations make it difficult to determine the actual features that contribute to the existence of cancer classification in the gene expression profiles. Various accurate feature selection methods exist, and they are being widely applied. The objective of feature selection is to search for a relevant, discriminant feature subset from the basic feature space. In this review, we aim to compile and review the latest hybrid feature selection methods based on bio-inspired metaheuristic methods and wrapper methods for the classification of BC and other types of cancer
Molecular Signature as Optima of Multi-Objective Function with Applications to Prediction in Oncogenomics
Náplní této práce je teoretický úvod a následné praktické zpracování tématu Molekulární signatura jako optimální multi-objektivní funkce s aplikací v predikci v onkogenomice. Úvodní kapitoly jsou zaměřeny na téma rakovina, zejména pak rakovina prsu a její podtyp triple negativní rakovinu prsu. Následuje literární přehled z oblasti optimalizačních metod, zejména se zaměřením na metaheuristické metody a problematiku strojového učení. Část se odkazuje na onkogenomiku a principy microarray a také na statistiku a s důrazem na výpočet p-hodnoty a bimodálního indexu. Praktická část je pak zaměřena na konkrétní průběh výzkumu a nalezené závěry, vedoucí k dalším krokům výzkumu. Implementace vybraných metod byla provedena v programech Matlab a R, s využitím dalších programovacích jazyků a to konkrétně programů Java a Python.Content of this work is theoretical introduction and follow-up practical processing of topic Molecular signature as optima of multi-objective function with applications to prediction in oncogenomics. Opening chapters are targeted on topic of cancer, mainly on breast cancer and its subtype Triple Negative Breast Cancer. Succeeds the literature review of optimization methods, mainly on meta-heuristic methods for multi-objective optimization and problematic of machine learning. Part is focused on the oncogenomics and on the principal of microarray and also to statistics methods with emphasis on the calculation of p-value and Bimodality Index. Practical part of work consists from concrete research and conclusions lead to next steps of research. Implementation of selected methods was realised in Matlab and R, with use of other programming languages Java and Python.
Recommended from our members
Evolutionary and deep mining models for effective biomarker discovery
With the advent of high-throughput biology, large amounts of molecular data are available for purposeful analysis and evaluation. Extracting relevant knowledge from high-throughput biomedical datasets has become a common goal of current approaches to personalised cancer medicine and understanding cancer genotype and phenotype. However, the datasets are characterised by high dimensionality and relatively small sample sizes with small signal-to-noise ratios. Extracting and interpreting relevant knowledge from such complex datasets therefore remains a significant challenge for the fields of machine learning and data mining. This is evidenced by the limited success these methods have had in detecting robust and reliable biomarkers for cancers and other complicated diseases. This could also explain the lack of finding generic biomarkers among the identified published genes for identical diseases or clinical conditions.
This thesis proposes and evaluates the efficacy of two novel feature mining models established on the basis of the evolutionary computation and deep learning paradigms to position and solve biomarker discovery as an optimisation problem. Deep learning methods lack the transparency and interpretability found in the evolutionary paradigm. To overcome the inherent issue of poor explanatory power associated with the deep learning, this research also introduces a novel deep mining model that helps to deconstruct the internal state of such deep learning models to reveal key determinants underlying its latent representations to aid feature selection. As a result, salient biomarkers for breast cancer and the positivity of the Estrogen and Progesterone receptors are discovered robustly and validated reliably across a wide range of independently generated breast cancer data samples
Network-Based Biomarker Discovery : Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge
Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism
Recommended from our members
Evaluation of segmentation algorithms for optical coherence tomography images of ovarian tissue
Ovarian cancer has the lowest survival rate among all gynecologic cancers predominantly due to late diagnosis. Early detection of ovarian cancer can increase 5-year survival rates from 40% up to 92%, yet no reliable early detection techniques exist. Optical coherence tomography (OCT) is an emerging technique that provides depth-resolved, high-resolution images of biological tissue in real-time and demonstrates great potential for imaging of ovarian tissue. Mouse models are crucial to quantitatively assess the diagnostic potential of OCT for ovarian cancer imaging; however, due to small organ size, the ovaries must first be separated from the image background using the process of segmentation. Manual segmentation is time-intensive, as OCT yields three-dimensional data. Furthermore, speckle noise complicates OCT images, frustrating many processing techniques. While much work has investigated noise-reduction and automated segmentation for retinal OCT imaging, little has considered the application to the ovaries, which exhibit higher variance and inhomogeneity than the retina. To address these challenges, we evaluate a set of algorithms to segment OCT images of mouse ovaries. We examine five preprocessing techniques and seven segmentation algorithms. While all preprocessing methods improve segmentation, Gaussian filtering is most effective, showing an improvement of
32
%
±
1.2
%
. Of the segmentation algorithms, active contours performs best, segmenting with an accuracy of
94.8
%
±
1.2
%
compared with manual segmentation. Even so, further optimization could lead to maximizing the performance for segmenting OCT images of the ovaries.National Science Foundation Graduate Research Fellowship Program [DGE-1143953]; National Institutes of Health/National Cancer Institute [1R01CA195723]; University of Arizona Cancer Center [3P30CA023074]This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
- …