2,116 research outputs found

    Machine Learning Approaches for Cancer Analysis

    Get PDF
    In addition, we propose many machine learning models that serve as contributions to solve a biological problem. First, we present Zseq, a linear time method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors, such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Studying the abundance of select mRNA species throughout prostate cancer progression may provide some insight into the molecular mechanisms that advance the disease. In the second contribution of this dissertation, we reveal that the combination of proper clustering, distance function and Index validation for clusters are suitable in identifying outlier transcripts, which show different trending than the majority of the transcripts, the trending of the transcript is the abundance throughout different stages of prostate cancer. We compare this model with standard hierarchical time-series clustering method based on Euclidean distance. Using time-series profile hierarchical clustering methods, we identified stage-specific mRNA species termed outlier transcripts that exhibit unique trending patterns as compared to most other transcripts during disease progression. This method is able to identify those outliers rather than finding patterns among the trending transcripts compared to the hierarchical clustering method based on Euclidean distance. A wet-lab experiment on a biomarker (CAM2G gene) confirmed the result of the computational model. Genes related to these outlier transcripts were found to be strongly associated with cancer, and in particular, prostate cancer. Further investigation of these outlier transcripts in prostate cancer may identify them as potential stage-specific biomarkers that can predict the progression of the disease. Breast cancer, on the other hand, is a widespread type of cancer in females and accounts for a lot of cancer cases and deaths in the world. Identifying the subtype of breast cancer plays a crucial role in selecting the best treatment. In the third contribution, we propose an optimized hierarchical classification model that is used to predict the breast cancer subtype. Suitable filter feature selection methods and new hybrid feature selection methods are utilized to find discriminative genes. Our proposed model achieves 100% accuracy for predicting the breast cancer subtypes using the same or even fewer genes. Studying breast cancer survivability among different patients who received various treatments may help understand the relationship between the survivability and treatment therapy based on gene expression. In the fourth contribution, we have built a classifier system that predicts whether a given breast cancer patient who underwent some form of treatment, which is either hormone therapy, radiotherapy, or surgery will survive beyond five years after the treatment therapy. Our classifier is a tree-based hierarchical approach that partitions breast cancer patients based on survivability classes; each node in the tree is associated with a treatment therapy and finds a predictive subset of genes that can best predict whether a given patient will survive after that particular treatment. We applied our tree-based method to a gene expression dataset that consists of 347 treated breast cancer patients and identified potential biomarker subsets with prediction accuracies ranging from 80.9% to 100%. We have further investigated the roles of many biomarkers through the literature. Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. Discovery of gene indicators can be a crucial step in predicting survivability and handling of breast cancer patients. In the fifth contribution, we propose a hierarchical clustering method to separate dissimilar groups of genes in time-series data as outliers. These isolated outliers, genes that trend differently from other genes, can serve as potential biomarkers of breast cancer survivability. In the last contribution, we introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have significant value in guiding treatment decisions. Our study also supports PTGFR, NREP, scaRNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease

    Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications

    Get PDF
    Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    Application of Machine Learning in Predicting Ovarian Cancer Survivability

    Get PDF
    This study applies machine learning techniques for predicting ovarian cancer survivability using Cerner Health facts data. Specifically, the study uses three popular data mining techniques: Neural network- MLP, Neural network-RBF and Decision trees. Using 10-fold cross validation is used in all the techniques to minimize the overfitting of models. Based on the descriptive statistics this study finds similar patterns that are observed in studies conducted on SEER cancer data. The aggregated results indicate that balanced technique using neural network multilayer perceptron in IBM SPSS modeler performed the best with a classification accuracy of 97.71% which is better than any other model compared in the study. The second best is using unbalanced data on neural network radial basis function with a classification accuracy of 96.18%. The neural network with radial basis function comes out as the worst with a classification accuracy of 67.80% even with a balanced dataset. This signifies given a set of parameters used in the study like: admission source, race of the patient, census division and so on the neural network using multilayer perceptron will predict the outcome of survival of the patient with 97.71% accuracy. In addition to the prediction model this study also found important factors in order to have a better insight into the relative contribution of the variables to predict survivability.Industrial Engineering & Managemen

    Continuous User Authentication Using Multi-Modal Biometrics

    Get PDF
    It is commonly acknowledged that mobile devices now form an integral part of an individual’s everyday life. The modern mobile handheld devices are capable to provide a wide range of services and applications over multiple networks. With the increasing capability and accessibility, they introduce additional demands in term of security. This thesis explores the need for authentication on mobile devices and proposes a novel mechanism to improve the current techniques. The research begins with an intensive review of mobile technologies and the current security challenges that mobile devices experience to illustrate the imperative of authentication on mobile devices. The research then highlights the existing authentication mechanism and a wide range of weakness. To this end, biometric approaches are identified as an appropriate solution an opportunity for security to be maintained beyond point-of-entry. Indeed, by utilising behaviour biometric techniques, the authentication mechanism can be performed in a continuous and transparent fashion. This research investigated three behavioural biometric techniques based on SMS texting activities and messages, looking to apply these techniques as a multi-modal biometric authentication method for mobile devices. The results showed that linguistic profiling; keystroke dynamics and behaviour profiling can be used to discriminate users with overall Equal Error Rates (EER) 12.8%, 20.8% and 9.2% respectively. By using a combination of biometrics, the results showed clearly that the classification performance is better than using single biometric technique achieving EER 3.3%. Based on these findings, a novel architecture of multi-modal biometric authentication on mobile devices is proposed. The framework is able to provide a robust, continuous and transparent authentication in standalone and server-client modes regardless of mobile hardware configuration. The framework is able to continuously maintain the security status of the devices. With a high level of security status, users are permitted to access sensitive services and data. On the other hand, with the low level of security, users are required to re-authenticate before accessing sensitive service or data

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    An intelligent framework for pre-processing ancient Thai manuscripts on palm leaves

    Get PDF
    In Thailand’s early history, prior to the availability of paper and printing technologies, palm leaves were used to record information written by hand. These ancient documents contain invaluable knowledge. By digitising the manuscripts, the content can be preserved and made widely available to the interested community via electronic media. However, the content is difficult to access or retrieve. In order to extract relevant information from the document images efficiently, each step of the process requires reduction of irrelevant data such as noise or interference on the images. The pre-processing techniques serve the purpose of extracting regions of interest, reducing noise from the image and degrading the irrelevant background. The image can then be directly and efficiently processed for feature selection and extraction prior to the subsequent phase of character recognition. It is therefore the main objective of this study to develop an efficient and intelligent image preprocessing system that could be used to extract components from ancient manuscripts for information extraction and retrieval purposes. The main contributions of this thesis are the provision and enhancement of the region of interest by using an intelligent approach for the pre-processing of ancient Thai manuscripts on palm leaves and a detailed examination of the preprocessing techniques for palm leaf manuscripts. As noise reduction and binarisation are involved in the first step of pre-processing to eliminate noise and background from image documents, it is necessary for this step to provide a good quality output; otherwise, the accuracy of the subsequent stages will be affected. In this work, an intelligent approach to eliminate background was proposed and carried out by a selection of appropriate binarisation techniques using SVM. As there could be multiple binarisation techniques of choice, another approach was proposed to eliminate the background in this study in order to generate an optimal binarised image. The proposal is an ensemble architecture based on the majority vote scheme utilising local neighbouring information around a pixel of interest. To extract text from that binarised image, line segmentation was then applied based on the partial projection method as this method provides good results with slant texts and connected components. To improve the quality of the partial projection method, an Adaptive Partial Projection (APP) method was proposed. This technique adjusts the size of a character strip automatically by adapting the width of the strip to separate the connected component of consecutive lines through divide and conquer, and analysing the upper vowels and lower vowels of the text line. Finally, character segmentation was proposed using a hierarchical segmentation technique based on a contour-tracing algorithm. Touching components identified from the previous step were then separated by a trace of the background skeletons, and a combined method of segmentation. The key datasets used in this study are images provided by the Project for Palm Leaf Preservation, Northeastern Thailand Division, and benchmark datasets from the Document Image Binarisation Contest (DIBCO) series are used to compare the results of this work against other binarisation techniques. The experimental results have shown that the proposed methods in this study provide superior performance and will be used to support subsequent processing of the Thai ancient palm leaf documents. It is expected that the contributions from this study will also benefit research work on ancient manuscripts in other languages
    • …
    corecore