7 research outputs found

    Machine learning models for predicting patients survival after liver transplantation

    Get PDF
    In our work we have built models predicting whether a patient will lose an organ after liver transplantation within a specified time horizon. We have used the observations of bilirubin and creatinine in the whole first year after the transplantation to derive predictors capturing not only their static value but also variability. Our models indeed have predictive power which proves the value of incorporating variability of biochemical measurements and it is the first contribution of our paper.The second one is the selection of the best model for the defined problem. We have identified that full-complexity models, such as random forests and gradient boosting, despite having the best predictive power, lack sufficient interpretability which is important in medicine. We have found that generalized additive models (GAM) provide desired interpretability and their predictive power is closer to the predictions of full-complexity models than to the predictions of simple linear models

    Analysis of Microarray Data using Machine Learning Techniques on Scalable Platforms

    Get PDF
    Microarray-based gene expression profiling has been emerged as an efficient technique for classification, diagnosis, prognosis, and treatment of cancer disease. Frequent changes in the behavior of this disease, generate a huge volume of data. The data retrieved from microarray cover its veracities, and the changes observed as time changes (velocity). Although, it is a type of high-dimensional data which has very large number of features rather than number of samples. Therefore, the analysis of microarray high-dimensional dataset in a short period is very much essential. It often contains huge number of data, only a fraction of which comprises significantly expressed genes. The identification of the precise and interesting genes which are responsible for the cause of cancer is imperative in microarray data analysis. Most of the existing schemes employ a two phase process such as feature selection/extraction followed by classification. Our investigation starts with the analysis of microarray data using kernel based classifiers followed by feature selection using statistical t-test. In this work, various kernel based classifiers like Extreme learning machine (ELM), Relevance vector machine (RVM), and a new proposed method called kernel fuzzy inference system (KFIS) are implemented. The proposed models are investigated using three microarray datasets like Leukemia, Breast and Ovarian cancer. Finally, the performance of these classifiers are measured and compared with Support vector machine (SVM). From the results, it is revealed that the proposed models are able to classify the datasets efficiently and the performance is comparable to the existing kernel based classifiers. As the data size increases, to handle and process these datasets becomes very bottleneck. Hence, a distributed and a scalable cluster like Hadoop is needed for storing (HDFS) and processing (MapReduce as well as Spark) the datasets in an efficient way. The next contribution in this thesis deals with the implementation of feature selection methods, which are able to process the data in a distributed manner. Various statistical tests like ANOVA, Kruskal-Wallis, and Friedman tests are implemented using MapReduce and Spark frameworks which are executed on the top of Hadoop cluster. The performance of these scalable models are measured and compared with the conventional system. From the results, it is observed that the proposed scalable models are very efficient to process data of larger dimensions (GBs, TBs, etc.), as it is not possible to process with the traditional implementation of those algorithms. After selecting the relevant features, the next contribution of this thesis is the scalable viii implementation of the proximal support vector machine classifier, which is an efficient variant of SVM. The proposed classifier is implemented on the two scalable frameworks like MapReduce and Spark and executed on the Hadoop cluster. The obtained results are compared with the results obtained using conventional system. From the results, it is observed that the scalable cluster is well suited for the Big data. Furthermore, it is concluded that Spark is more efficient than MapReduce due to its an intelligent way of handling the datasets through Resilient distributed dataset (RDD) as well as in-memory processing and conventional system to analyze the Big datasets. Therefore, the next contribution of the thesis is the implementation of various scalable classifiers base on Spark. In this work various classifiers like, Logistic regression (LR), Support vector machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), and Radial basis function network (RBFN) with two variants hybrid and gradient descent learning algorithms are proposed and implemented using Spark framework. The proposed scalable models are executed on Hadoop cluster as well as conventional system and the results are investigated. From the obtained results, it is observed that the execution of the scalable algorithms are very efficient than conventional system for processing the Big datasets. The efficacy of the proposed scalable algorithms to handle Big datasets are investigated and compared with the conventional system (where data are not distributed, kept on standalone machine and processed in a traditional manner). The comparative analysis shows that the scalable algorithms are very efficient to process Big datasets on Hadoop cluster rather than the conventional system

    A survey on video compression fast block matching algorithms

    Get PDF
    Video compression is the process of reducing the amount of data required to represent digital video while preserving an acceptable video quality. Recent studies on video compression have focused on multimedia transmission, videophones, teleconferencing, high definition television, CD-ROM storage, etc. The idea of compression techniques is to remove the redundant information that exists in the video sequences. Motion compensation predictive coding is the main coding tool for removing temporal redundancy of video sequences and it typically accounts for 50–80% of video encoding complexity. This technique has been adopted by all of the existing International Video Coding Standards. It assumes that the current frame can be locally modelled as a translation of the reference frames. The practical and widely method used to carry out motion compensated prediction is block matching algorithm. In this method, video frames are divided into a set of non-overlapped macroblocks and compared with the search area in the reference frame in order to find the best matching macroblock. This will carry out displacement vectors that stipulate the movement of the macroblocks from one location to another in the reference frame. Checking all these locations is called Full Search, which provides the best result. However, this algorithm suffers from long computational time, which necessitates improvement. Several methods of Fast Block Matching algorithm are developed to reduce the computation complexity. This paper focuses on a survey for two video compression techniques: the first is called the lossless block matching algorithm process, in which the computational time required to determine the matching macroblock of the Full Search is decreased while the resolution of the predicted frames is the same as for the Full Search. The second is called lossy block matching algorithm process, which reduces the computational complexity effectively but the search result's quality is not the same as for the Full Search

    Data analysis and modeling for engineering and medical applications

    Get PDF
    Master'sMASTER OF ENGINEERIN

    TREE-BASED SURVIVAL MODELS AND PRECISION MEDICINE

    Get PDF
    Random forests have become one of the most popular machine learning tools in recent years. The main advantage of tree- and forest-based models is their nonparametric nature. My dissertation mainly focuses on a particular type of tree and forest model, in which the outcomes are right censored survival data. Censored survival data are frequently seen in biomedical studies when the true clinical outcome may not be directly observable due to early dropout or other reasons. We first carry out a comprehensive analysis of survival random forest and tree models and show the consistency of these popular machine learning models by developing a general theoretical framework. Our results significantly improve the current understanding of such models and this is the first consistency result of tree- and forest-based regression estimator for censored outcomes under high-dimensional settings. In particular, the consistency results are derived through analyzing the splitting rules and establishing an adaptive concentration bound of the variance component, which may also shed light on the theoretical analysis of other random forest models. In the second part, motivated by tree-based survival models, we propose a fiducial approach to provide pointwise and curvewise confidence intervals for the survival functions. On each terminal node, the estimation is essentially a small sample and maybe heavy censoring problem. Most of the asymptotic methods of estimating confidence intervals have coverage problems in many scenarios. The proposed fiducial based pointwise confidence intervals maintain coverage in these situations. Furthermore, the average length of the proposed pointwise confidence intervals is often shorter than the length of competing methods that maintain coverage. In the third topic, we show one application of tree-based survival models in precision medicine. We extend the outcome weighted learning to right censored survival data without requiring either inverse probability of censoring weighting or semi-parametric modeling of the censoring and failure times. To accomplish this, we take advantage of the tree based approach to nonparametrically impute the survival time in two different ways. We also illustrate the proposed method on a phase III clinical trial of non-small cell lung cancer.Doctor of Philosoph
    corecore