27 research outputs found

    Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

    Get PDF
    BACKGROUND: Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. RESULTS: A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. CONCLUSION: Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently

    Gene Expression Analysis Methods on Microarray Data a A Review

    Get PDF
    In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays

    Multivariate Analysis of Tumour Gene Expression Profiles Applying Regularisation and Bayesian Variable Selection Techniques

    No full text
    High-throughput microarray technology is here to stay, e.g. in oncology for tumour classification and gene expression profiling to predict cancer pathology and clinical outcome. The global objective of this thesis is to investigate multivariate methods that are suitable for this task. After introducing the problem and the biological background, an overview of multivariate regularisation methods is given in Chapter 3 and the binary classification problem is outlined (Chapter 4). The focus of applications presented in Chapters 5 to 7 is on sparse binary classifiers that are both parsimonious and interpretable. Particular emphasis is on sparse penalised likelihood and Bayesian variable selection models, all in the context of logistic regression. The thesis concludes with a final discussion chapter. The variable selection problem is particularly challenging here, since the number of variables is much larger than the sample size, which results in an ill-conditioned problem with many equally good solutions. Thus, one open problem is the stability of gene expression profiles. In a resampling study, various characteristics including stability are compared between a variety of classifiers applied to five gene expression data sets and validated on two independent data sets. Bayesian variable selection provides an alternative to resampling for estimating the uncertainty in the selection of genes. MCMC methods are used for model space exploration, but because of the high dimensionality standard algorithms are computationally expensive and/or result in poor Markov chain mixing. A novel MCMC algorithm is presented that uses the dependence structure between input variables for finding blocks of variables to be updated together. This drastically improves mixing while keeping the computational burden acceptable. Several algorithms are compared in a simulation study. In an ovarian cancer application in Chapter 7, the best-performing MCMC algorithms are combined with parallel tempering and compared with an alternative method

    Principal Component Analysis

    Get PDF
    This book is aimed at raising awareness of researchers, scientists and engineers on the benefits of Principal Component Analysis (PCA) in data analysis. In this book, the reader will find the applications of PCA in fields such as taxonomy, biology, pharmacy,finance, agriculture, ecology, health and architecture

    Personality Identification from Social Media Using Deep Learning: A Review

    Get PDF
    Social media helps in sharing of ideas and information among people scattered around the world and thus helps in creating communities, groups, and virtual networks. Identification of personality is significant in many types of applications such as in detecting the mental state or character of a person, predicting job satisfaction, professional and personal relationship success, in recommendation systems. Personality is also an important factor to determine individual variation in thoughts, feelings, and conduct systems. According to the survey of Global social media research in 2018, approximately 3.196 billion social media users are in worldwide. The numbers are estimated to grow rapidly further with the use of mobile smart devices and advancement in technology. Support vector machine (SVM), Naive Bayes (NB), Multilayer perceptron neural network, and convolutional neural network (CNN) are some of the machine learning techniques used for personality identification in the literature review. This paper presents various studies conducted in identifying the personality of social media users with the help of machine learning approaches and the recent studies that targeted to predict the personality of online social media (OSM) users are reviewed

    Development of a static bioactive stent prototype and dynamic aneurysm-on-a-chip(TM) model for the treatment of aneurysms

    Get PDF
    Aneurysms are pockets of blood that collect outside blood vessel walls forming dilatations and leaving arterial walls very prone to rupture. Current treatments include: (1) clipping, and (2) coil embolization, including stent-assisted coiling. While these procedures can be effective, it would be advantageous to design a biologically active stent, modified with magnetic stent coatings, allowing cells to be manipulated to heal the arterial lining. Further, velocity, pressure, and wall shear stresses aid in the disease development of aneurysmal growth, but the shear force mechanisms effecting wound closure is elusive. Due to these factors, there is a definite need to cultivate a new stent device that will aid in healing an aneurysm insitu. To this end, a static bioactive stent device was synthesized. Additionally, to study aneurysm pathogenesis, a lab-on-a-chip device (a dynamic stent device) is the key to discovering the underlying mechanisms of these lesions. A first step to the reality of a true bioactive stent involves the study of cells that can be tested against the biomaterials that constitute the stent itself. The second step is to test particles/cells in a microfluidic environment. Therefore, biocompatability data was collected against PDMS, bacterial nanocellulose (BNC), and magnetic bacterial nanocellulose (MBNC). Preliminary static bioactive stents were synthesized whereby BNC was grown to cover standard nitinol stents. In an offshoot of the original research, a two-dimensional microfluidic model, the Aneurysm-on-a-ChipTM (AOC), was the logical answer to study particle flow within an aneurysm sac - this was the dynamic bioactive stent device. The AOC apparatus can track particles/cells when it is coupled to a particle image velocimetry software (PIV) package. The AOC fluid flow was visualized using standard microscopy techniques with commercial microparticles/cells. Movies were taken during fluid flow experiments and PIV was utilized to monito

    Diagnostic Neuropathology of Brain Tumours using Biophotonics and Spectrometry

    Get PDF
    Classification of tumours such as gliomas, which are on a continuous spectrum of histology and malignancy into distinct categories is still a challenge using histopathology. There has been significant advances in the techniques used to fight cancer in the past two decades. A number of studies have looked at different approaches to improve the accuracy in diagnosis using histopathology. This study evaluated a number of techniques to compliment histopathology. One study looked at vibrational spectroscopy, Raman and attenuated total reflection-fourier transform infrared (ATR-FTIR) looking at brain tumour cell lines. This study investigated the potential application of vibrational spectroscopy in the segregation of different types of brain tumours using two tumour cell lines, U87MG, 1321N1 and a control, SVGP12. Another study looked at two approaches, elemental profiling of both tissue and serum using inductively coupled plasma-mass spectrometry. Trace elements increase or deficiency has been linked to cancer development and progression. The final study looked at the diagnostic application of Raman spectroscopy to distinguish between gliomas, meningiomas, medulloblastoma and several other brain tumours from histological normal brain tissue from brain tumour patients used as controls. The three cell lines U87MG, SVGP12 and 1321N1 were cultured and grown on calcium fluoride slides in triplicates. Spectra from each cell line was taken using both Raman and ATR-FTIR. The spectra was then analysed using multivariate statistics. In the elemental profiling study serum and tissue samples from 55 patients with brain tumours were collected and analysed using ICP-MS. The elemental data was then evaluated using multivariate statistics to investigate significant differences. In the analysis of human brain tumours tissue blocks of both tumour and histological normal brain that were formalin fixed and paraffin embedded (FFPE) were processed and mounted on low-E slides, dewaxed using Xylene, washed with alcohol and water before storage at room temperature until analysis. Raman and ATR-FTIR were able to separate U87MG, SVGP12 and 1321N1 with very high classification accuracy. All the brain tumour groups investigated showed a deficiency of Mg, Fe, Cu, and Zn concentrations against reported levels from healthy individuals in the literature. Raman spectroscopy coupled with multivariate statistics was able to distinguish between normal brain tissue and normal brain tumour tissue used as controls. Classification of gliomas based on the degree of malignancy was also apparent with very high classification accuracy. Spectral panels were developed that can be used as biomarkers in the diagnosis of brain tumours. Raman and Infrared spectroscopy are types of vibrational spectroscopy which have the potential to be used as diagnostic tools in neuropathology. They provide an intrinsic molecular fingerprint of the sample based on the interaction of light. The panels can accurately identify and classify specific brain tumours alleviating the need to use complex statistical models. Raman and ATR-FTIR were able to elucidate chemical information from the samples which was used to differentiate the three cell lines with very high classification accuracy. Diagnosis of a brain tumour is not always a straight forward process and the current techniques used lack the desired level of precision in diagnosis and cytoreductive surgery
    corecore