Search CORE

27 research outputs found

Towards large scale continuous EDA: a random matrix theory perspective

Author: Bootkrajang Jakramate
Durrant Robert J.
Kabán Ata
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Estimation of distribution algorithms (EDA) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging and as a result existing EDAs lose their strengths in large scale problems. Large scale continuous global optimisation is key to many real world problems of modern days. Scaling up EAs to large scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective EDA-type algorithms for large scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections of the set of fittest search points to low dimensions as a basis for developing a new and generic divide-and-conquer methodology. This is rooted in the theory of random projections developed in theoretical computer science, and will exploit recent advances of non-asymptotic random matrix theory

CiteSeerX

Crossref

Research Commons@Waikato

Robust adaptive Lasso in high-dimensional logistic regression with an application to genomic classification of cancer patients

Author: Basu A.
Ghosh A.
Jaenada Malagón María
Pardo Llorente Leandro
Publication venue
Publication date: 01/01/2021
Field of study

Penalized logistic regression is extremely useful for binary classiffication with a large number of covariates (significantly higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood based loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its in uence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we clearly demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification

Docta Complutense

Learning kernel logistic regression in the presence of class label noise

Author: Bootkrajang Jakramate
Kabán Ata
Publication venue: 'Elsevier BV'
Publication date: 01/11/2014
Field of study

Crossref

University of Birmingham Research Portal

Recommended from our members

Review of processing and analysis methods for DNA methylation array data

Author: Brown Robert
Christensen Brock C.
Flanagan James M.
Houseman E. Andres
Karagas Margaret R.
Kelsey Karl T.
Koestler Devin C.
Marsit Carmen J.
Wilhelm-Benartzi Charlotte S.
Publication venue: Nature Publishing Group
Publication date
Field of study

The promise of epigenome-wide association studies (EWAS) and cancer specific somatic changes in improving our understanding of cancer coupled with the decreasing cost and increasing coverage of DNA methylation microarrays, has brought about a surge in the use of these technologies. Here, we aim to provide both a review of issues encountered in the processing and analysis of array-based DNA methylation data, as well as to summarize advantages of recent approaches proposed for handling those issues; focusing on approaches publicly available in open-source environments such as R and Bioconductor. The processing tools and analysis flowchart described we hope will facilitate researchers to effectively use these powerful DNA methylation array-based platforms, thereby advancing our understanding of human health and disease.Keywords: Processing, Microarray, Analysis, DNA methylation, Bioconductor and R package

ScholarsArchive@OSU

Computational investigation of systemic pathway responses in severe pneumonia among the Gambian children and infants

Author: Jafali James
Publication venue: The University of Edinburgh
Publication date: 29/06/2019
Field of study

Pneumonia remains the leading cause of infectious mortality in under-five children, and the burden is highest in sub-Saharan Africa. To mitigate this burden, further knowledge is required to accelerate the development of innovative and cost-effective approaches. To gain a deeper insight into the pathogenesis of pneumonia, I investigated the central hypothesis that systemic pathway (cellular and molecular) responses underpin the development of severe pneumonia outcomes. Mainly, I compared whole blood transcriptomes between severe pneumonia cases (clinically stratified as mild, severe and very severe) and non-pneumonia community controls (prospectively matched by age and sex). In total, 803 whole blood RNA samples were collected from Gambian children (aged 2-59 months) between 2007 and 2010, of which, 518 passed laboratory quality control criteria for the microarray analysis. After data cleaning, the final database reduced to 503 samples including the training (n=345) and independent validation (n=158) data sets. To investigate the cellular responses, I applied computational deconvolution analysis to assess the variations of immune cell type proportions with pneumonia severity. To further enhance the computational performance, I applied a data fusion approach on 3,475 immune marker genes from different resources to derive an optimal and integrated blood marker list (IBML, m=277) for Neutrophils, Monocytes, NK, Dendritic, B and T cell types; which robustly performed better than the existing individual resources. Using the IBML resource, pneumonia severity was significantly associated with the depletion of B, T, Dendritic and NK cell types, and the elevation of Monocytes and neutrophil proportions (P-value<0.001). At the molecular level, pneumonia severity was associated (false discovery rate<0.05) with a battery of systemic pathway (innate, adaptive and metabolic) responses in a range of biomedical databases. While the up-regulation of inflammatory innate responses was also observed in mild cases, severe pneumonia cases were predominantly associated with the co-inhibition of the cells of the adaptive immune response (B and T) and Natural killer cells, and the up-regulation of fatty acid and lipid metabolism. While most of these findings were anticipated, the involvement of NK cells was unexpected, and potentially presents a novel immune-modulation target for mitigating the burden of pneumonia. Together, the cellular and molecular pathways responses consistently support the central hypothesis that systemic pathway responses contribute significantly to the development of severe pneumonia outcomes. Clinically, the identification and appropriate treatment of patients at the higher risk of developing severe pneumonia outcomes remains the major challenge. To address that, I applied supervised machine-learning approaches on cellular pathway based transcriptomic features; and derived a 33-gene classifier (representing the NK, T, and neutrophils cell types), which accurately detected severe pneumonia cases in both the training (leave-one-out cross-validated accuracy=99%) and independent validation (accuracy=98%) datasets. Independently, similar performance (98% in each dataset) was associated with a subset (m=18) of the validated 52-gene neonatal sepsis classifier. Conversely, at least 75% of the cellular biomarkers were differentially expressed (false discovery rate<0.05) in bacterial neonatal sepsis. Further, very severe pneumonia cases were predominantly associated with antibacterial responses; and mild pneumonia cases with blood-culture-confirmed positivity were also associated with an increased frequency of differentially expressed genes. These findings suggest the significant contribution of bacterial septicaemia in the development of serious pneumonia outcomes. Together, this study highlights the future potential of host-derived systemic biomarkers for early identification and novel treatment modalities of high-risk cases presenting at a resource-constrained clinic with mild pneumonia. However, further validation studies are required

Edinburgh Research Archive

Enhanced label noise filtering with multiple voting

Author: Fahim Muhammad
Guan Donghai
Hussain Maqbool
Khan Wajahat Ali
Khattak Asad Masood
Yuan Weiwei
Publication venue: ZU Scholars
Publication date: 01/12/2019
Field of study

© 2019 by the authors. Label noises exist in many applications, and their presence can degrade learning performance. Researchers usually use filters to identify and eliminate them prior to training. The ensemble learning based filter (EnFilter) is the most widely used filter. According to the voting mechanism, EnFilter is mainly divided into two types: single-voting based (SVFilter) and multiple-voting based (MVFilter). In general, MVFilter is more often preferred because multiple-voting could address the intrinsic limitations of single-voting. However, the most important unsolved issue in MVFilter is how to determine the optimal decision point (ODP). Conceptually, the decision point is a threshold value, which determines the noise detection performance. To maximize the performance of MVFilter, we propose a novel approach to compute the optimal decision point. Our approach is data driven and cost sensitive, which determines the ODP based on the given noisy training dataset and noise misrecognition cost matrix. The core idea of our approach is to estimate the mislabeled data probability distributions, based on which the expected cost of each possible decision point could be inferred. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach

ZU Scholars (Zayed University)

Mathematical and statistical methods for single cell data

Author: Thomson William
Publication venue
Publication date: 01/12/2020
Field of study

The availability of single-cell data has increased rapidly in recent years and presents interesting new challenges in the analysis of such data and the modelling of the processes that generate it. In this thesis, we attempt to deal with some of those challenges by developing and exploring mathematical and statistical models for the evolution of population distributions over time, and methods for using aggregated single-cell data from individual patients in predictive diagnostic models of disease. In the first part of the thesis, we explore structured population models – a class of partial differential equations for describing the evolution of individual-level cell properties in a population over time. We begin by analysing an age-structured model of cell growth in which rates of proliferation and cell death are controlled by an external resource. We follow this with a method for extracting properties of a more general class of structured population models directly from single-cell data. In the final part of the thesis, we develop a flexible Bayesian statistical framework for building predictive models from possibly high-dimensional data collected from patients using single-cell technologies and find that the performance is promising compared to a number of existing methods

University of Birmingham Research Archive, E-theses Repository

Analysis of breast tissue microarray spots

Author: Amaral Telmo
Publication venue
Publication date: 01/01/2010
Field of study

University of Dundee Online Publications