Search CORE

839 research outputs found

Recommended from our members

Hadoop performance modeling and job optimization for big data analytics

Author: Khan Mukhtaj
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda

Brunel University Research Archive

Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem

Author: Anant Agrawal (3953690)
Andrea Lozzi (3953780)
Cristin G. Welle (3953777)
Daniel X. Hammer (3952427)
Erkinay Abliz (3953774)
Noah Greenbaum (3953768)
Victor Krauthamer (3953771)
Publication venue
Publication date: 27/06/2018
Field of study

We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEP). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR. Supplementary materials are available online

arXiv.org e-Print Archive

FigShare

Schema theory based data engineering in gene expression programming for big data analytics

Author: Chousidis Christos
Huang Zhengwen
Jiang Changjun
Li Maozhen
Mousavi Ali
Publication venue
Publication date: 12/12/2017
Field of study

Gene expression programming (GEP) is a data driven evolutionary technique that well suits for correlation mining. Parallel GEPs are proposed to speed up the evolution process using a cluster of computers or a computer with multiple CPU cores. However, the generation structure of chromosomes and the size of input data are two issues that tend to be neglected when speeding up GEP in evolution. To fill the research gap, this paper proposes three guiding principles to elaborate the computation nature of GEP in evolution based on an analysis of GEP schema theory. As a result, a novel data engineered GEP is developed which follows closely the generation structure of chromosomes in parallelization and considers the input data size in segmentation. Experimental results on two data sets with complementary features show that the data engineered GEP speeds up the evolution process significantly without loss of accuracy in data correlation mining. Based on the experimental tests, a computation model of the data engineered GEP is further developed to demonstrate its high scalability in dealing with potential big data using a large number of CPU cores

ZENODO

UWL Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Brunel University Research Archive

Computational models and approaches for lung cancer diagnosis

Author: Azzawi Hasseeb
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/10/2019
Field of study

The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

Deakin Research Online

IP-Enabled C/C++ Based High Level Synthesis: A Step towards Better Designer Productivity and Design Performance

Author: Sharad Sinha
Thambipillai Srikanthan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Intellectual property (IP) core based design is an emerging design methodology to deal with increasing chip design complexity. C/C++ based high level synthesis (HLS) is also gaining traction as a design methodology to deal with increasing design complexity. In the work presented here, we present a design methodology that combines these two individual methodologies and is therefore more powerful. We discuss our proposed methodology in the context of supporting efficient hardware synthesis of a class of mathematical functions without altering original C/C++ source code. Additionally, we also discuss and propose methods to integrate legacy IP cores in existing HLS flows. Relying on concepts from the domains of program recognition and optimized low level implementations of such arithmetic functions, the described design methodology is a step towards intelligent synthesis where application characteristics are matched with specific architectural resources and relevant IP cores in a transparent manner for improved area-delay results. The combined methodology is more aware of the target hardware architecture than the conventional HLS flow. Implementation results of certain compute kernels from a commercial tool Vivado-HLS as well as proposed flow are also compared to show that proposed flow gives better results

Crossref

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

Semiautomatic epicardial fat segmentation based on fuzzy c-means clustering and geometric ellipse fitting

Author: Babin Danilo
Janev Marko
Jovanov Ljubomir
Krstanović Lidija
Obradović Ratko
Popović Branislav
Ralević Nebojsa M.
Velicki Lazar
Zlokolica Vladimir
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Automatic segmentation of particular heart parts plays an important role in recognition tasks, which is utilized for diagnosis and treatment. One particularly important application is segmentation of epicardial fat (surrounds the heart), which is shown by various studies to indicate risk level for developing various cardiovascular diseases as well as to predict progression of certain diseases. Quantification of epicardial fat from CT images requires advance image segmentation methods. The problem of the state-of-the-art methods for epicardial fat segmentation is their high dependency on user interaction, resulting in low reproducibility of studies and time-consuming analysis. We propose in this paper a novel semiautomatic approach for segmentation and quantification of epicardial fat from 3D CT images. Our method is a semisupervised slice-by-slice segmentation approach based on local adaptive morphology and fuzzy c-means clustering. Additionally, we use a geometric ellipse prior to filter out undesired parts of the target cluster. The validation of the proposed methodology shows good correspondence between the segmentation results and the manual segmentation performed by physicians

Ghent University Academic Bibliography

Directory of Open Access Journals