14 research outputs found
Recommended from our members
Hadoop performance modeling and job optimization for big data analytics
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda
A MapReduce Based Distributed LSI for Scalable Information Retrieval
Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources
Making the Sourcing Decision of Software Maintenance and Information Technology
Outsourcing has been getting a significant growth for the last few years. Organizations tend to outsource Information Technology (IT), primarily to take advantage of the availability of qualified, trained and skilled workforce in low cost countries across the globe. Outsourcing of IT and software maintenance seem very promising, but a number of factors, risks, and challenges associated with the outsourcing process that make the sourcing decision very complicated. The present study aimed at gaining in-depth understanding of the three aspects of outsourcing, namely; perceived benefits of IT outsourcing, influencing factors of IT outsourcing and software maintenance offshoring. The findings of the current study will lead us to develop a sourcing framework for outsourcing decision as well as a decision support system for software maintenance. A systematic literature review is performed that presents perceived benefits of IT outsourcing, the influencing factors of IT outsourcing and software maintenance. Furthermore, the identified factors are analyzed based on their occurrences in literature as well as chi square test is performed to derive the significant differences amongst the factors based on decades. Similarly, critical success factors are derived both for IT outsourcing and software maintenance offshoring. Our article shows that how the critical success factors impact the IT as well the software maintenance in global delivery perspective. The findings of the current study will help the IT experts and decision makers in making suitable sourcing decisions.Qatar University [IRCC-2020-009]
A migration aware scheduling technique for real-time aperiodic tasks over multiprocessor systems
Multi-processor systems consist of more than one processor and are mostly used for computationally intensive applications. Real-time systems are those systems that require completing execution of tasks within a pre-defined deadline. Traditionally, multiprocessor systems are given attention in periodic models, where tasks are executed at regular intervals of time. Gradually, as maturity in a multiprocessor design had increased; their usage has become very common for real-time systems to execute both periodic and aperiodic tasks. As the priority of an aperiodic task is usually but not essentially greater than the priority of a periodic task, they must be completed within the deadline. There is a lot of research works on multiprocessor systems with scheduling of periodic tasks, but the task scheduling is relatively remained unexplored for a mixed workload of both periodic and aperiodic tasks. Moreover, higher energy consumption is another main issue in multiprocessor systems. Although it could be reduced by using the energy-aware scheduling technique, the response time of aperiodic tasks still increases. In the literature, various techniques were suggested to decrease the energy consumption of these systems. However, the study on reducing the response time of aperiodic tasks is limited. In this paper, we propose a scheduling technique that: 1) executes aperiodic tasks at full speed and migrates periodic tasks to other processors if their deadline is earlier than aperiodic tasks-reduces the response time and 2) executes aperiodic tasks with lower speed by identifying appropriate processor speed without affecting the response time-reduces energy consumption. Through simulations, we demonstrate the efficiency of the proposed algorithm and we show that our algorithm also outperforms the well-known total bandwidth server algorithm
Neighbourhood oriented TDMA scheme for the internet of things-enabled remote sensing
Throughout the world, Internet of Things (IoT) have been used in different application areas to assist human beings in numerous activities such as smart buildings and cities via remote sensing-enabled techniques. However, simultaneous transmission of packet(s) by multiple devices Ci, which are interested to start a communication session with a common receiver device, is one of the challenging issues associated with these networks. In the literature, various mechanisms have been presented to resolve the aforementioned issue without changing the technological infrastructures; however, neighbourhood information of sensor nodes is not considered yet. In IoT-enabled remote sensing, neighbourhood information of various devices plays a vital role in developing a reliable communication mechanism specifically for scenarios where multiple devices Ci are interested to start communication with a common destination module. In this paper, a neighbourhood-enabled TDMA scheme is presented for the IoT to ensure the concurrent communication of multiple devices Ci with a common destination device Sj preferably with a minimum possible packet collision ratio (if avoidance is not possible). The proposed scheme bounds each and every member device Ci to assign a dedicated time slot to its neighbouring devices in the operational IoT network. Furthermore, neighbouring devices Ci are forced to communicate within the assigned time slot. Simulation results have verified that the proposed scheme is ideal solution compared to the existing schemes for the IoT and other resource-limited networks particularly in scenarios where the deployment process is random
Energy harvesting based routing protocol for underwater sensor networks.
Underwater sensor networks (UWSNs) are ad-hoc networks which are deployed at rivers, seas and oceans to explore and monitor the phenomena such as pollution control, seismic activities and petroleum mining etc. The sensor nodes of UWSNs have limited charging capabilities. UWSNs networks are generally operated under two deployment mechanisms i.e localization and non-localization based. However, in both the mechanisms, balanced energy utilization is a challenging issue. Inefficient usage of energy significantly affects stability period, packet delivery ratio, end-to-end delay, path loss and throughput of a network. To efficiently utilize and harvest energy, this paper present a novel scheme called EH-ARCUN (Energy Harvesting Analytical approach towards Reliability with Cooperation for UWSNs) based on cooperation with energy harvesting. The scheme employs Amplify-and-Forward (AF) technique at relay nodes for data forwarding and Fixed Combining Ratio (FCR) technique at destination node to select accurate signal. The proposed technique selects relay nodes among its neighbor nodes based on harvested energy level. Most cooperation-based UWSN routing techniques do not exhibit energy harvesting mechanism at the relay nodes. EH-ARCUN deploys piezoelectric energy harvesting at relay nodes to improve the working capabilities of sensors in UWSNs. The proposed scheme is an extension of our previously implemented routing scheme called ARCUN for UWSNs. Performance of the proposed scheme is compared with ARCUN and RACE (Reliability and Adaptive Cooperation for efficient Underwater sensor Networks) schemes in term of stability period, packet delivery ratio, network throughput and path loss. Extensive simulation results show that EH-ARCUN performs better than both previous schemes in terms of the considered parameters
An energy and performance aware consolidation technique for containerized datacenters
Cloud datacenters have become a backbone for today's business and economy, which are the fastest-growing electricity consumers, globally. Numerous studies suggest that ~30% of the US datacenters are comatose and the others are grossly less-utilized, which make it possible to save energy through resource consolidation techniques. However, consolidation comprises migrations that are expensive in terms of energy consumption and performance degradation, which is mostly not accounted for in many existing models, and, possibly, it could be more energy and performance efficient not to consolidate. In this paper, we investigate how migration decisions should be taken so that the migration cost is recovered, as only when migration cost has been recovered and performance is guaranteed, will energy start to be saved. We demonstrate through several experiments, using the Google workload data for 12,583 hosts and approximately one million tasks that belong to three different kinds of workload, how different allocation policies, combined with various migration approaches, will impact on datacenter's energy and performance efficiencies. Using several plausible assumptions for containerised datacenter set-up, we suggest, that a combination of the proposed energy-performance-aware allocation (Epc-Fu) and migration (Cper) techniques, and migrating relatively long-running containers only, offers for ideal energy and performance efficiencies
Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features
Sumoylation is a post-translation modification (PTM) mechanism that involves many critical biological processes, such as gene expression, localizing and stabilizing proteins, and replicating the genome. Moreover, sumoylation sites are associated with different diseases, including Parkinson’s and Alzheimer’s. Due to its vital role in the biological process, identifying sumoylation sites in proteins is significant for monitoring protein functions and discovering multiple diseases. Therefore, in the literature, several computational models utilizing conventional ML methods have been introduced to classify sumoylation sites. However, these models cannot accurately classify the sumoylation sites due to intrinsic limitations associated with the conventional learning methods. This paper proposes a robust computational model (called Deep-Sumo) for predicting sumoylation sites based on a deep-learning algorithm with efficient feature representation methods. The proposed model employs a half-sphere exposure method to represent protein sequences in a feature vector. Principal Component Analysis is applied to extract discriminative features by eliminating noisy and redundant features. The discriminant features are given to a multilayer Deep Neural Network (DNN) model to predict sumoylation sites accurately. The performance of the proposed model is extensively evaluated using a 10-fold cross-validation test by considering various statistical-based performance measurement metrics. Initially, the proposed DNN is compared with the traditional learning algorithm, and subsequently, the performance of the Deep-Sumo is compared with the existing models. The validation results show that the proposed model reports an average accuracy of 96.47%, with improvement compared with the existing models. It is anticipated that the proposed model can be used as an effective tool for drug discovery and the diagnosis of multiple diseases
Optimized Feature Learning for Anti-Inflammatory Peptide Prediction Using Parallel Distributed Computing
With recent advancements in computational biology, high throughput Next-Generation Sequencing (NGS) has become a de facto standard technology for gene expression studies, including DNAs, RNAs, and proteins; however, it generates several millions of sequences in a single run. Moreover, the raw sequencing datasets are increasing exponentially, doubling in size every 18 months, leading to a big data issue in computational biology. Moreover, inflammatory illnesses and boosting immune function have recently attracted a lot of attention, yet accurate recognition of Anti-Inflammatory Peptides (AIPs) through a biological process is time-consuming as therapeutic agents for inflammatory-related diseases. Similarly, precise classification of these AIPs is challenging for traditional technology and conventional machine learning algorithms. Parallel and distributed computing models and deep neural networks have become major computing platforms for big data analytics now required in computational biology. This study proposes an efficient high-throughput anti-inflammatory peptide predictor based on a parallel deep neural network model. The model performance is extensively evaluated regarding performance measurement parameters such as accuracy, efficiency, scalability, and speedup in sequential and distributed environments. The encoding sequence data were balanced using the SMOTETomek approach, resulting in a high-accuracy performance. The parallel deep neural network demonstrated high speed up and scalability compared to other traditional classification algorithms study’s outcome could promote a parallel-based model for predicting anti-Inflammatory Peptides