Search CORE

228 research outputs found

Recommended from our members

Hadoop performance modeling and job optimization for big data analytics

Author: Khan Mukhtaj
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda

Brunel University Research Archive

A Novel Method of Butterfly Optimization Algorithm for Load Balancing in Cloud Computing

Author: Kumar Sunil
Saini Dilip Kumar J
Yadav Priya
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2022
Field of study

Cloud computing is frequently alluded to as a model that furnishes boundless information handling conveniences with a compensation for each utilization framework. Present day cloud foundations resources as virtual machines (VMs) to actual machines utilizing virtualization innovation. All VMs works their involved structure and exhaust resources from their actual machine which behaves like a host. For load adjusting, Cloud moves VMs from exceptionally troubled real machines to low troubled actual machines. The delay of this calculation expansions in the organization as virtual machines are relocated. This work puts forward a new algorithm, namely Butterfly optimization for VM migration. The proposed optimization algorithm has been implemented in the MATLAB software.  A comparative analysis is performed between the outcomes of the preceding and the new algorithm.  The proposed algorithm has been evaluated over three performance parameters including delay, bandwidth used, and space used

International Journal on Recent and Innovation Trends in Computing and Communication

A genetic algorithm enhanced automatic data flow management solution for facilitating data intensive applications in the cloud

Author: Han L
Huang Z
Jiang C
Li S
Publication venue: 'Wiley'
Publication date: 14/08/2018
Field of study

National Basic Research Program (973) of China and Science and Technology Commission of Shanghai Municipalit

E-space: Manchester Metropolitan University's Research Repository

Brunel University Research Archive

BUILDING EFFICIENT AND COST-EFFECTIVE CLOUD-BASED BIG DATA MANAGEMENT SYSTEMS

Author: Quamar Abdul
Publication venue
Publication date: 01/01/2015
Field of study

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud

Digital Repository at the University of Maryland

An insight in cloud computing solutions for intensive processing of remote sensing data

Author: De Luca Claudio
Publication venue
Publication date: 31/03/2016
Field of study

The investigation of Earth's surface deformation phenomena provides critical insights into several processes of great interest for science and society, especially from the perspective of further understanding the Earth System and the impact of the human activities. Indeed, the study of ground deformation phenomena can be helpful for the comprehension of the geophysical dynamics dominating natural hazards such as earthquakes, volcanoes and landslide. In this context, the microwave space-borne Earth Observation (EO) techniques represent very powerful instruments for the ground deformation estimation. In particular, Small BAseline Subset (SBAS) is regarded as one of the key techniques, for its ability to investigate surface deformation affecting large areas of the Earth with a centimeter to millimeter accuracy in different scenarios (volcanoes, tectonics, landslides, anthropogenic induced land motions). The current Remote Sensing scenario is characterized by the availability of huge archives of radar data that are going to increase with the advent of Sentinel-1 satellites. The effective exploitation of this large amount of data requires both adequate computing resources as well as advanced algorithms able to properly exploit such facilities. In this work we concentrated on the use of the P-SBAS algorithm (a parallel version of SBAS) within HPC infrastructure, to finally investigate the effectiveness of such technologies for EO applications. In particular we demonstrated that the cloud computing solutions represent a valid alternative for scientific application and a promising research scenario, indeed, from all the experiments that we have conducted and from the results obtained performing Parallel Small Baseline Subset (P-SBAS) processing, the cloud technologies and features result to be absolutely competitive in terms of performance with in-house HPC cluster solution

Università degli Studi di Napoli Federico Il Open Archive

Research reports: 1991 NASA/ASEE Summer Faculty Fellowship Program

Author: Chappell Charles R.
Freeman L. Michael
Karr Gerald R.
Six Frank
Publication venue
Publication date
Field of study

The basic objectives of the programs, which are in the 28th year of operation nationally, are: (1) to further the professional knowledge of qualified engineering and science faculty members; (2) to stimulate an exchange of ideas between participants and NASA; (3) to enrich and refresh the research and teaching activities of the participants' institutions; and (4) to contribute to the research objectives of the NASA Centers. The faculty fellows spent 10 weeks at MSFC engaged in a research project compatible with their interests and background and worked in collaboration with a NASA/MSFC colleague. This is a compilation of their research reports for summer 1991

NASA Technical Reports Server

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Author: Chu Qi
He Tong
Li Yonghui
Liu Yating
Lu Yan
Ma Xinzhu
Ouyang Wanli
Yang Lei
Zhang Tianzhu
Publication venue
Publication date: 24/10/2023
Field of study

Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.Comment: 18 pages, 9 figure

arXiv.org e-Print Archive

Air Force Institute of Technology Contributions to Air Force Research and Development, Calendar Year 1987

Author: Air Force Institute of Technology
Publication venue: AFIT Scholar
Publication date: 01/03/1987
Field of study

From the introduction:The primary mission of the Air Force Institute of Technology (AFIT) is education, but research and consulting are essential integral elements in the process. This report highlights AFIT\u27s contributions to Air Force research and development activities [in 1987]

AFTI Scholar (Air Force Institute of Technology)

Internet data budget allocation policies for diverse smartphone applications

Author
Publication venue: Springer
Publication date: 22/09/2016
Field of study

Springer - Publisher Connector