8 research outputs found

    Whitepaper on Reusable Hybrid and Multi-Cloud Analytics Service Framework

    Full text link
    Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reuse, and interactions while leveraging service collaboration, and service cooperation. This document focuses on expanding analytics services to develop a framework for reusable hybrid multi-service data analytics. It includes (a) a short technology review that explicitly targets the intersection of hybrid multi-provider analytics services, (b) a small motivation based on use cases we looked at, (c) enhancing the concepts of services to showcase how hybrid, as well as multi-provider services can be integrated and reused via the proposed framework, (d) address analytics service composition, and (e) integrate container technologies to achieve state-of-the-art analytics service deploymen

    Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Scenarios exist in the era of Big Data where computational analysis needs to utilize widely distributed and remote compute clusters, especially when the data sources are sensitive or extremely large, and thus unable to move. A large dataset in Malaysia could be ecologically sensitive, for instance, and unable to be moved outside the country boundaries. Controlling an analysis experiment in this virtual cluster setting can be difficult on multiple levels: with setup and control, with managing behavior of the virtual cluster, and with interoperability issues across the compute clusters. Further, datasets can be distributed among clusters, or even across data centers, so that it becomes critical to utilize data locality information to optimize the performance of data-intensive jobs. Finally, datasets are increasingly sensitive and tied to certain administrative boundaries, though once the data has been processed, the aggregated or statistical result can be shared across the boundaries. This dissertation addresses management and control of a widely distributed virtual cluster having sensitive or otherwise immovable data sets through a controller. The Virtual Cluster Controller (VCC) gives control back to the researcher. It creates virtual clusters across multiple cloud platforms. In recognition of sensitive data, it can establish a single network overlay over widely distributed clusters. We define a novel class of data, notably immovable data that we call "pinned data", where the data is treated as a first-class citizen instead of being moved to where needed. We draw from our earlier work with a hierarchical data processing model, Hierarchical MapReduce (HMR), to process geographically distributed data, some of which are pinned data. The applications implemented in HMR use extended MapReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Further, by facilitating information sharing among resources, applications, and data, the overall performance is improved. Experimental results show that the overhead of VCC is minimum. The HMR outperforms traditional MapReduce model while processing a particular class of applications. The evaluations also show that information sharing between resources and application through the VCC shortens the hierarchical data processing time, as well satisfying the constraints on the pinned data

    A design pattern for optimizations in data intensive applications using ABS and JAVA 8

    Get PDF
    Cloud environments have become a standard method for enterprises to offer their applications by means of web services, data management systems, or simply renting out computing resources. In our previous work, we presented how we can use a modeling language together with the new features of JAVA 8 to overcome certain drawbacks of data structures and synchronization mechanisms in parallel applications. We extend this solution into a design pattern that allows application-specific optimizations in a distributed setting. We validate this integration using our previous case study of the Prime Sieve of Eratosthenes and illustrate the performance improvements in terms of speed-up and memory co

    A design pattern for optimizations in data intensive applications using ABS and JAVA 8

    Get PDF
    Cloud environments have become a standard method for enterprises to offer their applications by means of web services, data management systems, or simply renting out computing resources. In our previous work, we presented how we can use a modeling language together with the new features of JAVA 8 to overcome certain drawbacks of data structures and synchronization mechanisms in parallel applications. We extend this solution into a design pattern that allows application-specific optimizations in a distributed setting. We validate this integration using our previous case study of the Prime Sieve of Eratosthenes and illustrate the performance improvements in terms of speed-up and memory consumption

    LDAT: A Web Data Visualization Tool for LiDAR Point Cloud Data Analysis

    Get PDF
    Light Detection and Ranging (LiDAR) sensors have been employed in many different ways over time and continue to be utilized today. These sensors produce point clouds which are large and complex data sets that are a collection of position points across a 3D space. The research presented in this thesis focuses on the analysis and visualization of LiDAR point cloud data. The data obtained for this project is from LiDAR sensors located on street lights on Virginia Street to analyze traffic information. A web tool was developed to analyze and visualize this data, ensuing in an interactive and readable representation of the data. In order to ensure the effectiveness of the tool, a user study was conducted to test the functionality and assess possible improvements. This thesis aims to provide a template for creating an effective and a useful data visualization tool in an increasingly data-driven society

    DeReEs: real-time registration of RGBD images using image-based feature detection and robust 3D correspondence estimation and refinement

    Get PDF
    We present DeReEs, a real-time RGBD registration algorithm for the scenario where multiple RGBD images of the same scene are obtained from depth-sensing cameras placed at different viewpoints, with partial overlaps between their views. DeReEs (Detection, Rejection and Estimation) is a combination of 2D image-based feature detection algorithms, a RANSAC based false correspondence rejection and a rigid 3D transformation estimation. DeReEs performs global registration not only in real-time, but also supports large transformation distances for both translations and rotations. DeReEs is designed as part of a virtual/augmented reality solution for a remote 3D collaboration system that does not require initial setup and allows users to freely move the cameras during use. We present comparisons of DeReEs with other common registration algorithms. Our results suggest that DeReEs provides better speed and accuracy especially in scenes with partial overlapping

    Overview of Cloud Computing

    Get PDF
    This updated book (Version 1.2) serves to fill a void in introductory textbook on cloud computing publishing. The target audience are readers with some technical background that are also interested in the business aspects of cloud computing. The book intentionally does not focus on technical details and does not include step-by-step instructions in order to avoid becoming obsolete too quickly. While new tools and concepts are sure to continue to come up at a rapid pace, the bulk of the book should remain true and useful for a number of years. Examples are usually based on the Google Cloud Platform, but the principles covered in the book are equally relevant to users of other cloud platforms.Published

    Accessing multiple clouds with cloudmesh

    No full text
    corecore