306 research outputs found

    Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service

    Full text link
    An increasing number of Analytics-as-a-Service solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion and processing pipelines. This work is a first attempt at an experimental evaluation of analytic application performance executed using a wide range of storage service configurations. We present an intuitive notion of data locality, that we use as a proxy to rank different service compositions in terms of expected performance. Through an empirical analysis, we dissect the performance achieved by analytic workloads and unveil problems due to the impedance mismatch that arise in some configurations. Our work paves the way to a better understanding of modern cloud-based analytic services and their performance, both for its end-users and their providers.Comment: Longer version of the paper in Submission at IEEE CLOUD'1

    Extract, Transform, and Load data from Legacy Systems to Azure Cloud

    Get PDF
    Internship report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceIn a world with continuously evolving technologies and hardened competitive markets, organisations need to continually be on guard to grasp cutting edge technology and tools that will help them to surpass any competition that arises. Modern data platforms that incorporate cloud technologies, support organisations to strive and get ahead of their competitors by providing solutions that help them capture and optimally use untapped data, and scalable storages to adapt to ever-growing data quantities. Also, adopt data processing and visualisation tools that help to improve the decision-making process. With many cloud providers available in the market, from small players to major technology corporations, this offers much flexibility to organisations to choose the best cloud technology that will align with their use cases and overall products and services strategy. This internship came up at the time when one of Accenture’s significant client in the financial industry decided to migrate from legacy systems to a cloud-based data infrastructure that is Microsoft Azure cloud. During this internship, development of the data lake, which is a core part of the MDP, was done to understand better the type of challenges that can be faced when migrating data from on-premise legacy systems to a cloud-based infrastructure. Also, provided in this work, are the main recommendations and guidelines when it comes to performing a large scale data migration

    Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration

    Get PDF
    The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster with a friendly graphical user interface to automate the deployment of a cluster and to start up the distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating any algorithm into the system. That is, the user only needs to provide the executable file and the number of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms are already integrated so that users can test its usability from its initial installation.Ministerio de Ciencia, Innovación y Universidades TIN2017-88209-C2-1-

    Analysing Transportation Data with Open Source Big Data Analytic Tools

    Get PDF
    Big data analytics allows a vast amount of structured and unstructured data to be effectively processed so that correlations, hidden patterns, and other useful information can be mined from the data. Several open source big data analytic tools that can perform tasks such as dimensionality reduction, feature extraction, transformation, optimization, are now available. One interesting area where such tools can provide effective solutions is transportation. Big data analytics can be used to efficiently manage transport infrastructure assets such as roads, airports, bus stations or ports. In this paper an overview of two open source big data analytic tools is first provided followed by a simple demonstration of application of these tools on transport dataset

    A Service Oriented Architecture For Automated Machine Learning At Enterprise-Scale

    Get PDF
    This thesis presents a solution architecture for productizing machine learning models in an enterprise context and, tracking the model’s performance to gain insights on how and when to retrain the model. There are two challenges which this thesis deals with. First, machine learning models need to be trained regularly to incorporate unseen data to maintain it’s performance. This gives rise to the need of machine learning model management. Second, there is an overhead in deploying machine learning models into production with respect to support and operations. There is scope to reduce the time to production for a machine learning model, thus offering cost-effective solutions. These two challenges are addressed through the introduction of three services under ScienceOps called ModelDeploy, ModelMonitor and DataMonitor. ModelDeploy brings down the time to production for a machine learning model. ModelMonitor and DataMonitor helps gain insights on how and when a model should be retrained. Finally, the time to production for the proposed architecture on two cloud platforms versus a rudimentary approach is evaluated and compared. The monitoring services give insight on the model performance and how the statistics of data change over time

    HOLMeS: eHealth in the Big Data and Deep Learning Era

    Get PDF
    Now, data collection and analysis are becoming more and more important in a variety of application domains, as long as novel technologies advance. At the same time, we are experiencing a growing need for human–machine interaction with expert systems, pushing research toward new knowledge representation models and interaction paradigms. In particular, in the last few years, eHealth—which usually indicates all the healthcare practices supported by electronic elaboration and remote communications—calls for the availability of a smart environment and big computational resources able to offer more and more advanced analytics and new human–computer interaction paradigms. The aim of this paper is to introduce the HOLMeS (health online medical suggestions) system: A particular big data platform aiming at supporting several eHealth applications. As its main novelty/functionality, HOLMeS exploits a machine learning algorithm, deployed on a cluster-computing environment, in order to provide medical suggestions via both chat-bot and web-app modules, especially for prevention aims. The chat-bot, opportunely trained by leveraging a deep learning approach, helps to overcome the limitations of a cold interaction between users and software, exhibiting a more human-like behavior. The obtained results demonstrate the effectiveness of the machine learning algorithms, showing an area under ROC (receiver operating characteristic) curve (AUC) of 74.65% when some first-level features are used to assess the occurrence of different chronic diseases within specific prevention pathways. When disease-specific features are added, HOLMeS shows an AUC of 86.78%, achieving a greater effectiveness in supporting clinical decisions
    corecore