836 research outputs found

    The simplicity project: easing the burden of using complex and heterogeneous ICT devices and services

    Get PDF
    As of today, to exploit the variety of different "services", users need to configure each of their devices by using different procedures and need to explicitly select among heterogeneous access technologies and protocols. In addition to that, users are authenticated and charged by different means. The lack of implicit human computer interaction, context-awareness and standardisation places an enormous burden of complexity on the shoulders of the final users. The IST-Simplicity project aims at leveraging such problems by: i) automatically creating and customizing a user communication space; ii) adapting services to user terminal characteristics and to users preferences; iii) orchestrating network capabilities. The aim of this paper is to present the technical framework of the IST-Simplicity project. This paper is a thorough analysis and qualitative evaluation of the different technologies, standards and works presented in the literature related to the Simplicity system to be developed

    A SYSTEM TO IMPLEMENT A LINEAGE AUTO CAPTURE PLUGIN FOR HIVE

    Get PDF
    The present disclosure provides a system and a method for lineage auto capture plugin for hive. The system receives a query using a user interface and hive clients, process the query using a driver, execute the query via an execution engine. Further, the system implements the functionality of hive lineage connector to perform lineage auto capture for hive. Thereby, managing data efficiently and addressing issues related to data efficiently

    Evaluation of Storage Systems for Big Data Analytics

    Get PDF
    abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.Dissertation/ThesisMasters Thesis Computer Science 201

    High-Performance Persistent Caching in Multi- and Hybrid- Cloud Environments

    Get PDF
    Il modello di lavoro noto come Multi Cloud sta emergendo come una naturale evoluzione del Cloud Computing per rispondere alle nuove esigenze di business delle aziende. Un tipico esempio è il modello noto come Cloud Ibrido dove si ha un Cloud Privato connesso ad un Cloud Pubblico per consentire alle applicazioni di scalare al bisogno e contemporaneamente rispondere ai bisogni di privacy, costi e sicurezza. Data la distribuzione dei dati su diverse strutture, quando delle applicazioni in esecuzione su un centro di calcolo devono utilizzare dati memorizzati remotamente, diventa necessario accedere alla rete che connette le diverse infrastrutture. Questo ha grossi impatti negativi su carichi di lavoro che consumano dati in modo intensivo e che di conseguenza vengono influenzati da ritardi dovuti alla bassa banda e latenza tipici delle connessioni di rete. Applicazioni di Intelligenza Artificiale e Calcolo Scientifico sono esempi di questo tipo di carichi di lavoro che, grazie all’uso sempre maggiore di acceleratori come GPU e FPGA, diventano capaci di consumare dati ad una velocità maggiore di quella con cui diventano disponibili. Implementare un livello di cache che fornisce e memorizza i dati di calcolo dal dispositivo di memorizzazione lento (remoto) a quello più veloce (ma costoso) dove i calcoli sono eseguiti, sembra essere la migliore soluzione per trovare il compromesso ottimale tra il costo dei dispositivi di memorizzazione offerti come servizi Cloud e la grande velocità di calcolo delle moderne applicazioni. Il sistema cache presentato in questo lavoro è stato sviluppato tenendo conto di tutte le peculiarità dei servizi di memorizzazione Cloud che fanno uso di API S3 per comunicare con i clienti. La soluzione proposta è stata ottenuta lavorando con il sistema di memorizzazione distribuito Ceph che implementa molti dei servizi caratterizzanti la semantica S3 ed inoltre, essendo pensato per lavorare su ambienti Cloud si inserisce bene in scenari Multi Cloud

    InSight2: An Interactive Web Based Platform for Modeling and Analysis of Large Scale Argus Network Flow Data

    Get PDF
    Monitoring systems are paramount to the proactive detection and mitigation of problems in computer networks related to performance and security. Degraded performance and compromised end-nodes can cost computer networks downtime, data loss and reputation. InSight2 is a platform that models, analyzes and visualizes large scale Argus network flow data using up-to-date geographical data, organizational information, and emerging threats. It is engineered to meet the needs of network administrators with flexibility and modularity in mind. Scalability is ensured by devising multi-core processing by implementing robust software architecture. Extendibility is achieved by enabling the end user to enrich flow records using additional user provided databases. Deployment is streamlined by providing an automated installation script. State-of-the-art visualizations are devised and presented in a secure, user friendly web interface giving greater insight about the network to the end user

    METHOD AND SYSTEM FOR MANAGING DATA FLOW BETWEEN SOURCE SYSTEM AND TARGET SYSTEM

    Get PDF
    The present disclosure relates to a method and a system (103) for managing data flow between a source system (101) and a target system (105). Here, the system comprises a Metadata Definer (MD) 201 for defining metadata of the input dataset associated with the source system (101), output dataset associated with the target system (105), the data transfer pipeline between the input and output datasets and its internal transformation. The system further comprises a Data Flow Manager (DFM) (203) which extracts metadata input from the MD 201, and determines source type, target type, transfer type, data quality strategy, frequency of pipeline, and column mapping from source to target based on the metadata. Further, the DFM 203 generates query to be run on the source system (101) to obtain records (113,117) in target format (109) at target system (105), selects a template for instantiating pipeline. Thus, the present disclosure enables metadata driven data transfer capability between disparate systems which is cost effective and time efficient, by creating an automated pipeline without requiring manual intervention
    • …
    corecore