1,584 research outputs found
Augmenting data warehousing architectures with hadoop
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementAs the volume of available data increases exponentially, traditional data warehouses struggle to transform this data into actionable knowledge. Data strategies that include the creation and maintenance of data warehouses have a lot to gain by incorporating technologies from the Big Data’s spectrum. Hadoop, as a transformation tool, can add a theoretical infinite dimension of data processing, feeding transformed information into traditional data warehouses that ultimately will retain their value as central components in organizations’ decision support systems.
This study explores the potentialities of Hadoop as a data transformation tool in the setting of a traditional data warehouse environment. Hadoop’s execution model, which is oriented for distributed parallel processing, offers great capabilities when the amounts of data to be processed require the infrastructure to expand. Horizontal scalability, which is a key aspect in a Hadoop cluster, will allow for proportional growth in processing power as the volume of data increases.
Through the use of a Hive on Tez, in a Hadoop cluster, this study transforms television viewing events, extracted from Ericsson’s Mediaroom Internet Protocol Television infrastructure, into pertinent audience metrics, like Rating, Reach and Share. These measurements are then made available in a traditional data warehouse, supported by a traditional Relational Database Management System, where they are presented through a set of reports.
The main contribution of this research is a proposed augmented data warehouse architecture where the traditional ETL layer is replaced by a Hadoop cluster, running Hive on Tez, with the purpose of performing the heaviest transformations that convert raw data into actionable information. Through a typification of the SQL statements, responsible for the data transformation processes, we were able to understand that Hadoop, and its distributed processing model, delivers outstanding performance results associated with the analytical layer, namely in the aggregation of large data sets.
Ultimately, we demonstrate, empirically, the performance gains that can be extracted from Hadoop, in comparison to an RDBMS, regarding speed, storage usage and scalability potential, and suggest how this can be used to evolve data warehouses into the age of Big Data
Visual Networking
This research presents a case study of the Regis Academic Research Network (ARNe). It will focus on network bandwidth graphs and the information collected over a six month period of time. The case study will provide comparison of network bandwidth graphs. This case study will be used to theorize what is happening on the ARNE. In addition it will be used for creating hypotheses on what will happen to the network in the future. The research will also include a project. The project is to provide the setup and planning of the installation of open source tool set. This project will give Regis network administrators a useful tool in troubleshooting and planning with regards to the ARNE
Intrusion Detection In Wireless Sensor Networks
There are several applications that use sensor motes and researchers continue to explore additional applications. For this particular application of detecting the movement of humans through the sensor field, a set of Berkley mica2 motes on TinyOS operating system is used. Different sensors such as pressure, light, and so on can be used to identify the presence of an intruder in the field. In our case, the light sensor is chosen for the detection. When an intruder crosses the monitored environment, the system detects the changes of the light values, and any significant change meaning that a change greater than a pre-defined threshold. This indicates the presence of an intruder. An integrated web cam is used to take snapshot of the intruder and transmit the picture through the network to a remote station. The basic motivation of this thesis is that a sensor web system can be used to monitor and detect any intruder in a specific area from a remote location
CyberGuarder: a virtualization security assurance architecture for green cloud computing
Cloud Computing, Green Computing, Virtualization, Virtual Security Appliance, Security Isolation
Computational Resource Abuse in Web Applications
Internet browsers include Application Programming Interfaces (APIs) to support Web applications that require complex functionality, e.g., to let end users watch videos, make phone calls, and play video games. Meanwhile, many Web applications employ the browser APIs to rely on the user's hardware to execute intensive computation, access the Graphics Processing Unit (GPU), use persistent storage, and establish network connections.
However, providing access to the system's computational resources, i.e., processing, storage, and networking, through the browser creates an opportunity for attackers to abuse resources. Principally, the problem occurs when an attacker compromises a Web site and includes malicious code to abuse its visitor's computational resources. For example, an attacker can abuse the user's system networking capabilities to perform a Denial of Service (DoS) attack against third parties. What is more, computational resource abuse has not received widespread attention from the Web security community because most of the current specifications are focused on content and session properties such as isolation, confidentiality, and integrity.
Our primary goal is to study computational resource abuse and to advance the state of the art by providing a general attacker model, multiple case studies, a thorough analysis of available security mechanisms, and a new detection mechanism. To this end, we implemented and evaluated three scenarios where attackers use multiple browser APIs to abuse networking, local storage, and computation. Further, depending on the scenario, an attacker can use browsers to perform Denial of Service against third-party Web sites, create a network of browsers to store and distribute arbitrary data, or use browsers to establish anonymous connections similarly to The Onion Router (Tor). Our analysis also includes a real-life resource abuse case found in the wild, i.e., CryptoJacking, where thousands of Web sites forced their visitors to perform crypto-currency mining without their consent. In the general case, attacks presented in this thesis share the attacker model and two key characteristics: 1) the browser's end user remains oblivious to the attack, and 2) an attacker has to invest little resources in comparison to the resources he obtains.
In addition to the attack's analysis, we present how existing, and upcoming, security enforcement mechanisms from Web security can hinder an attacker and their drawbacks. Moreover, we propose a novel detection approach based on browser API usage patterns. Finally, we evaluate the accuracy of our detection model, after training it with the real-life crypto-mining scenario, through a large scale analysis of the most popular Web sites
Coprocessor integration for real-time event processing in particle physics detectors
Els experiments de fĂsica d’altes energies actuals disposen d’acceleradors amb mĂ©s energĂa, sensors mĂ©s precisos i formes mĂ©s flexibles de recopilar les dades. Aquesta rĂ pida evoluciĂł requereix de mĂ©s capacitat de cĂ lcul; els processadors massivament paral·lels, com ara les targes acceleradores grĂ fiques, ens posen a l’abast aquesta major capacitat de cĂ lcul a un cost sensiblement inferior a les CPUs tradicionals. L’ús d’aquest tipus de processadors requereix, però, de nous algoritmes i nous enfocaments de l’organitzaciĂł de les dades que sĂłn difĂcils d’integrar en els programaris actuals.
En aquest treball s’exploren els problemes derivats de l’ús d’algoritmes paral·lels en els entorns de programari existents, orientats a CPUs, i es proposa una soluciĂł, en forma de servei, que comunica amb els diversos pipelines que processen els esdeveniments procedents de les col·lisions de partĂcules, recull les dades en lots i els envia als algoritmes corrent sobre els processadors massivament paral·lels.
Aquest servei s’integra en GaudĂ - l’entorn de software de dos dels quatre experiments principals del Gran Col·lisionador d’Hadrons. S’examina el sobrecost que el servei afegeix als algoritmes paral·lels. S’estudia un cas d´ùs del servei per fer una reconstrucciĂł paral·lela de les traces detectades en el VELO Pixel, el subdetector encarregat de la detecciĂł de vèrtex en l’upgrade de LHCb. Per aquest cas, s’observen les caracterĂstiques del rendiment en funciĂł de la mida dels lots de dades. Finalment, les conclusions en posen en el context dels requeriments del sistema de trigger de LHCb.La fĂsica de altas energĂas dispone actualmente de aceleradores con energĂas mayores, sensores más precisos y mĂ©todos de recopilaciĂłn de datos más flexibles que nunca. Su rápido progreso necesita aĂşn más potencia de cálculo; el hardware masivamente paralelo, como las unidades de procesamiento gráfico, nos brinda esta potencia a un coste mucho más bajo que las CPUs tradicionales. Sin embargo, para usar eficientemente este hardware necesitamos algoritmos nuevos y nuevos enfoques de organizaciĂłn de datos difĂciles de integrarse con el software existente.
En este trabajo, se investiga cómo se pueden usar estos algoritmos paralelos en las infraestructuras de software ya existentes y que están orientadas a CPUs. Se propone una solución en forma de un servicio que comunica con los diversos pipelines que procesan los eventos de las correspondientes colisiones de particulas, reúne los datos en lotes y se los entrega a los algoritmos paralelos acelerados por hardware.
Este servicio se integra con Gaudà — la infraestructura del entorno de software que usan dos de los cuatro gran experimentos del Gran Colisionador de Hadrones. Se examinan los costes añadidos por el servicio en los algoritmos paralelos. Se estudia un caso de uso del servicio para ejecutar un algoritmo paralelo para el VELO Pixel (el subdetector encargado de la localizaciĂłn de vĂ©rtices en el upgrade del experimento LHCb) y se estudian las caracterĂsticas de rendimiento de los distintos tamaños de lotes de datos. Finalmente, las conclusiones se contextualizan dentro la perspectiva de los requerimientos para el sistema de trigger de LHCb.High-energy physics experiments today have higher energies, more accurate sensors, and more flexible means of data collection than ever before. Their rapid progress requires ever more computational power; and massively parallel hardware, such as graphics cards, holds the promise to provide this power at a much lower cost than traditional CPUs. Yet, using this hardware requires new algorithms and new approaches to organizing data that can be difficult to integrate with existing software.
In this work, I explore the problem of using parallel algorithms within existing CPU-orientated frameworks and propose a compromise between the different trade-offs. The solution is a service that communicates with multiple event-processing pipelines, gathers data into batches, and submits them to hardware-accelerated parallel algorithms.
I integrate this service with Gaudi — a framework underlying the software environments of two of the four major experiments at the Large Hadron Collider. I examine the overhead the service adds to parallel algorithms. I perform a case study of using the service to run a parallel track reconstruction algorithm for the LHCb experiment's prospective VELO Pixel subdetector and look at the performance characteristics of using different data batch sizes. Finally, I put the findings into perspective within the context of the LHCb trigger's requirements
Adaptive Multimedia Content Delivery for Scalable Web Servers
The phenomenal growth in the use of the World Wide Web often places a heavy load on networks and servers, threatening to increase Web server response time and raising scalability issues for both the network and the server. With the advances in the field of optical networking and the increasing use of broadband technologies like cable modems and DSL, the server and not the network, is more likely to be the bottleneck. Many clients are willing to receive a degraded, less resource intensive version of the requested content as an alternative to connection failures. In this thesis, we present an adaptive content delivery system that transparently switches content depending on the load on the server in order to serve more clients. Our system is designed to work for dynamic Web pages and streaming multimedia traffic, which are not currently supported by other adaptive content approaches. We have designed a system which is capable of quantifying the load on the server and then performing the necessary adaptation. We designed a streaming MPEG server and client which can react to the server load by scaling the quality of frames transmitted. The main benefits of our approach include: transparent content switching for content adaptation, alleviating server load by a graceful degradation of server performance and no requirement of modification to existing server software, browsers or the HTTP protocol. We experimentally evaluate our adaptive server system and compare it with an unadaptive server. We find that adaptive content delivery can support as much as 25% more static requests, 15% more dynamic requests and twice as many multimedia requests as a non-adaptive server. Our, client-side experiments performed on the Internet show that the response time savings from our system are quite significant
A software approach to defeating side channels in last-level caches
We present a software approach to mitigate access-driven side-channel attacks
that leverage last-level caches (LLCs) shared across cores to leak information
between security domains (e.g., tenants in a cloud). Our approach dynamically
manages physical memory pages shared between security domains to disable
sharing of LLC lines, thus preventing "Flush-Reload" side channels via LLCs. It
also manages cacheability of memory pages to thwart cross-tenant "Prime-Probe"
attacks in LLCs. We have implemented our approach as a memory management
subsystem called CacheBar within the Linux kernel to intervene on such side
channels across container boundaries, as containers are a common method for
enforcing tenant isolation in Platform-as-a-Service (PaaS) clouds. Through
formal verification, principled analysis, and empirical evaluation, we show
that CacheBar achieves strong security with small performance overheads for
PaaS workloads
- …