Search CORE

1,161 research outputs found

Real-time Feature Detection in Mass Spectrometer Data

Author: Hillman Chris
Publication venue
Publication date: 01/01/2018
Field of study

University of Dundee Online Publications

User-centric Visualization of Data Provenance

Author: Garae Jeffery
Publication venue: 'University of Waikato'
Publication date: 18/02/2015
Field of study

The need to understand and track files (and inherently, data) in cloud computing systems is in high demand. Over the past years, the use of logs and data representation using graphs have become the main method for tracking and relating information to the cloud users. While it is still in use, tracking and relating information with ‘Data Provenance’ (i.e. series of chronicles and the derivation history of data on meta-data) is the new trend for cloud users. However, there is still much room for improving representation of data activities in cloud systems for end-users. In this thesis, we propose “UVisP (User-centric Visualization of Data Provenance with Gestalt)”, a novel user-centric visualization technique for data provenance. This technique aims to facilitate the missing link between data movements in cloud computing environments and the end-users’ uncertain queries over their files’ security and life cycle within cloud systems. The proof of concept for the UVisP technique integrates D3 (an open-source visualization API) with Gestalts’ theory of perception to provide a range of user-centric visualizations. UVisP allows users to transform and visualize provenance (logs) with implicit prior knowledge of ‘Gestalts’ theory of perception.’ We presented the initial development of the UVisP technique and our results show that the integration of Gestalt and the existence of ‘perceptual key(s)’ in provenance visualization allows end-users to enhance their visualizing capabilities, extract useful knowledge and understand the visualizations better. This technique also enables end-users to develop certain methods and preferences when sighting different visualizations. For example, having the prior knowledge of Gestalt’s theory of perception and integrated with the types of visualizations offers the user-centric experience when using different visualizations. We also present significant future work that will help profile new user-centric visualizations for cloud users

Research Commons@Waikato

Big Data in multiscale modelling:from medical image processing to personalized models

Author: Filipović Nenad
Geroski Tijana
Jakovljević Djordje
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/05/2023
Field of study

Coventry University Pure Portal

Distributed Anomaly Detection and Prevention for Virtual Platforms

Author: Jehangiri Ali Imran
Publication venue
Publication date: 17/07/2015
Field of study

Georg-August-University Göttingen

Scaling out Big Data Distributed Pricing in Gaming Industry

Author: Soikkeli Eero
Publication venue
Publication date: 28/01/2019
Field of study

Game companies have millions of customers, billions of transactions and petabytes of other data related to game events. The vast volume and complexity of this data make it practically impossible to process and analyze it using traditional relational database models (RDBMs). This kind of data can be identified as Big Data, and in order to handle it in efficient manner, multiple issues have to be taken into account. It is more straightforward to answer to these problems when developing completely new system, that can be implemented with all the new techniques and platforms to support big data handling. However, if it is needed to modify an existing system to accommodate data volumes of big data, there are more issues to be taken into account. This thesis starts with the clarification of the definition 'big data'. Scalability and parallelism are key factors for handling big data, thus they will be explained and some of the conventions to do them will be reviewed. Next, different tools and platforms that do parallel programming, are presented. The relevance of big data in gaming industry is briefly explained, as well as the different monetization models that games have. Furthermore, price elasticity of demand is explained to give better understanding of a Dynamic Pricing Engine and what does it do. In this thesis, I solve a bottleneck that emerges in data transfer and processing when introducing big data to an existing system, a Dynamic Pricing Engine, by using parallel programming in order to scale the system. Spark will be used to deal with fetching and processing distributed data. The main focus is in the impact of using parallel programming in comparison to the current solution, which is done with PHP and MySQL. Furthermore, Spark implementations are done against different data storage solutions, such as MySQL, Hadoop and HDFS, and their performance is also compared. The results for utilizing Spark for the implementation show significant improvement in performance time for processing the data. However, the importance of choosing the right data storage for fetching the data can't be understated, as the speed for fetching the data can widely variate.Peliyhtiöillä on miljoonia asiakkaita, miljardeja maksutapahtumia ja petatavuja pelin tapahtumiin liittyvää dataa. Tämän datan suuri määrä ja kompleksisuus tekevät sen prosessoimisesta sekä analysoimisesta lähes mahdotonta tavallisilla relaatiotietokannoilla. Tällaista dataa voidaan kutsua Big Dataksi, ja jotta sen käsittely olisi tehokasta, useita asioita on otettava huomioon. Uuden järjestelmän toteutuksessa näihin ongelmiin pystytään vastaamaan melko johdonmukaisesti, sillä uusimmat tekniikat ja alustat voidaan ottaa tällöin helposti käyttöön. Jos kyseessä on jo olemassa oleva järjestelmä, jota halutaan muuttaa vastaamaan big datamaisiin datamääriin, huomioon otettavien asioden määrä kasvaa. Tämän diplomityön aluksi selitetään termi 'Big Data'. Big Datan kanssa työskentelyyn tarvitaan skaalautuvuutta ja rinnakkaisuutta, joten nämä termit, sekä näiden yleisimmät käytännöt käydään läpi. Seuraavaksi esitellään työkaluja ja alustoja, joilla on mahdollista tehdä rinnakkaisohjelmointia. Big Datan merkitys peliteollisuudessa selitetään lyhyesti, kuten myös eri monetisaatiomallit, joita peliyritykset käyttävät. Lisäksi kysynnän hintajousto käydään läpi, jotta lukijalle olisi helpompaa ymmärtää, mikä seuraavaksi esitelty Apprien on ja mihin sitä käytetään. Tässä diplomityössä etsin ratkaisua Big Datan siirrossa ja prosessoinnissa ilmenevään ongelmaan jo olemassa olevalle järjestelmälle, Apprienille. Tämä pullonkaula ratkaistaan käyttämällä rinnakkaisohjelmointia Sparkin avulla. Pääasiallinen painopiste on selvittää rinnakkaisohjelmoinnilla saavutettu hyöty verrattuna nykyiseen ratkaisuun, joka on toteutettu PHP:llä ja MySQL:llä. Tämän lisäksi, Spark toteusta hyödynnetään eri datan säilytysmalleilla (MySQL, Hadoop+HDFS), ja niiden suorityskykyä vertaillaan. Tulokset, jotka saatiin Spark toteutusta hyödyntämällä, osoittavat merkittävän parannuksen suoritusajassa datan prosessoimisessa. Oikean tietomallin valitsemisen tärkeyttä ei pidä aliarvioida, sillä datan siirtämiseen käytetty aika vaihtelee myös huomattavasti alustasta riippuen

Aaltodoc Publication Archive

Large Scale Optimization in Hadoop

Author: Barreto Marcos
Publication venue: Udelar. FI.
Publication date: 01/01/2016
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas