1,161 research outputs found

    User-centric Visualization of Data Provenance

    Get PDF
    The need to understand and track files (and inherently, data) in cloud computing systems is in high demand. Over the past years, the use of logs and data representation using graphs have become the main method for tracking and relating information to the cloud users. While it is still in use, tracking and relating information with ‘Data Provenance’ (i.e. series of chronicles and the derivation history of data on meta-data) is the new trend for cloud users. However, there is still much room for improving representation of data activities in cloud systems for end-users. In this thesis, we propose “UVisP (User-centric Visualization of Data Provenance with Gestalt)”, a novel user-centric visualization technique for data provenance. This technique aims to facilitate the missing link between data movements in cloud computing environments and the end-users’ uncertain queries over their files’ security and life cycle within cloud systems. The proof of concept for the UVisP technique integrates D3 (an open-source visualization API) with Gestalts’ theory of perception to provide a range of user-centric visualizations. UVisP allows users to transform and visualize provenance (logs) with implicit prior knowledge of ‘Gestalts’ theory of perception.’ We presented the initial development of the UVisP technique and our results show that the integration of Gestalt and the existence of ‘perceptual key(s)’ in provenance visualization allows end-users to enhance their visualizing capabilities, extract useful knowledge and understand the visualizations better. This technique also enables end-users to develop certain methods and preferences when sighting different visualizations. For example, having the prior knowledge of Gestalt’s theory of perception and integrated with the types of visualizations offers the user-centric experience when using different visualizations. We also present significant future work that will help profile new user-centric visualizations for cloud users

    Scaling out Big Data Distributed Pricing in Gaming Industry

    Get PDF
    Game companies have millions of customers, billions of transactions and petabytes of other data related to game events. The vast volume and complexity of this data make it practically impossible to process and analyze it using traditional relational database models (RDBMs). This kind of data can be identified as Big Data, and in order to handle it in efficient manner, multiple issues have to be taken into account. It is more straightforward to answer to these problems when developing completely new system, that can be implemented with all the new techniques and platforms to support big data handling. However, if it is needed to modify an existing system to accommodate data volumes of big data, there are more issues to be taken into account. This thesis starts with the clarification of the definition 'big data'. Scalability and parallelism are key factors for handling big data, thus they will be explained and some of the conventions to do them will be reviewed. Next, different tools and platforms that do parallel programming, are presented. The relevance of big data in gaming industry is briefly explained, as well as the different monetization models that games have. Furthermore, price elasticity of demand is explained to give better understanding of a Dynamic Pricing Engine and what does it do. In this thesis, I solve a bottleneck that emerges in data transfer and processing when introducing big data to an existing system, a Dynamic Pricing Engine, by using parallel programming in order to scale the system. Spark will be used to deal with fetching and processing distributed data. The main focus is in the impact of using parallel programming in comparison to the current solution, which is done with PHP and MySQL. Furthermore, Spark implementations are done against different data storage solutions, such as MySQL, Hadoop and HDFS, and their performance is also compared. The results for utilizing Spark for the implementation show significant improvement in performance time for processing the data. However, the importance of choosing the right data storage for fetching the data can't be understated, as the speed for fetching the data can widely variate.PeliyhtiöillÀ on miljoonia asiakkaita, miljardeja maksutapahtumia ja petatavuja pelin tapahtumiin liittyvÀÀ dataa. TÀmÀn datan suuri mÀÀrÀ ja kompleksisuus tekevÀt sen prosessoimisesta sekÀ analysoimisesta lÀhes mahdotonta tavallisilla relaatiotietokannoilla. TÀllaista dataa voidaan kutsua Big Dataksi, ja jotta sen kÀsittely olisi tehokasta, useita asioita on otettava huomioon. Uuden jÀrjestelmÀn toteutuksessa nÀihin ongelmiin pystytÀÀn vastaamaan melko johdonmukaisesti, sillÀ uusimmat tekniikat ja alustat voidaan ottaa tÀllöin helposti kÀyttöön. Jos kyseessÀ on jo olemassa oleva jÀrjestelmÀ, jota halutaan muuttaa vastaamaan big datamaisiin datamÀÀriin, huomioon otettavien asioden mÀÀrÀ kasvaa. TÀmÀn diplomityön aluksi selitetÀÀn termi 'Big Data'. Big Datan kanssa työskentelyyn tarvitaan skaalautuvuutta ja rinnakkaisuutta, joten nÀmÀ termit, sekÀ nÀiden yleisimmÀt kÀytÀnnöt kÀydÀÀn lÀpi. Seuraavaksi esitellÀÀn työkaluja ja alustoja, joilla on mahdollista tehdÀ rinnakkaisohjelmointia. Big Datan merkitys peliteollisuudessa selitetÀÀn lyhyesti, kuten myös eri monetisaatiomallit, joita peliyritykset kÀyttÀvÀt. LisÀksi kysynnÀn hintajousto kÀydÀÀn lÀpi, jotta lukijalle olisi helpompaa ymmÀrtÀÀ, mikÀ seuraavaksi esitelty Apprien on ja mihin sitÀ kÀytetÀÀn. TÀssÀ diplomityössÀ etsin ratkaisua Big Datan siirrossa ja prosessoinnissa ilmenevÀÀn ongelmaan jo olemassa olevalle jÀrjestelmÀlle, Apprienille. TÀmÀ pullonkaula ratkaistaan kÀyttÀmÀllÀ rinnakkaisohjelmointia Sparkin avulla. PÀÀasiallinen painopiste on selvittÀÀ rinnakkaisohjelmoinnilla saavutettu hyöty verrattuna nykyiseen ratkaisuun, joka on toteutettu PHP:llÀ ja MySQL:llÀ. TÀmÀn lisÀksi, Spark toteusta hyödynnetÀÀn eri datan sÀilytysmalleilla (MySQL, Hadoop+HDFS), ja niiden suorityskykyÀ vertaillaan. Tulokset, jotka saatiin Spark toteutusta hyödyntÀmÀllÀ, osoittavat merkittÀvÀn parannuksen suoritusajassa datan prosessoimisessa. Oikean tietomallin valitsemisen tÀrkeyttÀ ei pidÀ aliarvioida, sillÀ datan siirtÀmiseen kÀytetty aika vaihtelee myös huomattavasti alustasta riippuen
    • 

    corecore