12 research outputs found

    Un modelo distribuido para calcular descriptores locales de malla 3D basados en k-rings

    Get PDF
    In order to facilitate 3D object processing, it is common to use high-level representations such as local descriptors that are usually computed using defined neighborhoods. K-rings, a technique to define them, is widely used by several methods. In this work, we propose a model for the distributed computation of local descriptors over 3D triangular meshes, using the concept of k-rings. In our experiments, we measure the performance of our model on huge meshes, evaluating the speedup, the scalability, and the descriptor computation time. We show the optimal configuration of our model for the cluster we implemented and the linear growth of computation time regarding the mesh size and the number of rings. We used the Harris response, which describes the saliency of the object, for our tests.Para facilitar el procesamiento de objetos 3D, es común utilizar representaciones de alto nivel, como los descriptores locales que generalmente se calculan utilizando vecindarios definidos. K-rings es una técnica para definirlos y es ampliamente utilizada por varios métodos. En este trabajo, proponemos un modelo para el cálculo distribuido de descriptores locales sobre mallas triangulares 3D, utilizando el concepto de anillos k. En nuestros experimentos, medimos el rendimiento de nuestro modelo en mallas enormes, evaluando la aceleración, la escalabilidad y el tiempo de cálculo del descriptor. Mostramos la configuración óptima de nuestro modelo para el clúster que implementamos y el crecimiento lineal del tiempo de cálculo con respecto al tamaño de la malla y el número de anillos. Usamos la respuesta de Harris, que describe la prominencia del objeto, para nuestras pruebas

    Introducing Cloud Computing Topics in Curricula

    Get PDF
    The demand for graduates with exposure in Cloud Computing is on the rise. For many educational institutions, the challenge is to decide on how to incorporate appropriate cloud-based technologies into their curricula. In this paper, we describe our design and experiences of integrating Cloud Computing components into seven third/fourth-year undergraduate-level information system, computer science, and general science courses that are related to large-scale data processing and analysis at the University of Queensland, Australia. For each course, we aimed at finding the best-available and cost-effective cloud technologies that fit well in the existing curriculum. The cloud related technologies discussed in this paper include open-source distributed computing tools such as Hadoop, Mahout, and Hive, as well as cloud services such as Windows Azure and Amazon Elastic Computing Cloud (EC2). We anticipate that our experiences will prove useful and of interest to fellow academics wanting to introduce Cloud Computing modules to existing courses

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research

    Hillview:A trillion-cell spreadsheet for big data

    Get PDF
    Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems

    Vispark: GPU-accelerated distributed visual computing using spark

    Get PDF
    With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suffers from inflexibility and inefficiency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark for GPU-accelerated MapReduce processing on array-based scientific computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simplifies many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scientific computing problems in distributed systems. Vispark also provides domain-specific functions and language supports specifically designed for high-performance computing and image processing applications. We demonstrate the performance of our prototype system on several visual computing tasks, such as image processing, volume rendering, K-means clustering, and heat transfer simulation.clos

    Prov-Vis: visualização de dados de experimentos em larga escala por meio de proveniência

    Get PDF
    Experimentos científicos em larga escala são muitas vezes organizados como uma composição de diversas tarefas computacionais ligadas por meio de fluxo de atividades. A esse fluxo de atividades damos o nome de workflow científico. Os dados que fluem ao longo do workflow muitas vezes são transferidos de um computador de sktop para um ambiente de alto desempenho,como um cluster, e em seguida para um ambiente de visualização. Manter o controle do fluxo de dados é um desafio para o apoio à proveniência em Sistemas de Gerenciamento de workflows Científicos (SGWfC) de alto desempenho. Após a conclusão de um experimento científico, muitas vezes um cientista deve selecionar manualmente e analisar seus dados, por exemplo, verificando as entradas e saídas ao longo de diversas atividades computacionais que fazem parte do seu experimento. Neste projeto, o objetivo é propor um sistema de gerência dos dados de proveniência que descreva as relações de produção e consumo entre artefatos, tais como arquivos, e as tarefas computacionais que compõem o experimento. O projeto propõe uma interface de consulta que permita ao cientista procurar dados de proveniência em um ambiente de alto desempenho e selecionar a saída que deseja visualizar usando seu próprio navegador ou um ambiente de visualização remot

    Interactive visualization of big data leveraging databases for scalable computation

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 55-57).Modern database management systems (DBMS) have been designed to efficiently store, manage and perform computations on massive amounts of data. In contrast, many existing visualization systems do not scale seamlessly from small data sets to enormous ones. We have designed a three-tiered visualization system called ScalaR to deal with this issue. ScalaR dynamically performs resolution reduction when the expected result of a DBMS query is too large to be effectively rendered on existing screen real estate. Instead of running the original query, ScalaR inserts aggregation, sampling or filtering operations to reduce the size of the result. This thesis presents the design and implementation of ScalaR, and shows results for two example applications, visualizing earthquake records and satellite imagery data, stored in SciDB as the back-end DBMS.by Leilani Marie Battle.S.M

    Big Datan visualisoinnin kokemus virtuaalitodellisuudessa

    Get PDF
    Tutkielmassa pyrittiin selvittämään vastauksia siihen olisiko virtuaalitodellisuus soveltuva ympäristö Big Datan visualisoimiseen, eli tehostaisiko kokemuksellisempi ympäristö Big Dataksi luokiteltavien datajoukkojen ymmärtämistä. Tutkimuskysymykseen liittyen tutkielmassa haluttiin selvittää, miten käyttäjäkokemus tiedon visualisoinnista eroaa virtuaalitodellisuuden ja työasemaympäristön välillä ja miten käyttäjät kokevat tiedon visualisoinnin kokemuksen virtuaalitodellisuudessa. Vastausten selvittämiseksi tutkielma aloitettiin Big Datan käsitteen sekä aikaisempien virtuaalitodellisuuteen pohjautuneiden Big Datan visualisointijärjestelmien taustakartoituksella. Aikaisempien visualisointijärjestelmien raportoituja ominaisuuksia peilattiin Big Datan käsitettä vasten ja tehtiin havaintoja siitä, että aikaisemmat ratkaisut ovat huonosti täyttäneet Big Datan käsitteen mukaisia vaatimuksia ja eivät tarjonneet pohjaa tässä tutkielmassa toteutettavia visualisointeja varten. Tutkielman toteutusvaiheessa luotiin kolme visualisointikokonaisuutta, joista luotiin erilliset demot virtuaalitodellisuuteen sekä työasemaympäristöön. Visualisoinnin demot pyrittiin toteuttamaan Big Datan käsitteiden mukaisesti näitä kaikkia kuitenkaan saavuttamatta. Tutkielman rajallisilla resursseilla Big Datan asettamista haasteista suurimmaksi koettiin tarpeeksi laajan datamäärän hyödyntämisen sekä Big Datan määritteiden mukaiseen käyttöön soveltuvien tietokantojen löytämisen. Luotuja testijärjestelmiä varten luotiin testisuunnitelma, jonka mukaisesti suoritettiin 10 osallistujan käyttäjätestaus tiedon visualisoinnin kokemusten selvittämiseksi yhtäläisen virtuaalitodellisuustoteutuksen ja työasemaversion välillä. Käyttäjätutkimuksissa monet käyttäjät kokivat virtuaalitodellisuuden visualisoinnit kokonaisvaltaisempana kokemuksena ja ympäristö mahdollisti paremman keskittymisen visualisoinnin sisältöön. Osallistujat kuitenkin kokivat, että virtuaalitodellisuuden visualisointien tulisi olla luotuna virtuaalitodellisuuden tarjoamia mahdollisuuksia hyödyntäen, jotta erilaisen ympäristön hyödyntäminen koetaan merkityksellisenä. Tämän lisäksi virtuaalitodellisuudessa käytettyjen vuorovaikutustekniikoiden soveltuvuus sekä järjestelmän käytön sujuvuus korostuivat huomiota herättäneinä tekijöinä

    Doctor of Philosophy

    Get PDF
    dissertationDataflow pipeline models are widely used in visualization systems. Despite recent advancements in parallel architecture, most systems still support only a single CPU or a small collection of CPUs such as a SMP workstation. Even for systems that are specifically tuned towards parallel visualization, their execution models only provide support for data-parallelism while ignoring taskparallelism and pipeline-parallelism. With the recent popularization of machines equipped with multicore CPUs and multi-GPU units, these visualization systems are undoubtedly falling further behind in reaching maximum efficiency. On the other hand, there exist several libraries that can schedule program executions on multiple CPUs and/or multiple GPUs. However, due to differences in executing a task graph and a pipeline along with their APIs being considerably low-level, it still remains a challenge to integrate these run-time libraries into current visualization systems. Thus, there is a need for a redesigned dataflow architecture to fully support and exploit the power of highly parallel machines in large-scale visualization. The new design must be able to schedule executions on heterogeneous platforms while at the same time supporting arbitrarily large datasets through the use of streaming data structures. The primary goal of this dissertation work is to develop a parallel dataflow architecture for streaming large-scale visualizations. The framework includes supports for platforms ranging from multicore processors to clusters consisting of thousands CPUs and GPUs. We achieve this in our system by introducing the notion of Virtual Processing Elements and Task-Oriented Modules along with a highly customizable scheduler that controls the assignment of tasks to elements dynamically. This creates an intuitive way to maintain multiple CPU/GPU kernels yet still provide coherency and synchronization across module executions. We have implemented these techniques into HyperFlow which is made of an API with all basic dataflow constructs described in the dissertation, and a distributed run-time library that can be used to deploy those pipelines on multicore, multi-GPU and cluster-based platforms
    corecore