36 research outputs found

    Processing Structured Data Streams

    Get PDF
    We elaborate this study in order to choose the most suitable technology to develop our proposal. Second, we propose three methods to reduce the set of data to be processed by a query when working with large graphs, namely spatial, temporal and random approximations. These methods are based on Approximate Query Processing techniques and consist in discarding the information that is considered not relevant for the query. The reduction of the data is performed online with the processing and considers both spatial and temporal aspects of the data. Since discarding information in the source data may decrease the validity of the results, we also define the transformation error obtain with these methods in terms of accuracy, precision and recall. Finally, we present a preprocessing algorithm, called SDR algorithm, that is also used to reduce the set of data to be processed, but without compromising the accuracy of the results. It calculates a subgraph from the source graph that contains only the relevant information for a given query. Since this technique is a preprocessing algorithm it is run offline before the actual processing begins. In addition, an incremental version of the algorithm is developed in order to update the subgraph as new information arrives to the system.A large amount of data is daily generated from different sources such as social networks, recommendation systems or geolocation systems. Moreover, this information tends to grow exponentially every year. Companies have discovered that the processing of these data may be important in order to obtain useful conclusions that serve for decision-making or the detection and resolution of problems in a more efficient way, for instance, through the study of trends, habits or customs of the population. The information provided by these sources typically consists of a non-structured and continuous data flow, where the relations among data elements conform graph structures. Inevitably, the processing performance of this information progressively decreases as the size of the data increases. For this reason, non-structured information is usually handled taking into account only the most recent data and discarding the rest, since they are considered not relevant when drawing conclusions. However, this approach is not enough in the case of sources that provide graph-structured data, since it is necessary to consider spatial features as well as temporal features. These spatial features refer to the relationships among the data elements. For example, some cases where it is important to consider spatial aspects are marketing techniques, which require information on the location of users and their possible needs, or the detection of diseases, that use data about genetic relationships among subjects or the geographic scope. It is worth highlighting three main contributions from this dissertation. First, we provide a comparative study of seven of the most common processing platforms to work with huge graphs and the languages that are used to query them. This study measures the performance of the queries in terms of execution time, and the syntax complexity of the languages according to three parameters: number of characters, number of operators and number of internal variables

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy

    Semantic Systems. The Power of AI and Knowledge Graphs

    Get PDF
    This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies

    Towards Predictive Rendering in Virtual Reality

    Get PDF
    The strive for generating predictive images, i.e., images representing radiometrically correct renditions of reality, has been a longstanding problem in computer graphics. The exactness of such images is extremely important for Virtual Reality applications like Virtual Prototyping, where users need to make decisions impacting large investments based on the simulated images. Unfortunately, generation of predictive imagery is still an unsolved problem due to manifold reasons, especially if real-time restrictions apply. First, existing scenes used for rendering are not modeled accurately enough to create predictive images. Second, even with huge computational efforts existing rendering algorithms are not able to produce radiometrically correct images. Third, current display devices need to convert rendered images into some low-dimensional color space, which prohibits display of radiometrically correct images. Overcoming these limitations is the focus of current state-of-the-art research. This thesis also contributes to this task. First, it briefly introduces the necessary background and identifies the steps required for real-time predictive image generation. Then, existing techniques targeting these steps are presented and their limitations are pointed out. To solve some of the remaining problems, novel techniques are proposed. They cover various steps in the predictive image generation process, ranging from accurate scene modeling over efficient data representation to high-quality, real-time rendering. A special focus of this thesis lays on real-time generation of predictive images using bidirectional texture functions (BTFs), i.e., very accurate representations for spatially varying surface materials. The techniques proposed by this thesis enable efficient handling of BTFs by compressing the huge amount of data contained in this material representation, applying them to geometric surfaces using texture and BTF synthesis techniques, and rendering BTF covered objects in real-time. Further approaches proposed in this thesis target inclusion of real-time global illumination effects or more efficient rendering using novel level-of-detail representations for geometric objects. Finally, this thesis assesses the rendering quality achievable with BTF materials, indicating a significant increase in realism but also confirming the remainder of problems to be solved to achieve truly predictive image generation

    HUMAN-DATA INTERACTION IN LARGE AND HIGH-DIMENSIONAL DATA

    Get PDF
    Human-Data Interaction (HDI) is an emerging field which studies how humans make sense of large and complex data. Visual analytics tools are a central component of this sensemaking process. However, the growth of big data has affected their performance, resulting in latency in interactivity or long query-response times, both of which degrade one's ability to do knowledge discovery. To address these challenges, a new paradigm of data exploration has appeared in which a rapid but inaccurate result is followed by a succession of gradually more accurate answers. As the primary objective of this thesis, we investigated how this incremental latency affects the quantity and quality of knowledge discovery in an HDI system. We have developed a big data visualization tool and studied 40 participants in a think-aloud experiment, using this tool to explore a large and high-dimensional data. Our findings indicate that although incremental latency reduces the rate of discovery generation, it does not affect one's chance of making a discovery per each generated visualization, and it does not affect the correctness of those discoveries. However, in the presence of latency, utilizing contextual layers such as a map result in fewer mistakes while exploring higher-dimensional visualizations lead to more incorrect discoveries. As the secondary objective, we investigated what strategies improved a subject's performance. Our observations suggest that successful participants explore the data methodically, by first examining simple and familiar concepts and then gradually adding complexity to the visualizations, until they build a correct mental model of the inner workings of the tool. With this model, they generate several discovery patterns, each acting as a blueprint for forming new insights. Ultimately, some participants combined their discovery patterns to create multifaceted data-driven stories. Based on these observations, we propose design guidelines for developing HDI platforms for large and high-dimensional data

    The 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies

    Get PDF
    This publication comprises the papers presented at the 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland, on May 9-11, 1995. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    Uma proposta de arquitetura NoLAP para um sistema de apoio à decisão acadêmico

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.Este trabalho tem por objetivo apresentar uma proposta de migração da arquitetura do Data Warehouse de dados acadêmicos da Universidade de Brasília devenvolvido em bancos de dados relacional, conhecido na literatura como arquitetura ROLAP, para uma abordagem em bancos de dados NoSQL, mais precisamente para bancos de dados NoSQL de família de colunas. As abordagens consideradas nesse trabalho levaram em consideração o que se observou de mais relevante no estado da arte da literatura relacionada ao tema, como migrações de sistemas de Data Warehouse para os bancos de dados de família de colunas, como HBase, por exemplo, em conjunto com soluções para o processamento de grande volumes de dados em um cluster de servidores, como Apache Hadoop. Essa migração parte da necessidade de serem realizados estudos de novos paradigmas de arquiteturas para o Sistema de Apoio à Decisão Acadêmico face à nova realidade dos problemas que surgiram pelo crescimento massivo do volume de dados gerados pelos sistemas de informação da Universidade de Brasília - UnB.This research aims to propose a migration project to the Data Warehouse of the University of Brasilia - UnB , developed in a relational database, that is, ROLAP architecture to a column family NoSQL database architecture. The approache envolved in this work was considered by researches from the literature state of the art about migrations from Data Warehouse systems to column family databases, such as HBase, with solutions for large volumes of data processing on a server cluster, such as Apache Hadoop. This migration project emerged from the need of studying new architectural paradigms for the Academic Data Warehouse due to problems raised by massive growth of data volume stored in the University of Brasilia relational databases

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Visualizations for Real Time Big Data

    Get PDF
    We live in a Big Data era in which data are generated every minute at astonishing rates: internet cookies, social networks, all kind of sensors, climate measurements, smartphone GPS systems, etc. While in the last years the value has moved from possessing the data to knowing how to interpret them, now, with all these amounts of data being generated every minute, the value reside in the ability of interpreting them in real time. This can give a competitive advantage to those able to achieve it, providing them the ability of taking advantage of real time opportunities. Visual analytics, a science combining the strengths of machines with those of humans, can be the solution to that. This work gives a review of existing literature about visualizations of real time Big Data by discussing the main parameters that must be taken into consideration, examining some approaches to perform effective and efficiently and identifying guidelines to evaluate how good a visualization tool is. Results have shown that, although this science is still in a non-mature phase and little is written about this concrete case, the implementation of existing strategies – some of them are even obtained from a more generic case –, can be of great help for successfully visualizing real time Big Data
    corecore