3,102 research outputs found

    Building a Data Warehouse for Twitter Stream Exploration

    Full text link
    In the recent year Twitter has evolved into an extremely popular social network and has revolutionized the ways of interacting and exchanging information on the Internet. By making its public stream available through a set of APIs Twitter has triggered a wave of research initiatives aimed at analysis and knowledge discovery from the data about its users and their messaging activities. While most of the projects and tools are tailored towards solving specific tasks, we pursue a goal of providing an application in dependent and universal analytical platform for supporting any kind of analysis and knowledge discovery. We employ the well established data warehousing technology with its underlying multidimensional data model, ETL routine for loading and consolidating data from different sources, OLAP functionality for exploring the data and data mining tools for more sophisticated analysis. In this work we describe the process of transforming the original stream into a set of related multidimensional cubes and demonstrate how the resulting data warehouse can be used for solving a variety of analytical tasks. We expect our proposed approach to be applicable for analyzing the data of other social networks as well

    OLAP4Tweets: Multidimensional Modeling of tweets

    Get PDF
    International audienceTwitter, a popular microblogging platform, is at the epicenter of the social media explosion, with millions of users being able to create and publish short posts, referred to as tweets, in real time. The application of the OLAP (On-Line Analytical Processing) on large volumes of tweets is a challenge that would allow the extraction of information (especially knowledge) such as user behavior, new emerging issues, trends… In this paper, we pursue a goal of providing a generic multidimensional model dedicated to the OLAP of tweets. The proposed model reflects on some specifics such as recursive references between tweets and calculated attributes

    Towards a Workload for Evolutionary Analytics

    Full text link
    Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.Comment: 10 page

    Big Data Harmonization – Challenges and Applications

    Get PDF
    As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail

    Architecture for Analysis of Streaming Data

    Full text link
    While several attempts have been made to construct a scalable and flexible architecture for analysis of streaming data, no general model to tackle this task exists. Thus, our goal is to build a scalable and maintainable architecture for performing analytics on streaming data. To reach this goal, we introduce a 7-layered architecture consisting of microservices and publish-subscribe software. Our study shows that this architecture yields a good balance between scalability and maintainability due to high cohesion and low coupling of the solution, as well as asynchronous communication between the layers. This architecture can help practitioners to improve their analytic solutions. It is also of interest to academics, as it is a building block for a general architecture for processing streaming data

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed
    • …
    corecore