3,102 research outputs found
Building a Data Warehouse for Twitter Stream Exploration
In the recent year Twitter has evolved into an extremely popular social network and has revolutionized the ways of interacting and exchanging information on the Internet. By making its public stream available through a set of APIs Twitter has triggered a wave of research initiatives aimed at analysis and knowledge discovery from the data about its users and their messaging activities. While most of the projects and tools are tailored towards solving specific tasks, we pursue a goal of providing an application in dependent and universal analytical platform for supporting any kind of analysis and knowledge discovery. We employ the well established data warehousing technology with its underlying multidimensional data model, ETL routine for loading and consolidating data from different sources, OLAP functionality for exploring the data and data mining tools for more sophisticated analysis. In this work we describe the process of transforming the original stream into a set of related multidimensional cubes and demonstrate how the resulting data warehouse can be used for solving a variety of analytical tasks. We expect our proposed approach to be applicable for analyzing the data of other social networks as well
OLAP4Tweets: Multidimensional Modeling of tweets
International audienceTwitter, a popular microblogging platform, is at the epicenter of the social media explosion, with millions of users being able to create and publish short posts, referred to as tweets, in real time. The application of the OLAP (On-Line Analytical Processing) on large volumes of tweets is a challenge that would allow the extraction of information (especially knowledge) such as user behavior, new emerging issues, trends… In this paper, we pursue a goal of providing a generic multidimensional model dedicated to the OLAP of tweets. The proposed model reflects on some specifics such as recursive references between tweets and calculated attributes
Towards a Workload for Evolutionary Analytics
Emerging data analysis involves the ingestion and exploration of new data
sets, application of complex functions, and frequent query revisions based on
observing prior query answers. We call this new type of analysis evolutionary
analytics and identify its properties. This type of analysis is not well
represented by current benchmark workloads. In this paper, we present a
workload and identify several metrics to test system support for evolutionary
analytics. Along with our metrics, we present methodologies for running the
workload that capture this analytical scenario.Comment: 10 page
Big Data Harmonization – Challenges and Applications
As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail
Architecture for Analysis of Streaming Data
While several attempts have been made to construct a scalable and flexible
architecture for analysis of streaming data, no general model to tackle this
task exists. Thus, our goal is to build a scalable and maintainable
architecture for performing analytics on streaming data.
To reach this goal, we introduce a 7-layered architecture consisting of
microservices and publish-subscribe software. Our study shows that this
architecture yields a good balance between scalability and maintainability due
to high cohesion and low coupling of the solution, as well as asynchronous
communication between the layers.
This architecture can help practitioners to improve their analytic solutions.
It is also of interest to academics, as it is a building block for a general
architecture for processing streaming data
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
- …