Search CORE

3,102 research outputs found

Building a Data Warehouse for Twitter Stream Exploration

Author: Mansmann Svetlana
Rehman Nafees
Scholl Marc H.
Weiler Andreas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In the recent year Twitter has evolved into an extremely popular social network and has revolutionized the ways of interacting and exchanging information on the Internet. By making its public stream available through a set of APIs Twitter has triggered a wave of research initiatives aimed at analysis and knowledge discovery from the data about its users and their messaging activities. While most of the projects and tools are tailored towards solving specific tasks, we pursue a goal of providing an application in dependent and universal analytical platform for supporting any kind of analysis and knowledge discovery. We employ the well established data warehousing technology with its underlying multidimensional data model, ETL routine for loading and consolidating data from different sources, OLAP functionality for exploring the data and data mining tools for more sophisticated analysis. In this work we describe the process of transforming the original stream into a set of related multidimensional cubes and demonstrate how the resulting data warehouse can be used for solving a variety of analytical tasks. We expect our proposed approach to be applicable for analyzing the data of other social networks as well

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

Crossref

OLAP4Tweets: Multidimensional Modeling of tweets

Author: Ben Kraiem Maha
Feki Jamel
Khrouf Kaïs
Ravat Franck
Teste Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceTwitter, a popular microblogging platform, is at the epicenter of the social media explosion, with millions of users being able to create and publish short posts, referred to as tweets, in real time. The application of the OLAP (On-Line Analytical Processing) on large volumes of tweets is a challenge that would allow the extraction of information (especially knowledge) such as user behavior, new emerging issues, trends… In this paper, we pursue a goal of providing a generic multidimensional model dedicated to the OLAP of tweets. The proposed model reflects on some specifics such as recursive references between tweets and calculated attributes

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Towards a Workload for Evolutionary Analytics

Author: Hacigumus Hakan
LeFevre Jeff
Polyzotis Neoklis
Sankaranarayanan Jagan
Tatemura Junichi
Publication venue
Publication date: 01/01/2013
Field of study

Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Big Data Harmonization – Challenges and Applications

Author: Prof. Jigna Ashish Patel, Dr. Priyanka Sharma
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail

International Journal on Recent and Innovation Trends in Computing and Communication

Architecture for Analysis of Streaming Data

Author: Hoque Sheik
Miranskyy Andriy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2018
Field of study

While several attempts have been made to construct a scalable and flexible architecture for analysis of streaming data, no general model to tackle this task exists. Thus, our goal is to build a scalable and maintainable architecture for performing analytics on streaming data. To reach this goal, we introduce a 7-layered architecture consisting of microservices and publish-subscribe software. Our study shows that this architecture yields a good balance between scalability and maintainability due to high cohesion and low coupling of the solution, as well as asynchronous communication between the layers. This architecture can help practitioners to improve their analytic solutions. It is also of interest to academics, as it is a building block for a general architecture for processing streaming data

arXiv.org e-Print Archive

Crossref

When Things Matter: A Data-Centric View of the Internet of Things

Author: Dustdar Schahram
Falkner Nickolas J. G.
Qin Yongrui
Sheng Quan Z.
Vasilakos Athanasios V.
Wang Hua
Publication venue
Publication date: 01/01/2014
Field of study

With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

arXiv.org e-Print Archive

Victoria University Eprints Repository