235 research outputs found
Cognitive Fit in Visualizing Big Data
This dissertation examines the consequences of cognitive fit in visualizing big data. Specifically, it focuses on the interplay between different types of business data analysis tasks and visualization methods, and how the defining characteristics of big data (i.e., volume and variety) moderate the outcomes concerning data analysis performance (i.e., solution time and solution accuracy). A 12-cell repeated-measures laboratory experiment (n=145) using eye trackers is conducted to test the hypotheses. Data analysis performance is observed to improve when the information emphasized by a visualization method matches the specific information requirements for a data analysis task. Such improvements in data analysis performance are further amplified when the visualized information has high volume and variety.
This dissertation contributes to the literature in at least three ways. First, it improves our understanding of cognitive fit and how it manifests in analysts’ problem solving behaviors when using visualization tools. This is done by analyzing participants’ eye movement and gaze fixation patterns while they work with different types of data analysis tasks and visualization methods. Based on this analysis, this study proposes an objective method for assessing and measuring cognitive fit. Second, this study maps visualization characteristics to business data analysis task types, and informs the choice of visualization tools among an ever-increasing number of alternatives for supporting the complex problems faced by big data analysts. Third, this dissertation extends the cognitive fit theory to the big data context and highlights the relative importance of cognitive fit in this setting by demonstrating that increases in volume and variety amplify the task performance consequences of cognitive fit. The limitations of the experiment conducted for this dissertation and the future research opportunities they present are discussed. The findings of this dissertation also can inform the development of new visualization tools and techniques based on task and data characteristics
Visualizing Big Data with augmented and virtual reality: challenges and research agenda
This paper provides a multi-disciplinary overview of the research issues and achievements in the field of Big Data and its visualization techniques and tools. The main aim is to summarize challenges in visualization methods for existing Big Data, as well as to offer novel solutions for issues related to the current state of Big Data Visualization. This paper provides a classification of existing data types, analytical methods, visualization techniques and tools, with a particular emphasis placed on surveying the evolution of visualization methodology over the past years. Based on the results, we reveal disadvantages of existing visualization methods. Despite the technological development of the modern world, human involvement (interaction), judgment and logical thinking are necessary while working with Big Data. Therefore, the role of human perceptional limitations involving large amounts of information is evaluated. Based on the results, a non-traditional approach is proposed: we discuss how the capabilities of Augmented Reality and Virtual Reality could be applied to the field of Big Data Visualization. We discuss the promising utility of Mixed Reality technology integration with applications in Big Data Visualization. Placing the most essential data in the central area of the human visual field in Mixed Reality would allow one to obtain the presented information in a short period of time without significant data losses due to human perceptual issues. Furthermore, we discuss the impacts of new technologies, such as Virtual Reality displays and Augmented Reality helmets on the Big Data visualization as well as to the classification of the main challenges of integrating the technology.publishedVersionPeer reviewe
Visualizing Gender Gap in Film Industry over the Past 100 Years
Visualizing big data can provide valuable insights into social science
research. In this project, we focused on visualizing the potential gender gap
in the global film industry over the past 100 years. We profiled the
differences both for the actors/actresses and male/female movie audiences and
analyzed the IMDb data of the most popular 10,000 movies (the composition and
importance of casts of different genders, the cooperation network of the
actors/actresses, the movie genres, the movie descriptions, etc.) and audience
ratings (the differences between male's and female's ratings). Findings suggest
that the gender gap has been distinct in many aspects, but a recent trend is
that this gap narrows down and women are gaining discursive power in the film
industry. Our study presented rich data, vivid illustrations, and novel
perspectives that can serve as the foundation for further studies on related
topics and their social implications.Comment: Accepted by ChinaVis 2022 (Poster Presentation
GIS and Big Data Visualization
Geographic information system (GIS) has expanded its area of applications and services into various fields, from geo-positioning service to three dimensional demonstration and virtual reality. Big data analysis and its visualization tools boosters the capacity of GIS, especially in graphics and visual demonstration. In this chapter, I describe major traits of big data and its spatial analysis with visualization. And then I will find a linkage between big data and GIS. There are several GIS-based software and geo-web that deal with big data or similar scaled databases, such as ArcGIS, Google Earth, Google Map, Tableau, and InstantAtlas. For these software and websites are developed based on geography or location, they still have some limits in visualizing big data or persuading people with maps or graphics. I will search a way out of this limitation of GIS-based tools and show an alternative way to visualize big data and demonstrate thematic maps. This chapter will be a useful guide to lead GIS people into a new horizon of big data visualization
Extending ROOT through Modules
The ROOT software framework is foundational for the HEP ecosystem, providing
capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses
object-oriented concepts and build-time components to layer between them. We
believe additional layering formalisms will benefit ROOT and its users. We
present the modularization strategy for ROOT which aims to formalize the
description of existing source components, making available the dependencies
and other metadata externally from the build system, and allow post-install
additions of functionality in the runtime environment. components can then be
grouped into packages, installable from external repositories to deliver
post-install step of missing packages. This provides a mechanism for the wider
software ecosystem to interact with a minimalistic install. Reducing
intra-component dependencies improves maintainability and code hygiene. We
believe helping maintain the smallest "base install" possible will help
embedding use cases. The modularization effort draws inspiration from the Java,
Python, and Swift ecosystems. Keeping aligned with the modern C++, this
strategy relies on forthcoming features such as C++ modules. We hope
formalizing the component layer will provide simpler ROOT installs, improve
extensibility, and decrease the complexity of embedding in other ecosystemsComment: 8 pages, 2 figures, 1 listing, CHEP 2018 - 23rd International
Conference on Computing in High Energy and Nuclear Physic
Towards the cloudification of the social networks analytics
In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft
Digital Realities and Academic Research
There\u27s a change occurring in the delivery of scientific content. The development and application of virtual reality and augmented reality is changing research in nearly every field, from the life sciences to engineering. As a result, scholarly content is also changing its direction from print centric to fully submersed digital. Historically, scientific content has been simple text and figures. To create higher quality, more intuitive and engaging content, scholarly communication has witnessed a shift to video and, most recently, researchers have begun to include data to create next generation content types that supplement and enrich their works. Scholarly communication will continue this trend, requiring the delivery of content that is more innovative and interactive. However, in a world where the PDF has dominated the industry for years, new skills and technologies will be needed to ensure reader use and engagement remain stable as the information services industry shifts to accommodate new forms of content and articles enhanced by virtual and augmented reality. Implementing and delivering on augmented or virtual reality supplemental material, and supporting them with the necessary tools for engagement, is no easy task. For as much as interest, discussion and innovation are occurring-as with all disruptive entrants-questions will need to be answered, issues addressed, and best practices established so that publisher, author and end-user can benefit from the results of deeper content engagement. For publishers who work directly with scholars and researchers, this pivot means they must re-examine the needs of their customers, understand what they need delivered, where they expect to find that information, and how they want to interact with it. This will require publishers to update their current infrastructures, submission practices and guidelines, as well as develop or license software to keep pace and meet the needs of their authors and readers.
This session will help to define the challenges and strengths related to digital realities, data, and the role researchers play in shaping mixed content types in a more data drive, digital environment. Discussion includes: What are some of the pros and cons associated with data and digital reality research? How are these different content types being used as supplemental material and will they be shifting to be seen as a more integral part of the scholarly record? In the future, what role will libraries play in this shift in providing users what they want, and in a format conducive to their work and research
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors
River water-quality monitoring is increasingly conducted using automated in
situ sensors, enabling timelier identification of unexpected values. However,
anomalies caused by technical issues confound these data, while the volume and
velocity of data prevent manual detection. We present a framework for automated
anomaly detection in high-frequency water-quality data from in situ sensors,
using turbidity, conductivity and river level data. After identifying end-user
needs and defining anomalies, we ranked their importance and selected suitable
detection methods. High priority anomalies included sudden isolated spikes and
level shifts, most of which were classified correctly by regression-based
methods such as autoregressive integrated moving average models. However, using
other water-quality variables as covariates reduced performance due to complex
relationships among variables. Classification of drift and periods of
anomalously low or high variability improved when we applied replaced anomalous
measurements with forecasts, but this inflated false positive rates.
Feature-based methods also performed well on high priority anomalies, but were
also less proficient at detecting lower priority anomalies, resulting in high
false negative rates. Unlike regression-based methods, all feature-based
methods produced low false positive rates, but did not and require training or
optimization. Rule-based methods successfully detected impossible values and
missing observations. Thus, we recommend using a combination of methods to
improve anomaly detection performance, whilst minimizing false detection rates.
Furthermore, our framework emphasizes the importance of communication between
end-users and analysts for optimal outcomes with respect to both detection
performance and end-user needs. Our framework is applicable to other types of
high frequency time-series data and anomaly detection applications
- …