235 research outputs found

    Cognitive Fit in Visualizing Big Data

    Get PDF
    This dissertation examines the consequences of cognitive fit in visualizing big data. Specifically, it focuses on the interplay between different types of business data analysis tasks and visualization methods, and how the defining characteristics of big data (i.e., volume and variety) moderate the outcomes concerning data analysis performance (i.e., solution time and solution accuracy). A 12-cell repeated-measures laboratory experiment (n=145) using eye trackers is conducted to test the hypotheses. Data analysis performance is observed to improve when the information emphasized by a visualization method matches the specific information requirements for a data analysis task. Such improvements in data analysis performance are further amplified when the visualized information has high volume and variety. This dissertation contributes to the literature in at least three ways. First, it improves our understanding of cognitive fit and how it manifests in analysts’ problem solving behaviors when using visualization tools. This is done by analyzing participants’ eye movement and gaze fixation patterns while they work with different types of data analysis tasks and visualization methods. Based on this analysis, this study proposes an objective method for assessing and measuring cognitive fit. Second, this study maps visualization characteristics to business data analysis task types, and informs the choice of visualization tools among an ever-increasing number of alternatives for supporting the complex problems faced by big data analysts. Third, this dissertation extends the cognitive fit theory to the big data context and highlights the relative importance of cognitive fit in this setting by demonstrating that increases in volume and variety amplify the task performance consequences of cognitive fit. The limitations of the experiment conducted for this dissertation and the future research opportunities they present are discussed. The findings of this dissertation also can inform the development of new visualization tools and techniques based on task and data characteristics

    Visualizing Big Data with augmented and virtual reality: challenges and research agenda

    Get PDF
    This paper provides a multi-disciplinary overview of the research issues and achievements in the field of Big Data and its visualization techniques and tools. The main aim is to summarize challenges in visualization methods for existing Big Data, as well as to offer novel solutions for issues related to the current state of Big Data Visualization. This paper provides a classification of existing data types, analytical methods, visualization techniques and tools, with a particular emphasis placed on surveying the evolution of visualization methodology over the past years. Based on the results, we reveal disadvantages of existing visualization methods. Despite the technological development of the modern world, human involvement (interaction), judgment and logical thinking are necessary while working with Big Data. Therefore, the role of human perceptional limitations involving large amounts of information is evaluated. Based on the results, a non-traditional approach is proposed: we discuss how the capabilities of Augmented Reality and Virtual Reality could be applied to the field of Big Data Visualization. We discuss the promising utility of Mixed Reality technology integration with applications in Big Data Visualization. Placing the most essential data in the central area of the human visual field in Mixed Reality would allow one to obtain the presented information in a short period of time without significant data losses due to human perceptual issues. Furthermore, we discuss the impacts of new technologies, such as Virtual Reality displays and Augmented Reality helmets on the Big Data visualization as well as to the classification of the main challenges of integrating the technology.publishedVersionPeer reviewe

    Visualizing Gender Gap in Film Industry over the Past 100 Years

    Full text link
    Visualizing big data can provide valuable insights into social science research. In this project, we focused on visualizing the potential gender gap in the global film industry over the past 100 years. We profiled the differences both for the actors/actresses and male/female movie audiences and analyzed the IMDb data of the most popular 10,000 movies (the composition and importance of casts of different genders, the cooperation network of the actors/actresses, the movie genres, the movie descriptions, etc.) and audience ratings (the differences between male's and female's ratings). Findings suggest that the gender gap has been distinct in many aspects, but a recent trend is that this gap narrows down and women are gaining discursive power in the film industry. Our study presented rich data, vivid illustrations, and novel perspectives that can serve as the foundation for further studies on related topics and their social implications.Comment: Accepted by ChinaVis 2022 (Poster Presentation

    GIS and Big Data Visualization

    Get PDF
    Geographic information system (GIS) has expanded its area of applications and services into various fields, from geo-positioning service to three dimensional demonstration and virtual reality. Big data analysis and its visualization tools boosters the capacity of GIS, especially in graphics and visual demonstration. In this chapter, I describe major traits of big data and its spatial analysis with visualization. And then I will find a linkage between big data and GIS. There are several GIS-based software and geo-web that deal with big data or similar scaled databases, such as ArcGIS, Google Earth, Google Map, Tableau, and InstantAtlas. For these software and websites are developed based on geography or location, they still have some limits in visualizing big data or persuading people with maps or graphics. I will search a way out of this limitation of GIS-based tools and show an alternative way to visualize big data and demonstrate thematic maps. This chapter will be a useful guide to lead GIS people into a new horizon of big data visualization

    Extending ROOT through Modules

    Get PDF
    The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time components to layer between them. We believe additional layering formalisms will benefit ROOT and its users. We present the modularization strategy for ROOT which aims to formalize the description of existing source components, making available the dependencies and other metadata externally from the build system, and allow post-install additions of functionality in the runtime environment. components can then be grouped into packages, installable from external repositories to deliver post-install step of missing packages. This provides a mechanism for the wider software ecosystem to interact with a minimalistic install. Reducing intra-component dependencies improves maintainability and code hygiene. We believe helping maintain the smallest "base install" possible will help embedding use cases. The modularization effort draws inspiration from the Java, Python, and Swift ecosystems. Keeping aligned with the modern C++, this strategy relies on forthcoming features such as C++ modules. We hope formalizing the component layer will provide simpler ROOT installs, improve extensibility, and decrease the complexity of embedding in other ecosystemsComment: 8 pages, 2 figures, 1 listing, CHEP 2018 - 23rd International Conference on Computing in High Energy and Nuclear Physic

    Towards the cloudification of the social networks analytics

    Get PDF
    In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft

    Digital Realities and Academic Research

    Get PDF
    There\u27s a change occurring in the delivery of scientific content. The development and application of virtual reality and augmented reality is changing research in nearly every field, from the life sciences to engineering. As a result, scholarly content is also changing its direction from print centric to fully submersed digital. Historically, scientific content has been simple text and figures. To create higher quality, more intuitive and engaging content, scholarly communication has witnessed a shift to video and, most recently, researchers have begun to include data to create next generation content types that supplement and enrich their works. Scholarly communication will continue this trend, requiring the delivery of content that is more innovative and interactive. However, in a world where the PDF has dominated the industry for years, new skills and technologies will be needed to ensure reader use and engagement remain stable as the information services industry shifts to accommodate new forms of content and articles enhanced by virtual and augmented reality. Implementing and delivering on augmented or virtual reality supplemental material, and supporting them with the necessary tools for engagement, is no easy task. For as much as interest, discussion and innovation are occurring-as with all disruptive entrants-questions will need to be answered, issues addressed, and best practices established so that publisher, author and end-user can benefit from the results of deeper content engagement. For publishers who work directly with scholars and researchers, this pivot means they must re-examine the needs of their customers, understand what they need delivered, where they expect to find that information, and how they want to interact with it. This will require publishers to update their current infrastructures, submission practices and guidelines, as well as develop or license software to keep pace and meet the needs of their authors and readers. This session will help to define the challenges and strengths related to digital realities, data, and the role researchers play in shaping mixed content types in a more data drive, digital environment. Discussion includes: What are some of the pros and cons associated with data and digital reality research? How are these different content types being used as supplemental material and will they be shifting to be seen as a more integral part of the scholarly record? In the future, what role will libraries play in this shift in providing users what they want, and in a format conducive to their work and research

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications
    corecore