36 research outputs found

    The role of data visualization in Railway Big Data Risk Analysis

    Get PDF
    Big Data Risk Analysis (BDRA) is one of the possible alleys for the further development of risk models in the railway transport. Big Data techniques allow a great quantity of information to be handled from different types of sources (e.g. unstructured text, signaling and train data). The benefits of this approach may lie in improving the understanding of the risk factors involved in railways, detecting possible new threats or assessing the risk levels for rolling stock, rail infrastructure or railway operations. For the efficient use of BDRA, the conversion of huge amounts of data into a simple and effective display is particularly challenging. Especially because it is presented to various specific target audiences. This work reports a literature review of risk communication and visualization in order to find out its applicability to BDRA, and beyond the visual techniques, what human factors have to be considered in the understanding and risk perception of the infor-mation when safety analysts and decision-makers start basing their decisions on BDRA analyses. It was found that BDRA requires different visualization strategies than those that have normally been carried out in risk analysis up to now

    Using visual analytics to make sense of railway Close Calls

    Get PDF
    In the big data era, large and complex data sets will exceed scientists’ capacity to make sense of them in the traditional way. New approaches in data analysis, supported by computer science, will be necessary to address the problems that emerge with the rise of big data. The analysis of the Close Call database, which is a text-based database for near-miss reporting on the GB railways, provides a test case. The traditional analysis of Close Calls is time consuming and prone to differences in interpretation. This paper investigates the use of visual analytics techniques, based on network text analysis, to conduct data analysis and extract safety knowledge from 500 randomly selected Close Call records relating to worker slips, trips and falls. The results demonstrate a straightforward, yet effective, way to identify hazardous conditions without having to read each report individually. This opens up new ways to perform data analysis in safety science

    Integrating data to support SPAD management

    Get PDF
    A stop signal passed without authority (aka SPAD) is one of the most serious types of incidents in railways since they potentially cause derailments or collisions. SPADs are complex incidents that have been usually analysed as human factors incidents. Human errors of train drivers such as slips or lapses have been prevalent in SPAD incident investigations. In the big data era, alternatives to the traditional methods can be used to support SPAD analysis of whatever kind. Railway systems produce a huge amount of data from a variety of data sources that can be used to get a better understanding of the factors involved in SPADs. This paper describes a first trial within the Big Data Risk Analysis program (BDRA) in order to combine unstructured data from SMIS/IFCS text records with structured data from of railway signals in order to support the SPAD management

    Design Principles for Data Analysis

    Full text link
    The data science revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking -- the problem-solving process to understand the people for whom a product is being designed. For a given problem, there can be significant or subtle differences in how a data analyst (or producer of a data analysis) constructs, creates, or designs a data analysis, including differences in the choice of methods, tooling, and workflow. These choices can affect the data analysis products themselves and the experience of the consumer of the data analysis. Therefore, the role of a producer can be thought of as designing the data analysis with a set of design principles. Here, we introduce design principles for data analysis and describe how they can be mapped to data analyses in a quantitative, objective and informative manner. We also provide empirical evidence of variation of principles within and between both producers and consumers of data analyses. Our work leads to two insights: it suggests a formal mechanism to describe data analyses based on the design principles for data analysis, and it provides a framework to teach students how to build data analyses using formal design principles.Comment: arXiv admin note: text overlap with arXiv:1903.0763

    A Distance Framework to Understand the Academia-Practicioner Gap

    Get PDF
    This paper lays out a framework for understanding the academia-practitioner gap. In this framework, there are three dimensions of distance between academia and practice: (1) Product, (2) Mindset, and (3) Process. Within each category, there are multiple elements that can be used as a comprehensive way to identify all the challenges in transferring academic research to practice. This approach is both firmly grounded in and distinct from existing frameworks like Rogers’ Diffusion of Innovation, Roberts’ Marketing Science Value Chain, Wierenga’s Success of Marketing Management Support Systems and Lilien’s analysis of Bridging the Academic-Practitioner divide. After describing the framework, the paper looks at how researchers can better understand the role of intermediaries in bridging the gap between academia and practice by identifying which types of distance they help close

    Lessons Learned from Topic Modeling Analysis of COVID-19 News to Enrich Statistics Education in Korea

    Get PDF
    This study aimed to investigate how mass media in Korea dealt with various issues arising from COVID-19 and the implications of this on statistics education in South Korea during the recent pandemic. We extracted news articles with the keywords “Corona” and “Statistics” from 18 February to 20 May 2020. We employed word frequency analysis, topic modeling, semantic network analysis, hierarchical clustering, and simple linear regression analysis. The main results of this study are as follows. First, the topic modeling analysis revealed four topics, namely “macroeconomy”, “domestic outbreak”, “international outbreak”, and “real estate and stocks”. Second, a simple linear regression analysis displayed two rising topics, “macroeconomy” and “real estate and stocks” and two falling topics, “domestic outbreak” and “international outbreak” regarding the statistics related to COVID-19 as time passed. Based on these findings, we suggest that the high school mathematics curriculum of Korea should be revised to use real-life context to enable integrated education, social justice for statistics education, and simple linear regression analysis

    rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model quality

    Full text link
    Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that rTisane's DSL helps analysts engage more deeply with and accurately externalize their assumptions. rTisane also leads to statistical models that match analysts' assumptions, maintain analysis intent, and better fit the data
    corecore