24,714 research outputs found

    An intelligent assistant for exploratory data analysis

    Get PDF
    In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm

    Exploratory Data Analysis

    Get PDF
    In the Food research and production field, system complexity is increasing and several new challenges are emerging every day. This implies a urgent necessity to extract information and obtain models capable of inferring the underlying relationships that link all the variability sources which characterize food or its production process (e.g. compositional profile, processing conditions) to very general end-properties of foodstuff, such as the healthiness, the consumer perception, the link to a territory and the effect of the production chain itself on food. This makes a \u201cdeductive\u201d, theory-driven research approach inefficient, since it is often difficult to formulate hypotheses. Explorative Multivariate Data Analysis methods, together with the most recent analytical instrumentation, offer the possibility to come back to an \u201cinductive\u201d data-driven attitude with a minimum of a priori hypotheses, instead helping in formulating new ones from the direct observation of data. The aim of this Chapter is to offer the reader an overview of the most significant tools which can be used in a preliminary, exploratory phase, ranging from the most classical descriptive statistics methods, to Multivariate Analysis methods, with particular attention to Projection methods. For all techniques, examples are given so that the main advantage of this techniques, that is a direct, graphical representation of data and their characteristics, can be immediately experienced by the reader

    Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

    Get PDF
    Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computation for Data Analysis or more concisely Progressive Analytics, that brings at the programming language level a low-latency guarantee by performing computations in a progressive fashion. Moving this progressive computation at the language level relieves the programmer of exploratory data analysis systems from implementing the whole analytics pipeline in a progressive way from scratch, streamlining the implementation of scalable exploratory data analysis systems. This article describes the new paradigm through a prototype implementation called ProgressiVis, and explains the requirements it implies through examples.Comment: 10 page

    Reinforced Approximate Exploratory Data Analysis

    Full text link
    Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.Comment: Appears in the 37th AAAI Conference on Artificial Intelligence (AAAI), 202

    Multivariate exploratory data analysis: bias assessment

    Get PDF
    El objetivo de la presente investigación fue evaluar si el sesgo cometido por los sujetos en una tarea de clasificación de estímulos varió en función del formato de presentación (soles o estrellas) y la forma de asignar las variables (aleatoria, ordenadas según su correlación en 360 grados o representadas en el espacio mediante un biplot: soles o estrellas factoriales). Se encontró que hubo una interacción significativa entre el formato de presentación y la forma de asignar las variables. En concreto, se obtuvo que los sujetos cometieron menos errores cuando clasificaron los soles factoriales que las estrellas factoriales. Asimismo, también clasificaron mejor los soles ordenados que las estrellas ordenadas. Tampoco hubo diferencias significativas entre soles y estrellas cuando no se incluyó ningún tipo de información acerca de las correlaciones (asignación aleatoria), ni entre los soles factoriales y los soles ordenados. Por último, también se obtuvo que los sujetos tardaron más tiempo en completar la tarea en la condición de asignación aleatoria que en la representación factorial.The objective of this investigation was to evaluate if the bias of the subjects varied during a stimulus classification task in relation to the presentation form at (suns and stars) and to the way the variables were assigned (randomly, ordered according to their correlation in 360º or using a biplot: factorial suns or stars). It was found that there was a significant interaction between the presentation format and the method of assigning variables. In particular, it was found that the subjects made fewer mistakes classifying the factorial suns than the factorial stars. Likewise , they also classified better the ordered suns than the ordered stars. There were no significant differences between the suns and stars when no information regarding the correlation was included (random order) nor between the ordered suns and factorial suns. Lastly, it was also found that the subjects took longer to complete a task with random assignment than one with the factorial representation.Ministerio de Educación y Ciencia PB93117

    Food survey using exploratory data analysis.

    Get PDF
    A person's eating habits are the most important aspect of maintaining one's physical wellbeing, which in turn is key to enduring the stresses and emotional hurdles that are so commonplace in our modern lifestyles. Our research shows that, over the past 33 years, the global obesity rate has increased by 27.5%. Moreover, although many people are overweight or obese, most still believe that their eating habits are healthy. This research aimed to further identify which eating habits people consider to be healthy
    corecore