35,886 research outputs found

    Representation Independent Analytics Over Structured Data

    Full text link
    Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytics algorithms will still provide the correct insights, no matter what structures are chosen to organize the database. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence, study its fundamental properties for a wide range of data analytics algorithms, and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find the characteristics of more representation independent heuristics under certain representational shifts

    Structuring visual exploratory analysis of skill demand

    No full text
    The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to users’ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    A framework for applying natural language processing in digital health interventions

    Get PDF
    BACKGROUND: Digital health interventions (DHIs) are poised to reduce target symptoms in a scalable, affordable, and empirically supported way. DHIs that involve coaching or clinical support often collect text data from 2 sources: (1) open correspondence between users and the trained practitioners supporting them through a messaging system and (2) text data recorded during the intervention by users, such as diary entries. Natural language processing (NLP) offers methods for analyzing text, augmenting the understanding of intervention effects, and informing therapeutic decision making. OBJECTIVE: This study aimed to present a technical framework that supports the automated analysis of both types of text data often present in DHIs. This framework generates text features and helps to build statistical models to predict target variables, including user engagement, symptom change, and therapeutic outcomes. METHODS: We first discussed various NLP techniques and demonstrated how they are implemented in the presented framework. We then applied the framework in a case study of the Healthy Body Image Program, a Web-based intervention trial for eating disorders (EDs). A total of 372 participants who screened positive for an ED received a DHI aimed at reducing ED psychopathology (including binge eating and purging behaviors) and improving body image. These users generated 37,228 intervention text snippets and exchanged 4285 user-coach messages, which were analyzed using the proposed model. RESULTS: We applied the framework to predict binge eating behavior, resulting in an area under the curve between 0.57 (when applied to new users) and 0.72 (when applied to new symptom reports of known users). In addition, initial evidence indicated that specific text features predicted the therapeutic outcome of reducing ED symptoms. CONCLUSIONS: The case study demonstrates the usefulness of a structured approach to text data analytics. NLP techniques improve the prediction of symptom changes in DHIs. We present a technical framework that can be easily applied in other clinical trials and clinical presentations and encourage other groups to apply the framework in similar contexts

    Dialogue as Data in Learning Analytics for Productive Educational Dialogue

    Get PDF
    This paper provides a novel, conceptually driven stance on the state of the contemporary analytic challenges faced in the treatment of dialogue as a form of data across on- and offline sites of learning. In prior research, preliminary steps have been taken to detect occurrences of such dialogue using automated analysis techniques. Such advances have the potential to foster effective dialogue using learning analytic techniques that scaffold, give feedback on, and provide pedagogic contexts promoting such dialogue. However, the translation of much prior learning science research to online contexts is complex, requiring the operationalization of constructs theorized in different contexts (often face-to-face), and based on different datasets and structures (often spoken dialogue). In this paper, we explore what could constitute the effective analysis of productive online dialogues, arguing that it requires consideration of three key facets of the dialogue: features indicative of productive dialogue; the unit of segmentation; and the interplay of features and segmentation with the temporal underpinning of learning contexts. The paper thus foregrounds key considerations regarding the analysis of dialogue data in emerging learning analytics environments, both for learning-science and for computationally oriented researchers
    • …
    corecore