11,854 research outputs found

    An Exploratory Sequential Mixed Methods Approach to Understanding Researchers’ Data Management Practices at UVM: Findings from the Qualitative Phase

    Get PDF
    The objective of this article is to report on the first qualitative phase of an exploratory sequential mixed methods research design focused on researcher data management practices and related institutional research data services. The aim of this study is to understand data management behaviors of faculty at the University of Vermont (UVM), a higher-research activity Research University, in order to guide the development of campus research data management services. The population of study was all faculty who received National Science Foundation (NSF) grants between 2011 and 2014 who were required to submit a data management plan (DMP); qualitative data was collected in two forms: (1) semi-structured interviews and (2) document analysis of data management plans. From a population of 47 researchers, six were included in the interview sample, representing a broad range of disciplines and NSF Directorates, and 35 data management plans were analyzed. Three major themes were identified through triangulation of qualitative data sources: data management activities, including data dissemination and data sharing; institutional research support and infrastructure barriers; and perceptions of data management plans and attitudes towards data management planning. The themes articulated in this article will be used to design a survey for the second quantitative phase of the study, which will aim to more broadly generalize data management activities at UVM across all disciplines

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    An Exploratory Sequential Mixed Methods Approach to Understanding Researchers’ Data Management Practices at UVM: Findings from the Quantitative Phase

    Get PDF
    This article reports on the second quantitative phase of an exploratory sequential mixed methods research design focused on researcher data management practices and related institutional support and services. The study aims to understand data management activities and challenges of faculty at the University of Vermont (UVM), a higher research activity Research University, in order to develop appropriate research data services (RDS). Data was collected via a survey, built on themes from the initial qualitative data analysis from the first phase of this study. The survey was distributed to a nonrandom census sample of full-time UVM faculty and researchers (P=1,190); from this population, a total of 319 participants completed the survey for a 26.8% response rate. The survey collected information on five dimensions of data management: data management activities; data management plans; data management challenges; data management support; and attitudes and behaviors towards data management planning. Frequencies, cross tabulations, and chi-square tests of independence were calculated using demographic variables including gender, rank, college, and discipline. Results from the analysis provide a snapshot of research data management activities at UVM, including types of data collected, use of metadata, short- and long-term storage of data, and data sharing practices. The survey identified key challenges to data management, including data description (metadata) and sharing data with others; this latter challenge is particular impacted by confidentiality issues and lack of time, personnel, and infrastructure to make data available. Faculty also provided insight to RDS that they think UVM should support, as well as RDS they were personally interested in. Data from this study will be integrated with data from the first qualitative phase of the research project and analyzed for meta-inferences to help determine future research data services at UVM

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    A framework for interrogating social media images to reveal an emergent archive of war

    Get PDF
    The visual image has long been central to how war is seen, contested and legitimised, remembered and forgotten. Archives are pivotal to these ends as is their ownership and access, from state and other official repositories through to the countless photographs scattered and hidden from a collective understanding of what war looks like in individual collections and dusty attics. With the advent and rapid development of social media, however, the amateur and the professional, the illicit and the sanctioned, the personal and the official, and the past and the present, all seem to inhabit the same connected and chaotic space.However, to even begin to render intelligible the complexity, scale and volume of what war looks like in social media archives is a considerable task, given the limitations of any traditional human-based method of collection and analysis. We thus propose the production of a series of ‘snapshots’, using computer-aided extraction and identification techniques to try to offer an experimental way in to conceiving a new imaginary of war. We were particularly interested in testing to see if twentieth century wars, obviously initially captured via pre-digital means, had become more ‘settled’ over time in terms of their remediated presence today through their visual representations and connections on social media, compared with wars fought in digital media ecologies (i.e. those fought and initially represented amidst the volume and pervasiveness of social media images).To this end, we developed a framework for automatically extracting and analysing war images that appear in social media, using both the features of the images themselves, and the text and metadata associated with each image. The framework utilises a workflow comprising four core stages: (1) information retrieval, (2) data pre-processing, (3) feature extraction, and (4) machine learning. Our corpus was drawn from the social media platforms Facebook and Flickr

    Analysis and Detection of Information Types of Open Source Software Issue Discussions

    Full text link
    Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019
    • …
    corecore