4,113 research outputs found

    trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

    Full text link
    Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    Data Mining; A Conceptual Overview

    Get PDF
    This tutorial provides an overview of the data mining process. The tutorial also provides a basic understanding of how to plan, evaluate and successfully refine a data mining project, particularly in terms of model building and model evaluation. Methodological considerations are discussed and illustrated. After explaining the nature of data mining and its importance in business, the tutorial describes the underlying machine learning and statistical techniques involved. It describes the CRISP-DM standard now being used in industry as the standard for a technology-neutral data mining process model. The paper concludes with a major illustration of the data mining process methodology and the unsolved problems that offer opportunities for research. The approach is both practical and conceptually sound in order to be useful to both academics and practitioners

    Cultural Capital: Challenges to New York State’s Competitive Advantages in the Arts and Entertainment Industry

    Get PDF
    This is a report on the findings of the Cornell University ILR planning process conducted with support of a grant from the Alfred P. Sloan Foundation to investigate trends in the arts and entertainment industry in New York State and assess industry stakeholders’ needs and demand for industry studies and applied research. Building on a track record of research and technical assistance to arts and entertainment organizations, Cornell ILR moved toward a long-term goal of establishing an arts and entertainment research center by forging alliances with faculty from other schools and departments in the university and by establishing an advisory committee of key players in the industry. The outcome of this planning process is a research agenda designed to serve the priority needs and interests of the arts and entertainment industry in New York State

    Virtual Worlds for Archaeological Research and Education

    Get PDF

    Demonstration of Visible and Near Infrared Raman Spectrometers and Improved Matched Filter Model for Analysis of Combined Raman Signals

    Get PDF
    Raman spectroscopy is a powerful analysis technique that has found applications in fields such as analytical chemistry, planetary sciences, and medical diagnostics. Recent studies have shown that analysis of Raman spectral profiles can be greatly assisted by use of computational models with achievements including high accuracy pure sample classification with imbalanced data sets and detection of ideal sample deviations for pharmaceutical quality control. The adoption of automated methods is a necessary step in streamlining the analysis process as Raman hardware becomes more advanced. Due to limits in the architectures of current machine learning based Raman classification models, transfer from pure to mixed sample analysis is not possible. This thesis presents the design, fabrication, and data collected from two different Raman spectrometers, a visible light system operating at 532 nm and a near infrared system operating at 785 nm. For each system, the optical design and operational theory of the main components will be explained. Data collected on each system will then be presented. Additionally, a learned matched filter computer model was developed to analyze Raman line profiles and can detect the signatures of multiple materials in a single data point. The presented model incorporates machine learning theory into the traditional matched filter model for higher probability of detection and much reduced probability of false alarm. The structure and operation of the model will be explained, and analysis of both real and simulated mixed-sample Raman spectra will be presented

    DERMA: A melanoma diagnosis platform based on collaborative multilabel analog reasoning

    Get PDF
    The number of melanoma cancer-related death has increased over the last few years due to the new solar habits. Early diagnosis has become the best prevention method. This work presents a melanoma diagnosis architecture based on the collaboration of several multilabel case-based reasoning subsystems called DERMA. The system has to face up several challenges that include data characterization, pattern matching, reliable diagnosis, and self-explanation capabilities. Experiments using subsystems specialized in confocal and dermoscopy images have provided promising results for helping experts to assess melanoma diagnosis
    • …
    corecore