48,373 research outputs found

    Scatteract: Automated extraction of data from scatter plots

    Full text link
    Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.Comment: Submitted to ECML PKDD 2017 proceedings, 16 page

    How to automatically document data with the codebook package to facilitate data reuse

    No full text

    Database support of detector operation and data analysis in the DEAP-3600 Dark Matter experiment

    Full text link
    The DEAP-3600 detector searches for dark matter interactions on a 3.3 tonne liquid argon target. Over nearly a decade, from start of detector construction through the end of the data analysis phase, well over 200 scientists will have contributed to the project. The DEAP-3600 detector will amass in excess of 900 TB of data representing more than 1010^{10} particle interactions, a few of which could be from dark matter. At the same time, metadata exceeding 80 GB will be generated. This metadata is crucial for organizing and interpreting the dark matter search data and contains both structured and unstructured information. The scale of the data collected, the important role of metadata in interpreting it, the number of people involved, and the long lifetime of the project necessitate an industrialized approach to metadata management. We describe how the CouchDB and the PostgreSQL database systems were integrated into the DEAP detector operation and analysis workflows. This integration provides unified, distributed access to both structured (PostgreSQL) and unstructured (CouchDB) metadata at runtime of the data analysis software. It also supports operational and reporting requirements

    Reliability measurement during software development

    Get PDF
    During the development of data base software for a multi-sensor tracking system, reliability was measured. The failure ratio and failure rate were found to be consistent measures. Trend lines were established from these measurements that provided good visualization of the progress on the job as a whole as well as on individual modules. Over one-half of the observed failures were due to factors associated with the individual run submission rather than with the code proper. Possible application of these findings for line management, project managers, functional management, and regulatory agencies is discussed. Steps for simplifying the measurement process and for use of these data in predicting operational software reliability are outlined

    Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval

    Full text link
    Humans use context and scene knowledge to easily localize moving objects in conditions of complex illumination changes, scene clutter and occlusions. In this paper, we present a method to leverage human knowledge in the form of annotated video libraries in a novel search and retrieval based setting to track objects in unseen video sequences. For every video sequence, a document that represents motion information is generated. Documents of the unseen video are queried against the library at multiple scales to find videos with similar motion characteristics. This provides us with coarse localization of objects in the unseen video. We further adapt these retrieved object locations to the new video using an efficient warping scheme. The proposed method is validated on in-the-wild video surveillance datasets where we outperform state-of-the-art appearance-based trackers. We also introduce a new challenging dataset with complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for Video Technolog

    Automated annotation of multimedia audio data with affective labels for information management

    Get PDF
    The emergence of digital multimedia systems is creating many new opportunities for rapid access to huge content archives. In order to fully exploit these information sources, the content must be annotated with significant features. An important aspect of human interpretation of multimedia data, which is often overlooked, is the affective dimension. Such information is a potentially useful component for content-based classification and retrieval. Much of the affective information of multimedia content is contained within the audio data stream. Emotional features can be defined in terms of arousal and valence levels. In this study low-level audio features are extracted to calculate arousal and valence levels of multimedia audio streams. These are then mapped onto a set of keywords with predetermined emotional interpretations. Experimental results illustrate the use of this system to assign affective annotation to multimedia data
    corecore