7 research outputs found

    Automatic Module Detection in Data Cleaning Workflows: Enabling Transparency and Recipe Reuse

    Get PDF
    Before data from multiple sources can be analyzed, data cleaning workflows (“recipes”) usually need to be employed to improve data quality. We identify a number of technical problems that make application of FAIR principles to data cleaning recipes challenging. We then demonstrate how transparency and reusability of recipes can be improved by analyzing dataflow dependencies within recipes. In particular column-level dependencies can be used to automatically detect independent subworkflows, which then can be reused individually as data cleaning modules. We have prototypically implemented this approach as part of an ongoing project to develop open-source companion tools for OpenRefine. Keywords: Data Cleaning, Provenance, Workflow Analysi

    PERBAIKAN CITRA DIGITAL SECARA ADAPTIVE SPASIAL DENGAN HOPFIELD NETWORK SPATIALLY ADAPTIVE DIGITAL IMAGE RESTORATION WITH HOPFIELD NETWORK

    Get PDF
    ABSTRAKSI: Saat ini banyak mesin yang bekerja seperti indera penglihatan. Tetapi sayangnya masih banyak sekali kekurangan yang dapat terjadi dari sistem penglihatan mesin dibandingkan dengan sistem penglihatan manusia. Kekurangan yang bisa terjadi dari pembuatan citra digital oleh kamera yaitu adanya sensitivitas terhadap gerakan. Kamera memiliki kecepatan penangkapan gambar yang rendah sehingga jika objek yang ditangkap bergerak maka akan tejadi efek motion terhadap citra tersebut. Efek tersebut disebut sebagai degradasi dari citra. Beberapa contoh degradasi yang lain adalah blur dan noise. Oleh karena itu diperlukan perbaikan untuk citra yang terdegradasi tersebut. Dalam Tugas Akhir ini telah dibangun sebuah perangkat lunak dengan menggunakan Matlab 7.0.1 untuk perbaikan citra yang terdegradasi secara Adaptive Spasial dengan menggunakan Hopfield Network sebagai salah satu algoritma Jaringan Saraf Tiruan. Metode ini merupakan salah satu metode yang menggunakan pendekatan komputasi untuk dapat melakukan perbaikan citra. Dengan metode ini citra yang mengalami blur dapat diperbaiki menjadi lebih jelas. Input dari aplikasi yang yaitu citra yang terdegradasi lalu akan diproses untuk menghasilkan citra yang sudah direstorasi lalu dihitung besar error dari citra restorasi tersebut dengan citra asli sebagai pembanding. Untuk perhitungan error dari citra yang sudah direstorasi digunakan perhitungan PSNR (Peak Signal to Noise Ratio). Citra hasil perbaikan dengan Adaptive Hopfield Network menghasilkan nilai yang performansi yang lebih baik 8 dB bila dibandingkan dengan perbaikan citra dengan Hopfield Network biasa dan lebih baik 15 dB dibandingkan dengan wiener untuk citra terdegradasi gaussian blur dengan noise of variance 30.Kata Kunci : Jaringan Saraf Tiruan, Perbaikan Citra, Adaptive Spasial, Degradasi Citra, blur, noise, motion, PSNR.ABSTRACT: Now many machines works like visual sensor. But their many lacking of the machine visual system compared to human visual system. In example in the image quality of the produced image that have a small resolution, or low color variation, far differ than human vision system that have perfected. Other lack that can happen in the kamera is the motion sensitivity. Camera have a little image capture speed so that if some object move under the capturing process will make a motion with the image result. Thus effect named image degradation. Other image degradation are blur and noise. Because of that degradated image must have some Image Restoration. This Final Poject implementing a software developed by using Matlab 7.0.1 for Spatially Adaptive Image Restoration for the degradated image with Hopfield Network as a Neural Network Algorithm. This Methode use computational aproach to restoring the image. With this methode the image that have a degradation can be repaired to be more cleared. Input for this digital image restoration is the degradated image and will be proceed to create the restoration image and computing error value for the restoration image with original image to compare. To compute restoration image error usiing PSNR (Peak Signal to Noise Ratio). Result Image from Adaptive Hopfield Network produce performance value 8 dB greater than standar Hopfield Network Restoration and 15 dB greater than wiener restoration methode to restore the degradated blur image and additive gaussian noise with noise of variance 30.Keyword: Neural Network, Image Restoration, Adaptive Spasial, Image Degradation, blur, noise, motion, PSN

    An analysis of use and performance data aggregated from 35 institutional repositories

    Get PDF
    Purpose – This study demonstrates that aggregated data from the Repository Analytics and Metrics Portal (RAMP) have significant potential to analyze visibility and use of institutional repositories (IR) as well as potential factors affecting their use, including repository size, platform, content, device and global location. The RAMP dataset is unique and public. Design/methodology/approach – The webometrics methodology was followed to aggregate and analyze use and performance data from 35 institutional repositories in seven countries that were registered with the RAMP for a five-month period in 2019. The RAMP aggregates Google Search Console (GSC) data to show IR items that surfaced in search results from all Google properties. Findings – The analyses demonstrate large performance variances across IR as well as low overall use. The findings also show that device use affects search behavior, that different content types such as electronic thesis and dissertation (ETD) may affect use and that searches originating in the Global South show much higher use of mobile devices than in the Global North. Research limitations/implications – The RAMP relies on GSC as its sole data source, resulting in somewhat conservative overall numbers. However, the data are also expected to be as robot free as can be hoped. Originality/value – This may be the first analysis of aggregate use and performance data derived from a global set of IR, using an openly published dataset. RAMP data offer significant research potential with regard to quantifying and characterizing variances in the discoverability and use of IR content

    Evaluating a Machine Learning Approach to Identifying Expressive Content at Page Level in HathiTrust

    Get PDF
    HathiTrust currently provides metadata, scanned images, and full text for all public domain volumes. However, it’s likely there is content that is of interest to scholars and free from restriction within the front matter of most volumes, regardless of rights status. For example, the title page or table of contents may contain information that is likely non-expressive and useful to understanding the content’s structure and subject matter. It’s also likely that some volumes include materials that have expressive/creative content in the first 20 pages, so front matter cannot be made open for all volumes without understanding the most frequent type of content within the first 20 pages. This task is time-prohibitive for entirely manual exploration, so we seek to evaluate a machine learning approach for this task

    Systematic Examination of Pre- and Post-Retraction Citations

    Get PDF
    Scientific retractions occur for a multitude of reasons. A growing body of research has studied the phenomenon of retraction through systematic analyses of the characteristics of retracted articles and their associated citations. In our study, we focus on the characteristics of articles that cite retracted articles, and the changes in citation dynamics pre- and post-retraction. We leverage descriptive statistics and ego-network methods to examine 4,871 retracted articles and their citations before and after retraction. Our retracted articles data was obtained from PubMed, Scopus, and Retraction Watch and their citing articles from Scopus. Our findings indicate a stark decrease in post-retraction citations and that most of these citations came from countries different from the retracted article's country of publication. Citation context analyses of a subset of retracted articles also reveal that post-retraction citations came from articles with disciplinary and geographical boundaries different from that of the retracted article.Ope

    ReTracker: Actively and Automatically Matching Retraction Metadata in Zotero

    Get PDF
    Retraction removes seriously flawed papers from the scientific literature. However, even papers retracted for scientific fraud continue to be cited and used as valid after their retraction. Retracted papers are inadequately identified on publisher pages and in scholarly databases, and scholars’ personal libraries frequently contain retracted papers. To address this, we are developing a tool called ReTracker (https://github.com/nikolausn/ReTrackers) that automatically checks a user’s Zotero library for retracted articles, and adds retraction status as a new metadata field directly in the library. In this paper, we present the current version of ReTracker, which automatically flags retracted articles from PubMed. We describe how we have iteratively improved ReTracker’s matching performance through its initial two versions. Our tests show that the current version of ReTracker is able to flag retracted articles from PubMed with high precision and recall, and to distinguish retracted articles from articles about retraction. In its current state, ReTracker can actively and automatically bring retraction metadata into Zotero, and in future work we will test its usability with scholars.Ope
    corecore