44,111 research outputs found

    Towards The Efficient Use Of Fine-Grained Provenance In Datascience Applications

    Get PDF
    Recent years have witnessed increased demand for users to be able to interpret the results of data science pipelines, locate erroneous data items in the input, evaluate the importance of individual input data items, and acknowledge the contributions of data curators. Such applications often involve the use of the provenance at a fine-grained level, and require very fast response time. To address this issue, my goal is to expedite the use of fine-grained provenance in applications within both the database and machine learning domains, which are ubiquitous in contemporary data science pipelines. In applications from the database domain, I focus on the problem of data citation and provide two different types of solutions, Rewriting-based solutions and Provenance-based solutions, to generate fine-grained citations to database query results by implicitly or explicitly leveraging provenance information. In applications from the ML domain, the first considers the problem of incrementally updating ML models after the deletions of a small subset of training samples. This is critical for understanding the importance of individual training samples to ML models, especially in online pipelines. For this problem, I provide two solutions, PrIU and DeltaGrad, to incrementally update ML models constructed by SGD/GD methods, which utilize provenance information collected during the training phase on the full dataset before the deletion requests. The second application from the ML domain that I focus on is to explore how to clean label uncertainties located in the ML training dataset in a more efficient and cheaper manner. To address this problem, I proposed a solution, CHEF, to reduce the cost and the overhead at each phase of the label cleaning pipeline and maintain the overall model performance simultaneously. I also propose initial ideas for how to remove some assumptions used in these solutions to extend them to more general scenarios

    Exploring scholarly data with Rexplore.

    Get PDF
    Despite the large number and variety of tools and services available today for exploring scholarly data, current support is still very limited in the context of sensemaking tasks, which go beyond standard search and ranking of authors and publications, and focus instead on i) understanding the dynamics of research areas, ii) relating authors ‘semantically’ (e.g., in terms of common interests or shared academic trajectories), or iii) performing fine-grained academic expert search along multiple dimensions. To address this gap we have developed a novel tool, Rexplore, which integrates statistical analysis, semantic technologies, and visual analytics to provide effective support for exploring and making sense of scholarly data. Here, we describe the main innovative elements of the tool and we present the results from a task-centric empirical evaluation, which shows that Rexplore is highly effective at providing support for the aforementioned sensemaking tasks. In addition, these results are robust both with respect to the background of the users (i.e., expert analysts vs. ‘ordinary’ users) and also with respect to whether the tasks are selected by the evaluators or proposed by the users themselves

    Adaptive Multiscale Weighted Permutation Entropy for Rolling Bearing Fault Diagnosis

    Get PDF
    © 2020 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.Bearing vibration signals contain non-linear and non-stationary features due to instantaneous variations in the operation of rotating machinery. It is important to characterize and analyze the complexity change of the bearing vibration signals so that bearing health conditions can be accurately identified. Entropy measures are non-linear indicators that are applicable to the time series complexity analysis for machine fault diagnosis. In this paper, an improved entropy measure, termed Adaptive Multiscale Weighted Permutation Entropy (AMWPE), is proposed. Then, a new rolling bearing fault diagnosis method is developed based on the AMWPE and multi-class SVM. For comparison, experimental bearing data are analyzed using the AMWPE, compared with the conventional entropy measures, where a multi-class SVM is adopted for fault type classification. Moreover, the robustness of different entropy measures is further studied for the analysis of noisy signals with various Signal-to-Noise Ratios (SNRs). The experimental results have demonstrated the effectiveness of the proposed method in fault diagnosis of rolling bearing under different fault types, severity degrees, and SNR levels.Peer reviewedFinal Accepted Versio

    Quantification and Comparison of Degree Distributions in Complex Networks

    Full text link
    The degree distribution is an important characteristic of complex networks. In many applications, quantification of degree distribution in the form of a fixed-length feature vector is a necessary step. On the other hand, we often need to compare the degree distribution of two given networks and extract the amount of similarity between the two distributions. In this paper, we propose a novel method for quantification of the degree distributions in complex networks. Based on this quantification method,a new distance function is also proposed for degree distributions, which captures the differences in the overall structure of the two given distributions. The proposed method is able to effectively compare networks even with different scales, and outperforms the state of the art methods considerably, with respect to the accuracy of the distance function
    • …
    corecore