2,191 research outputs found

    TED Talk Recommender Using Speech Transcripts

    Full text link
    Nowadays, online video platforms mostly recommend related videos by analyzing user-driven data such as viewing patterns, rather than the content of the videos. However, content is more important than any other element when videos aim to deliver knowledge. Therefore, we have developed a web application which recommends related TED lecture videos to the users, considering the content of the videos from the transcripts. TED Talk Recommender constructs a network for recommending videos that are similar content-wise and providing a user interface.Comment: 3 page

    NETS: Extremely fast outlier detection from a data stream via set-based processing

    Get PDF
    This paper addresses the problem of efficiently detecting outliers from a data stream as old data points expire from and new data points enter the window incrementally. The proposed method is based on a newly discovered characteristic of a data stream that the change in the locations of data points in the data space is typically very insignificant. This observation has led to the finding that the existing distance-based outlier detection algorithms perform excessive unnecessary computations that are repetitive and/or canceling out the effects. Thus, in this paper, we propose a novel set-based approach to detecting outliers, whereby data points at similar locations are grouped and the detection of outliers or inliers is handled at the group level. Specifically, a new algorithm NETS is proposed to achieve a remarkable performance improvement by realizing set-based early identification of outliers or inliers and taking advantage of the net effect between expired and new data points. Additionally, NETS is capable of achieving the same efficiency even for a high-dimensional data stream through two-level dimensional filtering. Comprehensive experiments using six real-world data streams show 5 to 25 times faster processing time than state-of-the-art algorithms with comparable memory consumption. We assert that NETS opens a new possibility to real-time data stream outlier detection

    Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise

    Full text link
    Multi-label classification poses challenges due to imbalanced and noisy labels in training data. We propose a unified data augmentation method, named BalanceMix, to address these challenges. Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity. It also refines multi-labels at the label-wise granularity, categorizing noisy labels as clean, re-labeled, or ambiguous for robust optimization. Extensive experiments on three benchmark datasets demonstrate that BalanceMix outperforms existing state-of-the-art methods. We release the code at https://github.com/DISL-Lab/BalanceMix.Comment: This paper was accepted at AAAI 2024. We upload the full version of our paper on arXiv due to the page limit of AAA

    A Novel Framework for Online Amnesic Trajectory Compression in Resource-constrained Environments

    Full text link
    State-of-the-art trajectory compression methods usually involve high space-time complexity or yield unsatisfactory compression rates, leading to rapid exhaustion of memory, computation, storage and energy resources. Their ability is commonly limited when operating in a resource-constrained environment especially when the data volume (even when compressed) far exceeds the storage limit. Hence we propose a novel online framework for error-bounded trajectory compression and ageing called the Amnesic Bounded Quadrant System (ABQS), whose core is the Bounded Quadrant System (BQS) algorithm family that includes a normal version (BQS), Fast version (FBQS), and a Progressive version (PBQS). ABQS intelligently manages a given storage and compresses the trajectories with different error tolerances subject to their ages. In the experiments, we conduct comprehensive evaluations for the BQS algorithm family and the ABQS framework. Using empirical GPS traces from flying foxes and cars, and synthetic data from simulation, we demonstrate the effectiveness of the standalone BQS algorithms in significantly reducing the time and space complexity of trajectory compression, while greatly improving the compression rates of the state-of-the-art algorithms (up to 45%). We also show that the operational time of the target resource-constrained hardware platform can be prolonged by up to 41%. We then verify that with ABQS, given data volumes that are far greater than storage space, ABQS is able to achieve 15 to 400 times smaller errors than the baselines. We also show that the algorithm is robust to extreme trajectory shapes.Comment: arXiv admin note: substantial text overlap with arXiv:1412.032
    • …
    corecore