172 research outputs found

    Towards Addressing Key Visual Processing Challenges in Social Media Computing

    Get PDF
    abstract: Visual processing in social media platforms is a key step in gathering and understanding information in the era of Internet and big data. Online data is rich in content, but its processing faces many challenges including: varying scales for objects of interest, unreliable and/or missing labels, the inadequacy of single modal data and difficulty in analyzing high dimensional data. Towards facilitating the processing and understanding of online data, this dissertation primarily focuses on three challenges that I feel are of great practical importance: handling scale differences in computer vision tasks, such as facial component detection and face retrieval, developing efficient classifiers using partially labeled data and noisy data, and employing multi-modal models and feature selection to improve multi-view data analysis. For the first challenge, I propose a scale-insensitive algorithm to expedite and accurately detect facial landmarks. For the second challenge, I propose two algorithms that can be used to learn from partially labeled data and noisy data respectively. For the third challenge, I propose a new framework that incorporates feature selection modules into LDA models.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    IoT-Based Data Size Minimization Using Cluster-Based-Similarity- Elimination

    Get PDF
    This paper proposes a new redundancy reduction approach for continuous data flow from IoT devices based on minimizing the size of IoT data using a novel cluster-based-similarity-elimination algorithm. The continuously flowing data from IoT devices are characterized by the existence of redundant records. This redundancy not only leads to the overfitting of models but also requires a large processing power because of the large number of records. Feature selection is a technique used to partially reduce the data and thus redundancy, however, this is not sufficient. Removing redundant data is considered of utmost importance because as smart city scenarios are implemented, flow data generation requires more advanced analytics to deal with the evolution and regrowth of the IoT environment. Thus, this study aims to minimize processing time while maintaining the best accuracy by minimizing data similarity, therefore addressing the overfitting problem, and saving time. The proposed approach minimizes the data size, considering the number of tuples. The effectiveness of the proposed approach was validated using various classification algorithms and evaluation metrics. The results show a significant improvement compared with traditional approaches, resulting in a reduction in the real-time classification execution time to only 9% of the original time. This approach can be used to optimize data size and achieve accurate results with a fast execution time while also addressing overfitting issues
    • …
    corecore