42,896 research outputs found

    STREAM-EVOLVING BOT DETECTION FRAMEWORK USING GRAPH-BASED AND FEATURE-BASED APPROACHES FOR IDENTIFYING SOCIAL BOTS ON TWITTER

    Get PDF
    This dissertation focuses on the problem of evolving social bots in online social networks, particularly Twitter. Such accounts spread misinformation and inflate social network content to mislead the masses. The main objective of this dissertation is to propose a stream-based evolving bot detection framework (SEBD), which was constructed using both graph- and feature-based models. It was built using Python, a real-time streaming engine (Apache Kafka version 3.2), and our pretrained model (bot multi-view graph attention network (Bot-MGAT)). The feature-based model was used to identify predictive features for bot detection and evaluate the SEBD predictions. The graph-based model was used to facilitate multiview graph attention networks (GATs) with fellowship links to build our framework for predicting account labels from streams. A probably approximately correct learning framework was applied to confirm the accuracy and confidence levels of SEBD.The results showed that the SEBD can effectively identify bots from streams and profile features are sufficient for detecting social bots. The pretrained Bot-MGAT model uses fellowship links to reveal hidden information that can aid in identifying bot accounts. The significant contributions of this study are the development of a stream based bot detection framework for detecting social bots based on a given hashtag and the proposal of a hybrid approach for feature selection to identify predictive features for identifying bot accounts. Our findings indicate that Twitter has a higher percentage of active bots than humans in hashtags. The results indicated that stream-based detection is more effective than offline detection by achieving accuracy score 96.9%. Finally, semi supervised learning (SSL) can solve the issue of labeled data in bot detection tasks

    Hierarchical Subquery Evaluation for Active Learning on a Graph

    Get PDF
    To train good supervised and semi-supervised object classifiers, it is critical that we not waste the time of the human experts who are providing the training labels. Existing active learning strategies can have uneven performance, being efficient on some datasets but wasteful on others, or inconsistent just between runs on the same dataset. We propose perplexity based graph construction and a new hierarchical subquery evaluation algorithm to combat this variability, and to release the potential of Expected Error Reduction. Under some specific circumstances, Expected Error Reduction has been one of the strongest-performing informativeness criteria for active learning. Until now, it has also been prohibitively costly to compute for sizeable datasets. We demonstrate our highly practical algorithm, comparing it to other active learning measures on classification datasets that vary in sparsity, dimensionality, and size. Our algorithm is consistent over multiple runs and achieves high accuracy, while querying the human expert for labels at a frequency that matches their desired time budget.Comment: CVPR 201

    Graph-based Semi-Supervised & Active Learning for Edge Flows

    Full text link
    We present a graph-based semi-supervised learning (SSL) method for learning edge flows defined on a graph. Specifically, given flow measurements on a subset of edges, we want to predict the flows on the remaining edges. To this end, we develop a computational framework that imposes certain constraints on the overall flows, such as (approximate) flow conservation. These constraints render our approach different from classical graph-based SSL for vertex labels, which posits that tightly connected nodes share similar labels and leverages the graph structure accordingly to extrapolate from a few vertex labels to the unlabeled vertices. We derive bounds for our method's reconstruction error and demonstrate its strong performance on synthetic and real-world flow networks from transportation, physical infrastructure, and the Web. Furthermore, we provide two active learning algorithms for selecting informative edges on which to measure flow, which has applications for optimal sensor deployment. The first strategy selects edges to minimize the reconstruction error bound and works well on flows that are approximately divergence-free. The second approach clusters the graph and selects bottleneck edges that cross cluster-boundaries, which works well on flows with global trends

    A Probabilistic Interpretation of Sampling Theory of Graph Signals

    Full text link
    We give a probabilistic interpretation of sampling theory of graph signals. To do this, we first define a generative model for the data using a pairwise Gaussian random field (GRF) which depends on the graph. We show that, under certain conditions, reconstructing a graph signal from a subset of its samples by least squares is equivalent to performing MAP inference on an approximation of this GRF which has a low rank covariance matrix. We then show that a sampling set of given size with the largest associated cut-off frequency, which is optimal from a sampling theoretic point of view, minimizes the worst case predictive covariance of the MAP estimate on the GRF. This interpretation also gives an intuitive explanation for the superior performance of the sampling theoretic approach to active semi-supervised classification.Comment: 5 pages, 2 figures, To appear in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 201
    • …
    corecore