12 research outputs found

    Predicting Events Surrounding the Egyptian Revolution of 2011 Using Learning Algorithms on Micro Blog Data

    Get PDF
    We aim to predict activities of political nature in Egypt which influence or reflect societal-scale behavior and beliefs by using learning algorithms on Twitter data. We focus on capturing domestic events in Egypt from November 2009 to November 2013. To this extent we study underlying communication patterns by evaluating content-based and meta-data information in classification tasks without targeting specific keywords or users. Classification is done using Support Vector Machines (SVM) and Support Distribution Machines (SDM). Latent Dirichlet Allocation (LDA) is used to create content-based input patterns for the classifiers while bags of Twitter meta-information are used with the SDM to classify meta-data features. The experiments reveal that user centric approaches based on metadata can outperform methods employing content-based input despite the use of well established natural language processing algorithms. The results show that distributions over users-centric meta information provides an important signal when detecting and predicting events

    Support vector clustering of time series data with alignment kernels

    No full text
    Time series clustering is an important data mining topic and a challenging task due to the sequences’ potentially very complex structures. In the present study we experimentally investigate the combination of support vector clustering with a triangular alignment kernel by evaluating it on an artificial time series benchmark dataset. The experiments lead to meaningful segmentations of the data, thereby providing an example that clustering time series with specific kernels is possible without pre-processing of the data. We compare our approach and the results and learn that the clustering quality is competitive when compared to other approaches

    Generative Modeling Helps Weak Supervision (and Vice Versa)

    Full text link
    Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples

    Ordinal Programmatic Weak Supervision and Crowdsourcing for Estimating Cognitive States (Student Abstract)

    No full text
    Crowdsourcing and weak supervision offer methods to efficiently label large datasets. Our work builds on existing weak supervision models to accommodate ordinal target classes, in an effort to recover ground truth from weak, external labels. We define a parameterized factor function and show that our approach improves over other baselines
    corecore