3,179 research outputs found

    Event Identification in Social Networks

    Full text link
    Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. The ability to model emerging topics is a substantial step to monitor and summarize the information originating from social sources. Applying traditional methods for event detection which are often proposed for processing large, formal and structured documents, are less effective, due to the short length, noisiness and informality of the social posts. Recent event detection techniques address these challenges by exploiting the opportunities behind abundant information available in social networks. This article provides an overview of the state of the art in event detection from social networks.Comment: It will appear in Encyclopedia with Semantic Computing to be published by World Scientifi

    Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties [Extended Version]

    Full text link
    In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then analyze what contributes to one property being rated as more important than another one, and identify that at least three factors play a role, namely (i) general frequency, (ii) applicability to similar entities and (iii) semantic similarity between property and entity. We experimentally analyze the contribution of each factor and show that a combination of techniques addressing all the three factors achieves 74% precision on the task. The dataset is available at www.kaggle.com/srazniewski/wikidatapropertyranking.Comment: Extended version of an ADMA 2017 conference pape

    Content-based Video Indexing and Retrieval Using Corr-LDA

    Full text link
    Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging. This paper proposes a very specific way of searching for video clips, based on the content of the video. We present our work on Content-based Video Indexing and Retrieval using the Correspondence-Latent Dirichlet Allocation (corr-LDA) probabilistic framework. This is a model that provides for auto-annotation of videos in a database with textual descriptors, and brings the added benefit of utilizing the semantic relations between the content of the video and text. We use the concept-level matching provided by corr-LDA to build correspondences between text and multimedia, with the objective of retrieving content with increased accuracy. In our experiments, we employ only the audio components of the individual recordings and compare our results with an SVM-based approach.Comment: 8 Pages, Updated References, Added Figure

    Temporal Identification of Latent Communities on Twitter

    Full text link
    User communities in social networks are usually identified by considering explicit structural social connections between users. While such communities can reveal important information about their members such as family or friendship ties and geographical proximity, they do not necessarily succeed at pulling like-minded users that share the same interests together. In this paper, we are interested in identifying communities of users that share similar topical interests over time, regardless of whether they are explicitly connected to each other on the social network. More specifically, we tackle the problem of identifying temporal topic-based communities from Twitter, i.e., communities of users who have similar temporal inclination towards the current emerging topics on Twitter. We model each topic as a collection of highly correlated semantic concepts observed in tweets and identify them by clustering the time-series based representation of each concept built based on each concept's observation frequency over time. Based on the identified emerging topics in a given time period, we utilize multivariate time series analysis to model the contributions of each user towards the identified topics, which allows us to detect latent user communities. Through our experiments on Twitter data, we demonstrate i) the effectiveness of our topic detection method to detect real world topics and ii) the effectiveness of our approach compared to well-established approaches for community detection.Comment: Submitted to WSDM 201

    What do Vegans do in their Spare Time? Latent Interest Detection in Multi-Community Networks

    Full text link
    Most social network analysis works at the level of interactions between users. But the vast growth in size and complexity of social networks enables us to examine interactions at larger scale. In this work we use a dataset of 76M submissions to the social network Reddit, which is organized into distinct sub-communities called subreddits. We measure the similarity between entire subreddits both in terms of user similarity and topical similarity. Our goal is to find community pairs with similar userbases, but dissimilar content; we refer to this type of relationship as a "latent interest." Detection of latent interests not only provides a perspective on individual users as they shift between roles (student, sports fan, political activist) but also gives insight into the dynamics of Reddit as a whole. Latent interest detection also has potential applications for recommendation systems and for researchers examining community evolution.Comment: NIPS 2015 Network Worksho

    Supervised Laplacian Eigenmaps with Applications in Clinical Diagnostics for Pediatric Cardiology

    Full text link
    Electronic health records contain rich textual data which possess critical predictive information for machine-learning based diagnostic aids. However many traditional machine learning methods fail to simultaneously integrate both vector space data and text. We present a supervised method using Laplacian eigenmaps to augment existing machine-learning methods with low-dimensional representations of textual predictors which preserve the local similarities. The proposed implementation performs alternating optimization using gradient descent. For the evaluation we applied our method to over 2,000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. Our method was compared with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using AUC, MCC, specificity, and sensitivity. Results indicate supervised Laplacian eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC respectively. SLE showed an increase in 8.16% in AUC and 20.6% in MCC over the baseline which excluded textual data and a 2.69% and 5.35% increase in AUC and MCC respectively over unsupervised Laplacian eigenmaps. This method allows many existing machine learning predictors to effectively and efficiently utilize the potential of textual predictors

    Image Tag Refinement by Regularized Latent Dirichlet Allocation

    Full text link
    Tagging is nowadays the most prevalent and practical way to make images searchable. However, in reality many manually-assigned tags are irrelevant to image content and hence are not reliable for applications. A lot of recent efforts have been conducted to refine image tags. In this paper, we propose to do tag refinement from the angle of topic modeling and present a novel graphical model, regularized Latent Dirichlet Allocation (rLDA). In the proposed approach, tag similarity and tag relevance are jointly estimated in an iterative manner, so that they can benefit from each other, and the multi-wise relationships among tags are explored. Moreover, both the statistics of tags and visual affinities of images in the corpus are explored to help topic modeling. We also analyze the superiority of our approach from the deep structure perspective. The experiments on tag ranking and image retrieval demonstrate the advantages of the proposed method

    Semi-Automatic Terminology Ontology Learning Based on Topic Modeling

    Full text link
    Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user's query demonstrating the effectiveness of the proposed approach

    Self-supervised learning of visual features through embedding images into text topic spaces

    Full text link
    End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.Comment: Accepted CVPR 2017 pape

    How to Become Instagram Famous: Post Popularity Prediction with Dual-Attention

    Full text link
    With a growing number of social apps, people have become increasingly willing to share their everyday photos and events on social media platforms, such as Facebook, Instagram, and WeChat. In social media data mining, post popularity prediction has received much attention from both data scientists and psychologists. Existing research focuses more on exploring the post popularity on a population of users and including comprehensive factors such as temporal information, user connections, number of comments, and so on. However, these frameworks are not suitable for guiding a specific user to make a popular post because the attributes of this user are fixed. Therefore, previous frameworks can only answer the question "whether a post is popular" rather than "how to become famous by popular posts". In this paper, we aim at predicting the popularity of a post for a specific user and mining the patterns behind the popularity. To this end, we first collect data from Instagram. We then design a method to figure out the user environment, representing the content that a specific user is very likely to post. Based on the relevant data, we devise a novel dual-attention model to incorporate image, caption, and user environment. The dual-attention model basically consists of two parts, explicit attention for image-caption pairs and implicit attention for user environment. A hierarchical structure is devised to concatenate the explicit attention part and implicit attention part. We conduct a series of experiments to validate the effectiveness of our model and investigate the factors that can influence the popularity. The classification results show that our model outperforms the baselines, and a statistical analysis identifies what kind of pictures or captions can help the user achieve a relatively high "likes" number.Comment: 2018 IEEE International Conference on Big Data (IEEE Big Data
    corecore