29 research outputs found

    Predicting Rising Follower Counts on Twitter Using Profile Information

    Full text link
    When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

    Σημασιολογική μοντελοποίηση κοινωνικού περιεχομένου δημιουργούμενο από χρήστες

    No full text
    In this thesis, we study information management issues that arise in Online Social Networks (OSNs), as well as collective intelligence issues towards automated knowledge representation. We focus on three research directions, namely i) online social influence and the discovery of its impactful entities, ii) user-generated content and the role of semantics in social analysis, and iii) qualitative assessment of viral disseminated content. We present efficient and scalable methods focused on specific problems in the addressed directions, while we aim at proposing advancements in the relevant state of the art research in the field. In particular, in the first research direction, we study how we can measure social influence and what are its application domains. To this end, we developed a service aiming at calculating and ranking the importance and influence of Twitter accounts. This service incorporates theoretical aspects of influence metrics that derive from social functions that evaluate i) the activity of a Twitter account (e.g. tweets, re-tweets and replies), ii) its social degree (e.g. followers and following) and iii) its network impact (e.g. content diffusion and social acknowledgement). In the second research direction, we investigate the role of semantics in OSNs and the adoption of Semantic Web technologies which can be used for the detection of similar users, as well as user personalization issues (e.g. interests and suggestions). To this end, we define an ontological schema towards the semantification of social analytics, including structural aspects of Twitter accounts, disseminated entities and social relationships. Furthermore, we propose a methodology towards the discovery and suggestion of similar Twitter accounts, based entirely on their disseminated content. On top of that and based on the similarity relationships, we present an approach towards the automatic labeling of Twitter accounts by exploiting information from the Linked Open Data cloud; specifically according to DBpedia thematic categories. Finally, we contribute in the field of Query Expansion (QE) by proposing an algorithmic approach, which expands a user’s query through the creation of a suggestion set that consists of the most viral and up-to-date Twitter entities. Finally, in the third research direction, we tackle the problem of qualitative assessment of user-generated content by utilizing social influence and semantics. We conclude that the first two research areas along with the later can jointly provide useful insights, when we want to model dynamic properties of influential content and its flow dynamics.Η παρούσα διδακτορική διατριβή πραγματεύεται θέματα και προβλήματα διαχείρισης πληροφοριών που προκύπτουν εντός των Διαδικτυακών Κοινωνικών Δικτύων (Online Social Networks), καθώς και θέματα συλλογικής ευφυΐας προς την κατεύθυνση της αυτοματοποιημένης αναπαράστασης γνώσης. Σε αυτό το πλαίσιο ακολουθούνται τρεις ερευνητικές κατευθύνσεις συγκεκριμένα: i) η έννοια της επιρροής στα κοινωνικά δίκτυα και η ανεύρεση οντοτήτων σε αυτά με μεγάλη επιρροή, ii) το περιεχόμενο που δημιουργείται από τους χρήστες και ο ρόλος της σημασιολογίας στην ανάλυση των κοινωνικών δικτύων, και iii) η ποιοτική αξιολόγηση του διαχεόμενου περιεχομένου. Παρουσιάζουμε αποτελεσματικές και κλιμακώσιμες μεθόδους, εστιασμένες σε συγκεκριμένα προβλήματα των προαναφερθεισών κατευθύνσεων, με τελικό σκοπό να προταθούν νέες μέθοδοι στην αιχμή της έρευνας. Στην πρώτη ερευνητική κατεύθυνση, μελετάμε πώς μπορούμε να μετρήσουμε την κοινωνική επιρροή και ποια είναι τα πεδία εφαρμογής της. Για το σκοπό αυτό, δημιουργήσαμε μια δημοσίως διαθέσιμη υπηρεσία με στόχο τον υπολογισμό και την κατάταξη της επιρροής και της επίδρασης λογαριασμών στο Twitter. Αυτή η υπηρεσία ενσωματώνει θεωρητικές πτυχές της μέτρησης επιρροής οι οποίες απορρέουν από κοινωνικές λειτουργίες που αξιολογούν i) την κοινωνική δραστηριότητα ενός λογαριασμού στο Twitter (π.χ. tweets, re-tweets, απαντήσεις), ii) την κοινωνική δημοτικότητα (π.χ. ακολούθους (followers), ακολουθούμενους (following)), και iii) τον αντίκτυπο στο κοινωνικό δίκτυο (π.χ. διάχυση περιεχομένου, κοινωνική αναγνώριση ). Στη δεύτερη ερευνητική κατεύθυνση, διερευνάται ο ρόλος της σημασιολογίας στα Διαδικτυακά Κοινωνικά Δίκτυα και η υιοθέτηση τεχνολογιών Σημασιολογικού Ιστού οι οποίες μπορούν να χρησιμοποιηθούν για την ανίχνευση παρόμοιων χρηστών καθώς και θέματα εξατομίκευσης χρήστη (π.χ. ενδιαφέροντα και προτάσεις ). Υπό αυτό το πρίσμα, ορίζουμε ένα οντολογικό σχήμα με σκοπό τη σημασιολογική αναπαράσταση των αναλυτικών στοιχείων (analytics) των κοινωνικών δικτύων, συμπεριλαμβανομένων των δομικών πτυχών των λογαριασμών Twitter, των διαχεόμενων οντοτήτων, καθώς και των κοινωνικών σχέσεων. Επιπροσθέτως, προτείνουμε μια μεθοδολογία για την ανεύρεση και πρόταση παρεμφερών λογαριασμών στο Twitter, με βάση αποκλειστικά το διαχεόμενο περιεχόμενο. Συν τοις άλλοις και με βάση τις σχέσεις ομοιότητας, παρουσιάζουμε μια προσέγγιση για την αυτόματη σήμανση των λογαριασμών Twitter εκμεταλλευόμενοι πληροφορίες από το σύννεφο των «Συνδεδεμένων Ανοιχτών Δεδομένων» (Linked Open Data cloud), και συγκεκριμένα σύμφωνα με θεματικές κατηγορίες από τη γνωσιακή βάση DBpedia. Τέλος, συμβάλλουμε στο πεδίο της Επέκτασης Ερωτημάτων (Query Expansion) προτείνοντας μια αλγοριθμική προσέγγιση, η οποία επεκτείνει το ερώτημα ενός χρήστη μέσω της δημιουργίας ενός συνόλου προτάσεων το οποίο αποτελείται από τις πιο δημοφιλείς και ενημερωμένες οντότητες του Twitter. Τέλος, στην τρίτη ερευνητική κατεύθυνση, αντιμετωπίζουμε το πρόβλημα της ποιοτικής αξιολόγησης του περιεχομένου που παράγουν οι χρήστες χρησιμοποιώντας την κοινωνική επιρροή και τη σημασιολογία. Καταλήγουμε στο συμπέρασμα ότι οι δύο πρώτες ερευνητικές περιοχές μπορούν από κοινού με την τρίτη να παράσχουν χρήσιμες πληροφορίες, όταν θέλουμε να αναπαραστήσουμε τις δυναμικές ιδιότητες του περιεχομένου με που έχει μεγάλο αντίκτυπο καθώς και της δυναμικής του ροής

    Rating the Dominance of Concepts in Semantic Taxonomies

    No full text
    The descriptive concepts of “semantic” taxonomies are assigned to content items of the publishing domain for supporting a plethora of operations, mostly regarding the organization and discoverability of the content, as well as for recommendation tasks. However, either not all publishers rely on such structures, or in many cases employ their own proprietary taxonomies, thus the content is either difficult to be retrieved by the end users or stored in publisher-specific fragmented “data-silos”, respectively. To address these issues, the modular and scalable “Dominance Metric” methodology is proposed for rating the dominance and importance of concepts in semantic taxonomies. Our proposed metric is applied both on the vast multidisciplinary Microsoft Academic Graph Fields of Study taxonomy and the MeSH controlled vocabulary in order for their enhanced and refined versions to be produced. Moreover, we describe the cleansing process of the resulting taxonomy from Microsoft’s structure by deduplicating concepts and refining the hierarchical relations towards the increase of its representation quality. Our evaluation procedure provided valuable insights by showcasing that high volume, namely the number of publications a concept is assigned to, does not necessarily imply high influence, but the latter is also affected by the structural and topological properties of the individual entities

    Rating the Dominance of Concepts in Semantic Taxonomies

    No full text
    The descriptive concepts of “semantic” taxonomies are assigned to content items of the publishing domain for supporting a plethora of operations, mostly regarding the organization and discoverability of the content, as well as for recommendation tasks. However, either not all publishers rely on such structures, or in many cases employ their own proprietary taxonomies, thus the content is either difficult to be retrieved by the end users or stored in publisher-specific fragmented “data-silos”, respectively. To address these issues, the modular and scalable “Dominance Metric” methodology is proposed for rating the dominance and importance of concepts in semantic taxonomies. Our proposed metric is applied both on the vast multidisciplinary Microsoft Academic Graph Fields of Study taxonomy and the MeSH controlled vocabulary in order for their enhanced and refined versions to be produced. Moreover, we describe the cleansing process of the resulting taxonomy from Microsoft’s structure by deduplicating concepts and refining the hierarchical relations towards the increase of its representation quality. Our evaluation procedure provided valuable insights by showcasing that high volume, namely the number of publications a concept is assigned to, does not necessarily imply high influence, but the latter is also affected by the structural and topological properties of the individual entities

    Latent Twitter Image Information for Social Analytics

    No full text
    The appearance of images in social messages is continuously increasing, along with user engagement with that type of content. Analysis of social images can provide valuable latent information, often not present in the social posts. In that direction, a framework is proposed exploiting latent information from Twitter images, by leveraging the Google Cloud Vision API platform, aiming at enriching social analytics with semantics and hidden textual information. As validated by our experiments, social analytics can be further enriched by considering the combination of user-generated content, latent concepts, and textual data extracted from social images, along with linked data. Moreover, we employed word embedding techniques for investigating the usage of latent semantic information towards the identification of similar Twitter images, thereby showcasing that hidden textual information can improve such information retrieval tasks. Finally, we offer an open enhanced version of the annotated dataset described in this study with the aim of further adoption by the research community

    User Analytics in Online Social Networks: Evolving from Social Instances to Social Individuals

    No full text
    In our era of big data and information overload, content consumers utilise a variety of sources to meet their data and informational needs for the purpose of acquiring an in-depth perspective on a subject, as each source is focused on specific aspects. The same principle applies to the online social networks (OSNs), as usually, the end-users maintain accounts in multiple OSNs so as to acquire a complete social networking experience, since each OSN has a different philosophy in terms of its services, content, and interaction. Contrary to the current literature, we examine the users’ behavioural and disseminated content patterns under the assumption that accounts maintained by users in multiple OSNs are not regarded as distinct accounts, but rather as the same individual with multiple social instances. Our social analysis, enriched with information about the users’ social influences, revealed behavioural patterns depending on the examined OSN, its social entities, and the users’ exerted influence. Finally, we ranked the examined OSNs based on three types of social characteristics, revealing correlations between the users’ behavioural and content patterns, social influences, social entities, and the OSNs themselves
    corecore