23 research outputs found

    Enriching unstructured media content about events to enable semi-automated summaries, compilations, and improved search by leveraging social networks

    Get PDF
    (i) Mobile devices and social networks are omnipresent Mobile devices such as smartphones, tablets, or digital cameras together with social networks enable people to create, share, and consume enormous amounts of media items like videos or photos both on the road or at home. Such mobile devices "by pure definition" accompany their owners almost wherever they may go. In consequence, mobile devices are omnipresent at all sorts of events to capture noteworthy moments. Exemplary events can be keynote speeches at conferences, music concerts in stadiums, or even natural catastrophes like earthquakes that affect whole areas or countries. At such events" given a stable network connection" part of the event-related media items are published on social networks both as the event happens or afterwards, once a stable network connection has been established again. (ii) Finding representative media items for an event is hard Common media item search operations, for example, searching for the official video clip for a certain hit record on an online video platform can in the simplest case be achieved based on potentially shallow human-generated metadata or based on more profound content analysis techniques like optical character recognition, automatic speech recognition, or acoustic fingerprinting. More advanced scenarios, however, like retrieving all (or just the most representative) media items that were created at a given event with the objective of creating event summaries or media item compilations covering the event in question are hard, if not impossible, to fulfill at large scale. The main research question of this thesis can be formulated as follows. (iii) Research question "Can user-customizable media galleries that summarize given events be created solely based on textual and multimedia data from social networks?" (iv) Contributions In the context of this thesis, we have developed and evaluated a novel interactive application and related methods for media item enrichment, leveraging social networks, utilizing the Web of Data, techniques known from Content-based Image Retrieval (CBIR) and Content-based Video Retrieval (CBVR), and fine-grained media item addressing schemes like Media Fragments URIs to provide a scalable and near realtime solution to realize the abovementioned scenario of event summarization and media item compilation. (v) Methodology For any event with given event title(s), (potentially vague) event location(s), and (arbitrarily fine-grained) event date(s), our approach can be divided in the following six steps. 1) Via the textual search APIs (Application Programming Interfaces) of different social networks, we retrieve a list of potentially event-relevant microposts that either contain media items directly, or that provide links to media items on external media item hosting platforms. 2) Using third-party Natural Language Processing (NLP) tools, we recognize and disambiguate named entities in microposts to predetermine their relevance. 3) We extract the binary media item data from social networks or media item hosting platforms and relate it to the originating microposts. 4) Using CBIR and CBVR techniques, we first deduplicate exact-duplicate and near-duplicate media items and then cluster similar media items. 5) We rank the deduplicated and clustered list of media items and their related microposts according to well-defined ranking criteria. 6) In order to generate interactive and user-customizable media galleries that visually and audially summarize the event in question, we compile the top-n ranked media items and microposts in aesthetically pleasing and functional ways

    Public scientific communication on Twitter:visual analytic approach

    Get PDF
    Purpose - The purpose of this paper is to assess high-dimensional visualisation, combined with pattern matching, as an approach to observing dynamic changes in the ways people tweet about science topics. Design/methodology/approach - The high-dimensional visualisation approach was applied to three scientific topics to test its effectiveness for longitudinal analysis of message framing on Twitter over two disjoint periods in time. The paper uses coding frames to drive categorisation and visual analytics of tweets discussing the science topics. Findings - The findings point to the potential of this mixed methods approach, as it allows sufficiently high sensitivity to recognise and support the analysis of non-trending as well as trending topics on Twitter. Research limitations/implications - Three topics are studied and these illustrate a range of frames, but results may not be representative of all scientific topics. Social implications - Funding bodies increasingly encourage scientists to participate in public engagement. As social media provides an avenue actively utilised for public communication, understanding the nature of the dialog on this medium is important for the scientific community and the public at large. Originality/value - This study differs from standard approaches to the analysis of microblog data, which tend to focus on machine driven analysis large-scale datasets. It provides evidence that this approach enables practical and effective analysis of the content of midsize to large collections of microposts

    Proceedings of the Making Sense of Microposts Workshop (#Microposts2015) at the World Wide Web Conference

    Get PDF

    Deliverable D7.7 Dissemination and Standardisation Report v3

    Get PDF
    This deliverable presents the LinkedTV dissemination and standardisation report for the project period of months 31 to 42 (April 2014 to March 2015)

    Exploiting Domain Knowledge for Cross-domain Text Classification in Heterogeneous Data Sources

    Get PDF
    With the growing amount of data generated in large heterogeneous repositories (such as the Word Wide Web, corporate repositories, citation databases), there is an increased need for the end users to locate relevant information efficiently. Text Classification (TC) techniques provide automated means for classifying fragments of text (phrases, paragraphs or documents) into predefined semantic types, allowing an efficient way for organising and analysing such large document collections. Current approaches to TC rely on supervised learning, which perform well on the domains on which the TC system is built, but tend to adapt poorly to different domains. This thesis presents a body of work for exploring adaptive TC techniques across hetero- geneous corpora in large repositories with the goal of finding novel ways of bridging the gap across domains. The proposed approaches rely on the exploitation of domain knowledge for the derivation of stable cross-domain features. This thesis also investigates novel ways of estimating the performance of a TC classifier, by means of domain similarity measures. For this purpose, two novel knowledge-based similarity measures are proposed that capture the usefulness of the selected cross-domain features for cross-domain TC. The evaluation of these approaches and measures is presented on real world datasets against various strong baseline methods and content-based measures used in transfer learning. This thesis explores how domain knowledge can be used to enhance the representation of documents to address the lexical gap across the domains. Given that the effectiveness of a text classifier largely depends on the availability of annotated data, this thesis explores techniques which can leverage data from social knowledge sources (such as DBpedia and Freebase). Techniques are further presented, which explore the feasibility of exploiting different semantic graph structures from knowledge sources in order to create novel cross- domain features and domain similarity metrics. The methodologies presented provide a novel representation of documents, and exploit four wide coverage knowledge sources: DBpedia, Freebase, SNOMED-CT and MeSH. The contribution of this thesis demonstrates the feasibility of exploiting domain knowl- edge for adaptive TC and domain similarity, providing an enhanced representation of docu- ments with semantic information about entities, that can indeed reduce the lexical differences between domains

    Linked Data Supported Information Retrieval

    Get PDF
    Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem

    Deliverable D9.3 Final Project Report

    Get PDF
    This document comprises the final report of LinkedTV. It includes a publishable summary, a plan for use and dissemination of foreground and a report covering the wider societal implications of the project in the form of a questionnaire

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Enabling automatic provenance-based trust assessment of web content

    Get PDF
    corecore