186 research outputs found
Information Diffusion and Summarization in Social Networks
Social networks are web-based services that allow users to connect and share information. Due to the huge size of social network graph and the plethora of generated content, it is difficult to diffuse and summarize the social media content. This thesis thus addresses the problems of information diffusion and information summarization in social networks. Information diffusion is a process by which information about new opinions, behaviors, conventions, practices, and technologies flow from person-to-person through a social network. Studies on information diffusion primarily focus on how information diffuses in networks and how to enhance information diffusion. Our aim is to enhance the information diffusion in social networks. Many factors affect information diffusion, such as network connectivity, location, posting timestamp, post content, etc. In this thesis, we analyze the effect of three of the most important factors of information diffusion, namely network connectivity, posting time and post content. We first study the network factor to enhance the information diffusion, and later analyze how time and content factors can diffuse the information to a large number of users. Network connectivity of a user determines his ability to disseminate information. A well-connected authoritative user can disseminate information to a more wider audience compared to an ordinary user. We present a novel algorithm to find topicsensitive authorities in social networks. We use the topic-specific authoritative position of the users to promote a given topic through word-of-mouth (WoM) marketing. Next, the lifetime of social media content is very short, which is typically a few hours. If post content is posted at the time when the targeted audience are not online or are not interested in interacting with the content, the content will not receive high audience reaction. We look at the problem of finding the best posting time(s) to get high information diffusion. Further, the type of social media content determines the amount of audience interaction, it gets in social media. Users react differently to different types of content. If a post is related to a topic that is more arousing or debatable, then it tends to get more comments. We propose a novel method to identify whether a post has high arousal content or not. Furthermore, the sentiment of post content is also an important factor to garner users’ attention in social media. Same information conveyed with different sentiments receives a different amount of audience reactions. We understand to what extent the sentiment policies employed in social media have been successful to catch users’ attention. Finally, we study the problem of information summarization in social networks. Social media services generate a huge volume of data every day, which is difficult to search or comprehend. Information summarization is a process of creating a concise readable summary of this huge volume of unstructured information. We present a novel method to summarize unstructured social media text by generating topics similar to manually created topics. We also show a comprehensive topical summary by grouping semantically related topics
FIVR: Fine-Grained Incident Video Retrieval
This paper introduces the problem of Fine-grained Incident Video Retrieval
(FIVR). Given a query video, the objective is to retrieve all associated
videos, considering several types of associations that range from duplicate
videos to videos from the same incident. FIVR offers a single framework that
contains several retrieval tasks as special cases. To address the benchmarking
needs of all such tasks, we construct and present a large-scale annotated video
dataset, which we call FIVR-200K, and it comprises 225,960 videos. To create
the dataset, we devise a process for the collection of YouTube videos based on
major news events from recent years crawled from Wikipedia and deploy a
retrieval pipeline for the automatic selection of query videos based on their
estimated suitability as benchmarks. We also devise a protocol for the
annotation of the dataset with respect to the four types of video associations
defined by FIVR. Finally, we report the results of an experimental study on the
dataset comparing five state-of-the-art methods developed based on a variety of
visual descriptors, highlighting the challenges of the current problem
A treatise on Web 2.0 with a case study from the financial markets
There has been much hype in vocational and academic circles surrounding the emergence of
web 2.0 or social media; however, relatively little work was dedicated to substantiating the
actual concept of web 2.0. Many have dismissed it as not deserving of this new title, since the
term web 2.0 assumes a certain interpretation of web history, including enough progress in
certain direction to trigger a succession [i.e. web 1.0 → web 2.0]. Others provided arguments in
support of this development, and there has been a considerable amount of enthusiasm in the
literature. Much research has been busy evaluating current use of web 2.0, and analysis of the
user generated content, but an objective and thorough assessment of what web 2.0 really stands
for has been to a large extent overlooked. More recently the idea of collective intelligence
facilitated via web 2.0, and its potential applications have raised interest with researchers, yet a
more unified approach and work in the area of collective intelligence is needed.
This thesis identifies and critically evaluates a wider context for the web 2.0 environment, and
what caused it to emerge; providing a rich literature review on the topic, a review of existing
taxonomies, a quantitative and qualitative evaluation of the concept itself, an investigation of
the collective intelligence potential that emerges from application usage. Finally, a framework
for harnessing collective intelligence in a more systematic manner is proposed.
In addition to the presented results, novel methodologies are also introduced throughout this
work. In order to provide interesting insight but also to illustrate analysis, a case study of the
recent financial crisis is considered. Some interesting results relating to the crisis are revealed
within user generated content data, and relevant issues are discussed where appropriate
Modelling Social Media Popularity of News Articles Using Headline Text
The way we formulate headlines matters -- this is the central tenet of this thesis.
Headlines play a key role in attracting and engaging online audiences. With the increasing usage of mobile apps and social media to consume news, headlines are the most prominent -- and often the only -- part of the news article visible to readers. Earlier studies examined how readers' preferences and their social network influence which headlines are clicked or shared on social media. However, there is limited research on the impact of the headline text on social media popularity.
To address this research gap we pose the following question: how to formulate a headline so that it reaches as many readers as possible on social media. To answer this question we adopt an experimental approach to model and predict the popularity of news articles on social media using headlines. First, we develop computational methods for an automatic extraction of two types of headline characteristics. The first type is news values: Prominence, Sentiment, Magnitude, Proximity, Surprise, and Uniqueness. The second type is linguistic style: Brevity, Simplicity, Unambiguity, Punctuation, Nouns, Verbs, and Adverbs. We then investigate the impact of these features on popularity using social media popularity on Twitter and Facebook, and perceived popularity obtained from a crowdsourced survey. Finally, using these features and headline metadata we build prediction models for global and country-specific social media popularity. For the country-specific prediction model we augment several news values features with country relatedness information using knowledge graphs.
Our research established that computational methods can be reliably used to characterise headlines in terms of news values and linguistic style features; and that most of these features significantly correlate with social media popularity and to a lesser extent with perceived popularity. Our prediction model for global social media popularity outperformed state-of-the-art baselines, showing that headline wording has an effect on social media popularity. With the country-specific prediction model we showed that we improved the features implementations by adding data from knowledge graphs.
These findings indicate that formulating a headline in a certain way can lead to wider readership engagement. Furthermore, our methods can be applied to other types of digital content similar to headlines, such as titles for blog posts or videos. More broadly our results signify the importance of content analysis for popularity prediction
Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200
Eesti keele üldvaldkonna tekstide laia kattuvusega automaatne sündmusanalüüs
Seoses tekstide suuremahulise digitaliseerimisega ning digitaalse tekstiloome järjest laiema levikuga on tohutul hulgal loomuliku keele tekste muutunud ja muutumas masinloetavaks. Masinloetavus omab potentsiaali muuta tekstimassiivid inimeste jaoks lihtsamini hallatavaks, nt lubada rakendusi nagu automaatne sisukokkuvõtete tegemine ja tekstide põhjal küsimustele vastamine, ent paraku ei ulatu praegused automaatanalüüsi võimalused tekstide sisu tegeliku mõistmiseni. Oletatakse, tekstide sisu mõistvale automaatanalüüsile viib meid lähemale sündmusanalüüs – kuna paljud tekstid on narratiivse ülesehitusega, tõlgendatavad kui „sündmuste kirjeldused”, peaks tekstidest sündmuste eraldamine ja formaalsel kujul esitamine pakkuma alust mitmete „teksti mõistmist” nõudvate keeletehnoloogia rakenduste loomisel.
Käesolevas väitekirjas uuritakse, kuivõrd saab eestikeelsete tekstide sündmusanalüüsi käsitleda kui avatud sündmuste hulka ja üldvaldkonna tekste hõlmavat automaatse lingvistilise analüüsi ülesannet. Probleemile lähenetakse eesti keele automaatanalüüsi kontekstis uudsest, sündmuste ajasemantikale keskenduvast perspektiivist. Töös kohandatakse eesti keelele TimeML märgendusraamistik ja luuakse raamistikule toetuv automaatne ajaväljendite tuvastaja ning ajasemantilise märgendusega (sündmusviidete, ajaväljendite ning ajaseoste märgendusega) tekstikorpus; analüüsitakse korpuse põhjal inimmärgendajate kooskõla sündmusviidete ja ajaseoste määramisel ning lõpuks uuritakse võimalusi ajasemantika-keskse sündmusanalüüsi laiendamiseks geneeriliseks sündmusanalüüsiks sündmust väljendavate keelendite samaviitelisuse lahendamise näitel.
Töö pakub suuniseid tekstide ajasemantika ja sündmusstruktuuri märgenduse edasiarendamiseks tulevikus ning töös loodud keeleressurssid võimaldavad nii konkreetsete lõpp-rakenduste (nt automaatne ajaküsimustele vastamine) katsetamist kui ka automaatsete märgendustööriistade edasiarendamist.
Due to massive scale digitalisation processes and a switch from traditional means of written communication to digital written communication, vast amounts of human language texts are becoming machine-readable. Machine-readability holds a potential for easing human effort on searching and organising large text collections, allowing applications such as automatic text summarisation and question answering. However, current tools for automatic text analysis do not reach for text understanding required for making these applications generic. It is hypothesised that automatic analysis of events in texts leads us closer to the goal, as many texts can be interpreted as stories/narratives that are decomposable into events.
This thesis explores event analysis as broad-coverage and general domain automatic language analysis problem in Estonian, and provides an investigation starting from time-oriented event analysis and tending towards generic event analysis. We adapt TimeML framework to Estonian, and create an automatic temporal expression tagger and a news corpus manually annotated for temporal semantics (event mentions, temporal expressions, and temporal relations) for the language; we analyse consistency of human annotation of event mentions and temporal relations, and, finally, provide a preliminary study on event coreference resolution in Estonian news.
The current work also makes suggestions on how future research can improve Estonian event and temporal semantic annotation, and the language resources developed in this work will allow future experimentation with end-user applications (such as automatic answering of temporal questions) as well as provide a basis for developing automatic semantic analysis tools
Fine-grained Incident Video Retrieval with Video Similarity Learning.
PhD ThesesIn this thesis, we address the problem of Fine-grained Incident Video Retrieval (FIVR)
using video similarity learning methods. FIVR is a video retrieval task that aims to
retrieve all videos that depict the same incident given a query video { related video
retrieval tasks adopt either very narrow or very broad scopes, considering only nearduplicate
or same event videos. To formulate the case of same incident videos, we
de ne three video associations taking into account the spatio-temporal spans captured
by video pairs. To cover the benchmarking needs of FIVR, we construct a large-scale
dataset, called FIVR-200K, consisting of 225,960 YouTube videos from major news
events crawled from Wikipedia. The dataset contains four annotation labels according
to FIVR de nitions; hence, it can simulate several retrieval scenarios with the same
video corpus. To address FIVR, we propose two video-level approaches leveraging
features extracted from intermediate layers of Convolutional Neural Networks (CNN).
The rst is an unsupervised method that relies on a modi ed Bag-of-Word scheme,
which generates video representations from the aggregation of the frame descriptors
based on learned visual codebooks. The second is a supervised method based on Deep
Metric Learning, which learns an embedding function that maps videos in a feature
space where relevant video pairs are closer than the irrelevant ones. However, videolevel
approaches generate global video representations, losing all spatial and temporal
relations between compared videos. Therefore, we propose a video similarity learning
approach that captures ne-grained relations between videos for accurate similarity
calculation. We train a CNN architecture to compute video-to-video similarity from
re ned frame-to-frame similarity matrices derived from a pairwise region-level similarity
function. The proposed approaches have been extensively evaluated on FIVR-
200K and other large-scale datasets, demonstrating their superiority over other video
retrieval methods and highlighting the challenging aspect of the FIVR problem
- …