1,665 research outputs found

    SportsAnno: what do you think?

    Get PDF
    The automatic summarisation of sports video is of growing importance with the increased availability of on-demand content. Consumers who are unable to view events live often have a desire to watch a summary which allows then to quickly come to terms with all that has happened during a sporting event. Sports forums show that it is not only summaries that are desirable but also the opportunity to share one’s own point of view and discuss the opinions with a community of similar users. In this paper we give an overview of the ways in which annotations have been used to augment existing visual media. We present SportsAnno, a system developed to summarise World Cup 2006 matches and provide a means for open discussion of events within these matches

    Selection Bias in News Coverage: Learning it, Fighting it

    Get PDF
    News entities must select and filter the coverage they broadcast through their respective channels since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist's disposal. The magnitude and direction of these biases are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure make it difficult to observe the bias directly, to characterize the leaning's nature and to factor it out to ensure a neutral coverage of the news. In this work, we introduce a methodology to capture the latent structure of media's decision process on a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events

    Opportunities for Web-Based Indicators in Environmental Sciences

    Get PDF
    This paper proposes a set of web-based indicators for quantifying and ranking the relevance of terms related to key-issues in Ecology and Sustainability Science. Search engines that operate in different contexts (e.g. global, social, scientific) are considered as web information carriers (WICs) and are able to analyse; (i) relevance on different levels: global web, individual/personal sphere, on-line news, and culture/science; (ii) time trends of relevance; (iii) relevance of keywords for environmental governance. For the purposes of this study, several indicators and specific indices (relational indices and dynamic indices) were applied to a test-set of 24 keywords. Outputs consistently show that traditional study topics in environmental sciences such as water and air have remained the most quantitatively relevant keywords, while interest in systemic issues (i.e. ecosystem and landscape) has grown over the last 20 years. Nowadays, the relevance of new concepts such as resilience and ecosystem services is increasing, but the actual ability of these concepts to influence environmental governance needs to be further studied and understood. The proposed approach, which is based on intuitive and easily replicable procedures, can support the decision-making processes related to environmental governance

    The state-of-the-art in personalized recommender systems for social networking

    Get PDF
    With the explosion of Web 2.0 application such as blogs, social and professional networks, and various other types of social media, the rich online information and various new sources of knowledge flood users and hence pose a great challenge in terms of information overload. It is critical to use intelligent agent software systems to assist users in finding the right information from an abundance of Web data. Recommender systems can help users deal with information overload problem efficiently by suggesting items (e.g., information and products) that match users’ personal interests. The recommender technology has been successfully employed in many applications such as recommending films, music, books, etc. The purpose of this report is to give an overview of existing technologies for building personalized recommender systems in social networking environment, to propose a research direction for addressing user profiling and cold start problems by exploiting user-generated content newly available in Web 2.0

    Combining granularity-based topic-dependent and topic-independent evidences for opinion detection

    Get PDF
    Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait référence aux techniques de calcul pour l'extraction, la classification, la compréhension et l'évaluation des opinions exprimées par diverses sources de nouvelles en ligne, social commentaires des médias, et tout autre contenu généré par l'utilisateur. Il est également connu par de nombreux autres termes comme trouver l'opinion, la détection d'opinion, l'analyse des sentiments, la classification sentiment, de détection de polarité, etc. Définition dans le contexte plus spécifique et plus simple, fouille des opinion est la tâche de récupération des opinions contre son besoin aussi exprimé par l'utilisateur sous la forme d'une requête. Il y a de nombreux problèmes et défis liés à l'activité fouille des opinion. Dans cette thèse, nous nous concentrons sur quelques problèmes d'analyse d'opinion. L'un des défis majeurs de fouille des opinion est de trouver des opinions concernant spécifiquement le sujet donné (requête). Un document peut contenir des informations sur de nombreux sujets à la fois et il est possible qu'elle contienne opiniâtre texte sur chacun des sujet ou sur seulement quelques-uns. Par conséquent, il devient très important de choisir les segments du document pertinentes à sujet avec leurs opinions correspondantes. Nous abordons ce problème sur deux niveaux de granularité, des phrases et des passages. Dans notre première approche de niveau de phrase, nous utilisons des relations sémantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxième approche pour le niveau de passage, nous utilisons plus robuste modèle de RI i.e. la language modèle de se concentrer sur ce problème. L'idée de base derrière les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniâtre et pertinentes à sujet, il est plus opiniâtre qu'un document avec moins segments textuels opiniâtre et pertinentes. La plupart des approches d'apprentissage-machine basée à fouille des opinion sont dépendants du domaine i.e. leurs performances varient d'un domaine à d'autre. D'autre part, une approche indépendant de domaine ou un sujet est plus généralisée et peut maintenir son efficacité dans différents domaines. Cependant, les approches indépendant de domaine souffrent de mauvaises performances en général. C'est un grand défi dans le domaine de fouille des opinion à développer une approche qui est plus efficace et généralisé. Nos contributions de cette thèse incluent le développement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniâtre. Fouille des opinion basée entité devient très populaire parmi les chercheurs de la communauté IR. Il vise à identifier les entités pertinentes pour un sujet donné et d'en extraire les opinions qui leur sont associées à partir d'un ensemble de documents textuels. Toutefois, l'identification et la détermination de la pertinence des entités est déjà une tâche difficile. Nous proposons un système qui prend en compte à la fois l'information de l'article de nouvelles en cours ainsi que des articles antérieurs pertinents afin de détecter les entités les plus importantes dans les nouvelles actuelles. En plus de cela, nous présentons également notre cadre d'analyse d'opinion et tâches relieés. Ce cadre est basée sur les évidences contents et les évidences sociales de la blogosphère pour les tâches de trouver des opinions, de prévision et d'avis de classement multidimensionnel. Cette contribution d'prématurée pose les bases pour nos travaux futurs. L'évaluation de nos méthodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des évaluations ont été réalisées dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining

    News vertical search using user-generated content

    Get PDF
    The thesis investigates how content produced by end-users on the World Wide Web — referred to as user-generated content — can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification component identifies the most important events that are happening at any given moment using discussion in user-generated content streams. The News Query Classification component classifies incoming queries as news-related or not in real-time. The Ranking News-Related Content component finds and ranks relevant content for news-related user queries from multiple streams of news and user-generated content. Finally, the News-Related Content Integration component merges the previously ranked content for the user query into theWeb search ranking. In this thesis, we argue that user-generated content can be leveraged in one or more of these components to better satisfy news-related user queries. Potential enhancements include the faster identification of news queries relating to breaking news events, more accurate classification of news-related queries, increased coverage of the events searched for by the user or increased freshness in the results returned. Approaches to tackle each of the four components of the news search framework are proposed, which aim to leverage user-generated content. Together, these approaches form the news vertical component of a universal Web search engine. Each approach proposed for a component is thoroughly evaluated using one or more datasets developed for that component. Conclusions are derived concerning whether the use of user-generated content enhances the component in question using an appropriate measure, namely: effectiveness when ranking events by their current importance/newsworthiness for the Top Events Identification component; classification accuracy over different types of query for the News Query Classification component; relevance of the documents returned for the Ranking News-Related Content component; and end-user preference for rankings integrating user-generated content in comparison to the unalteredWeb search ranking for the News-Related Content Integration component. Analysis of the proposed approaches themselves, the effective settings for the deployment of those approaches and insights into their behaviour are also discussed. In particular, the evaluation of the Top Events Identification component examines how effectively events — represented by newswire articles — can be ranked by their importance using two different streams of user-generated content, namely blog posts and Twitter tweets. Evaluation of the proposed approaches for this component indicates that blog posts are an effective source of evidence to use when ranking events and that these approaches achieve state-of-the-art effectiveness. Using the same approaches instead driven by a stream of tweets, provide a story ranking performance that is significantly more effective than random, but is not consistent across all of the datasets and approaches tested. Insights are provided into the reasons for this with regard to the transient nature of discussion in Twitter. Through the evaluation of the News Query Classification component, we show that the use of timely features extracted from different news and user-generated content sources can increase the accuracy of news query classification over relying upon newswire provider streams alone. Evidence also suggests that the usefulness of the user-generated content sources varies as news events mature, with some sources becoming more influential over time as new content is published, leading to an upward trend in classification accuracy. The Ranking News-Related Content component evaluation investigates how to effectively rank content from the blogosphere and Twitter for news-related user queries. Of the approaches tested, we show that learning to rank approaches using features specific to blog posts/tweets lead to state-of-the-art ranking effectiveness under real-time constraints. Finally this thesis demonstrates that the majority of end-users prefer rankings integrated with usergenerated content for news-related queries to rankings containing only Web search results or integrated with only newswire articles. Of the user-generated content sources tested, the most popular source is shown to be Twitter, particularly for queries relating to breaking events. The central contributions of this thesis are the introduction of a news search framework, the approaches to tackle each of the four components of the framework that integrate user-generated content and their subsequent evaluation in a simulated real-time setting. This thesis draws insights from a broad range of experiments spanning the entire search process for news-related queries. The experiments reported in this thesis demonstrate the potential and scope for enhancements that can be brought about by the leverage of user-generated content for real-time news search and related applications

    BLOGGING ASA MEANS OF KNOWLEDGE SHARING

    Get PDF
    This study explores problems related to knowledge sharing in the Information Age and attempts to provide a solution for them. The primary problem revolves around how locating the relevant information we're looking for online is becoming harder since it is not well organized and quickly accessible within the vast internet. In addition to that, there is more "noise" available than useful quality content. The research delves into the use of blogging as a fast,' up-and-coming meansof knowledge and quality content sharing amongst communities of bloggers with similar interests. Knowledge sharing implies a bi-directional exchange of information. To read blogs is not enough. One must start a blog and engage other bloggers with shared interests. Therefore the study and methodology of conducting it involves implementing a blog (usingthe opensource blogging platform WordPress), promoting it to those interested in the topics it covers, socializing online with a community of like-minded bloggers, and at the end measuringthe usefulness of blogging as a means of knowledge sharing by analyzing blog participation. The implemented blog for the purpose of this study is located at www.PassionBasedLearning.com and the blog's topic is about learning, knowledge management and personal development

    FINDING HER MASTER’S VOICE: THE POWER OF COLLECTIVE ACTION AMONG FEMALE MUSLIM BLOGGERS

    Get PDF
    Emerging cyber-collective movements have frequently made headlines in the news. Despite the exponential growth of bloggers in Muslim countries, there is a lack of empirical study of cyber-collective actions in these countries. We analyzed the female Muslim blogosphere because very little research attempts to understand socio-political roles of female bloggers in the system where women are frequently denied freedom of expression. We collected 150 blogs from 17 countries ranging between April 2003 and July 2010 with a special focus on Al-Huwaider’s campaigns for our analysis. Bearing the analysis upon three central tenets of individual, community, and transnational perspectives, we develop novel algorithms modeling cyber-collective movements by utilizing existing social theories on collective action and computational social network analysis. This paper contributes a methodology to study the diffusion of issues in social networks and examines roles of influential community members. We also observe the transcending nature of cyber-collective movements with future possibilities for modeling transnational outreach. Using the global female Muslim blogosphere, we provide understanding of the complexity and dynamics of cyber-collective action. To the best of our knowledge, our research is the first to address the lacking fundamental research shedding light on re-framing collective action theory in online environments
    corecore