Search CORE

920 research outputs found

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Studying Media Events through Spatio-Temporal Statistical Analysis

Author: Lamarche-Perrin Robin
Studeny Angelika
Vincent Jean-Marc
Publication venue: HAL CCSD
Publication date: 01/09/2015
Field of study

This report is written in the context of the ANR Geomedia and summarises the developement of methods of spatio-temporel statistical analysis of media events (delivrable 3.2).This documents presents on-going work on statistical modelling and statistical inference of the ANR GEOMEDIA corpus, that is a collection of international RSS news feeds. Central to this project, RSS news feeds are viewed as a representation of the information flow in geopolitical space. As such they allow us to study media events of global extent and how they affect international relations. Here we propose hidden Markov models (HMM) as an adequate modelling framework to study the evolution of media events in time. This set of models respect the characteristic properties of the data, such as temporal dependencies and correlations between feeds. Its specific structure corresponds well to our conceptualisation of media attention and media events. We specify the general model structure that we use for modelling an ensemble of RSS news feeds. Finally, we apply the proposed models to a case study dedicated to the analysis of the media attention for the Ebola epidemic which spread through West Africa in 2014.Ce document présente les résultats d'un travail en cours sur la modélisation statistique et l'inférence appliqué au corpus de l'ANR GEOMEDIA qui est une collection des flux RSS internationaux. Au coeur du projet, les flux RSS sont considérés comme un marqueur représentatif des flux d'information dans l'espace géopolitique mondial. En tant que tel, ils nous permettent d'étudier des événements médiatiques globaux et leur impact sur les relations internationales. Dans ce contexte, on émet l'hypothèse que les modèles Markoviens cachés (HMM) constituent un cadre méthodologique adapté pour modéliser et étudier l'évolution des événements médiatiques dans le temps. Ces modèles respectent les propriétés des données, comme les corrélations temporelles et les redondances entre flux. Leur structure caractéristique correspond à notre conceptualisation de l'attention médiatique et des événements médiatiques. Nous spécifions la structure général d'un modèle HMM qui peut être appliqué a la modélisation simultané d'un ensemble des flux RSS. Finalement, on teste l'intérêt des modèles proposés à l'aide d'une étude de cas dédié à l'analyse de l'attention médiatique pour l'épidémie d'Ebola en Afrique de l'Ouest en 2014

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Geospatial database generation from digital newspapers: use case for risk and disaster domains.

Author: Preciado López Julio César
Publication venue
Publication date: 03/04/2010
Field of study

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.The generation of geospatial databases is expensive in terms of time and money. Many geospatial users still lack spatial data. Geographic Information Extraction and Retrieval systems can alleviate this problem. This work proposes a method to populate spatial databases automatically from the Web. It applies the approach to the risk and disaster domain taking digital newspapers as a data source. News stories on digital newspapers contain rich thematic information that can be attached to places. The use case of automating spatial database generation is applied to Mexico using placenames. In Mexico, small and medium disasters occur most years. The facts about these are frequently mentioned in newspapers but rarely stored as records in national databases. Therefore, it is difficult to estimate human and material losses of those events. This work present two ways to extract information from digital news using natural languages techniques for distilling the text, and the national gazetteer codes to achieve placename-attribute disambiguation. Two outputs are presented; a general one that exposes highly relevant news, and another that attaches attributes of interest to placenames. The later achieved a 75% rate of thematic relevance under qualitative analysis

Repositório da Universidade Nova de Lisboa

Engaging Mainstream Media for Efficient Content Distribution and Creation

Author: Lobzhanidze Aleksandre
Publication venue: University of Missouri--Columbia
Publication date
Field of study

University of Missouri: MOspace

Similarity Measures for Comparing and Measuring Diversity of News Feeds

Author: Luís Diogo dos Santos Teixeira da Silva
Publication venue
Publication date: 13/07/2021
Field of study

Repositório Aberto da Universidade do Porto

Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding

Author: Hiemstra D.
Serdyukov P.
Publication venue: Amsterdam University Press
Publication date: 01/01/2008
Field of study

Modern expert nding algorithms are developed under the assumption that all possible expertise evidence for a person is concentrated in a company that currently employs the person. The evidence that can be acquired outside of an enterprise is traditionally unnoticed. At the same time, the Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence out-side of an organization and experiment with rankings built on the data acquired from six dierent sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only

Radboud Repository

University of Twente Research Information

Financial news analysis using a semantic web approach

Author: Frasincar F.
Kaymak U.
Mast L.
Micu A.
Milea D.V.
Publication venue: 'IGI Global'
Publication date: 01/01/2008
Field of study

In this paper we present StockWatcher, an OWL-based web application that enables the extraction of relevant news items from RSS feeds concerning the NASDAQ-100 listed companies. The application's goal is to present a customized, aggregated view of the news categorized by different topics. We distinguish between four relevant news categories: i) news regarding the company itself, ii) news regarding direct competitors of the company, iii) news regarding important people of the company, and iv) news regarding the industry in which the company is active. At the same time, the system presented in this chapter is able to rate these news items based on their relevance. We identify three possible effects that a news message can have on the company, and thus on the stock price of that company: i) positive, ii) negative, and iii) neutral. Currently, StockWatcher provides support for the NASDAQ-100 companies. The selection of the relevant news items is based on a customizable user portfolio that may consist of one or more of these companies

Repository TU/e

Pure OAI Repository