Search CORE

250 research outputs found

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Blogs as Infrastructure for Scholarly Communication.

Author: Burton Matt
Publication venue
Publication date: 01/01/2015
Field of study

This project systematically analyzes digital humanities blogs as an infrastructure for scholarly communication. This exploratory research maps the discourses of a scholarly community to understand the infrastructural dynamics of blogs and the Open Web. The text contents of 106,804 individual blog posts from a corpus of 396 blogs were analyzed using a mix of computational and qualitative methods. Analysis uses an experimental methodology (trace ethnography) combined with unsupervised machine learning (topic modeling), to perform an interpretive analysis at scale. Methodological findings show topic modeling can be integrated with qualitative and interpretive analysis. Special attention must be paid to data fitness, or the shape and re-shaping practices involved with preparing data for machine learning algorithms. Quantitative analysis of computationally generated topics indicates that while the community writes about diverse subject matter, individual scholars focus their attention on only a couple of topics. Four categories of informal scholarly communication emerged from the qualitative analysis: quasi-academic, para-academic, meta-academic, and extra-academic. The quasi and para-academic categories represent discourse with scholarly value within the digital humanities community, but do not necessarily have an obvious path into formal publication and preservation. A conceptual model, the (in)visible college, is introduced for situating scholarly communication on blogs and the Open Web. An (in)visible college is a kind of scholarly communication that is informal, yet visible at scale. This combination of factors opens up a new space for the study of scholarly communities and communication. While (in)invisible colleges are programmatically observable, care must be taken with any effort to count and measure knowledge work in these spaces. This is the first systematic, data driven analysis of the digital humanities and lays the groundwork for subsequent social studies of digital humanities.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111592/1/mcburton_1.pd

Deep Blue Documents at the University of Michigan

Using Social Media Data to Analyse Issue Engagement During the 2017 German Federal Election

Author: Bazo Alexander
Elsweiler David
Meier Florian Maximilian
Publication venue
Publication date: 01/01/2021
Field of study

VBN

Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

Author: A Bifet
A Gruzd
A Guerbas
A Martínez-Ruiz
A Noruzi
A Noruzi
A Rettinger
A Schubert
A Zuccala
AB Barragáns-Martínez
ARH Fischer
B Mobasher
B Mobasher
B Yang
B Yang
BN Miller
C Romero
C Wang
C Woo-Young
C-L Hsu
CJ Williams
D Ai
D Minguillo
D Pierrakos
D Stuart
D Wilkinson
David Gunnarsson Lorentzen
E Angus
E Kontopoulos
E Orduña-Malea
E Otte
E Romero-Frías
F Aminpour
F Barjak
F Didegah
FM Facca
G Lappas
G Paliouras
G Qiu
G Somprasertsri
GD Kumar
H Kretschmer
H Small
H-F Li
H-W Park
H-W Park
H-W Park
I Aguillo
I-C Yeh
IF Aguillo
J Bar-Ilan
J Bar-Ilan
J Borges
J Canny
J Fernández
J Srivastava
J-C Ou
JA Kirby
JA Pratt
JD Velásquez
JD Velásquez
JL Ortega
JL Ortega
JL Ortega
JL Ortega
JM Kleinberg
JW Palmer
K Holmberg
K Holmberg
K Jonkers
K Poongothai
K-Y Wang
KA-I Nekaris
L Björneborn
L Björneborn
L Björneborn
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Zoonen Van
L-W Ku
M Asadi
M Biehl
M Chau
M Cheong
M Deshpande
M Efron
M Eirinaki
M Erfanmanesh
M Shekofteh
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M-L Shyu
MA Bayir
MA Islam
MA Islam
MR Martínez-Torres
MR Martínez-Torres
O Arbelaitz
O Etzioni
O Nasraoui
O Nasraoui
P Ingwersen
P Wang
P Wang
P-H Chou
PB Lang
PB Lang
Q He
Q Zhang
R Ball
R Das
R Duane Ireland
R Kosala
R Malinský
RL Glass
S Alsaleh
S Brin
S Kundu
S Milgram
S-H Lin
SA Hale
SE Cho
T Becher
T Hofmann
T Holloway
T Leeuwen Van
T Takahashi
TC Almind
TJ Ruller
V Panchal
V Popova
VD Blondel
WE Nwagwu
X Polanco
Y Lai
Y Nam
Y Zhang
Yuan Shunbo
Z Huang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

Crossref

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

SWKM 2008: Social Web and Knowledge Management, Proceedings:CEUR Workshop Proceedings

Author
Publication venue: CEUR Workshop Proceedings
Publication date: 01/01/2008
Field of study

VBN

A treatise on Web 2.0 with a case study from the financial markets

Author: Martin D. Sykora (7121681)
Publication venue
Publication date: 01/01/2012
Field of study

There has been much hype in vocational and academic circles surrounding the emergence of web 2.0 or social media; however, relatively little work was dedicated to substantiating the actual concept of web 2.0. Many have dismissed it as not deserving of this new title, since the term web 2.0 assumes a certain interpretation of web history, including enough progress in certain direction to trigger a succession [i.e. web 1.0 → web 2.0]. Others provided arguments in support of this development, and there has been a considerable amount of enthusiasm in the literature. Much research has been busy evaluating current use of web 2.0, and analysis of the user generated content, but an objective and thorough assessment of what web 2.0 really stands for has been to a large extent overlooked. More recently the idea of collective intelligence facilitated via web 2.0, and its potential applications have raised interest with researchers, yet a more unified approach and work in the area of collective intelligence is needed. This thesis identifies and critically evaluates a wider context for the web 2.0 environment, and what caused it to emerge; providing a rich literature review on the topic, a review of existing taxonomies, a quantitative and qualitative evaluation of the concept itself, an investigation of the collective intelligence potential that emerges from application usage. Finally, a framework for harnessing collective intelligence in a more systematic manner is proposed. In addition to the presented results, novel methodologies are also introduced throughout this work. In order to provide interesting insight but also to illustrate analysis, a case study of the recent financial crisis is considered. Some interesting results relating to the crisis are revealed within user generated content data, and relevant issues are discussed where appropriate

Loughborough University Institutional Repository

Analyzing the Dynamics of Communication in Online Social Networks

Author: Chih-Hui Lai Mor Naaman Jeffrey Boase. Is it really about me? message content in social awareness streams. In CSCW ’10: Proceedings of the
CR Berger
D. Liben-Nowell
DJ Watts
Eytan Adar and Lada A. Adamic. Tracking information epidemics in blogspace. In WI ’05: Proceedings of the
FM Bass
J Coleman
L Adamic
M Mcpherson
M McPherson
MS Granovetter
Munmun De Choudhury Hari Sundaram, Ajita John, and Dorée Duncan Seligmann. Social synchrony: Predicting mimicry of user actions in online social media. In CSE ’09: Proceedings of the
RB Cialdini
RS Burt
RS Burt
SL Feld
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mapping the (R-)Evolution of Technological Fields:A Semantic Network Approach

Author: A. Davies
A. Lopolito
Alex T. Kalinka
B. Verspagen
C.S. Wagner
D.A. McFarland
F. Radicchi
F. Timothy
G. Dosi
I. Wartburg von
J.W. Mohr
K. Frenken
K. Pavitt
K. Pavitt
L. Fleming
L. Fleming
M. Newman
M.P. Hekkert
P. DiMaggio
P.J. Mucha
R. Fontana
R.B. Bradford
S. Kaplan
S.-H. Chen
S.C. Deerwester
T. Opsahl
Vincent D Blondel
Y.-Y. Ahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

VBN

Web Science: Understanding the Emergence of Macro-Level Features on the World Wide Web

Author: Kieron O'Hara
Publication venue: 'Now Publishers'
Publication date: 01/01/2013
Field of study

Crossref

Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

Author: Cull A.
Fry A.
Rush Robert
Steel M.
Publication venue: University of Stirling
Publication date: 01/03/2001
Field of study

The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

PubMed Central

Stirling Online Research Repository

Queen Margaret University eResearch

University of St. Andrews - Pure