Search CORE

55 research outputs found

Exploratory Analysis of Highly Heterogeneous Document Collections

Author: Blei D. M.
Bun K. K.
Maiya A. S.
Manning C. D.
Mihalcea R.
Pecina P.
Ranganathan S. R.
Wagstaff K.
Publication venue
Publication date: 01/01/2013
Field of study

We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Modeling and OLAPing social media : the case of Twitter

Author: Ben Kraiem Maha
Feki Jamel
Khrouf Kaïs
Ravat Franck
Teste Olivier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In the recent year, social networks have revolutionized the ways of interacting and exchanging information on the Internet. Millions of users interact frequently and share variety of digital content with each other. They express their feelings and opinions on every topic of interest. These opinions carry import value for personal, academic, and commercial applications, but the volume and the speed at which these are produced make it a challenging task for researchers and the underlying technologies to provide useful insights into such data. We attempt to extend the established online analytical processing (OLAP) technology to allow multidimensional analysis of social media data. In this paper, we pursue a goal of providing a generic multidimensional model dedicated to the OLAP of social media and specially Twitter. The proposed model reflects on some specifics such as recursive references between tweets, Empty dimension, and different types of hierarchies. It is implemented using NetBeans IDE platform. We present also some experimental results. We expect our proposed approach to be applicable for analyzing the data of other social networks as well

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Toulouse Capitole Publications

Toulouse 1 Capitole Publications

Discovering and Mitigating Social Data Bias

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago. Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect. The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them. The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration

Author: Khalili Ali
Publication venue
Publication date: 26/01/2015
Field of study

The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions

Qucosa - Publikationsserver der Universität Leipzig

2022-2023 School Year

Author: St. Mary\u27s University School of Law
Publication venue: Digital Commons at St. Mary\u27s University
Publication date: 01/01/2022
Field of study

Digital Commons at St. Mary's University, San Antonio

2022-2023 School Year

Author: St. Mary\u27s University School of Law
Publication venue: Digital Commons at St. Mary\u27s University
Publication date: 01/01/2022
Field of study

St Mary's University School of Law Digital Repository

The Janus Faced Scholar:a Festschrift in honour of Peter Ingwersen

Author
Publication venue: Det Informationsvidenskabelige Akademi
Publication date: 01/01/2010
Field of study

Copenhagen University Research Information System

Theories of Informetrics and Scholarly Communication

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 10/02/2021
Field of study

Scientometrics have become an essential element in the practice and evaluation of science and research, including both the evaluation of individuals and national assessment exercises. Yet, researchers and practitioners in this field have lacked clear theories to guide their work. As early as 1981, then doctoral student Blaise Cronin published "The need for a theory of citing" —a call to arms for the fledgling scientometric community to produce foundational theories upon which the work of the field could be based. More than three decades later, the time has come to reach out the field again and ask how they have responded to this call. This book compiles the foundational theories that guide informetrics and scholarly communication research. It is a much needed compilation by leading scholars in the field that gathers together the theories that guide our understanding of authorship, citing, and impact

Directory of Open Access Books (DOAB)

Theories of Informetrics and Scholarly Communication

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Scientometrics have become an essential element in the practice and evaluation of science and research, including both the evaluation of individuals and national assessment exercises. Yet, researchers and practitioners in this field have lacked clear theories to guide their work. As early as 1981, then doctoral student Blaise Cronin published The need for a theory of citing - a call to arms for the fledgling scientometric community to produce foundational theories upon which the work of the field could be based. More than three decades later, the time has come to reach out the field again and ask how they have responded to this call. This book compiles the foundational theories that guide informetrics and scholarly communication research. It is a much needed compilation by leading scholars in the field that gathers together the theories that guide our understanding of authorship, citing, and impact

SSOAR - Social Science Open Access Repository

Youth-Led Social Movements and Peacebuilding in Africa

Author
Publication venue: 'Informa UK Limited'
Publication date: 18/05/2022
Field of study

This book critically examines and analyses the active role played by youth-led social movements in pushing for change and promoting peacebuilding in Africa, and their long-term impacts on society. Africa’s history is characterised by youth movements. The continent’s youth populations played pivotal roles in the campaign against colonialism and, ever since independence, Africa’s youth have been at the center of social mobilisation. Most recently, social media has contributed significantly to a further rise in youth-led social movements. However, the impact of youth voices is often marginalised by patriarchal and gerontocratic approaches to governance, denying them the place, voice, and recognition that they deserve. Drawing on empirical evidence from across the continent, this book analyses the drivers and long-term impacts of youth-led social movements on politics in African societies, especially in the area of peacebuilding. The book draws attention to the innovative ways in which young people continue to seek to re-engineer social space and challenge contexts that deny them their voice, place, recognition and identity. This book will be of interest to researchers across the fields of social movement studies, youth studies, peace and conflict studies, history, political sciences, social justice, and African studies

Directory of Open Access Books (DOAB)