11,953 research outputs found
Meeting of the MINDS: an information retrieval research agenda
Since its inception in the late 1950s, the field of Information Retrieval (IR) has developed tools that help people find, organize, and analyze information. The key early influences on the field are well-known. Among them are H. P. Luhn's pioneering work, the development of the vector space retrieval model by Salton and his students, Cleverdon's development of the Cranfield experimental methodology, Spärck Jones' development of idf, and a series of probabilistic retrieval models by Robertson and Croft. Until the development of the WorldWideWeb (Web), IR was of greatest interest to professional information analysts such as librarians, intelligence analysts, the legal community, and the pharmaceutical industry
The Blogosphere at a Glance — Content-Based Structures Made Simple
A network representation based on a basic wordoverlap
similarity measure between blogs is introduced.
The simplicity of the representation renders
it computationally tractable, transparent and insensitive
to representation-dependent artifacts. Using
Swedish blog data, we demonstrate that the representation,
in spite of its simplicity, manages to capture
important structural properties of the content
in the blogosphere. First, blogs that treat similar
subjects are organized in distinct network clusters.
Second, the network is hierarchically organized as
clusters in turn form higher-order clusters: a compound
structure reminiscent of a blog taxonomy
Recommended from our members
Navigating the information landscape
Preprint of column segment to be published in Serials Librarian 61(3), 2011.This article explores the tension between the structures by which the library organises and presents information, and the ways in which students and researchers access, use and conceptualise knowledge. I suggest that while knowledge structures are vital to learning and research, an overemphasis on structurality is mistaken, and can lead to an inappropriately positivist approach which impedes the research mission. The article examines various metaphoric ways of negotiating meaning and navigating information structures, and of crossing the threshold of structuralit
An integrated ranking algorithm for efficient information computing in social networks
Social networks have ensured the expanding disproportion between the face of
WWW stored traditionally in search engine repositories and the actual ever
changing face of Web. Exponential growth of web users and the ease with which
they can upload contents on web highlights the need of content controls on
material published on the web. As definition of search is changing,
socially-enhanced interactive search methodologies are the need of the hour.
Ranking is pivotal for efficient web search as the search performance mainly
depends upon the ranking results. In this paper new integrated ranking model
based on fused rank of web object based on popularity factor earned over only
valid interlinks from multiple social forums is proposed. This model identifies
relationships between web objects in separate social networks based on the
object inheritance graph. Experimental study indicates the effectiveness of
proposed Fusion based ranking algorithm in terms of better search results.Comment: 14 pages, International Journal on Web Service Computing (IJWSC),
Vol.3, No.1, March 201
Recommended from our members
Exploiting Social Media Sources for Search, Fusion and Evaluation
The web contains heterogeneous information that is generated with different characteristics and is presented via different media. Social media, as one of the largest content carriers, has generated information from millions of users worldwide, creating material rapidly in all types of forms such as comments, images, tags, videos and ratings, etc. In social applications, the formation of online communities contributes to conversations of substantially broader aspects, as well as unfiltered opinions about subjects that are rarely covered in public media. Information accrued on social platforms, therefore, presents a unique opportunity to augment web sources such as Wikipedia or news pages, which are usually characterized as being more formal. The goal of this dissertation is to investigate in depth how social data can be exploited and applied in the context of three fundamental information retrieval (IR) tasks: search, fusion, and evaluation. Improving search performance has consistently been a major focus in the IR community. Given the in-depth discussions and active interactions contained in social media, we present approaches to incorporating this type of data to improve search on general web corpora. In particular, we propose two graph-based frameworks, social anchor and information network, to associate related web and social content, where information sources of diverse characteristics can be used to complement each other in a unified manner. We investigate how the enriched representation can potentially reduce vocabulary mismatch and improve retrieval effectiveness. Presenting social media content to users is valuable particularly for queries intended for time-sensitive events or community opinions. Current major search engines commonly blend results from different search services (or verticals) into core web results. Motivated by this real-world need, we explore ways to merge results from different web and social services into a single ranked list. We present an optimization framework for fusion, where impact of documents, ranked lists, and verticals can be modeled simultaneously to maximize performance. Evaluating search system performance has largely relied on creating reusable test collections in IR. Traditional ways to creating evaluation sets can require substantial manual effort. To reduce such effort, we explore an approach to automating the process of collecting pairs of queries and relevance judgments, using high quality social media, Community Question Answering (CQA). Our approach is based on the idea that CQA services support platforms for users to raise questions and to share answers, therefore encoding the associations between real user information needs and real user assessments. To demonstrate the effectiveness of our approaches, we conduct extensive retrieval and fusion experiments, as well as verify the reliability of the new, CQA-based evaluation test sets
The 'who' and 'what' of #diabetes on Twitter
Social media are being increasingly used for health promotion, yet the
landscape of users, messages and interactions in such fora is poorly
understood. Studies of social media and diabetes have focused mostly on
patients, or public agencies addressing it, but have not looked broadly at all
the participants or the diversity of content they contribute. We study Twitter
conversations about diabetes through the systematic analysis of 2.5 million
tweets collected over 8 months and the interactions between their authors. We
address three questions: (1) what themes arise in these tweets?, (2) who are
the most influential users?, (3) which type of users contribute to which
themes? We answer these questions using a mixed-methods approach, integrating
techniques from anthropology, network science and information retrieval such as
thematic coding, temporal network analysis, and community and topic detection.
Diabetes-related tweets fall within broad thematic groups: health information,
news, social interaction, and commercial. At the same time, humorous messages
and references to popular culture appear consistently, more than any other type
of tweet. We classify authors according to their temporal 'hub' and 'authority'
scores. Whereas the hub landscape is diffuse and fluid over time, top
authorities are highly persistent across time and comprise bloggers, advocacy
groups and NGOs related to diabetes, as well as for-profit entities without
specific diabetes expertise. Top authorities fall into seven interest
communities as derived from their Twitter follower network. Our findings have
implications for public health professionals and policy makers who seek to use
social media as an engagement tool and to inform policy design.Comment: 25 pages, 11 figures, 7 tables. Supplemental spreadsheet available
from http://journals.sagepub.com/doi/suppl/10.1177/2055207616688841, Digital
Health, Vol 3, 201
A bagging SVM to learn from positive and unlabeled examples
We consider the problem of learning a binary classifier from a training set
of positive and unlabeled examples, both in the inductive and in the
transductive setting. This problem, often referred to as \emph{PU learning},
differs from the standard supervised classification problem by the lack of
negative examples in the training set. It corresponds to an ubiquitous
situation in many applications such as information retrieval or gene ranking,
when we have identified a set of data of interest sharing a particular
property, and we wish to automatically retrieve additional data sharing the
same property among a large and easily available pool of unlabeled data. We
propose a conceptually simple method, akin to bagging, to approach both
inductive and transductive PU learning problems, by converting them into series
of supervised binary classification problems discriminating the known positive
examples from random subsamples of the unlabeled set. We empirically
demonstrate the relevance of the method on simulated and real data, where it
performs at least as well as existing methods while being faster
- …