Search CORE

11,953 research outputs found

Meeting of the MINDS: an information retrieval research agenda

Author: Allan J.
Callan J.
Clarke C.L.A.
Dumais S.
Evans D.A.
Sanderson M.
Zhai C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2007
Field of study

Since its inception in the late 1950s, the field of Information Retrieval (IR) has developed tools that help people find, organize, and analyze information. The key early influences on the field are well-known. Among them are H. P. Luhn's pioneering work, the development of the vector space retrieval model by Salton and his students, Cleverdon's development of the Cranfield experimental methodology, Spärck Jones' development of idf, and a series of probabilistic retrieval models by Robertson and Croft. Until the development of the WorldWideWeb (Web), IR was of greatest interest to professional information analysts such as librarians, intelligence analysts, the legal community, and the pharmaceutical industry

White Rose Research Online

The Blogosphere at a Glance — Content-Based Structures Made Simple

Author: Boman Magnus
Görnerup Olof
Publication venue
Publication date: 01/01/2011
Field of study

A network representation based on a basic wordoverlap similarity measure between blogs is introduced. The simplicity of the representation renders it computationally tractable, transparent and insensitive to representation-dependent artifacts. Using Swedish blog data, we demonstrate that the representation, in spite of its simplicity, manages to capture important structural properties of the content in the blogosphere. First, blogs that treat similar subjects are organized in distinct network clusters. Second, the network is hierarchically organized as clusters in turn form higher-order clusters: a compound structure reminiscent of a blog taxonomy

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recommended from our members

Navigating the information landscape

Author: Coonan Emma
Publication venue
Publication date: 01/10/2011
Field of study

Preprint of column segment to be published in Serials Librarian 61(3), 2011.This article explores the tension between the structures by which the library organises and presents information, and the ways in which students and researchers access, use and conceptualise knowledge. I suggest that while knowledge structures are vital to learning and research, an overemphasis on structurality is mistaken, and can lead to an inappropriately positivist approach which impedes the research mission. The article examines various metaphoric ways of negotiating meaning and navigating information structures, and of crossing the threshold of structuralit

Apollo (Cambridge)

An integrated ranking algorithm for efficient information computing in social networks

Author: Suri Pushpa R.
Taneja Harmunish
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 06/04/2012
Field of study

Social networks have ensured the expanding disproportion between the face of WWW stored traditionally in search engine repositories and the actual ever changing face of Web. Exponential growth of web users and the ease with which they can upload contents on web highlights the need of content controls on material published on the web. As definition of search is changing, socially-enhanced interactive search methodologies are the need of the hour. Ranking is pivotal for efficient web search as the search performance mainly depends upon the ranking results. In this paper new integrated ranking model based on fused rank of web object based on popularity factor earned over only valid interlinks from multiple social forums is proposed. This model identifies relationships between web objects in separate social networks based on the object inheritance graph. Experimental study indicates the effectiveness of proposed Fusion based ranking algorithm in terms of better search results.Comment: 14 pages, International Journal on Web Service Computing (IJWSC), Vol.3, No.1, March 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Exploiting Social Media Sources for Search, Fusion and Evaluation

Author: Lee Chia-Jung
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

The web contains heterogeneous information that is generated with different characteristics and is presented via different media. Social media, as one of the largest content carriers, has generated information from millions of users worldwide, creating material rapidly in all types of forms such as comments, images, tags, videos and ratings, etc. In social applications, the formation of online communities contributes to conversations of substantially broader aspects, as well as unfiltered opinions about subjects that are rarely covered in public media. Information accrued on social platforms, therefore, presents a unique opportunity to augment web sources such as Wikipedia or news pages, which are usually characterized as being more formal. The goal of this dissertation is to investigate in depth how social data can be exploited and applied in the context of three fundamental information retrieval (IR) tasks: search, fusion, and evaluation. Improving search performance has consistently been a major focus in the IR community. Given the in-depth discussions and active interactions contained in social media, we present approaches to incorporating this type of data to improve search on general web corpora. In particular, we propose two graph-based frameworks, social anchor and information network, to associate related web and social content, where information sources of diverse characteristics can be used to complement each other in a unified manner. We investigate how the enriched representation can potentially reduce vocabulary mismatch and improve retrieval effectiveness. Presenting social media content to users is valuable particularly for queries intended for time-sensitive events or community opinions. Current major search engines commonly blend results from different search services (or verticals) into core web results. Motivated by this real-world need, we explore ways to merge results from different web and social services into a single ranked list. We present an optimization framework for fusion, where impact of documents, ranked lists, and verticals can be modeled simultaneously to maximize performance. Evaluating search system performance has largely relied on creating reusable test collections in IR. Traditional ways to creating evaluation sets can require substantial manual effort. To reduce such effort, we explore an approach to automating the process of collecting pairs of queries and relevance judgments, using high quality social media, Community Question Answering (CQA). Our approach is based on the idea that CQA services support platforms for users to raise questions and to share answers, therefore encoding the associations between real user information needs and real user assessments. To demonstrate the effectiveness of our approaches, we conduct extensive retrieval and fusion experiments, as well as verify the reliability of the new, CQA-based evaluation test sets

ScholarWorks@UMass Amherst

The 'who' and 'what' of #diabetes on Twitter

Author: Barahona Mauricio
Beguerisse-Díaz Mariano
Garduño-Hernández Guillermo
McLennan Amy K.
Ulijaszek Stanley J.
Publication venue: 'SAGE Publications'
Publication date: 16/12/2016
Field of study

Social media are being increasingly used for health promotion, yet the landscape of users, messages and interactions in such fora is poorly understood. Studies of social media and diabetes have focused mostly on patients, or public agencies addressing it, but have not looked broadly at all the participants or the diversity of content they contribute. We study Twitter conversations about diabetes through the systematic analysis of 2.5 million tweets collected over 8 months and the interactions between their authors. We address three questions: (1) what themes arise in these tweets?, (2) who are the most influential users?, (3) which type of users contribute to which themes? We answer these questions using a mixed-methods approach, integrating techniques from anthropology, network science and information retrieval such as thematic coding, temporal network analysis, and community and topic detection. Diabetes-related tweets fall within broad thematic groups: health information, news, social interaction, and commercial. At the same time, humorous messages and references to popular culture appear consistently, more than any other type of tweet. We classify authors according to their temporal 'hub' and 'authority' scores. Whereas the hub landscape is diffuse and fluid over time, top authorities are highly persistent across time and comprise bloggers, advocacy groups and NGOs related to diabetes, as well as for-profit entities without specific diabetes expertise. Top authorities fall into seven interest communities as derived from their Twitter follower network. Our findings have implications for public health professionals and policy makers who seek to use social media as an engagement tool and to inform policy design.Comment: 25 pages, 11 figures, 7 tables. Supplemental spreadsheet available from http://journals.sagepub.com/doi/suppl/10.1177/2055207616688841, Digital Health, Vol 3, 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Spiral - Imperial College Digital Repository

A bagging SVM to learn from positive and unlabeled examples

Author: Mordelet Fantine
Vert Jean-Philippe
Publication venue
Publication date: 19/07/2010
Field of study

We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications such as information retrieval or gene ranking, when we have identified a set of data of interest sharing a particular property, and we wish to automatically retrieve additional data sharing the same property among a large and easily available pool of unlabeled data. We propose a conceptually simple method, akin to bagging, to approach both inductive and transductive PU learning problems, by converting them into series of supervised binary classification problems discriminating the known positive examples from random subsamples of the unlabeled set. We empirically demonstrate the relevance of the method on simulated and real data, where it performs at least as well as existing methods while being faster

arXiv.org e-Print Archive

HAL-MINES ParisTech