Search CORE

332,183 research outputs found

The structure of broad topics on the web

Author: David M. Pennock
Kunal Punera
Mukul M. Joshi
Soumen Chakrabarti
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Recommended from our members

Controversy Analysis and Detection

Author: Dori-Hacohen Shiri
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the filter bubble effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple list of 10 links . This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two themes: computational models of controversy, and controversies occurring in neighborhoods of topics. Our broad contributions are: (1) Presenting a theoretical framework for modeling controversy as contention among populations; (2) Constructing the first automated approach to detecting controversy on the web, using a KNN classifier that maps from the web to similar Wikipedia articles; and (3) Proposing a novel controversy detection in Wikipedia by employing a stacked model using a combination of link structure and similarity. We conclude this work by discussing the challenging technical, societal and ethical implications of this emerging research area and proposing avenues for future work

ScholarWorks@UMass Amherst

Developing an ontology of mathematical logic

Author: Boyatt Russell
Joy Mike
Publication venue: Technical University of Civil Engineering Bucharest
Publication date: 01/01/2010
Field of study

An ontology provides a mechanism to formally represent a body of knowledge. Ontologies are one of the key technologies supporting the Semantic Web and the desire to add meaning to the information available on the World Wide Web. They provide the mechanism to describe a set of concepts, their properties and their relations to give a shared representation of knowledge. The MALog project are developing an ontology to support the development of high-quality learning materials in the general area of mathematical logic. This ontology of mathematical logic will form the basis of the semantic architecture allowing us to relate different learning objects and recommend appropriate learning paths. This paper reviews the technologies used to construct the ontology, the use of the ontology to support learning object development and explores the potential future use of the ontology

Warwick Research Archives Portal Repository

Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database

Author: A Trotman
Anne-Marie Vercoustre
James A. Thom
Jovan Pehcevski
W Meier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments ("General" and "Specific") and two categories of topics ("Broad" and "Narrow"). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call "Coherent Retrieval Elements". The results of our experiments show that -- when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics) -- the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.Comment: Postprint version. The editor version can be accessed through the DO

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

RMIT Research Repository

Hal-Diderot

Overview of the personalized and collaborative information retrieval (PIR) track at FIRE-2011

Author: Curtis Keith
Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Li Wei B.
Publication venue
Publication date: 02/12/2011
Field of study

The Personalized and collaborative Information Retrieval (PIR) track at FIRE 2011 was organized with an aim to extend standard information retrieval (IR) ad-hoc test collection design to facilitate research on personalized and collaborative IR by collecting additional meta-information during the topic (query) development process. A controlled query generation process through task-based activities with activity logging was used for each topic developer to construct the final list of topics. The standard ad-hoc collection is thus accompanied by a new set of thematically related topics and the associated log information. We believe this can better simulate a real-world search scenario and encourage mining user information from the logs to improve IR effectiveness. A set of 25 TREC formatted topics and the associated metadata of activity logs were released for the participants to use. In this paper we illustrate the data construction phase in detail and also outline two simple ways of using the additional information from the logs to improve retrieval effectiveness

Irish Universities

DCU Online Research Access Service

A Broad Evaluation of the Tor English Content Ecosystem

Author: Allahyari Mehdi
Doran Derek
Sadeghi Reza
Zabihimayvan Mahdieh
Publication venue
Publication date: 01/01/2019
Field of study

Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page

arXiv.org e-Print Archive

Crossref

CORE

Broad expertise retrieval in sparse data environments

Author: Azzopardi L.
Balog K.
Bogers T.
de Rijke M.
van den Bosch A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings

Crossref

Enlighten

International Migration, Integration and Social Cohesion online publications

Escaping the Trap of too Precise Topic Queries

Author: P. Cairns
P. Ion
T.K. Landauer
Publication venue
Publication date: 01/01/2013
Field of study

At the very center of digital mathematics libraries lie controlled vocabularies which qualify the {\it topic} of the documents. These topics are used when submitting a document to a digital mathematics library and to perform searches in a library. The latter are refined by the use of these topics as they allow a precise classification of the mathematics area this document addresses. However, there is a major risk that users employ too precise topics to specify their queries: they may be employing a topic that is only "close-by" but missing to match the right resource. We call this the {\it topic trap}. Indeed, since 2009, this issue has appeared frequently on the i2geo.net platform. Other mathematics portals experience the same phenomenon. An approach to solve this issue is to introduce tolerance in the way queries are understood by the user. In particular, the approach of including fuzzy matches but this introduces noise which may prevent the user of understanding the function of the search engine. In this paper, we propose a way to escape the topic trap by employing the navigation between related topics and the count of search results for each topic. This supports the user in that search for close-by topics is a click away from a previous search. This approach was realized with the i2geo search engine and is described in detail where the relation of being {\it related} is computed by employing textual analysis of the definitions of the concepts fetched from the Wikipedia encyclopedia.Comment: 12 pages, Conference on Intelligent Computer Mathematics 2013 Bath, U

arXiv.org e-Print Archive

Crossref