33,735 research outputs found
Search Bias Quantification: Investigating Political Bias in Social Media and Web Search
Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sourcesâinput data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systemsâTwitter social media search and Google web searchâfor queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.publishe
Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy
Expert finding is an information retrieval task concerned with the search for
the most knowledgeable people, in some topic, with basis on documents
describing peoples activities. The task involves taking a user query as input
and returning a list of people sorted by their level of expertise regarding the
user query. This paper introduces a novel approach for combining multiple
estimators of expertise based on a multisensor data fusion framework together
with the Dempster-Shafer theory of evidence and Shannon's entropy. More
specifically, we defined three sensors which detect heterogeneous information
derived from the textual contents, from the graph structure of the citation
patterns for the community of experts, and from profile information about the
academic experts. Given the evidences collected, each sensor may define
different candidates as experts and consequently do not agree in a final
ranking decision. To deal with these conflicts, we applied the Dempster-Shafer
theory of evidence combined with Shannon's Entropy formula to fuse this
information and come up with a more accurate and reliable final ranking list.
Experiments made over two datasets of academic publications from the Computer
Science domain attest for the adequacy of the proposed approach over the
traditional state of the art approaches. We also made experiments against
representative supervised state of the art algorithms. Results revealed that
the proposed method achieved a similar performance when compared to these
supervised techniques, confirming the capabilities of the proposed framework
Hybrid Search: Effectively Combining Keywords and Semantic Searches
This paper describes hybrid search, a search method supporting both document and knowledge retrieval via the flexible combination of ontologybased search and keyword-based matching. Hybrid search smoothly copes with
lack of semantic coverage of document content, which is one of the main limitations of current semantic search methods. In this paper we define hybrid search formally, discuss its compatibility with the current semantic trends and present a reference implementation: K-Search. We then show how the method outperforms both keyword-based search and pure semantic search in terms of precision and recall in a set of experiments performed on a collection of about 18.000 technical documents. Experiments carried out with professional users show that users understand the paradigm and consider it very powerful and reliable. K-Search has been ported to two applications released at Rolls-Royce
plc for searching technical documentation about jet engines
A probabilistic justification for using tf.idf term weighting in information retrieval
This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf.idf term weighting. The paper shows that the new probabilistic interpretation of tf.idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the TREC collection shows that the linguistically motivated weighting algorithm outperforms the popular BM25 weighting algorithm
Business Process Retrieval Based on Behavioral Semantics
This paper develops a framework for retrieving business processes considering search requirements based on behavioral semantics properties; it presents a framework called "BeMantics" for retrieving business processes based on structural, linguistics, and behavioral semantics properties. The relevance of the framework is evaluated retrieving business processes from a repository, and collecting a set of relevant business processes manually issued by human judges. The "BeMantics" framework scored high precision values (0.717) but low recall values (0.558), which implies that even when the framework avoided false negatives, it prone to false positives. The highest pre- cision value was scored in the linguistic criterion showing that using semantic inference in the tasks comparison allowed to reduce around 23.6 % the number of false positives. Using semantic inference to compare tasks of business processes can improve the precision; but if the ontologies are from narrow and specific domains, they limit the semantic expressiveness obtained with ontologies from more general domains. Regarding the perform- ance, it can be improved by using a filter phase which indexes business processes taking into account behavioral semantics propertie
Combination of content analysis and context features for digital photograph retrieval.
In recent years digital cameras have seen an enormous rise
in popularity, leading to a huge increase in the quantity of
digital photos being taken. This brings with it the challenge of organising these large collections. The MediAssist project uses date/time and GPS location for the
organisation of personal collections. However, this context
information is not always sufficient to support retrieval
when faced with a large, shared, archive made up of
photos from a number of users. We present work in this
paper which retrieves photos of known objects (buildings,
monuments) using both location information and content-based
retrieval tools from the AceToolbox. We show that
for this retrieval scenario, where a user is searching for
photos of a known building or monument in a large shared
collection, content-based techniques can offer a significant
improvement over ranking based on context (specifically
location) alone
- âŠ