400,639 research outputs found

    Multilingual Word Sense Induction to Improve Web Search Result Clustering

    Get PDF
    In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically in- ducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innovative Word Sense Induction method based on multilingual data; key to our approach was the idea that a multilingual context representation, where the context of the words is expanded by considering its translations in different languages, may im- prove the WSI results; the experiments showed a clear per- formance gain. In this paper we give some preliminary ideas to exploit our multilingual Word Sense Induction method to Web search result clustering

    The representation of voluntourism in search engines: The case of South Africa

    Get PDF
    © 2015, © 2015 Government Technical Advisory Centre (GTAC). This paper responds to the paucity of research on the linkages between voluntourism and digital technology and seeks to understand the online representation of the phenomenon in a developing context. In particular, the researchers investigate the so-called ‘online domain’ of voluntourism in South Africa. The researchers collected a series of web results from search engines and analysed the presence of traditional and social media websites, the most relevant presented topics, and the type of argumentation found. Results identify the context and representation of voluntourism as it transpires virtually. This will contribute to the understanding of the interplay between voluntourism and digital technology, with specific emphasis on web presence. Ultimately, results will shed light on how digitally accessible voluntourism is in South Africa and will set the basis for future investigations

    Inferring user intent in web search by exploiting social annotations

    Full text link
    This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, http://dx.doi.org/10.1145/1835449.1835636In this paper, we present a folksonomy-based approach for implicit user intent extraction during a Web search process. We present a number of result re-ranking techniques based on this representation that can be applied to any Web search engine. We perform a user experiment the results of which indicate that this type of representation is better at context extraction than using the actual textual content of the document.This research was partially supported by the Spanish Ministry of Science and Education (TIN2008-06566-C04-02) and the Regional Government of Madrid (S2009TIC-1542)

    Auditing the representation of migrants in image web search results

    Full text link
    Search engines serve as information gatekeepers on a multitude of topics that are prone to gender, ethnicity, and race misrepresentations. In this paper, we specifically look at the image search representation of migrant population groups that are often subjected to discrimination and biased representation in mainstream media, increasingly so with the rise of right-wing populist actors in the Western countries. Using multiple (n = 200) virtual agents to simulate human browsing behavior in a controlled environment, we collect image search results related to various terms referring to migrants (e.g., expats, immigrants, and refugees, seven queries in English and German used in total) from the six most popular search engines. Then, with the aid of manual coding, we investigate which features are used to represent these groups and whether the representations are subjected to bias. Our findings indicate that search engines reproduce ethnic and gender biases common for mainstream media representations of different subgroups of migrant population. For instance, migrant representations tend to be highly racialized, and female migrants as well as migrants at work tend to be underrepresented in the results. Our findings highlight the need for further algorithmic impact auditing studies in the context of representation of potentially vulnerable groups in web search results

    Auditing the representation of migrants in image web search results

    Get PDF
    Search engines serve as information gatekeepers on a multitude of topics that are prone to gender, ethnicity, and race misrepresentations. In this paper, we specifically look at the image search representation of migrant population groups that are often subjected to discrimination and biased representation in mainstream media, increasingly so with the rise of right-wing populist actors in the Western countries. Using multiple (n = 200) virtual agents to simulate human browsing behavior in a controlled environment, we collect image search results related to various terms referring to migrants (e.g., expats, immigrants, and refugees, seven queries in English and German used in total) from the six most popular search engines. Then, with the aid of manual coding, we investigate which features are used to represent these groups and whether the representations are subjected to bias. Our findings indicate that search engines reproduce ethnic and gender biases common for mainstream media representations of different subgroups of migrant population. For instance, migrant representations tend to be highly racialized, and female migrants as well as migrants at work tend to be underrepresented in the results. Our findings highlight the need for further algorithmic impact auditing studies in the context of representation of potentially vulnerable groups in web search results

    Faceted Search of Heterogeneous Geographic Information for Dynamic Map Projection

    Get PDF
    This paper proposes a faceted information exploration model that supports coarse-grained and fine-grained focusing of geographic maps by offering a graphical representation of data attributes within interactive widgets. The proposed approach enables (i) a multi-category projection of long-lasting geographic maps, based on the proposal of efficient facets for data exploration in sparse and noisy datasets, and (ii) an interactive representation of the search context based on widgets that support data visualization, faceted exploration, category-based information hiding and transparency of results at the same time. The integration of our model with a semantic representation of geographical knowledge supports the exploration of information retrieved from heterogeneous data sources, such as Public Open Data and OpenStreetMap. We evaluated our model with users in the OnToMap collaborative Web GIS. The experimental results show that, when working on geographic maps populated with multiple data categories, it outperforms simple category-based map projection and traditional faceted search tools, such as checkboxes, in both user performance and experience

    SMAPH: A Piggyback Approach for Entity-Linking in Web Queries

    Get PDF
    We study the problem of linking the terms of a web-search query to a semantic representation given by the set of entities (a.k.a. concepts) mentioned in it. We introduce SMAPH, a system that performs this task using the information coming from a web search engine, an approach we call “piggybacking.” We employ search engines to alleviate the noise and irregularities that characterize the language of queries. Snippets returned as search results also provide a context for the query that makes it easier to disambiguate the meaning of the query. From the search results, SMAPH builds a set of candidate entities with high coverage. This set is filtered by linking back the candidate entities to the terms occurring in the input query, ensuring high precision. A greedy disambiguation algorithm performs this filtering; it maximizes the coherence of the solution by itera- tively discovering the pertinent entities mentioned in the query. We propose three versions of SMAPH that outperform state-of-the-art solutions on the known benchmarks and on the GERDAQ dataset, a novel dataset that we have built specifically for this problem via crowd-sourcing and that we make publicly available

    Deriving implicit user feedback from partial URLs for effective web page retrieval

    Get PDF
    User click-throughs provide a search context for understanding the user need of complex information. This paper re-examines the effectiveness of this approach when based on partial clicked data using the language modeling framework. We expand the original query by topical terms derived from clicked Web pages and enhance early precision via a more compact document representation. Since our URLs of Web pages are stripped, we first reconstruct them at different levels based on different collections. Our experimental results on the GOV2 test collection and AOL query log show improvement by 31.7% and 28.3% significantly in statMAP for two sources of reconstruction and 153 ad-hoc queries. Our model also outperforms pseudo relevance feedback

    Intelligent personalized approaches for semantic search and query expansion

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.In today’s highly advanced technological world, the Internet has taken over all aspects of human life. Many services are advertised and provided to the users through online channels. The user looks for services and obtains them through different search engines. To obtain the best results that meet the needs and requirements of the users, researchers have extensively studied methods such as different personalization methods by which to improve the performance and efficiency of the retrieval process. A key part of the personalization process is the generation of user models. The most commonly used user models are still rather simplistic, representing the user as a vector of ratings or using a set of keywords. Recently, semantic techniques have had a significant importance in the field of personalized querying and personalized web search engines. This thesis focuses on both processes of personalized web search engines, first the reformulation of queries and second ranking query results. The importance of personalized web search lies in its ability to identify users' interests based on their personal profiles. This work contributes to personalized web search services in three aspects. These contributions can be summarized as follows: First, it creates user profiles based on a user’s browsing behaviour, as well as the semantic knowledge of a domain ontology, aiming to improve the quality of the search results. However, it is not easy to acquire personalized web search results, hence one of the problems that is encountered in this approach is how to get a precise representation of the user interests, as well as how to use it to find search results. The second contribution builds on the first contribution. A personalized web search approach is introduced by integrating user context history into the information retrieval process. This integration process aims to provide search results that meet the user’s needs. It also aims to create contextual profiles for the user based on several basic factors: user temporal behaviour during browsing, semantic knowledge of a specific domain ontology, as well as an algorithm based on re-ranking the search results. The previous solutions were related to the re-ranking of the returned search results to match the user’s requirements. The third contribution includes a comparison of three-term weight methods in personalized query expansion. This model has been built to incorporate both latent semantics and weighting terms. Experiments conducted in the real world to evaluate the proposed personalized web search approach; show promising results in the quality of reformulation and re-ranking processes compared to Google engine techniques

    A client side tool for contextual Web search

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.Includes bibliographical references (p. 76-77).This thesis describes the design and development of an application that uses information relevant to the context of a web search for the purpose of improving the search results obtained using standard search engines. The representation of the contextual information is based on a Vector Space Model and is obtained from a set of documents that have been identified as relevant to the context of the search. Two algorithms have been developed for using this contextual representation to re-rank the search results obtained using search engines. In the first algorithm, re-ranking is done based on a comparison of every search result with all the contextual documents. In the second algorithm, only a subset of the contextual documents that relate to the search query is used to measure the relevance of the search results. This subset is identified by mapping the search query onto the Vector Space representation of the contextual documents. A software application was developed using the .NET framework with C# as the implementation language. The software has functionality to enable users to identify contextual documents and perform searches either using a standard search engine or using the above-mentioned algorithms. The software implementation details, and preliminary results regarding the efficiency of the proposed algorithms have been presented.by Hariharan Lakshmanan.S.M
    corecore