58,545 research outputs found

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

    Global Contagion of Non-Viral Information

    Get PDF
    Contagion in Online Social Networks (OSN) is typically measured by the tendency of users to re-post information or to adopt a new behavior after exposure to that information/behavior. Most contagion research is bound by modeling: (i) only local neighbor-to-neighbor contagion (ii) the spread of viral information. However, most contagion events are non-viral and can also occur globally by non-neighbors through for example, exposure to information by exploratory browsing, or by content recommendation algorithms. This study is the first to address the phenomenon of both global and local contagion of non-viral information in a quantitative way. Analysis of Twitter networks reveals the prevailing nature of global contagion, the different temporal patterns between global and local contagion, and the ways it varies across topical categories. An interesting finding shows that users who retweeted due to global contagion have more Followers than those who retweeted due to local contagion

    All in for Privacy: Cultivating a Community of Information Privacy Awareness

    Get PDF
    The Library Freedom Project supports librarianship’s values of freedom of information and privacy by providing relevant tools and education to LIS professionals. A group from the Faculty of Information and Media Studies at Western aligned with the project to encourage student participation in local and global privacy issues. Our programming encourages hands-on use of open source and anti-surveillance software, such as Tor Browser for anonymous web browsing. In addition, we detail how we configured our Tor relay to route anonymous encrypted global traffic, so that other libraries can join the 280 relays currently running in Canada and 7000 worldwide

    A Semantic Graph-Based Approach for Mining Common Topics From Multiple Asynchronous Text Streams

    Get PDF
    In the age of Web 2.0, a substantial amount of unstructured content are distributed through multiple text streams in an asynchronous fashion, which makes it increasingly difficult to glean and distill useful information. An effective way to explore the information in text streams is topic modelling, which can further facilitate other applications such as search, information browsing, and pattern mining. In this paper, we propose a semantic graph based topic modelling approach for structuring asynchronous text streams. Our model in- tegrates topic mining and time synchronization, two core modules for addressing the problem, into a unified model. Specifically, for handling the lexical gap issues, we use global semantic graphs of each timestamp for capturing the hid- den interaction among entities from all the text streams. For dealing with the sources asynchronism problem, local semantic graphs are employed to discover similar topics of different entities that can be potentially separated by time gaps. Our experiment on two real-world datasets shows that the proposed model significantly outperforms the existing ones

    Context-based conceptual image indexing

    No full text
    International audienceAutomatic semantic classification of image databases is very useful for users searching and browsing, but it is at the same time a very challenging research problem as well. Local features based image classification is one of the key issues to bridge the semantic gap in order to detect concepts. This paper proposes a framework for incorporating contextual information into the concept detection process. The proposed method combines local and global classifiers with stacking, using SVM.We studied the impact of topologic and semantic contexts in concept detection performance and proposed solutions to handle the large amount of dimensions involved in classified data. We conducted experiments on TRECVIDĂŻÂżÂœ04 subset with 48104 images and 5 concepts. We found that the use of context yields a significant improvement both for the topologic and semantic contexts

    Contextualised Browsing in a Digital Library's Living Lab

    Full text link
    Contextualisation has proven to be effective in tailoring \linebreak search results towards the users' information need. While this is true for a basic query search, the usage of contextual session information during exploratory search especially on the level of browsing has so far been underexposed in research. In this paper, we present two approaches that contextualise browsing on the level of structured metadata in a Digital Library (DL), (1) one variant bases on document similarity and (2) one variant utilises implicit session information, such as queries and different document metadata encountered during the session of a users. We evaluate our approaches in a living lab environment using a DL in the social sciences and compare our contextualisation approaches against a non-contextualised approach. For a period of more than three months we analysed 47,444 unique retrieval sessions that contain search activities on the level of browsing. Our results show that a contextualisation of browsing significantly outperforms our baseline in terms of the position of the first clicked item in the result set. The mean rank of the first clicked document (measured as mean first relevant - MFR) was 4.52 using a non-contextualised ranking compared to 3.04 when re-ranking the result lists based on similarity to the previously viewed document. Furthermore, we observed that both contextual approaches show a noticeably higher click-through rate. A contextualisation based on document similarity leads to almost twice as many document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201

    Binary Particle Swarm Optimization based Biclustering of Web usage Data

    Full text link
    Web mining is the nontrivial process to discover valid, novel, potentially useful knowledge from web data using the data mining techniques or methods. It may give information that is useful for improving the services offered by web portals and information access and retrieval tools. With the rapid development of biclustering, more researchers have applied the biclustering technique to different fields in recent years. When biclustering approach is applied to the web usage data it automatically captures the hidden browsing patterns from it in the form of biclusters. In this work, swarm intelligent technique is combined with biclustering approach to propose an algorithm called Binary Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The main objective of this algorithm is to retrieve the global optimal bicluster from the web usage data. These biclusters contain relationships between web users and web pages which are useful for the E-Commerce applications like web advertising and marketing. Experiments are conducted on real dataset to prove the efficiency of the proposed algorithms

    Online advertising: analysis of privacy threats and protection approaches

    Get PDF
    Online advertising, the pillar of the “free” content on the Web, has revolutionized the marketing business in recent years by creating a myriad of new opportunities for advertisers to reach potential customers. The current advertising model builds upon an intricate infrastructure composed of a variety of intermediary entities and technologies whose main aim is to deliver personalized ads. For this purpose, a wealth of user data is collected, aggregated, processed and traded behind the scenes at an unprecedented rate. Despite the enormous value of online advertising, however, the intrusiveness and ubiquity of these practices prompt serious privacy concerns. This article surveys the online advertising infrastructure and its supporting technologies, and presents a thorough overview of the underlying privacy risks and the solutions that may mitigate them. We first analyze the threats and potential privacy attackers in this scenario of online advertising. In particular, we examine the main components of the advertising infrastructure in terms of tracking capabilities, data collection, aggregation level and privacy risk, and overview the tracking and data-sharing technologies employed by these components. Then, we conduct a comprehensive survey of the most relevant privacy mechanisms, and classify and compare them on the basis of their privacy guarantees and impact on the Web.Peer ReviewedPostprint (author's final draft

    Digital library access for illiterate users

    Get PDF
    The problems that illiteracy poses in accessing information are gaining attention from the research community. Issues currently being explored include developing an understanding of the barriers to information acquisition experienced by different groups of illiterate information seekers; creating technology, such as software interfaces, that support illiterate users effectively; and tailoring content to increase its accessibility. We have taken a formative evaluation approach to developing and evaluating a digital library interface for illiterate users. We discuss modifications to the Greenstone platform, describe user studies and outline resulting design implications
    • 

    corecore