300,595 research outputs found

    Mining and tracking evolving web user trends from very large web server logs.

    Get PDF
    Online organizations are always in search for innovative marketing strategies to better satisfy their current website users and lure new ones. Thus, recently, many organizations have started to retain all transactions taking place on their website, and tried to utilize this information to better understand and satisfy their users. However, due to the huge amount of transaction data, traditional methods are neither possible nor cost-effective. Hence, the use of effective and automated methods to handle these transactions became imperative. Web Usage Mining is the process of applying data mining techniques on web log data (transactions) to extract the most interesting usage patterns. The usage patterns are stored as profiles (a set of URLs) that can be used in higher-level applications, e.g. a recommendation system, to meet the company\u27s business goals. A lot of research has been conducted on Web Usage Mining, however, little has been done to handle the dynamic nature of web content, the spontaneous changing behavior of users, and the need for scalability in the face of large amounts of data. This thesis proposes a framework that helps capture the changing nature of user behavior on a website. The framework is designed to be applied periodically on incoming web transactions, with new usage data that is similar to older profiles used to update these old profiles, and distinct transactions subjected to a new pattern discovery process. The result of this framework is a set of evolving profiles that represent the usage behavior at any given period of time. These profiles can later be used in higher-level applications, for instance to predict the evolving user\u27s interest as part of an intelligent web personalization framework

    Soft peer review: social software and distributed scientific evaluation

    Get PDF
    The debate on the prospects of peer-review in the Internet age and the increasing criticism leveled against the dominant role of impact factor indicators are calling for new measurable criteria to assess scientific quality. Usage-based metrics offer a new avenue to scientific quality assessment but face the same risks as first generation search engines that used unreliable metrics (such as raw traffic data) to estimate content quality. In this article I analyze the contribution that social bookmarking systems can provide to the problem of usage-based metrics for scientific evaluation. I suggest that collaboratively aggregated metadata may help fill the gap between traditional citation-based criteria and raw usage factors. I submit that bottom-up, distributed evaluation models such as those afforded by social bookmarking will challenge more traditional quality assessment models in terms of coverage, efficiency and scalability. Services aggregating user-related quality indicators for online scientific content will come to occupy a key function in the scholarly communication system

    I Know Where You are and What You are Sharing: Exploiting P2P Communications to Invade Users' Privacy

    Get PDF
    In this paper, we show how to exploit real-time communication applications to determine the IP address of a targeted user. We focus our study on Skype, although other real-time communication applications may have similar privacy issues. We first design a scheme that calls an identified targeted user inconspicuously to find his IP address, which can be done even if he is behind a NAT. By calling the user periodically, we can then observe the mobility of the user. We show how to scale the scheme to observe the mobility patterns of tens of thousands of users. We also consider the linkability threat, in which the identified user is linked to his Internet usage. We illustrate this threat by combining Skype and BitTorrent to show that it is possible to determine the file-sharing usage of identified users. We devise a scheme based on the identification field of the IP datagrams to verify with high accuracy whether the identified user is participating in specific torrents. We conclude that any Internet user can leverage Skype, and potentially other real-time communication systems, to observe the mobility and file-sharing usage of tens of millions of identified users.Comment: This is the authors' version of the ACM/USENIX Internet Measurement Conference (IMC) 2011 pape

    The contribution of data mining to information science

    Get PDF
    The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research

    Second language learning in the context of MOOCs

    Get PDF
    Massive Open Online Courses are becoming popular educational vehicles through which universities reach out to non-traditional audiences. Many enrolees hail from other countries and cultures, and struggle to cope with the English language in which these courses are invariably offered. Moreover, most such learners have a strong desire and motivation to extend their knowledge of academic English, particularly in the specific area addressed by the course. Online courses provide a compelling opportunity for domain-specific language learning. They supply a large corpus of interesting linguistic material relevant to a particular area, including supplementary images (slides), audio and video. We contend that this corpus can be automatically analysed, enriched, and transformed into a resource that learners can browse and query in order to extend their ability to understand the language used, and help them express themselves more fluently and eloquently in that domain. To illustrate this idea, an existing online corpus-based language learning tool (FLAX) is applied to a Coursera MOOC entitled Virology 1: How Viruses Work, offered by Columbia University
    corecore