47,797 research outputs found

    Language Models

    Get PDF
    Contains fulltext : 227630.pdf (preprint version ) (Open Access

    Towards an Information Retrieval Theory of Everything

    Get PDF
    I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google's page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving others. So, essentially each model solves part of a bigger puzzle, and a unified view on these models might be a first step towards an Information Retrieval Theory of Everything

    Web Site Personalization based on Link Analysis and Navigational Patterns

    Get PDF
    The continuous growth in the size and use of the World Wide Web imposes new methods of design and development of on-line information services. The need for predicting the users’ needs in order to improve the usability and user retention of a web site is more than evident and can be addressed by personalizing it. Recommendation algorithms aim at proposing “next” pages to users based on their current visit and the past users’ navigational patterns. In the vast majority of related algorithms, however, only the usage data are used to produce recommendations, disregarding the structural properties of the web graph. Thus important – in terms of PageRank authority score – pages may be underrated. In this work we present UPR, a PageRank-style algorithm which combines usage data and link analysis techniques for assigning probabilities to the web pages based on their importance in the web site’s navigational graph. We propose the application of a localized version of UPR (l-UPR) to personalized navigational sub-graphs for online web page ranking and recommendation. Moreover, we propose a hybrid probabilistic predictive model based on Markov models and link analysis for assigning prior probabilities in a hybrid probabilistic model. We prove, through experimentation, that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches

    Base calling for high-throughput short-read sequencing: dynamic programming solutions

    Get PDF
    Shreepriya Das and Haris Vikalo are with the Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas 78712, USABackground: Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. Results: We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog webcite. Conclusions: We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.Electrical and Computer [email protected]

    University of Twente at the TREC 2008 Enterprise Track: using the Global Web as an expertise evidence source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track.\ud This is the fourth (and the last) year of TREC 2007 Enterprise Track and the second year the University of Twente (Database group) submitted runs for the expert nding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies\ud complimentary to our recent research [8] that demonstrated how taking the web factor seriously signicantly improves the performance of expert nding in the enterprise

    Using the Global Web as an Expertise Evidence Source

    Get PDF
    This paper describes the details of our participation in expert search task of the TREC 2007 Enterprise track. The presented study demonstrates the predicting potential of the expertise evidence that can be found outside of the organization. We discovered that combining the ranking built solely on the Enterprise data with the Global Web based ranking may produce significant increases in performance. However, our main goal was to explore whether this result can be further improved by using various quality measures to distinguish among web result items. While, indeed, it was beneficial to use some of these measures, especially those measuring relevance of URL strings and titles, it stayed unclear whether they are decisively important

    Effects of in-class variation and student rank on the probability of withdrawal : cross-section and time-series analysis for UK university students

    Get PDF
    From individual-level data for nine entire cohorts of undergraduate students in UK universities, we estimate the probability that an individual will drop out of university during their first-year. We examine the 1984-85 to 1992-93 cohorts of students enrolling full-time for a three or four-year course, and focus on the sensitivity of the probability of withdrawal to the individual’s prior qualifications relative to those of the other students in their university course. We show not only that weaker students are more likely to withdraw but also that the extent of variation in prior qualifications within the student’s university degree course exerts an influence on the individual's probability of withdrawal in a way that varies with the individual’s own in-class rank

    Individual Incentives in Program Participation: Splitting up the Process in Assignment and Enrollment

    Get PDF
    In this paper we investigate two stages in the process that leads to participation in ALMP programs. We use unique administrative data from the Austrian unemployment registers which allow us to distinguish between caseworker assignment and actual program enrollment. Although 25% of newly unemployed workers are assigned to a program, only half of them enroll and participate in the program longer than 5 days. This difference between assignment and enrollment rates cannot be explained by job entries, program cancelations, or rejected program applications alone. Therefore we analyze the influence of observable characteristics on each stage of the participation process. We find that beside policy regulations individual worker incentives play an important role in determining program participation.unemployment, active labor market policy, evaluation
    • …
    corecore