20,640 research outputs found

    Using Parsimonious Language Models on Web Data

    Get PDF
    In this paper we explore the use of parsimonious language models for web retrieval. These models are smaller thus more efficient than the standard language models and are therefore well suited for large-scale web retrieval. We have conducted experiments on four TREC topic sets, and found that the parsimonious language model results in improvement of retrieval effectiveness over the standard language model for all data-sets and measures. In all cases the improvement is significant, and more substantial than in earlier experiments\ud on newspaper/newswire data

    Exploring Topic-based Language Models for Effective Web Information Retrieval

    Get PDF
    The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model

    Advanced language modeling approaches, case study: Expert search

    Get PDF
    This tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions

    Geographical information retrieval with ontologies of place

    Get PDF
    Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects

    Science epistemological beliefs of form four students and their science achievement using web-based learning

    Get PDF
    Epistemological beliefs affect student motivation and learning. They have been found to play a significant role in the acquisition of the capacity to control and direct one’s cognitive processing (Lindner, 1993). In particular, science epistemological belief is considered an important factor in science achievement and positive science attitudes among students (Cobern, 1991). Based on the premise above, the purpose of this study was (1) to examine the science epistemological beliefs of Form Four students in Malaysia, and (2) to find out if there was a significant difference in the science achievement of students with high science epistemological beliefs and those with low belief when learning science using different Web-based modules. The sample comprised 169 students from ten schools in the state of Perak. The instrument used in this study was the “Nature of Scientific Knowledge Scale” developed by Rubba (1977). Six factors of the science epistemological beliefs, that is amoral, creative, developmental, parsimonious, testable and unified, were analysed using descriptive statistics. Results showed that the highest ranked factor was testable, followed by unified, creative, developmental and amoral. The lowest ranked factor was parsimonious. Analysis of t-tests for independent means showed that the science achievement of students with high science epistemological beliefs who followed the constructivist approach was significantly higher than those who followed the direct instruction approach. However, there was no significant difference between the science achievement of students with low science epistemological beliefs who followed the constructivist approach and those who followed the direct instruction approach. 2-way ANOVA analysis showed that the interaction effect between type of approach for web-based learning and science epistemological beliefs was significant, suggesting that the effect of the type of web-based learning approach is dependent on the science epistemological beliefs held by the students

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
    corecore