2,868 research outputs found

    Disambiguation strategies for cross-language information retrieval

    Get PDF
    This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching

    Numerical simulation of flow over a rough bed

    Get PDF
    This paper presents results of a direct numerical simulation (DNS) of turbulent flow over the rough bed of an open channel. We consider a hexagonal arrangement of spheres on the channel bed. The depth of flow has been taken as four times the diameter of the spheres and the Reynolds number has been chosen so that the roughness Reynolds number is greater than 70, thus ensuring a fully rough flow. A parallel code based on finite difference, domain decomposition, and multigrid methods has been used for the DNS. Computed results are compared with available experimental data. We report the first- and second-order statistics, variation of lift/drag and exchange coefficients. Good agreement with experimental results is seen for the mean velocity, turbulence intensities, and Reynolds stress. Further, the DNS results provide accurate quantitative statistics for rough bed flow. Detailed analysis of the DNS data confirms the streaky nature of the flow near the effective bed and the existence of a hierarchy of vortices aligned with the streamwise direction, and supports the wall similarity hypothesis. The computed exchange coefficients indicate a large degree of mixing between the fluid trapped below the midplane of the roughness elements and that above it

    Sampled Weighted Min-Hashing for Large-Scale Topic Mining

    Full text link
    We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to automatically mine topics from large-scale corpora. SWMH generates multiple random partitions of the corpus vocabulary based on term co-occurrence and agglomerates highly overlapping inter-partition cells to produce the mined topics. While other approaches define a topic as a probabilistic distribution over a vocabulary, SWMH topics are ordered subsets of such vocabulary. Interestingly, the topics mined by SWMH underlie themes from the corpus at different levels of granularity. We extensively evaluate the meaningfulness of the mined topics both qualitatively and quantitatively on the NIPS (1.7 K documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora. Additionally, we compare the quality of SWMH with Online LDA topics for document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern Recognition 201

    Identifying Research Fields within Business and Management: A Journal Cross-Citation Analysis

    Get PDF
    A discipline such as business and management (B&M) is very broad and has many fields within it, ranging from fairly scientific ones such as management science or economics to softer ones such as information systems. There are at least three reasons why it is important to identify these sub-fields accurately. Firstly, to give insight into the structure of the subject area and identify perhaps unrecognised commonalities; second for the purpose of normalizing citation data as it is well known that citation rates vary significantly between different disciplines. And thirdly, because journal rankings and lists tend to split their classifications into different subjects – for example, the Association of Business Schools (ABS) list, which is a standard in the UK, has 22 different fields. Unfortunately, at the moment these are created in an ad hoc manner with no underlying rigour. The purpose of this paper is to identify possible sub-fields in B&M rigorously based on actual citation patterns. We have examined 450 journals in B&M which are included in the ISI Web of Science (WoS) and analysed the cross-citation rates between them enabling us to generate sets of coherent and consistent sub-fields that minimise the extent to which journals appear in several categories. Implications and limitations of the analysis are discussed

    Can a workspace help to overcome the query formulation problem in image retrieval?

    Get PDF
    We have proposed a novel image retrieval system that incorporates a workspace where users can organise their search results. A task-oriented and user-centred experiment has been devised involving design professionals and several types of realistic search tasks. We study the workspace’s effect on two aspects: task conceptualisation and query formulation. A traditional relevance feedback system serves as baseline. The results of this study show that the workspace is more useful with respect to both of the above aspects. The proposed approach leads to a more effective and enjoyable search experience

    Electronic Quantum Monte Carlo Calculations of Atomic Forces, Vibrations, and Anharmonicities

    Get PDF
    Atomic forces are calculated for first-row monohydrides and carbon monoxide within electronic quantum Monte Carlo (QMC). Accurate and efficient forces are achieved by using an improved method for moving variational parameters in variational QMC. Newton's method with singular value decomposition (SVD) is combined with steepest descent (SD) updates along directions rejected by the SVD, after initial SD steps. Dissociation energies in variational and diffusion QMC agree well with experiment. The atomic forces agree quantitatively with potential energy surfaces, demonstrating the accuracy of this force procedure. The harmonic vibrational frequencies and anharmonicity constants, derived from the QMC energies and atomic forces, also agree well with experimental values.Comment: 6 pages, 2 figures; updated conten

    A study on using genetic niching for query optimisation in document retrieval

    Get PDF
    International audienceThis paper presents a new genetic approach for query optimisation in document retrieval. The main contribution of the paper is to show the effectiveness of the genetic niching technique to reach multiple relevant regions of the document space. Moreover, suitable merging procedures have been proposed in order to improve the retrieval evaluation. Experimental results obtained using a TREC sub-collection indicate that the proposed approach is promising for applications

    A multi-layered Bayesian network model for structured document retrieval

    Get PDF
    New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

    Visualising the structure of document search results: A comparison of graph theoretic approaches

    Get PDF
    This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion

    The use of implicit evidence for relevance feedback in web retrieval

    Get PDF
    In this paper we report on the application of two contrasting types of relevance feedback for web retrieval. We compare two systems; one using explicit relevance feedback (where searchers explicitly have to mark documents relevant) and one using implicit relevance feedback (where the system endeavours to estimate relevance by mining the searcher's interaction). The feedback is used to update the display according to the user's interaction. Our research focuses on the degree to which implicit evidence of document relevance can be substituted for explicit evidence. We examine the two variations in terms of both user opinion and search effectiveness
    corecore