2,868 research outputs found
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
Numerical simulation of flow over a rough bed
This paper presents results of a direct numerical simulation (DNS) of turbulent flow over the rough bed of an open channel. We consider a hexagonal arrangement of spheres on the channel bed. The depth of flow has been taken as four times the diameter of the spheres and the Reynolds number has been chosen so that the roughness Reynolds number is greater than 70, thus ensuring a fully rough flow. A parallel code based on finite difference, domain decomposition, and multigrid methods has been used for the DNS. Computed results are compared with available experimental data. We report the first- and second-order statistics, variation of lift/drag and exchange coefficients. Good agreement with experimental results is seen for the mean velocity, turbulence intensities, and Reynolds stress. Further, the DNS results provide accurate quantitative statistics for rough bed flow. Detailed analysis of the DNS data confirms the streaky nature of the flow near the effective bed and the existence of a hierarchy of vortices aligned with the streamwise direction, and supports the wall similarity hypothesis. The computed exchange coefficients indicate a large degree of mixing between the fluid trapped below the midplane of the roughness elements and that above it
Sampled Weighted Min-Hashing for Large-Scale Topic Mining
We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to
automatically mine topics from large-scale corpora. SWMH generates multiple
random partitions of the corpus vocabulary based on term co-occurrence and
agglomerates highly overlapping inter-partition cells to produce the mined
topics. While other approaches define a topic as a probabilistic distribution
over a vocabulary, SWMH topics are ordered subsets of such vocabulary.
Interestingly, the topics mined by SWMH underlie themes from the corpus at
different levels of granularity. We extensively evaluate the meaningfulness of
the mined topics both qualitatively and quantitatively on the NIPS (1.7 K
documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora.
Additionally, we compare the quality of SWMH with Online LDA topics for
document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern
Recognition 201
Identifying Research Fields within Business and Management: A Journal Cross-Citation Analysis
A discipline such as business and management (B&M) is very broad and has many fields within it, ranging from fairly scientific ones such as management science or economics to softer ones such as information systems. There are at least three reasons why it is important to identify these sub-fields accurately. Firstly, to give insight into the structure of the subject area and identify perhaps unrecognised commonalities; second for the purpose of normalizing citation data as it is well known that citation rates vary significantly between different disciplines. And thirdly, because journal rankings and lists tend to split their classifications into different subjects – for example, the Association of Business Schools (ABS) list, which is a standard in the UK, has 22 different fields. Unfortunately, at the moment these are created in an ad hoc manner with no underlying rigour. The purpose of this paper is to identify possible sub-fields in B&M rigorously based on actual citation patterns. We have examined 450 journals in B&M which are included in the ISI Web of Science (WoS) and analysed the cross-citation rates between them enabling us to generate sets of coherent and consistent sub-fields that minimise the extent to which journals appear in several categories. Implications and limitations of the analysis are discussed
Can a workspace help to overcome the query formulation problem in image retrieval?
We have proposed a novel image retrieval system that incorporates a workspace where users can organise their search results. A task-oriented and user-centred experiment has been devised involving design professionals and several types of realistic search tasks. We study the workspace’s effect on two aspects: task conceptualisation and query formulation. A traditional relevance feedback system serves as baseline. The results of this study show that the workspace is more useful with respect to both of the above aspects. The proposed approach leads to a more effective and enjoyable search experience
Electronic Quantum Monte Carlo Calculations of Atomic Forces, Vibrations, and Anharmonicities
Atomic forces are calculated for first-row monohydrides and carbon monoxide
within electronic quantum Monte Carlo (QMC). Accurate and efficient forces are
achieved by using an improved method for moving variational parameters in
variational QMC. Newton's method with singular value decomposition (SVD) is
combined with steepest descent (SD) updates along directions rejected by the
SVD, after initial SD steps. Dissociation energies in variational and diffusion
QMC agree well with experiment. The atomic forces agree quantitatively with
potential energy surfaces, demonstrating the accuracy of this force procedure.
The harmonic vibrational frequencies and anharmonicity constants, derived from
the QMC energies and atomic forces, also agree well with experimental values.Comment: 6 pages, 2 figures; updated conten
A study on using genetic niching for query optimisation in document retrieval
International audienceThis paper presents a new genetic approach for query optimisation in document retrieval. The main contribution of the paper is to show the effectiveness of the genetic niching technique to reach multiple relevant regions of the document space. Moreover, suitable merging procedures have been proposed in order to improve the retrieval evaluation. Experimental results obtained using a TREC sub-collection indicate that the proposed approach is promising for applications
A multi-layered Bayesian network model for structured document retrieval
New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented
Visualising the structure of document search results: A comparison of graph theoretic approaches
This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion
The use of implicit evidence for relevance feedback in web retrieval
In this paper we report on the application of two contrasting types of relevance feedback for web retrieval. We compare two systems; one using explicit relevance feedback (where searchers explicitly have to mark documents relevant) and one using implicit relevance feedback (where the system endeavours to estimate relevance by mining the searcher's interaction). The feedback is used to update the display according to the user's interaction. Our research focuses on the degree to which implicit evidence of document relevance can be substituted for explicit evidence. We examine the two variations in terms of both user opinion and search effectiveness
- …