2,593 research outputs found
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
A method for maintaining document consistency based on similarity contents
The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved have access to the most up-to-date version of the documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document summarisation and term extraction and indexing. This methodology is developed for the E-cognos project which aims at developing tools for the management and sharing of documents in the construction domain
Object-based Image Ranking using Neural Networks
In this paper an object-based image ranking is performed using both supervised and unsupervised neural networks. The features are extracted based on the moment invariants, the run length, and a composite method. This paper also introduces a likeness parameter, namely a similarity measure using the weights of the neural networks. The experimental results show that the performance of image retrieval depends on the method of feature extraction, types of learning, the values of the parameters of the neural networks, and the databases including query set. The best performance is achieved using supervised neural networks for internal query set
Numerical simulation of flow over a rough bed
This paper presents results of a direct numerical simulation (DNS) of turbulent flow over the rough bed of an open channel. We consider a hexagonal arrangement of spheres on the channel bed. The depth of flow has been taken as four times the diameter of the spheres and the Reynolds number has been chosen so that the roughness Reynolds number is greater than 70, thus ensuring a fully rough flow. A parallel code based on finite difference, domain decomposition, and multigrid methods has been used for the DNS. Computed results are compared with available experimental data. We report the first- and second-order statistics, variation of lift/drag and exchange coefficients. Good agreement with experimental results is seen for the mean velocity, turbulence intensities, and Reynolds stress. Further, the DNS results provide accurate quantitative statistics for rough bed flow. Detailed analysis of the DNS data confirms the streaky nature of the flow near the effective bed and the existence of a hierarchy of vortices aligned with the streamwise direction, and supports the wall similarity hypothesis. The computed exchange coefficients indicate a large degree of mixing between the fluid trapped below the midplane of the roughness elements and that above it
Vertex similarity in networks
We consider methods for quantifying the similarity of vertices in networks.
We propose a measure of similarity based on the concept that two vertices are
similar if their immediate neighbors in the network are themselves similar.
This leads to a self-consistent matrix formulation of similarity that can be
evaluated iteratively using only a knowledge of the adjacency matrix of the
network. We test our similarity measure on computer-generated networks for
which the expected results are known, and on a number of real-world networks
Sampled Weighted Min-Hashing for Large-Scale Topic Mining
We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to
automatically mine topics from large-scale corpora. SWMH generates multiple
random partitions of the corpus vocabulary based on term co-occurrence and
agglomerates highly overlapping inter-partition cells to produce the mined
topics. While other approaches define a topic as a probabilistic distribution
over a vocabulary, SWMH topics are ordered subsets of such vocabulary.
Interestingly, the topics mined by SWMH underlie themes from the corpus at
different levels of granularity. We extensively evaluate the meaningfulness of
the mined topics both qualitatively and quantitatively on the NIPS (1.7 K
documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora.
Additionally, we compare the quality of SWMH with Online LDA topics for
document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern
Recognition 201
Electronic Quantum Monte Carlo Calculations of Atomic Forces, Vibrations, and Anharmonicities
Atomic forces are calculated for first-row monohydrides and carbon monoxide
within electronic quantum Monte Carlo (QMC). Accurate and efficient forces are
achieved by using an improved method for moving variational parameters in
variational QMC. Newton's method with singular value decomposition (SVD) is
combined with steepest descent (SD) updates along directions rejected by the
SVD, after initial SD steps. Dissociation energies in variational and diffusion
QMC agree well with experiment. The atomic forces agree quantitatively with
potential energy surfaces, demonstrating the accuracy of this force procedure.
The harmonic vibrational frequencies and anharmonicity constants, derived from
the QMC energies and atomic forces, also agree well with experimental values.Comment: 6 pages, 2 figures; updated conten
Identifying Research Fields within Business and Management: A Journal Cross-Citation Analysis
A discipline such as business and management (B&M) is very broad and has many fields within it, ranging from fairly scientific ones such as management science or economics to softer ones such as information systems. There are at least three reasons why it is important to identify these sub-fields accurately. Firstly, to give insight into the structure of the subject area and identify perhaps unrecognised commonalities; second for the purpose of normalizing citation data as it is well known that citation rates vary significantly between different disciplines. And thirdly, because journal rankings and lists tend to split their classifications into different subjects – for example, the Association of Business Schools (ABS) list, which is a standard in the UK, has 22 different fields. Unfortunately, at the moment these are created in an ad hoc manner with no underlying rigour. The purpose of this paper is to identify possible sub-fields in B&M rigorously based on actual citation patterns. We have examined 450 journals in B&M which are included in the ISI Web of Science (WoS) and analysed the cross-citation rates between them enabling us to generate sets of coherent and consistent sub-fields that minimise the extent to which journals appear in several categories. Implications and limitations of the analysis are discussed
Can a workspace help to overcome the query formulation problem in image retrieval?
We have proposed a novel image retrieval system that incorporates a workspace where users can organise their search results. A task-oriented and user-centred experiment has been devised involving design professionals and several types of realistic search tasks. We study the workspace’s effect on two aspects: task conceptualisation and query formulation. A traditional relevance feedback system serves as baseline. The results of this study show that the workspace is more useful with respect to both of the above aspects. The proposed approach leads to a more effective and enjoyable search experience
- …