177,655 research outputs found

    Document Retrieval on Repetitive Collections

    Full text link
    Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on tailored indexes that offer near-optimal performance. However, an experimental study establishing which alternatives are actually better than brute force, and which perform best depending on the collection characteristics, has not been carried out. In this paper we address this shortcoming by exploring the relationship between the nature of the underlying collection and the performance of current methods. Via extensive experiments we show that established solutions are often beaten in practice by brute-force alternatives. We also design new methods that offer superior time/space trade-offs, particularly on repetitive collections.Comment: Accepted to ESA 2014. Implementation and experiments at http://www.cs.helsinki.fi/group/suds/rlcsa

    Automated, unsupervised inversion of multiwavelength lidar data with TiARA : Assessment of retrieval performance of microphysical parameters using simulated data

    Get PDF
    We evaluate the retrieval performance of the automated, unsupervised inversion algorithm, Tikhonov Advanced Regularization Algorithm (TiARA), which is used for the autonomous retrieval of microphysical parameters of anthropogenic and natural pollution particles. TiARA (version 1.0) has been developed in the past 10 years and builds on the legacy of a data-operator-controlled inversion algorithm used since 1998 for the analysis of data from multiwavelength Raman lidar. The development of TiARA has been driven by the need to analyze in (near) real time large volumes of data collected with NASA Langley Research Center's high-spectral-resolution lidar (HSRL-2). HSRL-2 was envisioned as part of the NASA Aerosols-Clouds-Ecosystems mission in response to the National Academy of Sciences (NAS) Decadal Study mission recommendations 2007. TiARA could thus also serve as an inversion algorithm in the context of a future space-borne lidar. We summarize key properties of TiARA on the basis of simulations with monomodal logarithmic-normal particle size distributions that cover particle radii from approximately 0.05 μm to 10 μm. The real and imaginary parts of the complex refractive index cover the range from nonabsorbing to highly light-absorbing pollutants. Our simulations include up to 25% measurement uncertainty. The goal of our study is to provide guidance with respect to technical features of future space-borne lidars, if such lidars will be used for retrievals of microphysical data products, absorption coefficients, and single-scattering albedo. We investigate the impact of two different measurement-error models on the quality of the data products.We also obtain for the first time, to the best of our knowledge, a statistical view on systematic and statistical uncertainties, if a large volume of data is processed. Effective radius is retrieved to 50% accuracy for 58% of cases with an imaginary part up to 0.01i and up to 100% of cases with an imaginary part of 0.05i. Similarly, volume concentration, surface-area concentration, and number concentrations are retrieved to 50% accuracy in 56%-100% of cases, 99%-100% of cases, and 54%-87% of cases, respectively, depending on the imaginary part. The numbers represent measurement uncertainties of up to 15%. If we target 20% retrieval accuracy, the numbers of cases that fall within that threshold are 36%-76% for effective radius, 36%-73% for volume concentration, 98%-100% for surface-area concentration, and 37%-61% for number concentration. That range of numbers again represents a spread in results for different values of the imaginary part. At present, we obtain an accuracy of (on average) 0.1 for the real part. A case study from the ORCALES field campaign is used to illustrate data products obtained with TiARA.Peer reviewe

    Context guided retrieval

    Get PDF
    This paper presents a hierarchical case representation that uses a context guided retrieval method The performance of this method is compared to that of a simple flat file representation using standard nearest neighbour retrieval. The data presented in this paper is more extensive than that presented in an earlier paper by the same authors. The estimation of the construction costs of light industrial warehouse buildings is used as the test domain. Each case in the system comprises approximately 400 features. These are structured into a hierarchical case representation that holds more general contextual features at its top and specific building elements at its leaves. A modified nearest neighbour retrieval algorithm is used that is guided by contextual similarity. Problems are decomposed into sub-problems and solutions recomposed into a final solution. The comparative results show that the context guided retrieval method using the hierarchical case representation is significantly more accurate than the simpler flat file representation and standard nearest neighbour retrieval

    Understanding the Limitations of CNN-based Absolute Camera Pose Regression

    Full text link
    Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure. A key result is that current approaches do not consistently outperform a handcrafted image retrieval baseline. This clearly shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.Comment: Initial version of a paper accepted to CVPR 201

    Neural Distributed Autoassociative Memories: A Survey

    Full text link
    Introduction. Neural network models of autoassociative, distributed memory allow storage and retrieval of many items (vectors) where the number of stored items can exceed the vector dimension (the number of neurons in the network). This opens the possibility of a sublinear time search (in the number of stored items) for approximate nearest neighbors among vectors of high dimension. The purpose of this paper is to review models of autoassociative, distributed memory that can be naturally implemented by neural networks (mainly with local learning rules and iterative dynamics based on information locally available to neurons). Scope. The survey is focused mainly on the networks of Hopfield, Willshaw and Potts, that have connections between pairs of neurons and operate on sparse binary vectors. We discuss not only autoassociative memory, but also the generalization properties of these networks. We also consider neural networks with higher-order connections and networks with a bipartite graph structure for non-binary data with linear constraints. Conclusions. In conclusion we discuss the relations to similarity search, advantages and drawbacks of these techniques, and topics for further research. An interesting and still not completely resolved question is whether neural autoassociative memories can search for approximate nearest neighbors faster than other index structures for similarity search, in particular for the case of very high dimensional vectors.Comment: 31 page

    Investigating the use of semantic technologies in spatial mapping applications

    Get PDF
    Semantic Web Technologies are ideally suited to build context-aware information retrieval applications. However, the geospatial aspect of context awareness presents unique challenges such as the semantic modelling of geographical references for efficient handling of spatial queries, the reconciliation of the heterogeneity at the semantic and geo-representation levels, maintaining the quality of service and scalability of communicating, and the efficient rendering of the spatial queries' results. In this paper, we describe the modelling decisions taken to solve these challenges by analysing our implementation of an intelligent planning and recommendation tool that provides location-aware advice for a specific application domain. This paper contributes to the methodology of integrating heterogeneous geo-referenced data into semantic knowledgebases, and also proposes mechanisms for efficient spatial interrogation of the semantic knowledgebase and optimising the rendering of the dynamically retrieved context-relevant information on a web frontend
    corecore