5,131 research outputs found
Searching for comets on the World Wide Web: The orbit of 17P/Holmes from the behavior of photographers
We performed an image search for "Comet Holmes," using the Yahoo Web search
engine, on 2010 April 1. Thousands of images were returned. We astrometrically
calibrated---and therefore vetted---the images using the Astrometry.net system.
The calibrated image pointings form a set of data points to which we can fit a
test-particle orbit in the Solar System, marginalizing over image dates and
detecting outliers. The approach is Bayesian and the model is, in essence, a
model of how comet astrophotographers point their instruments. In this work, we
do not measure the position of the comet within each image, but rather use the
celestial position of the whole image to infer the orbit. We find very strong
probabilistic constraints on the orbit, although slightly off the JPL
ephemeris, probably due to limitations of our model. Hyperparameters of the
model constrain the reliability of date meta-data and where in the image
astrophotographers place the comet; we find that ~70 percent of the meta-data
are correct and that the comet typically appears in the central third of the
image footprint. This project demonstrates that discoveries and measurements
can be made using data of extreme heterogeneity and unknown provenance. As the
size and diversity of astronomical data sets continues to grow, approaches like
ours will become more essential. This project also demonstrates that the Web is
an enormous repository of astronomical information; and that if an object has
been given a name and photographed thousands of times by observers who post
their images on the Web, we can (re-)discover it and infer its dynamical
properties.Comment: As published. Changes in v2: data-driven initialization rather than
JPL; added figures; clarified tex
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
The Best Trail Algorithm for Assisted Navigation of Web Sites
We present an algorithm called the Best Trail Algorithm, which helps solve
the hypertext navigation problem by automating the construction of memex-like
trails through the corpus. The algorithm performs a probabilistic best-first
expansion of a set of navigation trees to find relevant and compact trails. We
describe the implementation of the algorithm, scoring methods for trails,
filtering algorithms and a new metric called \emph{potential gain} which
measures the potential of a page for future navigation opportunities.Comment: 11 pages, 11 figure
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
- …