27,228 research outputs found
Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work
Deep networks thrive when trained on large scale data collections. This has
given ImageNet a central role in the development of deep architectures for
visual object classification. However, ImageNet was created during a specific
period in time, and as such it is prone to aging, as well as dataset bias
issues. Moving beyond fixed training datasets will lead to more robust visual
systems, especially when deployed on robots in new environments which must
train on the objects they encounter there. To make this possible, it is
important to break free from the need for manual annotators. Recent work has
begun to investigate how to use the massive amount of images available on the
Web in place of manual image annotations. We contribute to this research thread
with two findings: (1) a study correlating a given level of noisily labels to
the expected drop in accuracy, for two deep architectures, on two different
types of noise, that clearly identifies GoogLeNet as a suitable architecture
for learning from Web data; (2) a recipe for the creation of Web datasets with
minimal noise and maximum visual variability, based on a visual and natural
language processing concept expansion strategy. By combining these two results,
we obtain a method for learning powerful deep object models automatically from
the Web. We confirm the effectiveness of our approach through object
categorization experiments using our Web-derived version of ImageNet on a
popular robot vision benchmark database, and on a lifelong object discovery
task on a mobile robot.Comment: 8 pages, 7 figures, 3 table
The Precision Array for Probing the Epoch of Reionization: 8 Station Results
We are developing the Precision Array for Probing the Epoch of Reionization
(PAPER) to detect 21cm emission from the early Universe, when the first stars
and galaxies were forming. We describe the overall experiment strategy and
architecture and summarize two PAPER deployments: a 4-antenna array in the
low-RFI environment of Western Australia and an 8-antenna array at our
prototyping site in Green Bank, WV. From these activities we report on system
performance, including primary beam model verification, dependence of system
gain on ambient temperature, measurements of receiver and overall system
temperatures, and characterization of the RFI environment at each deployment
site.
We present an all-sky map synthesized between 139 MHz and 174 MHz using data
from both arrays that reaches down to 80 mJy (4.9 K, for a beam size of 2.15e-5
steradians at 154 MHz), with a 10 mJy (620 mK) thermal noise level that
indicates what would be achievable with better foreground subtraction. We
calculate angular power spectra () in a cold patch and determine them
to be dominated by point sources, but with contributions from galactic
synchrotron emission at lower radio frequencies and angular wavemodes. Although
the cosmic variance of foregrounds dominates errors in these power spectra, we
measure a thermal noise level of 310 mK at for a 1.46-MHz band
centered at 164.5 MHz. This sensitivity level is approximately three orders of
magnitude in temperature above the level of the fluctuations in 21cm emission
associated with reionization.Comment: 13 pages, 14 figures, submitted to AJ. Revision 2 corrects a scaling
error in the x axis of Fig. 12 that lowers the calculated power spectrum
temperatur
Finding Relevant Answers in Software Forums
Abstract—Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin
Index ordering by query-independent measures
Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming.
A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced
Community rotorcraft air transportation benefits and opportunities
Information about rotorcraft that will assist community planners in assessing and planning for the use of rotorcraft transportation in their communities is provided. Information useful to helicopter researchers, manufacturers, and operators concerning helicopter opportunities and benefits is also given. Three primary topics are discussed: the current status and future projections of rotorcraft technology, and the comparison of that technology with other transportation vehicles; the community benefits of promising rotorcraft transportation opportunities; and the integration and interfacing considerations between rotorcraft and other transportation vehicles. Helicopter applications in a number of business and public service fields are examined in various geographical settings
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries
We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform
- …