Search CORE

27,228 research outputs found

Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

Author: Babiloni Francesca
Caputo Barbara
Hawes Nick
Massouh Nizar
Tommasi Tatiana
Young Jay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.Comment: 8 pages, 7 figures, 3 table

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

The Precision Array for Probing the Epoch of Reionization: 8 Station Results

Author: Aaron R. Parsons
Baars
Barkana
Barkana
Becker
Bowman
Bowman
Bradley
Chaitali R. Parashare
Chris L. Carilli
Cornwell
Cornwell
Cornwell
Crochiere
Daniel C. Jacobs
Daniel J. Werthimer
David Herne
Donald C. Backer
Erin E. Benoit
Griffin S. Foster
Górski
Hall
Haslam
Högbom
Jackson
James E. Aguirre
Jason R. Manley
Johnson
Komesaroff
Melvyn C. H. Wright
Mervyn J. Lynch
Morales
Nicole E. Gugliucci
Page
Parsons
Parsons
Richard F. Bradley
Rogers
Santos
Sault
Slurzberg
Tegmark
White
Wrobel
Yatawatta
Zahn
Publication venue: 'IOP Publishing'
Publication date: 27/05/2009
Field of study

We are developing the Precision Array for Probing the Epoch of Reionization (PAPER) to detect 21cm emission from the early Universe, when the first stars and galaxies were forming. We describe the overall experiment strategy and architecture and summarize two PAPER deployments: a 4-antenna array in the low-RFI environment of Western Australia and an 8-antenna array at our prototyping site in Green Bank, WV. From these activities we report on system performance, including primary beam model verification, dependence of system gain on ambient temperature, measurements of receiver and overall system temperatures, and characterization of the RFI environment at each deployment site. We present an all-sky map synthesized between 139 MHz and 174 MHz using data from both arrays that reaches down to 80 mJy (4.9 K, for a beam size of 2.15e-5 steradians at 154 MHz), with a 10 mJy (620 mK) thermal noise level that indicates what would be achievable with better foreground subtraction. We calculate angular power spectra (

C_\ell

) in a cold patch and determine them to be dominated by point sources, but with contributions from galactic synchrotron emission at lower radio frequencies and angular wavemodes. Although the cosmic variance of foregrounds dominates errors in these power spectra, we measure a thermal noise level of 310 mK at

\ell=100

for a 1.46-MHz band centered at 164.5 MHz. This sensitivity level is approximately three orders of magnitude in temperature above the level of the fluctuations in 21cm emission associated with reionization.Comment: 13 pages, 14 figures, submitted to AJ. Revision 2 corrects a scaling error in the x axis of Fig. 12 that lowers the calculated power spectrum temperatur

arXiv.org e-Print Archive

Crossref

Finding Relevant Answers in Software Forums

Author: GOTTOPATI Swapna
JIANG Jing
LO David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract—Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin

CiteSeerX

Institutional Knowledge at Singapore Management University

Index ordering by query-independent measures

Author: Alan F. Smeaton
Amento
Anh
Anh
Anh
Baeza-Yates
Broder
Büttcher
Chakrabarti
Fagni
Ferguson
Garcia
Joachims
Joachims
Kleinberg
Moffat
Ntoulas
Park
Paul Ferguson
Persin
Plachouras
Robertson
Vapnik
Wang
Witten
Xue
Zhai
Zhang
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/05/2012
Field of study

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

Crossref

Irish Universities

DCU Online Research Access Service

Community rotorcraft air transportation benefits and opportunities

Author: Cafarelli N. J.
Freund D. J.
Gilbert G. A.
Hodgkins R. F.
Vickers T. K.
Winick R. M.
Publication venue
Publication date
Field of study

Information about rotorcraft that will assist community planners in assessing and planning for the use of rotorcraft transportation in their communities is provided. Information useful to helicopter researchers, manufacturers, and operators concerning helicopter opportunities and benefits is also given. Three primary topics are discussed: the current status and future projections of rotorcraft technology, and the comparison of that technology with other transportation vehicles; the community benefits of promising rotorcraft transportation opportunities; and the integration and interfacing considerations between rotorcraft and other transportation vehicles. Helicopter applications in a number of business and public service fields are examined in various geographical settings

NASA Technical Reports Server

Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

Author: Kraaij Wessel
Nie Jian-Yun
Simard Michel
Publication venue
Publication date: 01/01/2003
Field of study

Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

arXiv.org e-Print Archive

CiteSeerX

Leiden University Scholary Publications

What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

Author: Braslavski P.
Hagen M.
Lezina G.
Stein B.
Voelske M.
Браславский П. И.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin