27,228 research outputs found

    Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

    Full text link
    Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.Comment: 8 pages, 7 figures, 3 table

    The Precision Array for Probing the Epoch of Reionization: 8 Station Results

    Full text link
    We are developing the Precision Array for Probing the Epoch of Reionization (PAPER) to detect 21cm emission from the early Universe, when the first stars and galaxies were forming. We describe the overall experiment strategy and architecture and summarize two PAPER deployments: a 4-antenna array in the low-RFI environment of Western Australia and an 8-antenna array at our prototyping site in Green Bank, WV. From these activities we report on system performance, including primary beam model verification, dependence of system gain on ambient temperature, measurements of receiver and overall system temperatures, and characterization of the RFI environment at each deployment site. We present an all-sky map synthesized between 139 MHz and 174 MHz using data from both arrays that reaches down to 80 mJy (4.9 K, for a beam size of 2.15e-5 steradians at 154 MHz), with a 10 mJy (620 mK) thermal noise level that indicates what would be achievable with better foreground subtraction. We calculate angular power spectra (CC_\ell) in a cold patch and determine them to be dominated by point sources, but with contributions from galactic synchrotron emission at lower radio frequencies and angular wavemodes. Although the cosmic variance of foregrounds dominates errors in these power spectra, we measure a thermal noise level of 310 mK at =100\ell=100 for a 1.46-MHz band centered at 164.5 MHz. This sensitivity level is approximately three orders of magnitude in temperature above the level of the fluctuations in 21cm emission associated with reionization.Comment: 13 pages, 14 figures, submitted to AJ. Revision 2 corrects a scaling error in the x axis of Fig. 12 that lowers the calculated power spectrum temperatur

    Finding Relevant Answers in Software Forums

    Get PDF
    Abstract—Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin

    Index ordering by query-independent measures

    Get PDF
    Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

    Community rotorcraft air transportation benefits and opportunities

    Get PDF
    Information about rotorcraft that will assist community planners in assessing and planning for the use of rotorcraft transportation in their communities is provided. Information useful to helicopter researchers, manufacturers, and operators concerning helicopter opportunities and benefits is also given. Three primary topics are discussed: the current status and future projections of rotorcraft technology, and the comparison of that technology with other transportation vehicles; the community benefits of promising rotorcraft transportation opportunities; and the integration and interfacing considerations between rotorcraft and other transportation vehicles. Helicopter applications in a number of business and public service fields are examined in various geographical settings

    Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval

    Get PDF
    Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application which needs translation functionality of a relatively low level of sophistication since current models for information retrieval (IR) are still based on a bag-of-words. The Web provides a vast resource for the automatic construction of parallel corpora which can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this paper, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.Comment: 37 page

    What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries

    Full text link
    We analyze the question queries submitted to a large commercial web search engine to get insights about what people ask, and to better tailor the search results to the users’ needs. Based on a dataset of about one billion question queries submitted during the year 2012, we investigate askers’ querying behavior with the support of automatic query categorization. While the importance of question queries is likely to increase, at present they only make up 3–4% of the total search traffic. Since questions are such a small part of the query stream and are more likely to be unique than shorter queries, clickthrough information is typically rather sparse. Thus, query categorization methods based on the categories of clicked web documents do not work well for questions. As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set. The resulting classifier is then transferred to the web search questions. Even though questions on CQA platforms tend to be different to web search questions, our categorization method proves competitive with strong baselines with respect to classification accuracy. To show the scalability of our proposed method we apply the classifiers to about one billion question queries and discuss the trade-offs between performance and accuracy that different classification models offer. Our findings reveal what people ask a search engine and also how this contrasts behavior on a CQA platform
    corecore