2,913 research outputs found

    Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding

    Full text link
    Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.Comment: VISAPP 2015 pape

    Video browsing interfaces and applications: a review

    Get PDF
    We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other

    Spatial land-use inventory, modeling, and projection/Denver metropolitan area, with inputs from existing maps, airphotos, and LANDSAT imagery

    Get PDF
    A landscape model was constructed with 34 land-use, physiographic, socioeconomic, and transportation maps. A simple Markov land-use trend model was constructed from observed rates of change and nonchange from photointerpreted 1963 and 1970 airphotos. Seven multivariate land-use projection models predicting 1970 spatial land-use changes achieved accuracies from 42 to 57 percent. A final modeling strategy was designed, which combines both Markov trend and multivariate spatial projection processes. Landsat-1 image preprocessing included geometric rectification/resampling, spectral-band, and band/insolation ratioing operations. A new, systematic grid-sampled point training-set approach proved to be useful when tested on the four orginal MSS bands, ten image bands and ratios, and all 48 image and map variables (less land use). Ten variable accuracy was raised over 15 percentage points from 38.4 to 53.9 percent, with the use of the 31 ancillary variables. A land-use classification map was produced with an optimal ten-channel subset of four image bands and six ancillary map variables. Point-by-point verification of 331,776 points against a 1972/1973 U.S. Geological Survey (UGSG) land-use map prepared with airphotos and the same classification scheme showed average first-, second-, and third-order accuracies of 76.3, 58.4, and 33.0 percent, respectively

    Online avatar based interactions

    Get PDF
    The gridWorld project attempts to utilize 3D to develop an online multi-user visual chat system. GridWorld address ideas of how conversations in a virtual environment can be facilitated and enhanced by an abstract visual interface design. The visual interface was developed from research and examination of existing ideas, methodologies and application for development of user-embodiment, chat/virtual space, and interface useability towards the visualization of communication

    Context-Based Cultural Visits

    Get PDF
    Over the last two decades, there have been tremendous advances in mobile technologies, which have increased the interest in studying and developing mobile augmented reality systems, especially in the field of Cultural Heritage. Nowadays, people rely even more on smartphones, for example, when visiting a new city to search for information about monuments and landmarks, and the visitor expects precise and tailored information to his needs. Therefore, researchers started to investigate innovative approaches for presenting and suggesting digital content related to cultural and historical places around the city, incorporating contextual information about the visitor and his needs. This document presents a novel mobile augmented reality application, NearHeritage, that was developed within the scope of the master's thesis on Electrical and Computers Engineering from the Faculty of Engineering of Porto University (FEUP), in collaboration with INESC TEC. The research carried out was focused on the importance of utilising modern technologies to assist the visitors in finding and exploring Cultural Heritage. In this way, it is provided not only the nearby points-of-interest of a city but also detailed information about each POI. The solution presented uses built-in sensors and hardware of Android devices and takes advantage of various APIs (Foursquare API, Google Maps API and IntelContextSensing) to retrieve information about the landmarks and the visitor context. Also, these are crucial hardware components for implementing the full potential of augmented reality tools to create innovative contents that increase the overall user experience. All the experiments were conducted in Porto, Portugal, and the final results showcase that the concept of a MAR application can improve the user experience in discovering and learning more about Cultural Heritage around the world, creating an interactive, enjoyable and unforgettable adventure

    AI-Generated Images as Data Source: The Dawn of Synthetic Era

    Full text link
    The advancement of visual intelligence is intrinsically tethered to the availability of large-scale data. In parallel, generative Artificial Intelligence (AI) has unlocked the potential to create synthetic images that closely resemble real-world photographs. This prompts a compelling inquiry: how much visual intelligence could benefit from the advance of generative AI? This paper explores the innovative concept of harnessing these AI-generated images as new data sources, reshaping traditional modeling paradigms in visual intelligence. In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability, the rapid generation of vast datasets, and the effortless simulation of edge cases. Built on the success of generative AI models, we examine the potential of their generated data in a range of applications, from training machine learning models to simulating scenarios for computational modeling, testing, and validation. We probe the technological foundations that support this groundbreaking use of generative AI, engaging in an in-depth discussion on the ethical, legal, and practical considerations that accompany this transformative paradigm shift. Through an exhaustive survey of current technologies and applications, this paper presents a comprehensive view of the synthetic era in visual intelligence. A project associated with this paper can be found at https://github.com/mwxely/AIGS .Comment: 20 pages, 11 figure
    • …
    corecore