574 research outputs found

    Creating the Perception-based LADDER sketch recognition language

    Get PDF
    Sketch recognition is automated understanding of hand-drawn diagrams. Current sketch recognition systems exist for only a handful of domains, which contain on the order of 10--20 shapes. Our goal was to create a generalized method for recognition that could work for many domains, increasing the number of shapes that could be recognized in real-time, while maintaining a high accuracy. In an effort to effectively recognize shapes while allowing drawing freedom (both drawing-style freedom and perceptually-valid variations), we created the shape description language modeled after the way people naturally describe shapes to 1) create an intuitive and easy to understand description, providing transparency to the underlying recognition process, and 2) to improve recognition by providing recognition flexibility (drawing freedom) that is aligned with how humans perceive shapes. This paper describes the results of a study performed to see how users naturally describe shapes. A sample of 35 subjects described or drew approximately 16 shapes each. Results show a common vocabulary related to Gestalt grouping and singularities. Results also show that perception, similarity, and context play an important role in how people describe shapes. This study resulted in a language (LADDER) that allows shape recognizers for any domain to be automatically generated from a single hand-drawn example of each shape. Sketch systems for over 30 different domains have been automatically generated based on this language. The largest domain contained 923 distinct shapes, and achieved a recognition accuracy of 83% (and a top-3 accuracy of 87%) on a corpus of over 11,000 sketches, which recognizes almost two orders of magnitude more shapes than any other existing system.National Science Foundation (U.S.) (grant 0757557)National Science Foundation (U.S.) (grant 0943499

    Affirming the concept of continuity in the modernist heritage through the notion of border: case study of the meander buildings in New Belgrade’s Block 23

    Get PDF
    City planning is shaped by urban, political, social, and other resolutions that are materialized in the spatial plan. As an example of post-war architecture of the 20th century, New Belgrade was developed on modernist principles focused on the essence of dwelling, along with the idea of continuity based on the formation of fluid, liminal spaces and designing "from the inside out". Taking into account that the blocks of New Belgrade are particularly valued in the modern-day as locationally desirable and spatially highquality living units, the research motive is the observed change in the way that one block entity is considered in today's context against the system of ideas embedded in the object’s design concept. In accordance with the aforementioned, the research premise is that the long-term recognition of the overall quality of New Belgrade residential blocks can be reflected in the preservation of human-scale continuity, which therefore also ensures temporal continuity - the sustainability of project over time. The proposed hypothesis will be researched through the analysis of continuity and observed in the form of ideology instilled in the spatial organization, relying on the user as a reference value. The continuity of the observed spatial zones is confined by their liminal condition. Therefore, the border significance is determined through spatially defined phenomena of different nature (physical, immaterial, social). The methods used in this paper are theoretical overview, case study, and graphic analysis of the meander objects and their wider spatial context in New Belgrade’s block 23. Graphic analysis, namely mapping of relevant borders, sets the frameworks of spatial zones that participate in the construction of place continuity. Research result is the establishment of a concrete relationship between the concepts of spatial continuity in modernism, illustrated through the phenomenon of the border, which further influences the quality of living in the building after its construction. The research significance lies in a comprehensive understanding of the relationship between theory and practice, that is, in understanding the process of design and life of the chosen study objects, observing their development from the initial idea, through project realization, until its present-day existence

    Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

    Full text link
    Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic compositionality.Comment: Published as a conference paper at NeurIPS 2023. The first two authors contributed equally. To download the benchmark, visit https://systematic-visual-imagination.github.i

    PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

    Full text link
    We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.Comment: PhD Thesis, 152 pages, 32 figures, 6 table

    OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning

    Full text link
    Humans possess the cognitive ability to comprehend scenes in a compositional manner. To empower AI systems with similar abilities, object-centric representation learning aims to acquire representations of individual objects from visual scenes without any supervision. Although recent advancements in object-centric representation learning have achieved remarkable progress on complex synthesis datasets, there is a huge challenge for application in complex real-world scenes. One of the essential reasons is the scarcity of real-world datasets specifically tailored to object-centric representation learning methods. To solve this problem, we propose a versatile real-world dataset of tabletop scenes for object-centric learning called OCTScenes, which is meticulously designed to serve as a benchmark for comparing, evaluating and analyzing object-centric representation learning methods. OCTScenes contains 5000 tabletop scenes with a total of 15 everyday objects. Each scene is captured in 60 frames covering a 360-degree perspective. Consequently, OCTScenes is a versatile benchmark dataset that can simultaneously satisfy the evaluation of object-centric representation learning methods across static scenes, dynamic scenes, and multi-view scenes tasks. Extensive experiments of object-centric representation learning methods for static, dynamic and multi-view scenes are conducted on OCTScenes. The results demonstrate the shortcomings of state-of-the-art methods for learning meaningful representations from real-world data, despite their impressive performance on complex synthesis datasets. Furthermore, OCTScenes can serves as a catalyst for advancing existing state-of-the-art methods, inspiring them to adapt to real-world scenes. Dataset and code are available at https://huggingface.co/datasets/Yinxuan/OCTScenes
    corecore