811 research outputs found

    Region Refinement Network for Salient Object Detection

    Full text link
    Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection. In this paper, we propose a Region Refinement Network (RRN), which recurrently filters redundant information and explicitly models boundary information for saliency detection. Different from existing refinement methods, we propose a Region Refinement Module (RRM) that optimizes salient region prediction by incorporating supervised attention masks in the intermediate refinement stages. The module only brings a minor increase in model size and yet significantly reduces false predictions from the background. To further refine boundary areas, we propose a Boundary Refinement Loss (BRL) that adds extra supervision for better distinguishing foreground from background. BRL is parameter free and easy to train. We further observe that BRL helps retain the integrity in prediction by refining the boundary. Extensive experiments on saliency detection datasets show that our refinement module and loss bring significant improvement to the baseline and can be easily applied to different frameworks. We also demonstrate that our proposed model generalizes well to portrait segmentation and shadow detection tasks

    A Neural Network for Interpolating Light-Sources

    Get PDF
    This study combines two novel deterministic methods with a Convolutional Neural Network to develop a machine learning method that is aware of directionality of light in images. The first method detects shadows in terrestrial images by using a sliding-window algorithm that extracts specific hue and value features in an image. The second method interpolates light-sources by utilising a line-algorithm, which detects the direction of light sources in the image. Both of these methods are single-image solutions and employ deterministic methods to calculate the values from the image alone, without the need for illumination-models. They extract real-time geometry from the light source in an image, rather than mapping an illumination-model onto the image, which are the only models used today. Finally, those outputs are used to train a Convolutional Neural Network. This displays greater accuracy than previous methods for shadow detection and can predict light source-direction and thus orientation accurately, which is a considerable innovation for an unsupervised CNN. It is significantly faster than the deterministic methods. We also present a reference dataset for the problem of shadow and light direction detection. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Scene analysis in the natural environment

    Get PDF
    The problem of scene analysis has been studied in a number of different fields over the past decades. These studies have led to a number of important insights into problems of scene analysis, but not all of these insights are widely appreciated. Despite this progress, there are also critical shortcomings in current approaches that hinder further progress. Here we take the view that scene analysis is a universal problem solved by all animals, and that we can gain new insight by studying the problems that animals face in complex natural environments. In particular, the jumping spider, songbird, echolocating bat, and electric fish, all exhibit behaviors that require robust solutions to scene analysis problems encountered in the natural environment. By examining the behaviors of these seemingly disparate animals, we emerge with a framework for studying analysis comprising four essential properties: 1) the ability to solve ill-posed problems, 2) the ability to integrate and store information across time and modality, 3) efficient recovery and representation of 3D scene structure, and 4) the use of optimal motor actions for acquiring information to progress towards behavioral goals

    Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul

    Full text link
    Deploying deep learning-based applications in specialized domains like the aircraft production industry typically suffers from the training data availability problem. Only a few datasets represent non-everyday objects, situations, and tasks. Recent advantages in research around Vision Foundation Models (VFM) opened a new area of tasks and models with high generalization capabilities in non-semantic and semantic predictions. As recently demonstrated by the Segment Anything Project, exploiting VFM's zero-shot capabilities is a promising direction in tackling the boundaries spanned by data, context, and sensor variety. Although, investigating its application within specific domains is subject to ongoing research. This paper contributes here by surveying applications of the SAM in aircraft production-specific use cases. We include manufacturing, intralogistics, as well as maintenance, repair, and overhaul processes, also representing a variety of other neighboring industrial domains. Besides presenting the various use cases, we further discuss the injection of domain knowledge

    Hard-Hearted Scrolls: A Noninvasive Method for Reading the Herculaneum Papyri

    Get PDF
    The Herculaneum scrolls were buried and carbonized by the eruption of Mount Vesuvius in A.D. 79 and represent the only classical library discovered in situ. Charred by the heat of the eruption, the scrolls are extremely fragile. Since their discovery two centuries ago, some scrolls have been physically opened, leading to some textual recovery but also widespread damage. Many other scrolls remain in rolled form, with unknown contents. More recently, various noninvasive methods have been attempted to reveal the hidden contents of these scrolls using advanced imaging. Unfortunately, their complex internal structure and lack of clear ink contrast has prevented these efforts from successfully revealing their contents. This work presents a machine learning-based method to reveal the hidden contents of the Herculaneum scrolls, trained using a novel geometric framework linking 3D X-ray CT images with 2D surface imagery of scroll fragments. The method is verified against known ground truth using scroll fragments with exposed text. Some results are also presented of hidden characters revealed using this method, the first to be revealed noninvasively from this collection. Extensions to the method, generalizing the machine learning component to other multimodal transformations, are presented. These are capable not only of revealing the hidden ink, but also of generating rendered images of scroll interiors as if they were photographed in color prior to their damage two thousand years ago. The application of these methods to other domains is discussed, and an additional chapter discusses the Vesuvius Challenge, a $1,000,000+ open research contest based on the dataset built as a part of this work

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Kognitive Interpretationen mehrdeutiger visueller Reize

    Get PDF
    Unser Gehirn muss zu jeder Zeit relevante Signale von irrelevanten Informationen trennen. Dazu müssen diese als spezifische Einheiten erkannt und klassifiziert werden. Mehrdeutigkeit ist ein wesentlicher Aspekt dieses Verarbeitungsprozesses und kann durch verrauschte Eingangssignale und durch den Aufbau unserer sensorischer Systeme entstehen. Beispielsweise können Reize mehrdeutig sein, wenn sie verrauscht oder unvollständig sind oder nur kurzzeitig wahrgenommen werden. Unter solchen Bedingungen werden Wahrnehmung und Klassifikation eines Reizes deutlich erschwert. Bereits vorhandene kognitive Repräsentationen werden somit möglicherweise nicht aktiviert. Folglich müssen Rückschlüsse über die Reize aufgrund von Kontext und Erfahrung gezogen werden. Ein und derselbe Reiz kann jedoch unterschiedlich repräsentiert und im sensorischen System kodiert werden. Da nur eine Repräsentation die Basis zukünftigen Handelns bilden kann, entsteht eine Art Konkurrenz innerhalb der Wahrnehmung. Derartige Wahrnehmungsphänomene, die mit der Mehrdeutigkeit von Reizen in Verbindung stehen, bilden den Mittelpunkt der vorliegenden Dissertation. Wenn einem physikalisch konstanten Reiz mehrere Interpretationen zugeordnet werden, entsteht ein Wechsel zwischen diesen Einordnungen, den man wahrnimmt und Rivalität ("rivalry") nennt. In dieser Dissertation werden diverse neue Erkenntnisse zu diesem grundlegenden Phänomen der sensorischen Verarbeitung beschrieben. So wird gezeigt, dass Übergänge zwischen drei wahrgenommenen Interpretationen – ein vergleichsweise selten untersuchtes Phänomen, da Rivalität meist mit zweideutigen Reizen untersucht wird – vorhersehbaren Mustern folgen (Kapitel 2). Darüber hinaus zeigt sich, dass derartige Übergänge spezifische Eigenschaften aufweisen, welche die Geschwindigkeit und die Richtung ihrer räumlichen Ausbreitung im visuellen Feld bestimmen (Kapitel 3). Diese Eigenschaften der Mehrdeutigkeit werden weiterhin stark von Aufmerksamkeit und anderen, introspektiven Prozessen beeinflusst. Um die der Rivalität in der Wahrnehmung tatsächlich zugrundeliegenden Prozesse und die damit verbundenen Änderungen des Bewusstseins von derartigen subjektiven Prozessen abzugrenzen, müssen letztere kontrolliert oder sogar vollständig umgangen werden. Ein objektives Maß der Rivalität in der Wahrnehmung wird zur Lösung dieser Aufgabe vorgeschlagen und bietet eine wertvolle Alternative zu introspektivem Berichten über den Wahrnehmungszustand (Kapitel 4). Übergänge in der Wahrnehmung entstehen entlang einer bestimmten Merkmalsdimension des Reizes, wie beispielsweise der Orientierung des berühmten Neckerwürfels. Zudem kann auch eine Änderung in der Merkmalsdimension der Luminanz eine unterschiedliche Interpretation des Reizes hervorrufen. Es wird gezeigt, dass die Pupille kleiner wird, wenn eine Interpretation mit hoher Luminanz die Wahrnehmung übernimmt, und umgekehrt, dass die Pupille größer wird, wenn eine Interpretation mit niedriger Luminanz die Wahrnehmung übernimmt. Folglich kann die Pupille als ein zuverlässiges und objektives Maß für Änderungen in der Wahrnehmung verwendet werden. Durch die Verwendung solcher objektiven Maße konnten neue Eigenschaften der Übergänge in der Wahrnehmung aufgezeigt werden, welche die Theorie unterstützen, dass Introspektion die der Verarbeitung mehrdeutiger Situationen zugrundeliegenden Prozesse merklich beeinflussen kann. Als Nächstes wurden mehrdeutiger Reize im Zusammenhang mit der Wahrnehmung von Objekten eingesetzt (Kapitel 5). Am Beispiel der Kippfigur des "bewegten Diamanten" wird dabei die Bedeutung von mehrdeutigen Reizen veranschaulicht. Beim bewegten Diamanten werden zwei Interpretationen wahrgenommen, die sich entlang der Dimension der Objektkohärenz abwechseln. Das bedeutet, dass die Wahrnehmung zwischen einem einzelnen zusammenhängenden Objekt (Diamant) und mehreren unzusammenhängenden Komponenten kippt. Es wird gezeigt, dass die Interpretation des Reizes als ein einziges kohärentes Objekt, verglichen mit der Interpretation als mehrere Komponenten, zu einer Erhöhung der visuellen Empfindlichkeit innerhalb des Objektes führt. Diese Ergebnisse sind ein Beleg dafür, wie die Aktivierung einer Interpretation eines Reizes als Einzelobjekt (im Vergleich zur Komponentenwahrnehmung) dazu führt, dass die Aufmerksamkeit top-down zu den relevanten Bereichen des Gesichtsfeldes gelenkt wird. Es wird weiter untersucht, welche Eigenschaften des Reizes zu einer bottom-up Aktivierung der Interpretation solcher Objekte beitragen (Kapitel 6). Die Mehrdeutigkeit von Objekten kann erfolgreich aufgehoben werden, indem man einen starken Kontrast in Luminanz oder Farbe zwischen dem Objekt und dem Hintergrund erzeugt. Auch die Größe und die Form haben einen großen Einfluss auf die Detektion und Identifikation von Objekten. Des Weiteren sind die Eigenschaften eines Objektes nicht nur bestimmend für die Erfolgsquote bei der Objekterkennung, sondern ebenso bedeutend für die Speicherung der Repräsentation im Gedächtnis, beispielsweise von neu wahrgenommenen Objekten. Das Klassifizieren von Objekten durch die Versuchsperson wird ebenfalls durch Mehrdeutigkeit beeinflusst. So kann ein Objekt der Versuchsperson einerseits als neu erscheinen, obwohl es bereits bekannt war, weil es beispielsweise der Versuchsperson schon einmal gezeigt worden ist. Andererseits kann auch ein eigentlich unbekanntes Objekt der Versuchsperson dennoch vertraut vorkommen. In dieser Arbeit wird gezeigt, dass solche subjektiven Effekte einen Einfluss auf die Pupillengröße haben (Kapitel 7). Außerdem verkleinert sich die Pupille der Versuchspersonen beim Betrachten neuer Bilder stärker als bei bekannten. Ein ähnlicher Effekt wird gefunden, wenn das Bild vorher erfolgreich im Gedächtnis gespeichert wurde. Daher ist es wahrscheinlich, dass die Pupille die Verfestigung von neuen Objekten im Gedächtnis widerspiegelt. Abschließend wird untersucht, ob sich kognitive Prozesse, wie Entscheidungsfindung – ein wichtiger Prozess, falls mehreren Optionen zur Verfügung stehen und Mehrdeutigkeit aufgehoben werden soll – auch in der Pupille widerspiegeln (Kapitel 8). Es wird zunächst bestätigt, dass die Pupillen sich erweitern, nachdem man eine Entscheidung getroffen hat. Neu wird gezeigt, dass diese Pupillenausdehnungen erfolgreich von anderen Personen erkannt und verwendet werden können, um ein interaktives Spiel gegen die erste Person (den "Gegner") zu gewinnen. Insgesamt wird in dieser Dissertation untersucht, wie mehrdeutige Reize die Wahrnehmung beeinflussen und wie Mehrdeutigkeit verwendet werden kann, um Prozesse des Gehirns zu studieren. Es hat sich gezeigt, dass Mehrdeutigkeit vorhersehbaren Mustern folgt, sie objektiv mit Reflexen gemessen werden kann, und Einblicke in neuronale Prozesse wie Aufmerksamkeit, Objektwahrnehmung und Entscheidungsmechanismen liefern kann. Diese Ergebnisse zeigen, dass Mehrdeutigkeit eine zentrale Eigenschaft sensorischer Systeme ist, und Lebewesen in die Lage versetzt, mit ihrer Umwelt flexibel zu interagieren. Mehrdeutigkeit macht das Verhalten vielfältiger, ermöglicht es dem Gehirn, mit der Welt auf verschiedenen Wegen zu interagieren, und ist die Basis der Dynamik von Wahrnehmung, Interpretation und Entscheidung.Brains can sense and distinguish signals from background noise in physical environments, and recognize and classify them as distinct entities. Ambiguity is an inherent part of this process. It is a cognitive property that is generated by the noisy character of the signals, and by the design of the sensory systems that process them. Stimuli can be ambiguous if they are noisy, incomplete, or only briefly sensed. Such conditions may make stimuli indistinguishable from others and thereby difficult to classify as single entities by our sensory systems. In these cases, stimuli fail to activate a representation that may have been previously stored in the system. Deduction, through context and experience, is consequently needed to reach a decision on what is exactly sensed. Deduction can, however, also be subject to ambiguity as stimuli and their properties may receive multiple representations in the sensory system. In such cases, these multiple representations compete for perceptual dominance, that is, for becoming the single entity taken by the system as a reference point for subsequent behavior. These types of ambiguity and several phenomena that relate to them are at the center of this dissertation. Perceptual rivalry, the phenomenal experience of alternating percepts over time, is an example of how the brain may give multiple interpretations to a stimulus that is physically constant. Rivalry is a very typical and general sensory process and this thesis demonstrates some newly discovered properties of its dynamics. It was found that alternations between three perceptual interpretations – a relatively rare condition as rivalry generally occurs between two percepts – follow predictable courses (Chapter 2). Furthermore, such alternations had several properties that determine their speed and direction of spatial spread (suppression waves) in the visual field (Chapter 3). These properties of ambiguity were further strongly affected by attention and other introspective processes. To demarcate the true underlying process of perceptual rivalry and the accompanied changes in awareness, these subjective processes need to be either circumvented or controlled for. An objective measure of perceptual rivalry was proposed that resolved this issue and provided a good alternative for introspective report of ambiguous states (Chapter 4). Changes in percepts occur along a specific feature domain such as depth orientation for the famous Necker cube. Alternatively, luminance may also be a rivalry feature and one percept may appear brighter as the other rivaling percept. It was demonstrated that the pupil gets smaller when a percept with high luminance becomes dominant, and vice versa, gets bigger when a percept with low luminance gets dominant during perceptual rivalry. As such, the pupil can serve as a reliable objective indicator of changes in visual awareness. By using such reflexes during rivalry, several new properties of alternations were discovered and it was again confirmed that introspection can confound the true processes involved in ambiguity. Next, the usefulness of ambiguous stimuli was explored in the context of objects as entities (Chapter 5). Some ambiguous stimuli can induce two percepts that alternate along the feature domain of object coherency, that is, whether a single coherent object or multiple incoherent objects are seen. In other words, an ambiguous stimulus can induce two cognitive interpretations of either seeing an entity or not. It was reported that being aware of a single coherent object results in the increase in visual sensitivity for the areas that constitute the object. These results are evidence of how the activation of a representation of a single and unique object can guide and allocate attentional resources to relevant areas in the visual field in a top-down way. It was further explored which features help to bottom-up access such object representations (Chapter 6). Ambiguity of objects can be successfully resolved by adding strong contrasts between the object and its background in luminance and color. The size and variability of the object's shape was also found to be an important factor for its successful detection and identification. Furthermore, the characteristics of objects do not only determine the rate of success in a recognition task, but are equally important for the storage of their representations in memory if, for instance, the object is novel to the observer. The subjective experience of a novel object is also subject to ambiguity and objects may appear novel to the observer although they are familiar (i.e., previously shown to the observer), or vice versa, they appear familiar to the observer although they are actually novel. It was here shown that such subjective effects are reflected in the pupil (Chapter 7). In addition, if novel images were presented to observers, their pupils constricted stronger as compared to if familiar images were presented. Similarly, if novel stimuli were shown to observers, pupillary constrictions were stronger if these stimuli were successfully stored in memory as compared to those later forgotten. As such, the pupil reflected the cognitive process of novelty encoding. Finally, it was tested whether other cognitive processes, such as decision-making – an important process when multiple options are available and ambiguity has to be resolved with a conscious decision – were also reflected in changes of pupil size (Chapter 8). It was confirmed that the pupil tends to dilate after an observer has made a decision. These dilations can successfully be detected between individuals and further used to gain the upper hand during an interactive game. In sum, this thesis has explored how ambiguous signals affect perception and how ambiguity inside perceptual systems can be used to study processes of the brain. It is found that ambiguity follows predictable courses, can be objectively assessed with reflexes, and can provide insights into other neuronal mechanisms such as attention, object representations, and decision-making. These findings demonstrate that ambiguity is a core property of the sensory systems that enable living beings to interact with their surroundings. Ambiguity adds variation to behavior, allows the brain to flexibly interact with the world, and lies at the bottom of the dynamics of sense, interpretations, and behavioral decisions

    From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot

    Get PDF
    This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally

    Developmentally deep perceptual system for a humanoid robot

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 139-152).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity.(cont.) This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.by Paul Michael Fitzpatrick.Ph.D
    • …
    corecore