35 research outputs found

    Psychophysics, Gestalts and Games

    Get PDF
    International audienceMany psychophysical studies are dedicated to the evaluation of the human gestalt detection on dot or Gabor patterns, and to model its dependence on the pattern and background parameters. Nevertheless, even for these constrained percepts, psychophysics have not yet reached the challenging prediction stage, where human detection would be quantitatively predicted by a (generic) model. On the other hand, Computer Vision has attempted at defining automatic detection thresholds. This chapter sketches a procedure to confront these two methodologies inspired in gestaltism. Using a computational quantitative version of the non-accidentalness principle, we raise the possibility that the psychophysical and the (older) gestaltist setups, both applicable on dot or Gabor patterns, find a useful complement in a Turing test. In our perceptual Turing test, human performance is compared by the scientist to the detection result given by a computer. This confrontation permits to revive the abandoned method of gestaltic games. We sketch the elaboration of such a game, where the subjects of the experiment are confronted to an alignment detection algorithm, and are invited to draw examples that will fool it. We show that in that way a more precise definition of the alignment gestalt and of its computational formulation seems to emerge. Detection algorithms might also be relevant to more classic psychophysical setups, where they can again play the role of a Turing test. To a visual experiment where subjects were invited to detect alignments in Gabor patterns, we associated a single function measuring the alignment detectability in the form of a number of false alarms (NFA). The first results indicate that the values of the NFA, as a function of all simulation parameters, are highly correlated to the human detection. This fact, that we intend to support by further experiments , might end up confirming that human alignment detection is the result of a single mechanism

    VOICE QUALITY AND TV INTERPRETING: A PROPOSAL FOR A GESTALTIC EVALUATION

    Get PDF
    RESUMEN. La presente tesis doctoral es un estudio de interpretaci\uf3n basado en corpus y consiste en una propuesta de evaluaci\uf3n subjetiva de tipo gest\ue1ltico de la interpretaci\uf3n simult\ue1nea transmitida por televisi\uf3n. El objetivo principal del estudio ha sido la construcci\uf3n de un modelo de evaluaci\uf3n de la calidad basado en la percepci\uf3n gest\ue1ltica del habla y del sonido-imagen percibido a trav\ue9s del medio auiovisual. El modelo de percepci\uf3n gest\ue1ltica adoptado est\ue1 formado por voz-s\uedlaba-prosodia-sentido-contexto-conocimiento (ling\u3cb\uedstico) del mundo, propuesto en \u201cIl volto fonico delle parole\u201d (Albano Leoni 2009), que es una reelaboraci\uf3n del modelo basado en melod\ueda-ritmo-palabras-oraciones, propuesto por Karl B\u3cbhler en su \u201cTeor\ueda del lenguaje\u201d (1934). Se construy\uf3 un corpus tem\ue1tico formado por las interpretaciones en italiano (2) y en espa\uf1ol (2 \u2013 Espa\uf1a y Estados Unidos) de los Debates Presidenciales de Estados Unidos de 2012: el corpus ORenesit (Obama-Romney English espa\uf1ol italiano) se incluye en el corpus de referencia CorIT (Corpus Italiano de Interpretaci\uf3n Televisiva). El modelo de evaluaci\uf3n fue ensayado en una encuesta piloto basada en cuestionario, que incluye 3 extractos v\ueddeo de la interpretaci\uf3n en italiano del Tercer Debate Presidencial de EE.UU. de 2008, entre Obama y McCain, debido a que el corpus ORenesit todav\ueda no se hab\ueda terminado. Uno de los tres v\ueddeos fue modificado por fines experimentales: la voz del int\ue9rprete original se sustituy\uf3 por la de un actor doblador profesional que imit\uf3 en estudio la interpretaci\uf3n original leyendo la transcripci\uf3n y escuchando al orador. Esta decisi\uf3n respond\ueda a dos necesidades, relacionadas sobre todo a la validez ecol\uf3gica del experimento: a) ensayar el efecto de una voz teleg\ue9nica; b) utilizar la expresi\uf3n natural y personal del sujeto. El cuestionario se construy\uf3 sobre categor\uedas extra\ueddas de \u201cLa vive voix\u201d (F\uf3nagy 1983) e \u201cL\u2019Audio-Vision\u201d (Chion 1990). Los datos obtenidos del cuestionario se trataron estad\uedsticamente. Los resultados del estudio cuali-cuantitativo parecen confirmar una percepci\uf3n gest\ue1ltica de la interpretaci\uf3n simult\ue1nea percibida a trav\ue9s del medio audio-visual formada por las componentes: sonido-imagen, s\uedlaba-melod\ueda(-voz-personalidad), palabras-oraciones. Lor resultados parecen poner en duda la efectividad del enfoque cuantitativo para el an\ue1lisis de la percepci\uf3n del habla.ABSTRACT. The present thesis is a corpus-based Interpreting study consisting of a proposal for a gestaltic subjective evaluation of quality in television broadcast simultaneous interpreting. The main objective of the research was to build and test a model of quality assessment based on the gestaltic perception both of speech and the sound-image perceived through the audiovisual medium. The model of gestaltic perception adopted is the one formed by voice-syllable-prosody-sense-context-(linguistic) knowledge of the world, proposed in \u201cIl volto fonico delle parole\u201d (Albano Leoni 2009), which is a re-elaborated version of the model based on melody-rhythm-words-sentences, proposed by Karl B\u3cbhler in his \u201cTheory of Language\u201d (1934). A thematic corpus was built consisting of 2 Italian and 2 Spanish (Spain and United States) interpretations of the 2012 US Presidential Debates: the corpus ORenesit (Obama-Romney English espa\uf1ol italiano) is included in the reference corpus CorIT (Italian Television Interpreting Corpus). The assessment model was tested in a questionnaire-based pilot survey including 3 video excerpts from the Italian interpretations of the 2008 Third Presidential Debate (Obama vs. McCain), since the corpus ORenesit had not been completed yet. One of the 3 video excerpts was modified for experimental purpose: the interpreter\u2019s voice was replaced with the voice of a professional actor and dubber, who imitated in studio the original interpretation while reading the transcript and listening to the speaker. This choice was made to fulfill two needs, mainly related to the ecological validity of the experiment: i) to test the effect of a telegenic voice; and ii) to use a natural and personal expression of the subject. The questionnaire was built on categories extracted from the \u201cLa vive voix\u201d (F\uf3nagy 1983) and \u201cL\u2019Audio-Vision\u201d (Chion 1990). The data obtained were treated statistically. Results of the qualitative and quantitative research seem to confirm a gestaltic perception of interpreting speech received through audio-vision and formed by the following components: sound-image; syllable-melody(-voice-personality), words-sentences. Results seem to raise doubts on the effectiveness of the quantitative approach to the analysis of speech perception

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    A machine learning approach to the unsupervised segmentation of mitochondria in subcellular electron microscopy data

    Get PDF
    Recent advances in cellular and subcellular microscopy demonstrated its potential towards unravelling the mechanisms of various diseases at the molecular level. The biggest challenge in both human- and computer-based visual analysis of micrographs is the variety of nanostructures and mitochondrial morphologies. The state-of-the-art is, however, dominated by supervised manual data annotation and early attempts to automate the segmentation process were based on supervised machine learning techniques which require large datasets for training. Given a minimal number of training sequences or none at all, unsupervised machine learning formulations, such as spectral dimensionality reduction, are known to be superior in detecting salient image structures. This thesis presents three major contributions developed around the spectral clustering framework which is proven to capture perceptual organization features. Firstly, we approach the problem of mitochondria localization. We propose a novel grouping method for the extracted line segments which describes the normal mitochondrial morphology. Experimental findings show that the clusters obtained successfully model the inner mitochondrial membrane folding and therefore can be used as markers for the subsequent segmentation approaches. Secondly, we developed an unsupervised mitochondria segmentation framework. This method follows the evolutional ability of human vision to extrapolate salient membrane structures in a micrograph. Furthermore, we designed robust non-parametric similarity models according to Gestaltic laws of visual segregation. Experiments demonstrate that such models automatically adapt to the statistical structure of the biological domain and return optimal performance in pixel classification tasks under the wide variety of distributional assumptions. The last major contribution addresses the computational complexity of spectral clustering. Here, we introduced a new anticorrelation-based spectral clustering formulation with the objective to improve both: speed and quality of segmentation. The experimental findings showed the applicability of our dimensionality reduction algorithm to very large scale problems as well as asymmetric, dense and non-Euclidean datasets

    Conformal anomaly detection for visual reconstruction using gestalt principles

    Get PDF
    In this paper, we combine a modern machine learning technique called conformal predictors (CP) with elements of gestalt detection and apply them to the problem of visual perception in digital images. Our main task is to quantify several gestalt principles of visual reconstruction. We interpret an image/shape as being perceivable (meaningful) if it sufficiently deviates from randomness - in other words, the image could hardly happen by chance. These deviations from randomness are measured by using conformal prediction technique that can guarantee the validity under certain assumptions. The technique describes the detection of perceivable images that allows to bound the number of false alarms, i.e. the proportion of non-perceivable images wrongly detected as perceivable

    LISTENING PATTERNS. From Music to Perception and Cognition

    Get PDF
    The research aims to propose a narrative of the experience of listening and to provide some first examples of its possible application. This is done in three parts. Part One, “Words”, aims to methodologically frame the narrative by discussing the limits and requirements of a theory of listening. After discussing the difficulties of building an objective characterization of the listening experience, the research proposes that any theorization on listening can only express a point of view that is implied by descriptions of listening both in linguistic terms and in the data they involve. The analysis of theories about listening is therefore conducted through a grammatical path that unfolds by following the syntactic roles of the words involved in theoretical claims about listening. Starting from the problem of synonymy, the analysis moves around the subject, the object, adjectives and adverbs to finally discuss the status of the references of the discourses on listening. The Part One ends by claiming the need to reintroduce the subject in theories about listening and proposes to attribute the epistemological status of the narrative to any discourse about the listening experience. This implies that any proposed narrative must substitute its truth-value with the instrumental value that is expressed by the idea of “viability”. The Part Two, “Patterns”, is devoted to introducing a narrative of listening. This is first informally introduced in terms of the experience of a distinction within the sonic flow. After an intermission dedicated to connecting the idea of distinction to Gaston Bachelard’s metaphysics of time, the narrative is finally presented as a dialectics among three ways of organizing perceptive distinctions. Three perceptive modes of distinctions are presented as a basic mechanism that is responsible for articulating the sonic continuum in a complex structure of expectations and reactions, in terms of patterns, that is constantly renewed under the direction of statistical learning. The final chapter of the Part Two aims to briefly apply the narrative of pattern structures to dealing with the experience of noise. Part Three aims to show the “viability” of the proposed narrative of listening. First, a method for analysing music by listening is discussed. Then, a second chapter puts the idea of pattern structures in contact with music composition, as a framework that can be applied to data sonification, installations, music production and to the didactics of composition. Finally, the last chapter is devoted to the discussion of the idea of “soundscape” and “identity formation”, in order to show the potential of applying the proposed narrative to the context of cultural and social studies
    corecore