711 research outputs found

    A video-based framework for automatic 3d localization of multiple basketball players : a combinatorial optimization approach

    Get PDF
    Sports complexity must be investigated at competitions; therefore, non-invasive methods are essential. In this context, computer vision, image processing, and machine learning techniques can be useful in designing a non-invasive system for data acquisition that identifies players’ positions in official basketball matches. Here, we propose and evaluate a novel video-based framework to perform automatic 3D localization of multiple basketball players. The introduced framework comprises two parts. The first stage is player detection, which aims to identify players’ heads at the camera image level. This stage is based on background segmentation and on classification performed by an artificial neural network. The second stage is related to 3D reconstruction of the player positions from the images provided by the different cameras used in the acquisition. This task is tackled by formulating a constrained combinatorial optimization problem that minimizes the re-projection error while maximizing the number of detections in the formulated 3D localization problem8286CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO - CNPQCOORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIOR - CAPESFUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESPNão temNão temNão temWe would like to thank the CAPES, FAEPEX, FAPESP, and CNPq for funding their research. This paper has content from master degree’s dissertation previously published (Monezi, 2016) and available onlin

    Exploring and interrogating astrophysical data in virtual reality

    Get PDF
    Scientists across all disciplines increasingly rely on machine learning algorithms to analyse and sort datasets of ever increasing volume and complexity. Although trends and outliers are easily extracted, careful and close inspection will still be necessary to explore and disentangle detailed behaviour, as well as identify systematics and false positives. We must therefore incorporate new technologies to facilitate scientific analysis and exploration. Astrophysical data is inherently multi-parameter, with the spatial-kinematic dimensions at the core of observations and simulations. The arrival of mainstream virtual-reality (VR) headsets and increased GPU power, as well as the availability of versatile development tools for video games, has enabled scientists to deploy such technology to effectively interrogate and interact with complex data. In this paper we present development and results from custom-built interactive VR tools, called the iDaVIE suite, that are informed and driven by research on galaxy evolution, cosmic large-scale structure, galaxy–galaxy interactions, and gas/kinematics of nearby galaxies in survey and targeted observations. In the new era of Big Data ushered in by major facilities such as the SKA and LSST that render past analysis and refinement methods highly constrained, we believe that a paradigm shift to new software, technology and methods that exploit the power of visual perception, will play an increasingly important role in bridging the gap between statistical metrics and new discovery. We have released a beta version of the iDaVIE software system that is free and open to the community

    Light-driven micro-robotics for contemporary biophotonics.

    Get PDF

    Diffusion MRI tractography for oncological neurosurgery planning:Clinical research prototype

    Get PDF

    Contextual effects on visual perception

    Get PDF

    Diffusion MRI tractography for oncological neurosurgery planning:Clinical research prototype

    Get PDF

    Binaural scene analysis : localization, detection and recognition of speakers in complex acoustic scenes

    Get PDF
    The human auditory system has the striking ability to robustly localize and recognize a specific target source in complex acoustic environments while ignoring interfering sources. Surprisingly, this remarkable capability, which is referred to as auditory scene analysis, is achieved by only analyzing the waveforms reaching the two ears. Computers, however, are presently not able to compete with the performance achieved by the human auditory system, even in the restricted paradigm of confronting a computer algorithm based on binaural signals with a highly constrained version of auditory scene analysis, such as localizing a sound source in a reverberant environment or recognizing a speaker in the presence of interfering noise. In particular, the problem of focusing on an individual speech source in the presence of competing speakers, termed the cocktail party problem, has been proven to be extremely challenging for computer algorithms. The primary objective of this thesis is the development of a binaural scene analyzer that is able to jointly localize, detect and recognize multiple speech sources in the presence of reverberation and interfering noise. The processing of the proposed system is divided into three main stages: localization stage, detection of speech sources, and recognition of speaker identities. The only information that is assumed to be known a priori is the number of target speech sources that are present in the acoustic mixture. Furthermore, the aim of this work is to reduce the performance gap between humans and machines by improving the performance of the individual building blocks of the binaural scene analyzer. First, a binaural front-end inspired by auditory processing is designed to robustly determine the azimuth of multiple, simultaneously active sound sources in the presence of reverberation. The localization model builds on the supervised learning of azimuthdependent binaural cues, namely interaural time and level differences. Multi-conditional training is performed to incorporate the uncertainty of these binaural cues resulting from reverberation and the presence of competing sound sources. Second, a speech detection module that exploits the distinct spectral characteristics of speech and noise signals is developed to automatically select azimuthal positions that are likely to correspond to speech sources. Due to the established link between the localization stage and the recognition stage, which is realized by the speech detection module, the proposed binaural scene analyzer is able to selectively focus on a predefined number of speech sources that are positioned at unknown spatial locations, while ignoring interfering noise sources emerging from other spatial directions. Third, the speaker identities of all detected speech sources are recognized in the final stage of the model. To reduce the impact of environmental noise on the speaker recognition performance, a missing data classifier is combined with the adaptation of speaker models using a universal background model. This combination is particularly beneficial in nonstationary background noise

    Opening the low frequency window to the high redshift Universe

    Get PDF

    Opening the low frequency window to the high redshift Universe

    Get PDF

    Eye movements during listening reveal spontaneous grammatical processing

    Get PDF
    Recent research using eye-tracking typically relies on constrained visual contexts in particular goal-oriented contexts, viewing a small array of objects on a computer screen and performing some overt decision or identification. Eyetracking paradigms that use pictures as a measure of word or sentence comprehension are sometimes touted as ecologically invalid because pictures and explicit tasks are not always present during language comprehension. This study compared the comprehension of sentences with two different grammatical forms: the past progressive (e.g., was walking), which emphasizes the ongoing nature of actions, and the simple past (e.g., walked), which emphasizes the end-state of an action. The results showed that the distribution and timing of eye movements mirrors the underlying conceptual structure of this linguistic difference in the absence of any visual stimuli or task constraint: Fixations were shorter and saccades were more dispersed across the screen, as if thinking about more dynamic events when listening to the past progressive stories. Thus, eye movement data suggest that visual inputs or an explicit task are unnecessary to solicit analog representations of features such as movement, that could be a key perceptual component to grammatical comprehension
    • …
    corecore