18,609 research outputs found

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    The perils of automaticity

    Get PDF
    Classical theories of skill acquisition propose that automatization (i.e., performance requires progressively less attention as experience is acquired) is a defining characteristic of expertise in a variety of domains (e.g., Fitts & Posner, 1967). Automaticity is believed to enhance smooth and efficient skill execution by allowing performers to focus on strategic elements of performance rather than on the mechanical details that govern task implementation (Williams & Ford, 2008). By contrast, conscious processing (i.e., paying conscious attention to one’s action during motor execution) has been found to disrupt skilled movement and performance proficiency (e.g., Beilock & Carr, 2001). On the basis of this evidence, researchers have tended to extol the virtues of automaticity. However, few researchers have considered the wide range of empirical evidence which indicates that highly automated behaviors can, on occasion, lead to a series of errors that may prove deleterious to skilled performance. Therefore, the purpose of the current paper is to highlight the perils, rather than the virtues, of automaticity. We draw on Reason’s (1990) classification scheme of everyday errors to show how an overreliance on automated procedures may lead to 3 specific performance errors (i.e., mistakes, slips, and lapses) in a variety of skill domains (e.g., sport, dance, music). We conclude by arguing that skilled performance requires the dynamic interplay of automatic processing and conscious processing in order to avoid performance errors and to meet the contextually contingent demands that characterize competitive environments in a range of skill domains

    Aerospace medicine and biology: A continuing bibliography with indexes (supplement 359)

    Get PDF
    This bibliography lists 164 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during Jan. 1992. Subject coverage includes: aerospace medicine and physiology, life support systems and man/system technology, protective clothing, exobiology and extraterrestrial life, planetary biology, and flight crew behavior and performance

    Towards Multi-modal Explainable Video Understanding

    Get PDF
    This thesis presents a novel approach to video understanding by emulating human perceptual processes and creating an explainable and coherent storytelling representation of video content. Central to this approach is the development of a Visual-Linguistic (VL) feature for an interpretable video representation and the creation of a Transformer-in-Transformer (TinT) decoder for modeling intra- and inter-event coherence in a video. Drawing inspiration from the way humans comprehend scenes by breaking them down into visual and non-visual components, the proposed VL feature models a scene through three distinct modalities. These include: (i) a global visual environment, providing a broad contextual understanding of the scene; (ii) local visual main agents, focusing on key elements or entities in the video; and (iii) linguistic scene elements, incorporating semantically relevant language-based information for a comprehensive understanding of the scene. By integrating these multimodal features, the VL representation offers a rich, diverse, and interpretable view of video content, effectively bridging the gap between visual perception and linguistic description. To ensure the temporal coherence and narrative structure of the video content, we introduce an autoregressive Transformer-in-Transformer (TinT) decoder. The TinT design consists of a nested architecture where the inner transformer models the intra-event coherency, capturing the semantic connections within individual events, while the outer transformer models the inter-event coherency, identifying the relationships and transitions between different events. This dual-layer transformer structure facilitates the generation of accurate and meaningful video descriptions that reflect the chronological and causal links in the video content. Another crucial aspect of this work is the introduction of a novel VL contrastive loss function. This function plays an essential role in ensuring that the learned embedding features are semantically consistent with the video captions. By aligning the embeddings with the ground truth captions, the VL contrastive loss function enhances the model\u27s performance and contributes to the quality of the generated descriptions. The efficacy of our proposed methods is validated through comprehensive experiments on popular video understanding benchmarks. The results demonstrate superior performance in terms of both the accuracy and diversity of the generated captions, highlighting the potential of our approach in advancing the field of video understanding. In conclusion, this thesis provides a promising pathway toward building explainable video understanding models. By emulating human perception processes, leveraging multimodal features, and incorporating a nested transformer design, we contribute a new perspective to the field, paving the way for more advanced and intuitive video understanding systems in the future

    Towards Multi-modal Explainable Video Understanding

    Get PDF
    This thesis presents a novel approach to video understanding by emulating human perceptual processes and creating an explainable and coherent storytelling representation of video content. Central to this approach is the development of a Visual-Linguistic (VL) feature for an interpretable video representation and the creation of a Transformer-in-Transformer (TinT) decoder for modeling intra- and inter-event coherence in a video. Drawing inspiration from the way humans comprehend scenes by breaking them down into visual and non-visual components, the proposed VL feature models a scene through three distinct modalities. These include: (i) a global visual environment, providing a broad contextual understanding of the scene; (ii) local visual main agents, focusing on key elements or entities in the video; and (iii) linguistic scene elements, incorporating semantically relevant language-based information for a comprehensive understanding of the scene. By integrating these multimodal features, the VL representation offers a rich, diverse, and interpretable view of video content, effectively bridging the gap between visual perception and linguistic description. To ensure the temporal coherence and narrative structure of the video content, we introduce an autoregressive Transformer-in-Transformer (TinT) decoder. The TinT design consists of a nested architecture where the inner transformer models the intra-event coherency, capturing the semantic connections within individual events, while the outer transformer models the inter-event coherency, identifying the relationships and transitions between different events. This dual-layer transformer structure facilitates the generation of accurate and meaningful video descriptions that reflect the chronological and causal links in the video content. Another crucial aspect of this work is the introduction of a novel VL contrastive loss function. This function plays an essential role in ensuring that the learned embedding features are semantically consistent with the video captions. By aligning the embeddings with the ground truth captions, the VL contrastive loss function enhances the model\u27s performance and contributes to the quality of the generated descriptions. The efficacy of our proposed methods is validated through comprehensive experiments on popular video understanding benchmarks. The results demonstrate superior performance in terms of both the accuracy and diversity of the generated captions, highlighting the potential of our approach in advancing the field of video understanding. In conclusion, this thesis provides a promising pathway toward building explainable video understanding models. By emulating human perception processes, leveraging multimodal features, and incorporating a nested transformer design, we contribute a new perspective to the field, paving the way for more advanced and intuitive video understanding systems in the future

    Biological motion perception in Parkinson's disease

    Full text link
    Parkinson’s disease (PD) disrupts many aspects of visual perception, which has negative functional consequences. How PD affects perception of moving human bodies, or biological motion, is unknown. The ability to accurately perceive others’ motion is related to one’s own motor ability and depends on the integrity of brain areas affected in PD, including superior temporal sulcus and premotor cortex. Biological motion perception may therefore be compromised in PD but also provide a target for intervention, with perceptual training potentially improving motor function. Experiment 1 investigated whether perception of biological motion was impaired in PD (N=26) relative to neurologically-healthy control (NC; N=24) individuals. Participants viewed videos of point-light human figures and judged whether or not they depicted walking. As predicted, PD were less sensitive to biological motion than NC. This deficit was not associated with participants’ own walking difficulties or with other perceptual deficits (contrast sensitivity, coherent motion perception). Experiment 2 evaluated the hypothesis that PD deficits would extend to more socially-complex biological motion. PD (N=23) and NC (N=24) viewed point-light figures depicting communicative and non-communicative (object-oriented) gestures. The PD group was less accurate than NC in describing non-communicative gestures, an effect driven by PD men, who also had difficulty perceiving communicative gestures. Experiment 3 tested the efficacy of perceptual training for PD. Because biological motion perception is associated with motor function, it was hypothesized that perceptual training would improve walking. Individuals with PD were randomized to Gait Observation (N=13; viewing videos of healthy and unhealthy gait) or Landscape Observation (N=10; viewing videos of moving water) and trained daily for one week while gait data were collected with accelerometers. Post-training, only the Gait Observation group self-reported increased mobility, though improvements were not seen in objective gait data (daily activity, walking speed, stride length, stride frequency, leg swing time, gait asymmetry). These studies demonstrate that individuals with PD have difficulty perceiving biological motion (walking and socially-complex gestures). Improving biological motion perception led to enhancement in self-perceived walking ability. Perceptual training that incorporates more explicit learning over a longer time period may be required to effect objective improvements in walking.2018-12-06T00:00:00

    Combining local features and region segmentation: methods and applications

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 23-01-2020Esta tesis tiene embargado el acceso al texto completo hasta el 23-07-2021Muchas y muy diferentes son las propuestas que se han desarrollado en el área de la visión artificial para la extracción de información de las imágenes y su posterior uso. Entra las más destacadas se encuentran las conocidas como características locales, del inglés local features, que detectan puntos o áreas de la imagen con ciertas características de interés, y las describen usando información de su entorno (local). También destacan las regiones en este área, y en especial este trabajo se ha centrado en los segmentadores en regiones, cuyo objetivo es agrupar la información de la imagen atendiendo a diversos criterios. Pese al enorme potencial de estas técnicas, y su probado éxito en diversas aplicaciones, su definición lleva implícita una serie de limitaciones funcionales que les han impedido exportar sus capacidades a otras áreas de aplicación. Se pretende impulsar el uso de estas herramientas en dichas aplicaciones, y por tanto mejorar los resultados del estado del arte, mediante la propuesta de un marco de desarrollo de nuevas soluciones. En concreto, la hipótesis principal del proyecto es que las capacidades de las características locales y los segmentadores en regiones son complementarias, y que su combinación, realizada de la forma adecuada, las maximiza a la vez que minimiza sus limitaciones. El principal objetivo, y por tanto la principal contribución del proyecto, es validar dicha hipótesis mediante la propuesta de un marco de desarrollo de nuevas soluciones combinando características locales y segmentadores para técnicas con capacidades mejoradas. Al tratarse de un marco de combinación de dos técnicas, el proceso de validación se ha llevado a cabo en dos pasos. En primer lugar se ha planteado el caso del uso de segmentadores en regiones para mejorar las características locales. Para verificar la viabilidad y el éxito de esta combinación se ha desarrollado una propuesta específica, SP-SIFT, que se ha validado tanto a nivel experimental como a nivel de aplicación real, en concreto como técnica principal de algoritmos de seguimiento de objetos. En segundo lugar, se ha planteado el caso de uso de características locales para mejorar los segmentadores en regiones. Para verificar la viabilidad y el éxito de esta combinación se ha desarrollado una propuesta específica, LF-SLIC, que se ha validado tanto a nivel experimental como a nivel de aplicación real, en concreto como técnica principal de un algoritmo de segmentación de lesiones pigmentadas de la piel. Los resultados conceptuales han probado que las técnicas mejoran a nivel de capacidades. Los resultados aplicados han probado que estas mejoras permiten el uso de estas técnicas en aplicaciones donde antes no tenían éxito. Con ello, se ha considerado la hipótesis validada, y por tanto exitosa la definición de un marco para el desarrollo de nuevas técnicas específicas con capacidades mejoradas. En conclusión, la principal aportación de la tesis es el marco de combinación de técnicas, plasmada en sus dos propuestas específicas: características locales mejoradas con segmentadores y segmentadores mejorados con características locales, y en el éxito conseguido en sus aplicaciones.A huge number of proposals have been developed in the area of computer vision for information extraction from images, and its further use. One of the most prevalent solutions are those known as local features. They detect points or areas of the image with certain characteristics of interest, and describe them using information from their (local) environment. The regions also stand out in the area, and especially this work has focused on the region segmentation algorithms, whose objective is to group the information of the image according to di erent criteria. Despite the enormous potential of these techniques, and their proven success in a number of applications, their de nition implies a series of functional limitations that have prevented them from exporting their capabilities to other application areas. In this thesis, it is intended to promote the use of these tools in these applications, and therefore improve the results of the state of the art, by proposing a framework for developing new solutions. Speci cally, the main hypothesis of the project is that the capacities of the local features and the region segmentation algorithms are complementary, and thus their combination, carried out in the right way, maximizes them while minimizing their limitations. The main objective, and therefore the main contribution of the thesis, is to validate this hypothesis by proposing a framework for developing new solutions combining local features and region segmentation algorithms, obtaining solutions with improved capabilities. As the hypothesis is proposing to combine two techniques, the validation process has been carried out in two steps. First, the use case of region segmentation algorithms enhancing local features. In order to verify the viability and success of this combination, a speci c proposal, SP-SIFT, was been developed. This proposal was validated both experimentally and in a real application scenario, speci cally as the main technique of object tracking algorithms. Second, the use case of enhancing region segmentation algorithm with local features. In order to verify the viability and success of this combination, a speci c proposal, LF-SLIC, was developed. The proposal was validated both experimentally and in a real application scenario, speci cally as the main technique of a pigmented skin lesions segmentation algorithm. The conceptual results proved that the techniques improve at the capabilities level. The application results proved that these improvements allow the use of this techniques in applications where they were previously unsuccessful. Thus, the hypothesis can be considered validated, and therefore the de nition of a framework for the development of new techniques with improved capabilities can be considered successful. In conclusion, the main contribution of the thesis is the framework for the combination of techniques, embodied in the two speci c proposals: enhanced local features with region segmentation algorithms, and region segmentation algorithms enhanced with local features; and in the success achieved in their applications.The work described in this Thesis was carried out within the Video Processing and Understanding Lab at the Department of Tecnología Electrónica y de las Comunicaciones, Escuela Politécnica Superior, Universidad Autónoma de Madrid (from 2014 to 2019). It was partially supported by the Spanish Government (TEC2014-53176-R, HAVideo)

    Exploring the Touch and Motion Features in Game-Based Cognitive Assessments

    Get PDF
    Early detection of cognitive decline is important for timely intervention and treatment strategies to prevent further deterioration or development of more severe cognitive impairment, as well as identify at risk individuals for research. In this paper, we explore the feasibility of using data collected from built-in sensors of mobile phone and gameplay performance in mobile-game-based cognitive assessments. Twenty-two healthy participants took part in the two-session experiment where they were asked to take a series of standard cognitive assessments followed by playing three popular mobile games in which user-game interaction data were passively collected. The results from bivariate analysis reveal correlations between our proposed features and scores obtained from paper-based cognitive assessments. Our results show that touch gestural interaction and device motion patterns can be used as supplementary features on mobile game-based cognitive measurement. This study provides initial evidence that game related metrics on existing off-the-shelf games have potential to be used as proxies for conventional cognitive measures, specifically for visuospatial function, visual search capability, mental flexibility, memory and attention
    corecore