32 research outputs found

    Learning efficient temporal information in deep networks: From the viewpoints of applications and modeling

    Get PDF
    With the introduction of deep learning, machine learning has dominated several technology areas, giving birth to high-performance applications that can even challenge human-level accuracy. However, the complexity of deep models is also exploding as a by-product of the revolution of machine learning. Such enormous model complexity has raised the new challenge of improving the efficiency in deep models to reduce deployment expense, especially for systems with high throughput demands or devices with limited power. The dissertation aims to improve the efficiency of temporal-sensitive deep models in four different directions. First, we develop a bandwidth extension mapping to avoid deploying multiple speech recognition systems corresponding to wideband and narrowband signals. Second, we apply a multi-modality approach to compensate for the performance of an excitement scoring system, where the input video sequences are aggressively down-sampled to reduce throughput. Third, we formulate the motion feature in the feature space by directly inducing the temporal information from intermediate layers of deep networks instead of relying on an additional optical flow stream. Finally, we model a spatiotemporal sampling network inspired by the human visual perception mechanism to reduce input frames and regions adaptively

    The Development Of Glide Deletion In Seoul Korean: A Corpus And Articulatory Study

    Get PDF
    This dissertation investigates the pathways and causes of the development of glide deletion in Seoul Korean. Seoul provides fertile ground for studies of linguistic innovation in an urban setting since it has seen rapid historical, social and demographic changes in the twentieth century. The phenomenon under investigation is the variable deletion of the labiovelar glide /w/ found to be on the rise in Seoul Korean (Silva, 1991; Kang, 1997). I present two studies addressing variation and change at two different levels: a corpus study tracking the development of /w/-deletion at the phonological level and an articulatory study examining the phonetic aspect of this change. The corpus data are drawn from the sociolinguistic interviews with 48 native Seoul Koreans between 2015 and 2017. A trend comparison with the data from an earlier study of /w/- deletion (Kang, 1997) reveals that /w/-deletion in postconsonantal position has begun to retreat, while non-postconsonantal /w/-deletion has been rising vigorously. More importantly, the effect of preceding segment that used to be the strongest constraint on /w/-deletion has weakened over time. I conclude that /w/-deletion in Seoul Korean is being reanalyzed with the structural details being diluted over time. I analyze this weakening of the original pattern as the result of linguistic diffusion induced by a great influx of migrants into Seoul after the Korean War (1950-1953). In an articulatory study, ultrasound data of tongue movements and video data of lip rounding for the production of /w/ for three native Seoul Koreans in their 20s, 30s and 50s were analyzed using Optical Flow Analysis. I find that /w/ in Seoul Korean is subject to both gradient reduction and categorical deletion and that younger speakers exhibit a significantly larger articulatory gestures for /w/ after a bilabial than older generation, which is consistent with the pattern of phonological change found in the corpus study. This dissertation demonstrates the importance of using both corpus and articulatory data in the investigation of a change, finding the coexistence of gradient and categorical effects in segmental deletion processes. Finally, it advances our understanding of the outcome of migration-induced dialect contact in contemporary urban settings

    Analysis and recognition of human actions with flow features and temporal models

    Get PDF
    This work focuses the recognition of complex human activities in video data. A combination of new features and techniques from speech recognition is used to realize a recognition of action units and their combinations in video sequences. The presented approach shows how motion information gained from video data can be used to interpret the underlying structural information of actions and how higher level models allow an abstraction of different motion categories beyond simple classification

    The Panhellenic Project: assessing learning engagement using Web 2.0 technologies

    Get PDF
    High attrition rates have been a consistent occurrence among online learners, creating the challenge of how to design online instruction for the type of learning that encourages student engagement. With new technologies constantly evolving, the question becomes how educators can use these new web-based applications to engage students and possibly resolve the problem of high attrition among online learners? The purpose of this study was to assess the level of learning engagement through student participation in The Panhellenic Project, an instructional design model that integrated constructivist learning principles with Web 2.0 technologies. Additionally, the usefulness of structured orientations to the Web 2.0 technologies and the effectiveness of these technologies was also investigated. Using a mixed-methods case study design, The Panhellenic Project was framed around a collaborative group activity where undergraduate students worked in teams with the task of creating a three-dimensional virtual ancient Greek Parthenon and one ancient Olympic game event within the Second Life virtual world. A project wiki was established for student-participants to research sports history as well as share knowledge, information and resources. An informational blog with project resource information was developed as a Second Life learning reference. Multiple sources were used to capture data including the Survey of Student Engagement, pre- and post-project questionnaires, and electronic discourse analysis of wiki posts and Second Life chat transcripts. Research finding showed that the majority of the student-participants were engaged in The Panhellenic Project and that learning had occurred over the length of project implementation. The structured orientation and training sessions were perceived as effective in connecting theoretical and practical knowledge, though not effective for teaching students to use the Second Life virtual world. Overall, the level of difficulty experienced in learning the application influenced student-participant perceptions about the effectiveness of the Web 2.0 technologies used in this study. Further, analysis of the data revealed that the participants consistently demonstrated constructivist learning activities through interaction with other learners, collaborative teamwork and the sharing of multiple perspectives as they completed The Panhellenic Project

    Urban Studies

    Get PDF
    This work contains a selection of papers from the International Conference on Urban Studies (ICUS 2017) and is a bi-annual periodical publication containing articles on urban cultural studies based on the international conference organized by the Faculty of Humanities at the Universitas Airlangga, Indonesia. This publication contains studies on issues that become phenomena in urban life, including linguistics, literary, identity, gender, architecture, media, locality, globalization, the dynamics of urban society and culture, and urban history

    Temporal Segmentation of Human Actions in Videos

    Get PDF
    Understanding human actions in videos is of great interest in various scenarios ranging from surveillance over quality control in production processes to content-based video search. Algorithms for automatic temporal action segmentation need to overcome severe difficulties in order to be reliable and provide sufficiently good quality. Not only can human actions occur in different scenes and surroundings, the definition on an action itself is also inherently fuzzy, leading to a significant amount of inter-class variations. Moreover, besides finding the correct action label for a pre-defined temporal segment in a video, localizing an action in the first place is anything but trivial. Different actions not only vary in their appearance and duration but also can have long-range temporal dependencies that span over the complete video. Further, getting reliable annotations of large amounts of video data is time consuming and expensive. The goal of this thesis is to advance current approaches to temporal action segmentation. We therefore propose a generic framework that models the three components of the task explicitly, ie long-range temporal dependencies are handled by a context model, variations in segment durations are represented by a length model, and short-term appearance and motion of actions are addressed with a visual model. While the inspiration for the context model mainly comes from word sequence models in natural language processing, the visual model builds upon recent advances in the classification of pre-segmented action clips. Considering that long-range temporal context is crucial, we avoid local segmentation decisions and find the globally optimal temporal segmentation of a video under the explicit models. Throughout the thesis, we provide explicit formulations and training strategies for the proposed generic action segmentation framework under different supervision conditions. First, we address the task of fully supervised temporal action segmentation, where frame-level annotations are available during training. We show that our approach can outperform early sliding window baselines and recent deep architectures and that explicit length and context modeling leads to substantial improvements. Considering that full frame-level annotation is expensive to obtain, we then formulate a weakly supervised training algorithm that uses ordered sequences of actions occurring in the video as only supervision. While a first approach reduces the weakly supervised setup to a fully supervised setup by generating a pseudo ground-truth during training, we propose a second approach that avoids this intermediate step and allows to directly optimize a loss based on the weak supervision. Closing the gap between the fully and the weakly supervised setup, we moreover evaluate semi-supervised learning, where video frames are sparsely annotated. With the motivation that the vast amount of video data on the Internet only comes with meta-tags or content keywords that do not provide any temporal ordering information, we finally propose a method for action segmentation that learns from unordered sets of actions only. All approaches are evaluated on several commonly used benchmark datasets. With the proposed methods, we reach state-of-the-art performance for both, fully and weakly supervised action segmentation

    Reports to the President

    Get PDF
    A compilation of annual reports for the 1982-1983 academic year, including a report from the President of the Massachusetts Institute of Technology, as well as reports from the academic and administrative units of the Institute. The reports outline the year's goals, accomplishments, honors and awards, and future plans

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today

    The Palgrave Handbook of Digital Russia Studies

    Get PDF
    This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore