5 research outputs found

    拡張隠れセミマルコフモデルによる複数系列データモデリングとデータ収集・管理手法

    Get PDF
    In recent years, with the development of devices and the development of data aggregation methods, data to be analyzed and aggregating methods have been changed. Regarding the environment of Internet of Things (IoT), sensors or devices are connected to the communication terminal as access point or mobile phone and the terminal aggregate the sensing data and upload them to the cloud server. From the viewpoint of analysis, the aggregated data are sequential data and the grouped sequence is a meaningful set of sequences because the group represents the owner\u27s information. However, most of the researches for sequential data analysis are specialized for the target data, and not focusing on the "grouped" sequences. In addition from the viewpoint of aggregation, it needs to prepare the special terminals as an access point. The preparation of the equipment takes labor and cost. To analyze the "grouped" sequence and aggregate them without any preparation, this paper aims to realize the analysis method for grouped sequences and to realize the aggregation environment virtually. For analysis of grouped sequential data, we firstly analyze the grouped sequential data focusing on the event sequences and extract the requirements for their modeling. The requirements are (1) the order of events, (2) the duration of the event, (3) the interval between two events, and (4) the overlap of the event. To satisfy all requirements, this paper focuses on the Hidden Semi Markov Model (HSMM) as a base model because it can model the order of events and the duration of event. Then, we consider how to model these sequences with HSMM and propose its extensions. For the former consideration, we propose two models; duration and interval hidden semi-Markov model and interval state hidden-semi Markov model to satisfy both the duration of event and the interval between events simultaneously. For the latter consideration, we consider how to satisfy all requirements including the overlap of the events and propose a new modeling methodology, over-lapped state hidden semi-Markov model. The performance of each method are shown compared with HSMM from the view point of the training and recognition time, the decoding performance, and the recognition performance in the simulation experiment. In the evaluation, practical application data are also used in the simulation and it shows the effectiveness. For the data aggregation, most of conventional approaches for aggregating the grouped data are limited using pre-allocated access points or terminals. It can obtain the grouped data stably, but it needs to additional cost to allocate such terminals in order to aggregate a new group of sequences. Therefore, this paper focus on "area based information" as a target of the grouped sequences, and propose an extraordinary method to store such information using the storage of the terminals that exist in the area. It realize the temporary area based storage virtually by relaying the information with existing terminals in the area. In this approach, it is necessary to restrict the labor of terminals and also store the information as long as possible. To control optimally while the trade-off, we propose methods to control the relay timing and the size of the target storage area in ad hoc dynamically. Simulators are established as practical environment to evaluate the performance of both controlling method. The results show the effectiveness of our method compared with flooding based relay control. As a result of above proposal and evaluation, methods for the grouped sequential data modeling and its aggregation are appeared. Finally, we summarize the research with applicable examples.電気通信大学201

    Temporal Segmentation of Human Actions in Videos

    Get PDF
    Understanding human actions in videos is of great interest in various scenarios ranging from surveillance over quality control in production processes to content-based video search. Algorithms for automatic temporal action segmentation need to overcome severe difficulties in order to be reliable and provide sufficiently good quality. Not only can human actions occur in different scenes and surroundings, the definition on an action itself is also inherently fuzzy, leading to a significant amount of inter-class variations. Moreover, besides finding the correct action label for a pre-defined temporal segment in a video, localizing an action in the first place is anything but trivial. Different actions not only vary in their appearance and duration but also can have long-range temporal dependencies that span over the complete video. Further, getting reliable annotations of large amounts of video data is time consuming and expensive. The goal of this thesis is to advance current approaches to temporal action segmentation. We therefore propose a generic framework that models the three components of the task explicitly, ie long-range temporal dependencies are handled by a context model, variations in segment durations are represented by a length model, and short-term appearance and motion of actions are addressed with a visual model. While the inspiration for the context model mainly comes from word sequence models in natural language processing, the visual model builds upon recent advances in the classification of pre-segmented action clips. Considering that long-range temporal context is crucial, we avoid local segmentation decisions and find the globally optimal temporal segmentation of a video under the explicit models. Throughout the thesis, we provide explicit formulations and training strategies for the proposed generic action segmentation framework under different supervision conditions. First, we address the task of fully supervised temporal action segmentation, where frame-level annotations are available during training. We show that our approach can outperform early sliding window baselines and recent deep architectures and that explicit length and context modeling leads to substantial improvements. Considering that full frame-level annotation is expensive to obtain, we then formulate a weakly supervised training algorithm that uses ordered sequences of actions occurring in the video as only supervision. While a first approach reduces the weakly supervised setup to a fully supervised setup by generating a pseudo ground-truth during training, we propose a second approach that avoids this intermediate step and allows to directly optimize a loss based on the weak supervision. Closing the gap between the fully and the weakly supervised setup, we moreover evaluate semi-supervised learning, where video frames are sparsely annotated. With the motivation that the vast amount of video data on the Internet only comes with meta-tags or content keywords that do not provide any temporal ordering information, we finally propose a method for action segmentation that learns from unordered sets of actions only. All approaches are evaluated on several commonly used benchmark datasets. With the proposed methods, we reach state-of-the-art performance for both, fully and weakly supervised action segmentation
    corecore