3,611 research outputs found

    DeepStory: Video Story QA by Deep Embedded Memory Networks

    Full text link
    Question-answering (QA) on video contents is a significant challenge for achieving human-level intelligence as it involves both vision and language in real-world settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a video-story learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. The video stories are stored in a long-term memory component. For a given question, an LSTM-based attention model uses the long-term memory to recall the best question-story-answer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children's cartoon video series, Pororo. The dataset contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained sentences for scene description, and 8,913 story-related QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved state-of-the-art results on the MovieQA benchmark.Comment: 7 pages, accepted for IJCAI 201

    Herd Behaviors in Financial Markets

    Full text link
    We investigate the herd behavior of returns for the yen-dollar exchange rate in the Japanese financial market. It is obtained that the probability distribution P(R)P(R) of returns RR satisfies the power-law behavior P(R)RβP(R) \simeq R^{-\beta} with the exponents β=3.11 \beta=3.11(the time interval τ=\tau= one minute) and 3.36(τ=\tau= one day). The informational cascade regime appears in the herding parameter H2.33H\ge 2.33 at τ=\tau= one minute, while it occurs no herding at τ=\tau= one day. Especially, we find that the distribution of normalized returns shows a crossover to a Gaussian distribution at one time step Δt=1\Delta t=1 day.Comment: 15 pages, 6 figure

    Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

    Full text link
    Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor. Multi-scale aggregation (MSA), which utilizes multi-scale features from different layers of the feature extractor, has recently been introduced and shows superior performance for variable-duration utterances. To increase the robustness dealing with utterances of arbitrary duration, this paper improves the MSA by using a feature pyramid module. The module enhances speaker-discriminative information of features from multiple layers via a top-down pathway and lateral connections. We extract speaker embeddings using the enhanced features that contain rich speaker information with different time scales. Experiments on the VoxCeleb dataset show that the proposed module improves previous MSA methods with a smaller number of parameters. It also achieves better performance than state-of-the-art approaches for both short and long utterances.Comment: Accepted to Interspeech 202

    TiDAL: Learning Training Dynamics for Active Learning

    Full text link
    Active learning (AL) aims to select the most useful data samples from an unlabeled data pool and annotate them to expand the labeled dataset under a limited budget. Especially, uncertainty-based methods choose the most uncertain samples, which are known to be effective in improving model performance. However, AL literature often overlooks training dynamics (TD), defined as the ever-changing model behavior during optimization via stochastic gradient descent, even though other areas of literature have empirically shown that TD provides important clues for measuring the sample uncertainty. In this paper, we propose a novel AL method, Training Dynamics for Active Learning (TiDAL), which leverages the TD to quantify uncertainties of unlabeled data. Since tracking the TD of all the large-scale unlabeled data is impractical, TiDAL utilizes an additional prediction module that learns the TD of labeled data. To further justify the design of TiDAL, we provide theoretical and empirical evidence to argue the usefulness of leveraging TD for AL. Experimental results show that our TiDAL achieves better or comparable performance on both balanced and imbalanced benchmark datasets compared to state-of-the-art AL methods, which estimate data uncertainty using only static information after model training.Comment: ICCV 2023 Camera-Read

    Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection

    Full text link
    Recent studies on learning with noisy labels have shown remarkable performance by exploiting a small clean dataset. In particular, model agnostic meta-learning-based label correction methods further improve performance by correcting noisy labels on the fly. However, there is no safeguard on the label miscorrection, resulting in unavoidable performance degradation. Moreover, every training step requires at least three back-propagations, significantly slowing down the training speed. To mitigate these issues, we propose a robust and efficient method that learns a label transition matrix on the fly. Employing the transition matrix makes the classifier skeptical about all the corrected samples, which alleviates the miscorrection issue. We also introduce a two-head architecture to efficiently estimate the label transition matrix every iteration within a single back-propagation, so that the estimated matrix closely follows the shifting noise distribution induced by label correction. Extensive experiments demonstrate that our approach shows the best performance in training efficiency while having comparable or better accuracy than existing methods.Comment: ECCV202

    COMPARISON OF THE RISK FACTORS OF KOREAN ADOLESCENT SUICIDE RESIDING IN HIGH SUICIDAL REGIONS VERSUS THOSE IN LOW SUICIDAL REGIONS

    Get PDF
    Background: The suicide rate of the youth in South Korea has been increasing, and suicide of the youth still has been the most common cause of death since 2007. We aimed to determine the trends and the regional risk factors of youth suicide in South Korea from 2001 to 2010. Subjects and Methods: We used the data from the National Statistical Office to calculate the standardized suicide rates and various regional data including population census, employment, and labor. To calculate the effect of individual risk factors, we used the data from the fourth Korean Youth Risk Behavior Web-based Survey (KYRBWS-VI). Conditional autoregressive model for regional standardized mortality ratio (SMR) using inter-regional spatial information was fitted. Results: Suicide rates of adolescents aged 12 to 18 was from 3.5 per 100,000 people in 2001 and 5.3 per 100,000 in 2010. There were no significant gender difference in suicide rates, however, the number of suicides among adolescents aged 15-18 accounted for four times than those of adolescents ages 12-14. High proportion of late adolescents, higher number of recipients of national basic livelihood, and higher number of adolescents who treated with depression were related to elevated suicide rate of adolescent. Total sleep time of adolescents and regional unemployment rate were negatively associated with the suicide risk of respective regions. Conclusions: Age distribution, economic status, total sleep time, and the number of adolescent patients with depression were different between those in low and in high adolescent suicidal regions in Korea. Our findings suggest that preferential appliance of adolescent suicide prevention program for regions by considering those factors may be important steps to reduce adolescent suicide in Korea
    corecore