151 research outputs found

    LSTM Pose Machines

    Full text link
    We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.Comment: Poster in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

    Nucleosome Positioning and Its Role in Gene Regulation in Yeast

    Get PDF
    Nucleosome, composed of a 147-bp segment of DNA helix wrapped around a histone protein octamer, serves as the basic unit of chromatin. Nucleosome positioning refers to the relative position of DNA double helix with respect to the histone octamer. The positioning has an important role in transcription, DNA replication and other DNA transactions since packing DNA into nucleosomes occludes the binding site of proteins. Moreover, the nucleosomes bear histone modifications thus having a profound effect in regulation. Nucleosome positioning and its roles are extensively studied in model organism yeast. In this chapter, nucleosome organization and its roles in gene regulation are reviewed. Typically, nucleosomes are depleted around transcription start sites (TSSs), resulting in a nucleosome-free region (NFR) that is flanked by two well-positioned H2A.Z-containing nucleosomes. The nucleosomes downstream of the TSS are equally spaced in a nucleosome array. DNA sequences, especially 10–11 bp periodicities of some specific dinucleotides, partly determine the nucleosome positioning. Nucleosome occupancy can be determined with high throughput sequencing techniques. Importantly, nucleosome positions are dynamic in different cell types and different environments. Histones depletions, histones mutations, heat shock and changes in carbon source will profoundly change nucleosome organization. In the yeast cells, upon mutating the histones, the nucleosomes change drastically at promoters and the highly expressed genes, such as ribosome genes, undergo more change. The changes of nucleosomes tightly associate the transcription initiation, elongation and termination. H2A.Z is contained in the +1 and −1 nucleosomes and thus in transcription. Chaperon Chz1 and elongation factor Spt16 function in H2A.Z deposition on chromatin. The chapter covers the basic concept of nucleosomes, nucleosome determinant, the techniques of mapping nucleosomes, nucleosome alteration upon stress and mutation, and Htz1 dynamics on chromatin

    Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method

    Full text link
    Feature alignment is the primary means of fusing multimodal data. We propose a feature alignment method that fully fuses multimodal information, which alternately shifts and expands feature information from different modalities to have a consistent representation in a feature space. The proposed method can robustly capture high-level interactions between features of different modalities, thus significantly improving the performance of multimodal learning. We also show that the proposed method outperforms other popular multimodal schemes on multiple tasks. Experimental evaluation of ETT and MIT-BIH-Arrhythmia, datasets shows that the proposed method achieves state of the art performance.Comment: 8 pages,7 figure

    Enhancing Deep Knowledge Tracing with Auxiliary Tasks

    Full text link
    Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interactions with intelligent tutoring systems. Recent studies have applied multiple types of deep neural networks to solve the KT problem. However, there are two important factors in real-world educational data that are not well represented. First, most existing works augment input representations with the co-occurrence matrix of questions and knowledge components\footnote{\label{ft:kc}A KC is a generalization of everyday terms like concept, principle, fact, or skill.} (KCs) but fail to explicitly integrate such intrinsic relations into the final response prediction task. Second, the individualized historical performance of students has not been well captured. In this paper, we proposed \emph{AT-DKT} to improve the prediction performance of the original deep knowledge tracing model with two auxiliary learning tasks, i.e., \emph{question tagging (QT) prediction task} and \emph{individualized prior knowledge (IK) prediction task}. Specifically, the QT task helps learn better question representations by predicting whether questions contain specific KCs. The IK task captures students' global historical performance by progressively predicting student-level prior knowledge that is hidden in students' historical learning interactions. We conduct comprehensive experiments on three real-world educational datasets and compare the proposed approach to both deep sequential KT models and non-sequential models. Experimental results show that \emph{AT-DKT} outperforms all sequential models with more than 0.9\% improvements of AUC for all datasets, and is almost the second best compared to non-sequential models. Furthermore, we conduct both ablation studies and quantitative analysis to show the effectiveness of auxiliary tasks and the superior prediction outcomes of \emph{AT-DKT}.Comment: Accepted at WWW'23: The 2023 ACM Web Conference, 202

    Online Map Vectorization for Autonomous Driving: A Rasterization Perspective

    Full text link
    Vectorized high-definition (HD) map is essential for autonomous driving, providing detailed and precise environmental information for advanced perception and planning. However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. Specifically, we introduce a new rasterization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios. Furthermore, we propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiable rasterization to vectorized outputs and then performs precise and geometry-aware supervision on rasterized HD maps. Notably, MapVR designs tailored rasterization strategies for various geometric shapes, enabling effective adaptation to a wide range of map elements. Experiments show that incorporating rasterization into map vectorization greatly enhances performance with no extra computational cost during inference, leading to more accurate map perception and ultimately promoting safer autonomous driving.Comment: [NeurIPS 2023
    • …
    corecore