36 research outputs found

    Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

    Full text link
    We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multiple different VQA models. We propose to cluster the entropy values of the predicted answer distributions obtained by three different models: a baseline method that takes as input images and questions, and two variants that take as input images only and questions only. We use a simple k-means to cluster the visual questions of the VQA v2 validation set. Then we use state-of-the-art methods to determine the accuracy and the entropy of the answer distributions for each cluster. A benefit of the proposed method is that no annotation of the difficulty is required, because the accuracy of each cluster reflects the difficulty of visual questions that belong to it. Our approach can identify clusters of difficult visual questions that are not answered correctly by state-of-the-art methods. Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy. We show that our approach has the advantage of being able to assess the difficulty of visual questions without ground-truth (i.e. the test set of VQA v2) by assigning them to one of the clusters. We expect that this can stimulate the development of novel directions of research and new algorithms. Clustering results are available online at https://github.com/tttamaki/vqd .Comment: accepted by IEEE access available at https://doi.org/10.1109/ACCESS.2020.3022063 as "An Entropy Clustering Approach for Assessing Visual Question Difficulty

    Can R^n estimate a rotation matrix R more accurately than R ? : A method for estimating a rotation matrix R by using R,R^2,R^3,..., obtained by an one-shot measurement

    Get PDF
    本論文では、3×3 回転行列Rをより高精度に推定するために、高次の回転行列R^2,R^3,...,R^nを用いる手法を提案する。まず、電磁波測距に基づいて2×2回転行列R の推定をR^2;R^3...などを用いて行う手法について述べる。そして、その手法を3×3回転行列の角度推定のために、次のように定式化する。つまり、もしノイズを含む観測行列R,R^2,...,R^n が与えられた場合、それらから適切にRを推定する。提案手法では、まず与えられた観測行列を直交化により回転行列に変換する。次に固有値分解により回転軸と回転量を求める。最後に回転量の不定性を除去する。数値実験と、物体の姿勢推定実験により、Rnを用いることでR単独よりも高精度に回転行列を推定できることを実証するIn this paper, we show that a more accurate estimation of a 3*3 rotation matrix R can be achieved by appropriately decomposing higher-order rotation matrices: R^2,R^3, and so on. First we discuss an angle estimation of a 2*2 rotation matrix inspired by the Electronic Distance Measurement. Then we reformulate the problem fora 3*3 rotation matrix: if noise-contaminated measurement matrices R,R^2,...,R^n are given, find an appropriate rotation matrix R. In the proposed method, the given measurement matrices are first transformed to rotation matrices by using the polar decomposition. Then the rotation angles are obtained by using an eigen decomposition of the rotation matrices. Finally, the ambiguity of the obtained rotation angle is removed. Numerical simulations and pose estimation experiments show that the use of R^n results in more accurate estimates than when R itself is used

    Experimental study on properties of pose representations for 3DOF linear pose estimations

    Get PDF
    MIRU 2009 第12回 画像の認識・理解シンポジウム ポスター資料 ; 開催場所 : くにびきメッセ, 松江 ; 開催日時:2009年7月20日~7月22
    corecore