6 research outputs found

    SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

    Full text link
    Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situations. To address such limits of neural solvers, we introduce the concept of a \emph{situation model}, which originates from psychology studies to represent the mental states of humans in problem-solving, and propose \emph{SMART}, which adopts attributed grammar as the representation of situation models for algebra story problems. Specifically, we first train an information extraction module to extract nodes, attributes, and relations from problem texts and then generate a parse graph based on a pre-defined attributed grammar. An iterative learning strategy is also proposed to improve the performance of SMART further. To rigorously study this task, we carefully curate a new dataset named \emph{ASP6.6k}. Experimental results on ASP6.6k show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability. To test these models' generalization capability, we also design an out-of-distribution (OOD) evaluation, in which problems are more complex than those in the training set. Our model exceeds state-of-the-art models by 17\% in the OOD evaluation, demonstrating its superior generalization ability

    Regularized Deep Network Learning For Multi-Label Visual Recognition

    Get PDF
    This dissertation is focused on the task of multi-label visual recognition, a fundamental task of computer vision. It aims to tell the presence of multiple visual classes from the input image, where the visual classes, such as objects, scenes, attributes, etc., are usually defined as image labels. Due to the prosperous deep networks, this task has been widely studied and significantly improved in recent years. However, it remains a challenging task due to appearance complexity of multiple visual contents co-occurring in one image. This research explores to regularize the deep network learning for multi-label visual recognition. First, an attention concentration method is proposed to refine the deep network learning for human attribute recognition, i.e., a challenging instance of multi-label visual recognition. Here the visual attention of deep networks, in terms of attention maps, is an imitation of human attention in visual recognition. Derived by the deep network with only label-level supervision, attention maps interpretively highlight areas indicating the most relevant regions that contribute most to the final network prediction. Based on the observation that human attributes are usually depicted by local image regions, the added attention concentration enhances the deep network learning for human attribute recognition by forcing the recognition on compact attribute-relevant regions. Second, inspired by the consistent relevance between a visual class and an image region, an attention consistency strategy is explored and enforced during deep network learning for human attribute recognition. Specifically, two kinds of attention consistency are studied in this dissertation, including the equivariance under spatial transforms, such as flipping, scaling and rotation, and the invariance between different networks for recognizing the same attribute from the same image. These two kinds of attention consistency are formulated as a unified attention consistency loss and combined with the traditional classification loss for network learning. Experiments on public datasets verify its effectiveness by achieving new state-of-the-art performance for human attribute recognition. Finally, to address the long-tailed category distribution of multi-label visual recognition, the collaborative learning between using uniform and re-balanced samplings is proposed for regularizing the network training. While the uniform sampling leads to relatively low performance on tail classes, re-balanced sampling can improve the performance on tail classes, but may also hurt the performance on head classes in network training due to label co-occurrence. This research proposes a new approach to train on both class-biased samplings in a collaborative way, resulting in performance improvement for both head and tail classes. Based on a two-branch network taking the uniform sampling and re-balanced sampling as the inputs, respectively, a cross-branch loss enforces consistency when the same input goes through the two branches. The experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art methods on long-tailed multi-label visual recognition
    corecore