52 research outputs found

    Multimodal Representations for Teacher-Guided Compositional Visual Reasoning

    Full text link
    Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce an answer. NMNs provide enhanced explainability compared to integrated models, allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs we propose to exploit features obtained by a large-scale cross-modal encoder. Also, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process

    Curriculum Learning for Compositional Visual Reasoning

    Full text link
    Visual Question Answering (VQA) is a complex task requiring large datasets and expensive training. Neural Module Networks (NMN) first translate the question to a reasoning path, then follow that path to analyze the image and provide an answer. We propose an NMN method that relies on predefined cross-modal embeddings to ``warm start'' learning on the GQA dataset, then focus on Curriculum Learning (CL) as a way to improve training and make a better use of the data. Several difficulty criteria are employed for defining CL methods. We show that by an appropriate selection of the CL method the cost of training and the amount of training data can be greatly reduced, with a limited impact on the final VQA accuracy. Furthermore, we introduce intermediate losses during training and find that this allows to simplify the CL strategy

    A Statistical Framework for Image Category Search from a Mental Picture

    Get PDF
    Image Retrieval; Relevance Feedback; Page Zero Problem; Mental Matching; Bayesian System; Statistical LearningStarting from a member of an image database designated the “query image,” traditional image retrieval techniques, for example search by visual similarity, allow one to locate additional instances of a target category residing in the database. However, in many cases, the query image or, more generally, the target category, resides only in the mind of the user as a set of subjective visual patterns, psychological impressions or “mental pictures.” Consequently, since image databases available today are often unstructured and lack reliable semantic annotations, it is often not obvious how to initiate a search session; this is the “page zero problem.” We propose a new statistical framework based on relevance feedback to locate an instance of a semantic category in an unstructured image database with no semantic annotations. A search session is initiated from a random sample of images. At each retrieval round the user is asked to select one image from among a set of displayed images – the one that is closest in his opinion to the target class. The matching is then “mental.” Performance is measured by the number of iterations necessary to display an image which satisfies the user, at which point standard techniques can be employed to display other instances. Our core contribution is a Bayesian formulation which scales to large databases. The two key components are a response model which accounts for the user's subjective perception of similarity and a display algorithm which seeks to maximize the flow of information. Experiments with real users and two databases of 20,000 and 60,000 images demonstrate the efficiency of the search process

    Active SVM-based Relevance Feedback with Hybrid Visual and representation

    Get PDF
    Most of the available image databases have keyword annotations associated with the images, related to the image context or to the semantic interpretation of image content. Keywords and visual features provide complementary information, so using these sources of information together is an advantage in many applications. We address here the challenge of semantic gap reduction, through an active SVM-based relevance feedback method, jointly with a hybrid visual and conceptual content representation and retrieval. We first introduce a new feature vector, based on the keyword annotations available for the images, which makes use of conceptual information extracted from an external ontology and represented by ``core concepts''. We then present two improvements of the SVM-based relevance feedback mechanism: a new active learning selection criterion and the use of specific kernel functions that reduce the sensitivity of the SVM to scale. We evaluate the use of the proposed hybrid feature vector composed of keyword representations and the low level visual features in our SVM-based relevance feedback setting. Experiments show that the use of the keyword-based feature vectors provides a significant improvement in the quality of the results

    Reducing the Redundancy in the Selection of Samples for SVM-based Relevance Feedback

    Get PDF
    In image retrieval with relevance feedback, the strategy employed by the system for selecting the images presented to the user at every feedback round has a strong effect on the transfer of information between the user and the system. Using SVMs, we put forward a new active learning selection strategy that minimizes redundancy between the images presented to the user and takes into account assumptions that are specific to the retrieval setting. Experiments on several image databases confirm the attractiveness of this selection strategy. We also find that insensitivity to the scale of the data is a desirable property for the SVMs employed as learners in relevance feedback and we show how to obtain such insensitivity by the use of specific kernel functions

    An exploration of diversified user strategies for image retrieval with relevance feedback

    Get PDF
    Given the difficulty of setting up large-scale experiments with real users, the comparison of content-based image retrieval methods using relevance feedback usually relies on the emulation of the user, following a single, well-prescribed strategy. Since the behavior of real users cannot be expected to comply to strict specifications, it is very important to evaluate the sensitiveness of the retrieval results to likely variations of users behavior. It is also important to find out whether some strategies help the system to perform consistently better, so as to promote their use. Two selection algorithms for relevance feedback based on support vector machines are compared here. In these experiments, the user is emulated according to eight significantly different strategies on four ground truth databases of different complexity. It is first found that the ranking of the two algorithms does not depend much on the selected strategy. Also, the ranking of the strategies appears to be relatively independent of the complexity of the ground truth databases, which allows to identify desirable characteristics in the behavior of the user

    On the beneficial effect of noise in vertex localization

    Get PDF
    A theoretical and experimental analysis related to the effect of noise in the task of vertex identication in unknown shapes is presented. Shapes are seen as real functions of their closed boundary. An alternative global perspective of curvature is examined providing insight into the process of noise- enabled vertex localization. The analysis reveals that noise facilitates in the localization of certain vertices. The concept of noising is thus considered and a relevant global method for localizing Global Vertices is investigated in relation to local methods under the presence of increasing noise. Theoretical analysis reveals that induced noise can indeed help localizing certain vertices if combined with global descriptors. Experiments with noise and a comparison to localized methods validate the theoretical results

    Acknowledgments

    No full text
    Image retrieval with active relevance feedback using both visual and keyword-based descriptor
    • …
    corecore