91 research outputs found

    Multimodal Representations for Teacher-Guided Compositional Visual Reasoning

    Full text link
    Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce an answer. NMNs provide enhanced explainability compared to integrated models, allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs we propose to exploit features obtained by a large-scale cross-modal encoder. Also, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process

    Curriculum Learning for Compositional Visual Reasoning

    Full text link
    Visual Question Answering (VQA) is a complex task requiring large datasets and expensive training. Neural Module Networks (NMN) first translate the question to a reasoning path, then follow that path to analyze the image and provide an answer. We propose an NMN method that relies on predefined cross-modal embeddings to ``warm start'' learning on the GQA dataset, then focus on Curriculum Learning (CL) as a way to improve training and make a better use of the data. Several difficulty criteria are employed for defining CL methods. We show that by an appropriate selection of the CL method the cost of training and the amount of training data can be greatly reduced, with a limited impact on the final VQA accuracy. Furthermore, we introduce intermediate losses during training and find that this allows to simplify the CL strategy

    Iterative Search with Local Visual Features for Computer Assisted Plant Identification

    Get PDF
    To support computer assisted plant species identification in a realistic, uncontrolled picture-taking condition, we put forward an approach relying on local image features. It combines query by example and relevance feedback to support both the localization of potentially interesting image regions and the classification of these regions as representing or not the target species. We show that this approach is successful, and makes prior segmentation unnecessary

    Active SVM-based Relevance Feedback with Hybrid Visual and representation

    Get PDF
    Most of the available image databases have keyword annotations associated with the images, related to the image context or to the semantic interpretation of image content. Keywords and visual features provide complementary information, so using these sources of information together is an advantage in many applications. We address here the challenge of semantic gap reduction, through an active SVM-based relevance feedback method, jointly with a hybrid visual and conceptual content representation and retrieval. We first introduce a new feature vector, based on the keyword annotations available for the images, which makes use of conceptual information extracted from an external ontology and represented by ``core concepts''. We then present two improvements of the SVM-based relevance feedback mechanism: a new active learning selection criterion and the use of specific kernel functions that reduce the sensitivity of the SVM to scale. We evaluate the use of the proposed hybrid feature vector composed of keyword representations and the low level visual features in our SVM-based relevance feedback setting. Experiments show that the use of the keyword-based feature vectors provides a significant improvement in the quality of the results

    An exploration of diversified user strategies for image retrieval with relevance feedback

    Get PDF
    Given the difficulty of setting up large-scale experiments with real users, the comparison of content-based image retrieval methods using relevance feedback usually relies on the emulation of the user, following a single, well-prescribed strategy. Since the behavior of real users cannot be expected to comply to strict specifications, it is very important to evaluate the sensitiveness of the retrieval results to likely variations of users behavior. It is also important to find out whether some strategies help the system to perform consistently better, so as to promote their use. Two selection algorithms for relevance feedback based on support vector machines are compared here. In these experiments, the user is emulated according to eight significantly different strategies on four ground truth databases of different complexity. It is first found that the ranking of the two algorithms does not depend much on the selected strategy. Also, the ranking of the strategies appears to be relatively independent of the complexity of the ground truth databases, which allows to identify desirable characteristics in the behavior of the user

    Reducing the Redundancy in the Selection of Samples for SVM-based Relevance Feedback

    Get PDF
    In image retrieval with relevance feedback, the strategy employed by the system for selecting the images presented to the user at every feedback round has a strong effect on the transfer of information between the user and the system. Using SVMs, we put forward a new active learning selection strategy that minimizes redundancy between the images presented to the user and takes into account assumptions that are specific to the retrieval setting. Experiments on several image databases confirm the attractiveness of this selection strategy. We also find that insensitivity to the scale of the data is a desirable property for the SVMs employed as learners in relevance feedback and we show how to obtain such insensitivity by the use of specific kernel functions

    Speeding up active relevance feedback with approximate kNN retrieval for hyperplane queries

    Full text link
    In content-based image retrieval, relevance feedback (RF) is a prominent method for reducing the semantic gap between the low-level features describing the content and the usually higher-level meaning of user's target. Recent RF methods are able to identify complex target classes after relatively few feedback iterations. However, because the computational complexity of such methods is linear in the size of the database, retrieval can be quite slow on very large databases. To address this scalability issue for active learning-based RF, we put forward a method that consists in the construction of an index in the feature space associated to a kernel function and in performing approximate kNN hyperplane queries with this feature space index. The experimental evaluation performed on two image databases show that a significant speedup can be achieved at the expense of a limited increase in the number of feedback rounds

    In Silico Prediction of Estrogen Receptor Subtype Binding Affinity and Selectivity Using Statistical Methods and Molecular Docking with 2-Arylnaphthalenes and 2-Arylquinolines

    Get PDF
    Over the years development of selective estrogen receptor (ER) ligands has been of great concern to researchers involved in the chemistry and pharmacology of anticancer drugs, resulting in numerous synthesized selective ER subtype inhibitors. In this work, a data set of 82 ER ligands with ERα and ERÎČ inhibitory activities was built, and quantitative structure-activity relationship (QSAR) methods based on the two linear (multiple linear regression, MLR, partial least squares regression, PLSR) and a nonlinear statistical method (Bayesian regularized neural network, BRNN) were applied to investigate the potential relationship of molecular structural features related to the activity and selectivity of these ligands. For ERα and ERÎČ, the performances of the MLR and PLSR models are superior to the BRNN model, giving more reasonable statistical properties (ERα: for MLR, Rtr2 = 0.72, Qte2 = 0.63; for PLSR, Rtr2 = 0.92, Qte2 = 0.84. ERÎČ: for MLR, Rtr2 = 0.75, Qte2 = 0.75; for PLSR, Rtr2 = 0.98, Qte2 = 0.80). The MLR method is also more powerful than other two methods for generating the subtype selectivity models, resulting in Rtr2 = 0.74 and Qte2 = 0.80. In addition, the molecular docking method was also used to explore the possible binding modes of the ligands and a relationship between the 3D-binding modes and the 2D-molecular structural features of ligands was further explored. The results show that the binding affinity strength for both ERα and ERÎČ is more correlated with the atom fragment type, polarity, electronegativites and hydrophobicity. The substitutent in position 8 of the naphthalene or the quinoline plane and the space orientation of these two planes contribute the most to the subtype selectivity on the basis of similar hydrogen bond interactions between binding ligands and both ER subtypes. The QSAR models built together with the docking procedure should be of great advantage for screening and designing ER ligands with improved affinity and subtype selectivity property

    IRIM at TRECVID 2012: Semantic Indexing and Instance Search

    Get PDF
    International audienceThe IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried di erent fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.2378, which ranked us 4th out of 16 partici- pants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of in- stance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual re- sults and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to ob- tain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs
    • 

    corecore