8 research outputs found
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition
Variations of human body skeletons may be considered as dynamic graphs, which
are generic data representation for numerous real-world applications. In this
paper, we propose a spatio-temporal graph convolution (STGC) approach for
assembling the successes of local convolutional filtering and sequence learning
ability of autoregressive moving average. To encode dynamic graphs, the
constructed multi-scale local graph convolution filters, consisting of matrices
of local receptive fields and signal mappings, are recursively performed on
structured graph data of temporal and spatial domain. The proposed model is
generic and principled as it can be generalized into other dynamic models. We
theoretically prove the stability of STGC and provide an upper-bound of the
signal transformation to be learnt. Further, the proposed recursive model can
be stacked into a multi-layer architecture. To evaluate our model, we conduct
extensive experiments on four benchmark skeleton-based action datasets,
including the large-scale challenging NTU RGB+D. The experimental results
demonstrate the effectiveness of our proposed model and the improvement over
the state-of-the-art.Comment: Accepted by AAAI 201
Optimization of a Deep-Learning Method Based on the Classification of Images Generated by Parameterized Deep Snap a Novel Molecular-Image-Input Technique for Quantitative Structure–Activity Relationship (QSAR) Analysis
Numerous chemical compounds are distributed around the world and may affect the homeostasis of the endocrine system by disrupting the normal functions of hormone receptors. Although the risks associated with these compounds have been evaluated by acute toxicity testing in mammalian models, the chronic toxicity of many chemicals remains due to high cost of the compounds and the testing, etc. However, computational approaches may be promising alternatives and reduce these evaluations. Recently, deep learning (DL) has been shown to be promising prediction models with high accuracy for recognition of images, speech, signals, and videos since it greatly benefits from large datasets. Recently, a novel DL-based technique called DeepSnap was developed to conduct QSAR analysis using three-dimensional images of chemical structures. It can be used to predict the potential toxicity of many different chemicals to various receptors without extraction of descriptors. DeepSnap has been shown to have a very high capacity in tests using Tox21 quantitative qHTP datasets. Numerous parameters must be adjusted to use the DeepSnap method but they have not been optimized. In this study, the effects of these parameters on the performance of the DL prediction model were evaluated in terms of the loss in validation as an indicator for evaluating the performance of the DL using the toxicity information in the Tox21 qHTP database. The relations of the parameters of DeepSnap such as (1) number of molecules per SDF split into (2) zoom factor percentage, (3) atom size for van der waals percentage, (4) bond radius, (5) minimum bond distance, and (6) bond tolerance, with the validation loss following quadratic function curves, which suggests that optimal thresholds exist to attain the best performance with these prediction models. Using the parameter values set with the best performance, the prediction model of chemical compounds for CAR agonist was built using 64 images, at 105° angle, with AUC of 0.791. Thus, based on these parameters, the proposed DeepSnap-DL approach will be highly reliable and beneficial to establish models to assess the risk associated with various chemicals