5 research outputs found
Hierarchical Photo-Scene Encoder for Album Storytelling
In this paper, we propose a novel model with a hierarchical photo-scene
encoder and a reconstructor for the task of album storytelling. The photo-scene
encoder contains two sub-encoders, namely the photo and scene encoders, which
are stacked together and behave hierarchically to fully exploit the structure
information of the photos within an album. Specifically, the photo encoder
generates semantic representation for each photo while exploiting temporal
relationships among them. The scene encoder, relying on the obtained photo
representations, is responsible for detecting the scene changes and generating
scene representations. Subsequently, the decoder dynamically and attentively
summarizes the encoded photo and scene representations to generate a sequence
of album representations, based on which a story consisting of multiple
coherent sentences is generated. In order to fully extract the useful semantic
information from an album, a reconstructor is employed to reproduce the
summarized album representations based on the hidden states of the decoder. The
proposed model can be trained in an end-to-end manner, which results in an
improved performance over the state-of-the-arts on the public visual
storytelling (VIST) dataset. Ablation studies further demonstrate the
effectiveness of the proposed hierarchical photo-scene encoder and
reconstructor.Comment: 8 pages, 4 figure
Graph Neural Network-Aided Exploratory Learning for Community Detection with Unknown Topology
In social networks, the discovery of community structures has received
considerable attention as a fundamental problem in various network analysis
tasks. However, due to privacy concerns or access restrictions, the network
structure is often unknown, thereby rendering established community detection
approaches ineffective without costly network topology acquisition. To tackle
this challenge, we present META-CODE, a novel end-to-end solution for detecting
overlapping communities in networks with unknown topology via exploratory
learning aided by easy-to-collect node metadata. Specifically, META-CODE
consists of three iterative steps in addition to the initial network inference
step: 1) node-level community-affiliation embeddings based on graph neural
networks (GNNs) trained by our new reconstruction loss, 2) network exploration
via community affiliation-based node queries, and 3) network inference using an
edge connectivity-based Siamese neural network model from the explored network.
Through comprehensive evaluations using five real-world datasets, we
demonstrate that META-CODE exhibits (a) its superiority over benchmark
community detection methods, (b) empirical evaluations as well as theoretical
findings to see the effectiveness of our node query, (c) the influence of each
module, and (d) its computational efficiency.Comment: 15 pages, 8 figures, 5 tables; its conference version was presented
at the ACM International Conference on Information and Knowledge Management
(CIKM 2022
Adaptive Framework for Robust Visual Tracking
Visual tracking is a difficult and challenging problem, for numerous reasons such as small object size, pose angle variations, occlusion, and camera motion. Object tracking has many real-world applications such as surveillance systems, moving organs in medical imaging, and robotics. Traditional tracking methods lack a recovery mechanism that can be used in situations when the tracked objects drift away from ground truth. In this paper, we propose a novel framework for tracking moving objects based on a composite framework and a reporter mechanism. The composite framework tracks moving objects using different trackers and produces pairs of forward/backward tracklets. A robustness score is then calculated for each tracker using its forward/backward tracklet pair to find the most reliable moving object trajectory. The reporter serves as the recovery mechanism to correct the moving object trajectory when the robustness score is very low, mainly using a combination of particle filter and template matching. The proposed framework can handle partial and heavy occlusions; moreover, the structure of the framework enables integration of other user-specific trackers. Extensive experiments on recent benchmarks show that the proposed framework outperforms other current state-of-the-art trackers due to its powerful trajectory analysis and recovery mechanism; the framework improved the area under the curve from 68% to 70.8% on OTB-100 benchmark