43 research outputs found
Computing Lens for Exploring the Historical People's Social Network
A typical social research topic is to figure out the influential people's
relationship and its weights. It is very tedious for social scientists to solve
those problems by studying massive literature. Digital humanities bring a new
way to a social subject. In this paper, we propose a framework for social
scientists to find out ancient figures' power and their camp. The core of our
framework consists of signed graph model and novel group partition algorithm.
We validate and verify our solution by China Biographical Database Project
(CBDB) dataset. The analytic results on a case study demonstrate the
effectiveness of our framework, which gets information that consists with the
literature's facts and social scientists' viewpoints.Comment: accepted at SoNet 201
Unsupervised Domain Adaptive Detection with Network Stability Analysis
Domain adaptive detection aims to improve the generality of a detector,
learned from the labeled source domain, on the unlabeled target domain. In this
work, drawing inspiration from the concept of stability from the control theory
that a robust system requires to remain consistent both externally and
internally regardless of disturbances, we propose a novel framework that
achieves unsupervised domain adaptive detection through stability analysis. In
specific, we treat discrepancies between images and regions from different
domains as disturbances, and introduce a novel simple but effective Network
Stability Analysis (NSA) framework that considers various disturbances for
domain adaptation. Particularly, we explore three types of perturbations
including heavy and light image-level disturbances and instancelevel
disturbance. For each type, NSA performs external consistency analysis on the
outputs from raw and perturbed images and/or internal consistency analysis on
their features, using teacher-student models. By integrating NSA into Faster
R-CNN, we immediately achieve state-of-the-art results. In particular, we set a
new record of 52.7% mAP on Cityscapes-to-FoggyCityscapes, showing the potential
of NSA for domain adaptive detection. It is worth noticing, our NSA is designed
for general purpose, and thus applicable to one-stage detection model (e.g.,
FCOS) besides the adopted one, as shown by experiments.
https://github.com/tiankongzhang/NSA
Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis
Breast cancer death rates are higher than any other cancer in American women. Machine learning-based predictive models promise earlier detection techniques for breast cancer diagnosis. However, making an evaluation for models that efficiently diagnose cancer is still challenging. In this work, we proposed data exploratory techniques (DET) and developed four different predictive models to improve breast cancer diagnostic accuracy. Prior to models, four-layered essential DET, e.g., feature distribution, correlation, elimination, and hyperparameter optimization, were deep-dived to identify the robust feature classification into malignant and benign classes. These proposed techniques and classifiers were implemented on the Wisconsin Diagnostic Breast Cancer (WDBC) and Breast Cancer Coimbra Dataset (BCCD) datasets. Standard performance metrics, including confusion matrices and K-fold cross-validation techniques, were applied to assess each classifier’s efficiency and training time. The models’ diagnostic capability improved with our DET, i.e., polynomial SVM gained 99.3%, LR with 98.06%, KNN acquired 97.35%, and EC achieved 97.61% accuracy with the WDBC dataset. We also compared our significant results with previous studies in terms of accuracy. The implementation procedure and findings can guide physicians to adopt an effective model for a practical understanding and prognosis of breast cancer tumors.publishedVersio
Dual encoding for abstractive text summarization
Recurrent Neural Network (RNN) based sequence-to-sequence attentional models have proven effective in abstractive text summarization. In this paper, we model abstractive text summarization using a dual encoding model. Different from the previous works only using a single encoder, the proposed method employs a dual encoder including the primary and the secondary encoders. Specifically, the primary encoder conducts coarse encoding in a regular way, while the secondary encoder models the importance of words and generates more fine encoding based on the input raw text and the previously generated output text summarization. The two level encodings are combined and fed into the decoder to generate more diverse summary that can decrease repetition phenomenon for long sequence generation. The experimental results on two challenging datasets (i.e., CNN/DailyMail and DUC 2004) demonstrate that our dual encoding model performs against existing methods
Text with Knowledge Graph Augmented Transformer for Video Captioning
Video captioning aims to describe the content of videos using natural
language. Although significant progress has been made, there is still much room
to improve the performance for real-world applications, mainly due to the
long-tail words challenge. In this paper, we propose a text with knowledge
graph augmented transformer (TextKG) for video captioning. Notably, TextKG is a
two-stream transformer, formed by the external stream and internal stream. The
external stream is designed to absorb additional knowledge, which models the
interactions between the additional knowledge, e.g., pre-built knowledge graph,
and the built-in information of videos, e.g., the salient object regions,
speech transcripts, and video captions, to mitigate the long-tail words
challenge. Meanwhile, the internal stream is designed to exploit the
multi-modality information in videos (e.g., the appearance of video frames,
speech transcripts, and video captions) to ensure the quality of caption
results. In addition, the cross attention mechanism is also used in between the
two streams for sharing information. In this way, the two streams can help each
other for more accurate results. Extensive experiments conducted on four
challenging video captioning datasets, i.e., YouCookII, ActivityNet Captions,
MSRVTT, and MSVD, demonstrate that the proposed method performs favorably
against the state-of-the-art methods. Specifically, the proposed TextKG method
outperforms the best published results by improving 18.7% absolute CIDEr scores
on the YouCookII dataset.Comment: Accepted by CVPR202
Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently
Existing hand detection methods usually follow the pipeline of multiple
stages with high computation cost, i.e., feature extraction, region proposal,
bounding box regression, and additional layers for rotated region detection. In
this paper, we propose a new Scale Invariant Fully Convolutional Network
(SIFCN) trained in an end-to-end fashion to detect hands efficiently.
Specifically, we merge the feature maps from high to low layers in an iterative
way, which handles different scales of hands better with less time overhead
comparing to concatenating them simply. Moreover, we develop the Complementary
Weighted Fusion (CWF) block to make full use of the distinctive features among
multiple layers to achieve scale invariance. To deal with rotated hand
detection, we present the rotation map to get rid of complex rotation and
derotation layers. Besides, we design the multi-scale loss scheme to accelerate
the training process significantly by adding supervision to the intermediate
layers of the network. Compared with the state-of-the-art methods, our
algorithm shows comparable accuracy and runs a 4.23 times faster speed on the
VIVA dataset and achieves better average precision on Oxford hand detection
dataset at a speed of 62.5 fps.Comment: Accepted to AAAI201