1,751 research outputs found
Learning to Extract Coherent Summary via Deep Reinforcement Learning
Coherence plays a critical role in producing a high-quality summary from a
document. In recent years, neural extractive summarization is becoming
increasingly attractive. However, most of them ignore the coherence of
summaries when extracting sentences. As an effort towards extracting coherent
summaries, we propose a neural coherence model to capture the cross-sentence
semantic and syntactic coherence patterns. The proposed neural coherence model
obviates the need for feature engineering and can be trained in an end-to-end
fashion using unlabeled data. Empirical results show that the proposed neural
coherence model can efficiently capture the cross-sentence coherence patterns.
Using the combined output of the neural coherence model and ROUGE package as
the reward, we design a reinforcement learning method to train a proposed
neural extractive summarizer which is named Reinforced Neural Extractive
Summarization (RNES) model. The RNES model learns to optimize coherence and
informative importance of the summary simultaneously. Experimental results show
that the proposed RNES outperforms existing baselines and achieves
state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The
qualitative evaluation indicates that summaries produced by RNES are more
coherent and readable.Comment: 8 pages, 1 figure, presented at AAAI-201
Studies on deep learning approach in breast lesions detection and cancer diagnosis in mammograms
Breast cancer accounts for the largest proportion of newly diagnosed cancers in women recently. Early diagnosis of breast cancer can improve treatment outcomes and reduce mortality. Mammography is convenient and reliable, which is the most commonly used method for breast cancer screening. However, manual examinations are limited by the cost and experience of radiologists, which introduce a high false positive rate and false examination. Therefore, a high-performance computer-aided diagnosis (CAD) system is significant for lesions detection and cancer diagnosis. Traditional CADs for cancer diagnosis require a large number of features selected manually and remain a high false positive rate. The methods based on deep learning can automatically extract image features through the network, but their performance is limited by the problems of multicenter data biases, the complexity of lesion features, and the high cost of annotations. Therefore, it is necessary to propose a CAD system to improve the ability of lesion detection and cancer diagnosis, which is optimized for the above problems.
This thesis aims to utilize deep learning methods to improve the CADs' performance and effectiveness of lesion detection and cancer diagnosis. Starting from the detection of multi-type lesions using deep learning methods based on full consideration of characteristics of mammography, this thesis explores the detection method of microcalcification based on multiscale feature fusion and the detection method of mass based on multi-view enhancing. Then, a classification method based on multi-instance learning is developed, which integrates the detection results from the above methods, to realize the precise lesions detection and cancer diagnosis in mammography.
For the detection of microcalcification, a microcalcification detection network named MCDNet is proposed to overcome the problems of multicenter data biases, the low resolution of network inputs, and scale differences between microcalcifications. In MCDNet, Adaptive Image Adjustment mitigates the impact of multicenter biases and maximizes the input effective pixels. Then, the proposed pyramid network with shortcut connections ensures that the feature maps for detection contain more precise localization and classification information about multiscale objects. In the structure, trainable Weighted Feature Fusion is proposed to improve the detection performance of both scale objects by learning the contribution of feature maps in different stages. The experiments show that MCDNet outperforms other methods on robustness and precision. In case the average number of false positives per image is 1, the recall rates of benign and malignant microcalcification are 96.8% and 98.9%, respectively. MCDNet can effectively help radiologists detect microcalcifications in clinical applications.
For the detection of breast masses, a weakly supervised multi-view enhancing mass detection network named MVMDNet is proposed to solve the lack of lesion-level labels. MVMDNet can be trained on the image-level labeled dataset and extract the extra localization information by exploring the geometric relation between multi-view mammograms. In Multi-view Enhancing, Spatial Correlation Attention is proposed to extract correspondent location information between different views while Sigmoid Weighted Fusion module fuse diagnostic and auxiliary features to improve the precision of localization. CAM-based Detection module is proposed to provide detections for mass through the classification labels. The results of experiments on both in-house dataset and public dataset, [email protected] and [email protected] (recall rate@average number of false positive per image), demonstrate MVMDNet achieves state-of-art performances among weakly supervised methods and has robust generalization ability to alleviate the multicenter biases.
In the study of cancer diagnosis, a breast cancer classification network named CancerDNet based on Multi-instance Learning is proposed. CancerDNet successfully solves the problem that the features of lesions are complex in whole image classification utilizing the lesion detection results from the previous chapters. Whole Case Bag Learning is proposed to combined the features extracted from four-view, which works like a radiologist to realize the classification of each case. Low-capacity Instance Learning and High-capacity Instance Learning successfully integrate the detections of multi-type lesions into the CancerDNet, so that the model can fully consider lesions with complex features in the classification task. CancerDNet achieves the AUC of 0.907 and AUC of 0.925 on the in-house and the public datasets, respectively, which is better than current methods. The results show that CancerDNet achieves a high-performance cancer diagnosis.
In the works of the above three parts, this thesis fully considers the characteristics of mammograms and proposes methods based on deep learning for lesions detection and cancer diagnosis. The results of experiments on in-house and public datasets show that the methods proposed in this thesis achieve the state-of-the-art in the microcalcifications detection, masses detection, and the case-level classification of cancer and have a strong ability of multicenter generalization. The results also prove that the methods proposed in this thesis can effectively assist radiologists in making the diagnosis while saving labor costs
Weakly and Partially Supervised Learning Frameworks for Anomaly Detection
The automatic detection of abnormal events in surveillance footage is still a concern of the
research community. Since protection is the primary purpose of installing video surveillance systems, the monitoring capability to keep public safety, and its rapid response to
satisfy this purpose, is a significant challenge even for humans. Nowadays, human capacity has not kept pace with the increased use of surveillance systems, requiring much
supervision to identify unusual events that could put any person or company at risk, without ignoring the fact that there is a substantial waste of labor and time due to the extremely
low likelihood of occurring anomalous events compared to normal ones. Consequently,
the need for an automatic detection algorithm of abnormal events has become crucial in
video surveillance. Even being in the scope of various research works published in the last
decade, the state-of-the-art performance is still unsatisfactory and far below the required
for an effective deployment of this kind of technology in fully unconstrained scenarios.
Nevertheless, despite all the research done in this area, the automatic detection of abnormal events remains a challenge for many reasons. Starting by environmental diversity, the
complexity of movements resemblance in different actions, crowded scenarios, and taking into account all possible standard patterns to define a normal action is undoubtedly
difficult or impossible. Despite the difficulty of solving these problems, the substantive
problem lies in obtaining sufficient amounts of labeled abnormal samples, which concerning computer vision algorithms, is fundamental. More importantly, obtaining an extensive set of different videos that satisfy the previously mentioned conditions is not a
simple task. In addition to its effort and time-consuming, defining the boundary between
normal and abnormal actions is usually unclear.
Henceforward, in this work, the main objective is to provide several solutions to the
problems mentioned above, by focusing on analyzing previous state-of-the-art methods
and presenting an extensive overview to clarify the concepts employed on capturing normal and abnormal patterns. Also, by exploring different strategies, we were able to develop new approaches that consistently advance the state-of-the-art performance. Moreover, we announce the availability of a new large-scale first of its kind dataset fully annotated at the frame level, concerning a specific anomaly detection event with a wide diversity in fighting scenarios, that can be freely used by the research community. Along with
this document with the purpose of requiring minimal supervision, two different proposals
are described; the first method employs the recent technique of self-supervised learning
to avoid the laborious task of annotation, where the training set is autonomously labeled
using an iterative learning framework composed of two independent experts that feed
data to each other through a Bayesian framework. The second proposal explores a new
method to learn an anomaly ranking model in the multiple instance learning paradigm by
leveraging weakly labeled videos, where the training labels are done at the video-level. The
experiments were conducted in several well-known datasets, and our solutions solidly outperform the state-of-the-art. Additionally, as a proof-of-concept system, we also present the results of collected real-world simulations in different environments to perform a field
test of our learned models.A detecção automática de eventos anómalos em imagens de videovigilância permanece
uma inquietação por parte da comunidade científica. Sendo a proteção o principal
propósito da instalação de sistemas de vigilância, a capacidade de monitorização da segurança pública, e a sua rápida resposta para satisfazer essa finalidade, é uma adversidade
até para o ser humano. Nos dias de hoje, com o aumento do uso de sistemas de videovigilância, a capacidade humana não tem alcançado a cadência necessária, exigindo uma
supervisão exorbitante para a identificação de acontecimentos invulgares que coloquem
uma identidade ou sociedade em risco. O facto da probabilidade de se suceder um incidente ser extremamente reduzida comparada a eventualidades normais, existe um gasto
substancial de tempo de ofício. Consequentemente, a necessidade para um algorítmo de
detecção automática de incidentes tem vindo a ser crucial em videovigilância. Mesmo
sendo alvo de vários trabalhos científicos publicados na última década, o desempenho
do estado-da-arte continua insatisfatório e abaixo do requisitado para uma implementação eficiente deste tipo de tecnologias em ambientes e cenários totalmente espontâneos
e incontinentes. Porém, apesar de toda a investigação realizada nesta área, a automatização de detecção de incidentes é um desafio que perdura por várias razões. Começando
pela diversidade ambiental, a complexidade da semalhança entre movimentos de ações
distintas, cenários de multidões, e ter em conta todos os padrões para definir uma ação
normal, é indiscutivelmente difícil ou impossível. Não obstante a dificuldade de resolução
destes problemas, o obstáculo fundamental consiste na obtenção de um número suficiente
de instâncias classificadas anormais, considerando algoritmos de visão computacional é
essencial. Mais importante ainda, obter um vasto conjunto de diferentes vídeos capazes de
satisfazer as condições previamente mencionadas, não é uma tarefa simples. Em adição
ao esforço e tempo despendido, estabelecer um limite entre ações normais e anormais é
frequentemente indistinto.
Tendo estes aspetos em consideração, neste trabalho, o principal objetivo é providenciar diversas soluções para os problemas previamente mencionados, concentrando na
análise de métodos do estado-da-arte e apresentando uma visão abrangente dos mesmos
para clarificar os conceitos aplicados na captura de padrões normais e anormais. Inclusive, a exploração de diferentes estratégias habilitou-nos a desenvolver novas abordagens
que aprimoram consistentemente o desempenho do estado-da-arte. Por último, anunciamos a disponibilidade de um novo conjunto de dados, em grande escala, totalmente anotado ao nível da frame em relação à detecção de anomalias em um evento específico com
uma vasta diversidade em cenários de luta, podendo ser livremente utilizado pela comunidade científica. Neste documento, com o propósito de requerer o mínimo de supervisão,
são descritas duas propostas diferentes; O primeiro método põe em prática a recente técnica de aprendizagem auto-supervisionada para evitar a árdua tarefa de anotação, onde o
conjunto de treino é classificado autonomamente usando uma estrutura de aprendizagem
iterativa composta por duas redes neuronais independentes que fornecem dados entre si através de uma estrutura Bayesiana. A segunda proposta explora um novo método para
aprender um modelo de classificação de anomalias no paradigma multiple-instance learning manuseando vídeos fracamente anotados, onde a classificação do conjunto de treino
é feita ao nível do vídeo. As experiências foram concebidas em vários conjuntos de dados,
e as nossas soluções superam consolidamente o estado-da-arte. Adicionalmente, como
sistema de prova de conceito, apresentamos os resultados da execução do nosso modelo
em simulações reais em diferentes ambientes
Few-shot image classification : current status and research trends
Conventional image classification methods usually require a large number of training samples for the training model. However, in practical scenarios, the amount of available sample data is often insufficient, which easily leads to overfitting in network construction. Few-shot learning provides an effective solution to this problem and has been a hot research topic. This paper provides an intensive survey on the state-of-the-art techniques in image classification based on few-shot learning. According to the different deep learning mechanisms, the existing algorithms are di-vided into four categories: transfer learning based, meta-learning based, data augmentation based, and multimodal based methods. Transfer learning based methods transfer useful prior knowledge from the source domain to the target domain. Meta-learning based methods employ past prior knowledge to guide the learning of new tasks. Data augmentation based methods expand the amount of sample data with auxiliary information. Multimodal based methods use the information of the auxiliary modal to facilitate the implementation of image classification tasks. This paper also summarizes the few-shot image datasets available in the literature, and experimental results tested by some representative algorithms are provided to compare their performance and analyze their pros and cons. In addition, the application of existing research outcomes on few-shot image classification in different practical fields are discussed. Finally, a few future research directions are iden-tified. © 2022 by the authors. Licensee MDPI, Basel, Switzerland
The Challenges of Recognizing Offline Handwritten Chinese: A Technical Review
Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination with other domain knowledge, offline handwritten Chinese recognition has gained breakthroughs in methods and performance in recent years. However, there have yet to be articles that provide a technical review of this field since 2016. In light of this, this paper reviews the research progress and challenges of offline handwritten Chinese recognition based on traditional techniques, deep learning methods, methods combining deep learning with traditional techniques, and knowledge from other areas from 2016 to 2022. Firstly, it introduces the research background and status of handwritten Chinese recognition, standard datasets, and evaluation metrics. Secondly, a comprehensive summary and analysis of offline HCCR and offline HCTR approaches during the last seven years is provided, along with an explanation of their concepts, specifics, and performances. Finally, the main research problems in this field over the past few years are presented. The challenges still exist in offline handwritten Chinese recognition are discussed, aiming to inspire future research work
Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
This paper presents an empirical study to build relation extraction systems
in low-resource settings. Based upon recent pre-trained language models, we
comprehensively investigate three schemes to evaluate the performance in
low-resource settings: (i) different types of prompt-based methods with
few-shot labeled data; (ii) diverse balancing methods to address the
long-tailed distribution issue; (iii) data augmentation technologies and
self-training to generate more labeled in-domain data. We create a benchmark
with 8 relation extraction (RE) datasets covering different languages, domains
and contexts and perform extensive comparisons over the proposed schemes with
combinations. Our experiments illustrate: (i) Though prompt-based tuning is
beneficial in low-resource RE, there is still much potential for improvement,
especially in extracting relations from cross-sentence contexts with multiple
relational triples; (ii) Balancing methods are not always helpful for RE with
long-tailed distribution; (iii) Data augmentation complements existing
baselines and can bring much performance gain, while self-training may not
consistently achieve advancement to low-resource RE. Code and datasets are in
https://github.com/zjunlp/LREBench.Comment: Accepted to EMNLP 2022 (Findings) and the project website is
https://zjunlp.github.io/project/LREBench
- …