799 research outputs found
FocusNet++: Attentive Aggregated Transformations for Efficient and Accurate Medical Image Segmentation
We propose a new residual block for convolutional neural networks and
demonstrate its state-of-the-art performance in medical image segmentation. We
combine attention mechanisms with group convolutions to create our group
attention mechanism, which forms the fundamental building block of our network,
FocusNet++. We employ a hybrid loss based on balanced cross entropy, Tversky
loss and the adaptive logarithmic loss to enhance the performance along with
fast convergence. Our results show that FocusNet++ achieves state-of-the-art
results across various benchmark metrics for the ISIC 2018 melanoma
segmentation and the cell nuclei segmentation datasets with fewer parameters
and FLOPs.Comment: Published at ISBI 202
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
Document image classification combining textual and visual features.
This research contributes to the problem of classifying document images. The main addition of this thesis is the exploitation of textual and visual features through an approach that uses Convolutional Neural Networks.
The study uses a combination of Optical Character Recognition and Natural Language Processing algorithms to extract and manipulate relevant text concepts from document images.
Such content information are embedded within document images, with the aim of adding elements which help to improve the classification results of a Convolutional Neural Network.
The experimental phase proves that the overall document classification accuracy of a Convolutional Neural Network trained using these text-augmented document images, is considerably higher than the one achieved by a similar model trained solely on classic document images, especially when different classes of documents share similar visual characteristics. The comparison between our method and state-of-the-art approaches demonstrates the effectiveness of combining visual and textual features.
Although this thesis is about document image classification, the idea of using textual and visual features is not restricted to this context and comes from the observation that textual and visual information are complementary and synergetic in many aspects
Pelee: A Real-Time Object Detection System on Mobile Devices
There has been a rising interest in running high-quality Convolutional Neural Network (CNN) models under strict constraints on memory and computational budget. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and NASNet-A. However, all these architectures are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. Meanwhile, there are few studies that combine efficient models with fast object detection algorithms. This research tries to explore the design of an efficient CNN architecture for both image classification tasks and object detection tasks. We propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy by 0.6% and 11% lower computational cost than MobileNet, the state-of-the-art efficient architecture. It is also important to point out that PeleeNet is of only 66% of the model size of MobileNet and 1/49 size of VGG.
We then propose a real-time object detection system on mobile devices. We combine PeleeNet with Single Shot MultiBox Detector (SSD) method and optimize the architecture for fast speed. Meanwhile, we port SSD to iOS and provide an optimized code implementation. Our proposed detection system, named Pelee, achieves 70.9% mAP on PASCAL VOC2007 dataset at the speed of 17 FPS on iPhone 6s and 23.6 FPS on iPhone 8. Compared to TinyYOLOv2, the most widely used computational efficient object detection system, our proposed Pelee is more accurate (70.9% vs. 57.1%), 2.88 times lower in computational cost and 2.92 times smaller in model size
Precise Proximal Femur Fracture Classification for Interactive Training and Surgical Planning
We demonstrate the feasibility of a fully automatic computer-aided diagnosis
(CAD) tool, based on deep learning, that localizes and classifies proximal
femur fractures on X-ray images according to the AO classification. The
proposed framework aims to improve patient treatment planning and provide
support for the training of trauma surgeon residents. A database of 1347
clinical radiographic studies was collected. Radiologists and trauma surgeons
annotated all fractures with bounding boxes, and provided a classification
according to the AO standard. The proposed CAD tool for the classification of
radiographs into types "A", "B" and "not-fractured", reaches a F1-score of 87%
and AUC of 0.95, when classifying fractures versus not-fractured cases it
improves up to 94% and 0.98. Prior localization of the fracture results in an
improvement with respect to full image classification. 100% of the predicted
centers of the region of interest are contained in the manually provided
bounding boxes. The system retrieves on average 9 relevant images (from the
same class) out of 10 cases. Our CAD scheme localizes, detects and further
classifies proximal femur fractures achieving results comparable to
expert-level and state-of-the-art performance. Our auxiliary localization model
was highly accurate predicting the region of interest in the radiograph. We
further investigated several strategies of verification for its adoption into
the daily clinical routine. A sensitivity analysis of the size of the ROI and
image retrieval as a clinical use case were presented.Comment: Accepted at IPCAI 2020 and IJCAR
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Influenced by the great success of deep learning via cloud computing and the
rapid development of edge chips, research in artificial intelligence (AI) has
shifted to both of the computing paradigms, i.e., cloud computing and edge
computing. In recent years, we have witnessed significant progress in
developing more advanced AI models on cloud servers that surpass traditional
deep learning models owing to model innovations (e.g., Transformers, Pretrained
families), explosion of training data and soaring computing capabilities.
However, edge computing, especially edge and cloud collaborative computing, are
still in its infancy to announce their success due to the resource-constrained
IoT scenarios with very limited algorithms deployed. In this survey, we conduct
a systematic review for both cloud and edge AI. Specifically, we are the first
to set up the collaborative learning mechanism for cloud and edge modeling with
a thorough review of the architectures that enable such mechanism. We also
discuss potentials and practical experiences of some on-going advanced edge AI
topics including pretraining models, graph neural networks and reinforcement
learning. Finally, we discuss the promising directions and challenges in this
field.Comment: 20 pages, Transactions on Knowledge and Data Engineerin
Im2Flow: Motion Hallucination from Static Images for Action Recognition
Existing methods to recognize actions in static images take the images at
their face value, learning the appearances---objects, scenes, and body
poses---that distinguish each action class. However, such models are deprived
of the rich dynamic structure and motions that also define human activity. We
propose an approach that hallucinates the unobserved future motion implied by a
single snapshot to help static-image action recognition. The key idea is to
learn a prior over short-term dynamics from thousands of unlabeled videos,
infer the anticipated optical flow on novel static images, and then train
discriminative models that exploit both streams of information. Our main
contributions are twofold. First, we devise an encoder-decoder convolutional
neural network and a novel optical flow encoding that can translate a static
image into an accurate flow map. Second, we show the power of hallucinated flow
for recognition, successfully transferring the learned motion into a standard
two-stream network for activity recognition. On seven datasets, we demonstrate
the power of the approach. It not only achieves state-of-the-art accuracy for
dense optical flow prediction, but also consistently enhances recognition of
actions and dynamic scenes.Comment: Published in CVPR 2018, project page:
http://vision.cs.utexas.edu/projects/im2flow
- …