125 research outputs found
Semantic Segmentation of Pathological Lung Tissue with Dilated Fully Convolutional Networks
Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial
for making treatment decisions, but can be challenging even for experienced
radiologists. The diagnostic procedure is based on the detection and
recognition of the different ILD pathologies in thoracic CT scans, yet their
manifestation often appears similar. In this study, we propose the use of a
deep purely convolutional neural network for the semantic segmentation of ILD
patterns, as the basic component of a computer aided diagnosis (CAD) system for
ILDs. The proposed CNN, which consists of convolutional layers with dilated
filters, takes as input a lung CT image of arbitrary size and outputs the
corresponding label map. We trained and tested the network on a dataset of 172
sparsely annotated CT scans, within a cross-validation scheme. The training was
performed in an end-to-end and semi-supervised fashion, utilizing both labeled
and non-labeled image regions. The experimental results show significant
performance improvement with respect to the state of the art
Expected exponential loss for gaze-based video and volume ground truth annotation
Many recent machine learning approaches used in medical imaging are highly
reliant on large amounts of image and ground truth data. In the context of
object segmentation, pixel-wise annotations are extremely expensive to collect,
especially in video and 3D volumes. To reduce this annotation burden, we
propose a novel framework to allow annotators to simply observe the object to
segment and record where they have looked at with a \$200 eye gaze tracker. Our
method then estimates pixel-wise probabilities for the presence of the object
throughout the sequence from which we train a classifier in semi-supervised
setting using a novel Expected Exponential loss function. We show that our
framework provides superior performances on a wide range of medical image
settings compared to existing strategies and that our method can be combined
with current crowd-sourcing paradigms as well.Comment: 9 pages, 5 figues, MICCAI 2017 - LABELS Worksho
Semantic Segmentation of Histopathological Slides for the Classification of Cutaneous Lymphoma and Eczema
Mycosis fungoides (MF) is a rare, potentially life threatening skin disease,
which in early stages clinically and histologically strongly resembles Eczema,
a very common and benign skin condition. In order to increase the survival
rate, one needs to provide the appropriate treatment early on. To this end, one
crucial step for specialists is the evaluation of histopathological slides
(glass slides), or Whole Slide Images (WSI), of the patients' skin tissue. We
introduce a deep learning aided diagnostics tool that brings a two-fold value
to the decision process of pathologists. First, our algorithm accurately
segments WSI into regions that are relevant for an accurate diagnosis,
achieving a Mean-IoU of 69% and a Matthews Correlation score of 83% on a novel
dataset. Additionally, we also show that our model is competitive with the
state of the art on a reference dataset. Second, using the segmentation map and
the original image, we are able to predict if a patient has MF or Eczema. We
created two models that can be applied in different stages of the diagnostic
pipeline, potentially eliminating life-threatening mistakes. The classification
outcome is considerably more interpretable than using only the WSI as the
input, since it is also based on the segmentation map. Our segmentation model,
which we call EU-Net, extends a classical U-Net with an EfficientNet-B7 encoder
which was pre-trained on the Imagenet dataset.Comment: Submitted to
https://link.springer.com/chapter/10.1007/978-3-030-52791-4_
Food category recognition using SURF and MSER local feature representation
Food object recognition has gained popularity in recent years. This can perhaps be attributed to its potential applications in fields such as nutrition and fitness. Recognizing food images however is a challenging task since various foods come in many shapes and sizes. Besides having unexpected deformities and texture, food images are also captured in differing lighting conditions and camera viewpoints. From a computer vision perspective, using global image features to train a supervised classifier might be unsuitable due to the complex nature of the food images. Local features on the other hand seem the better alternative since they are able to capture minute intricacies such as interest points and other intricate information. In this paper, two local features namely SURF (Speeded- Up Robust Feature) and MSER (Maximally Stable Extremal Regions) are investigated for food object recognition. Both features are computationally inexpensive and have shown to be effective local descriptors for complex images. Specifically, each feature is firstly evaluated separately. This is followed by feature fusion to observe whether a combined representation could better represent food images. Experimental evaluations using a Support Vector Machine classifier shows that feature fusion generates better recognition accuracy at 86.6%
Figure Text Extraction in Biomedical Literature
Background: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engin
Text detection in images and videos
The goal of a multimedia text extraction and recognition system is filling the gap between the already existing and mature technology of Optical Character Recognition and the new needs for textual information retrieval created by the spread of digital multimedia. A text extraction system from multimedia usually consists of the following four stages: spatial text detection, temporal text detection – tracking (for videos), image binarization – segmentation, character recognition.
In the framework of this PhD thesis we dealt with all the stages of a multimedia text extraction system, focusing though on the designing and development of techniques for the spatial detection of text in images and videos as well as methods for evaluating the corresponding result. The contribution of the thesis to the research area of multimedia text extraction lies to the proposition of generic methods for spatial detection of unconstrained text in images and videos regardless of their content, quality and resolution. In addition, two methods for the evaluation of the text detection result were proposed that deal successfully with the problems of the related literature. Each of them uses different criteria for the evaluation of the result while both of them are based on intuitively correct observations. Finally, a very efficient method was developed for the temporal detection of text which actually conduces to a better spatial detection while concurrently enhances the quality of the text image.Ο σκοπός ενός συστήματος εξαγωγής και αναγνώρισης χαρακτήρων σε πολυμεσικά έγγραφα έγκειται στη γεφύρωση του χάσματος μεταξύ της ήδη υπάρχουσας και ώριμης τεχνολογίας Οπτικής Αναγνώρισης Χαρακτήρων και των νέων αναγκών ανάκτησης πληροφορίας κειμένου που δημιουργούνται από την ραγδαία εξάπλωση των ψηφιακών πολυμέσων. Ένα σύστημα εξαγωγής πολυμεσικού κειμένου συνήθως αποτελείται από τα εξής τέσσερα στάδια: χωρικός εντοπισμός κειμένου, χρονικός εντοπισμός - παρακολούθηση κειμένου (σε βίντεο), δυαδική μετατροπή εικόνας κειμένου, κατάτμηση και αναγνώριση χαρακτήρων. Στα πλαίσια της συγκεκριμένης διδακτορικής διατριβής μελετήθηκαν ξεχωριστά όλα τα στάδια της εξαγωγής χαρακτήρων από πολυμέσα, δίνοντας μεγαλύτερη έμφαση στην ανάπτυξη τεχνικών για το χωρικό εντοπισμό κειμένου σε βίντεο και φωτογραφίες καθώς και σε τεχνικές αποτίμησης του αποτελέσματος. Η συνεισφορά της διατριβής έγκειται κυρίως στην πρόταση γενικευμένων μεθόδων για χωρικό εντοπισμό κειμένου κάθε είδους σε φωτογραφίες και βίντεο οποιασδήποτε ποιότητας, ανάλυσης και περιεχομένου. Στα πλαίσια της διατριβής προτάθηκαν επίσης δύο μέθοδοι αποτίμησης του χωρικού εντοπισμού κειμένου που αντιμετωπίζουν επιτυχώς μια σειρά προβλημάτων της σχετικής βιβλιογραφίας. Κάθε μια από αυτές χρησιμοποιεί διαφορετικό κριτήριο για την αξιολόγηση του αποτελέσματος ενώ και οι δύο βασίζονται σε διαισθητικά ορθές παρατηρήσεις. Τέλος, αναπτύχθηκε μια πολύ αποδοτική μέθοδος χρονικού εντοπισμού ακίνητου κειμένου σε βίντεο η οποία παράλληλα συμβάλλει στον καλύτερο χωρικό εντοπισμό αλλά και βελτιώνει την ποιότητα της εικόνας κειμένο
A Hybrid System for Text Detection in Video Frames
This paper proposes a hybrid system for text detection in video frames. The system consists of two main stages. In the first stage text regions are detected based on the edge map of the image leading in a high recall rate with minimum computation requirements. In the sequel, a refinement stage uses an SVM classifier trained on features obtained by a new Local Binary Pattern based operator which results in diminishing false alarms. Experimental results show the overall performance of the system that proves the discriminating ability of the proposed feature set
- …