Search CORE

2,771 research outputs found

Learning to Segment Breast Biopsy Whole Slide Images

Author: Bartlett Jamen
Elmore Joann
Mehta Sachin
Mercan Ezgi
Shapiro Linda
Weaver Donald
Publication venue
Publication date: 10/10/2017
Field of study

We trained and applied an encoder-decoder model to semantically segment breast biopsy images into biologically meaningful tissue labels. Since conventional encoder-decoder networks cannot be applied directly on large biopsy images and the different sized structures in biopsies present novel challenges, we propose four modifications: (1) an input-aware encoding block to compensate for information loss, (2) a new dense connection pattern between encoder and decoder, (3) dense and sparse decoders to combine multi-level features, (4) a multi-resolution network that fuses the results of encoder-decoders run on different resolutions. Our model outperforms a feature-based approach and conventional encoder-decoders from the literature. We use semantic segmentations produced with our model in an automated diagnosis task and obtain higher accuracies than a baseline approach that employs an SVM for feature-based segmentation, both using the same segmentation-based diagnostic features.Comment: Added more WSI images in appendi

arXiv.org e-Print Archive

Crossref

Road conditions monitoring using semantic segmentation of smartphone motion sensor data

Author: Mahmood Emad
Mejdoub Mahmoud
Zaghden Nizar
Publication venue: 'International University of Sarajevo'
Publication date: 24/05/2023
Field of study

Many studies and publications have been written about the use of moving object analysis to locate a specific item or replace a lost object in video sequences. Using semantic analysis, it could be challenging to pinpoint each meaning and follow the movement of moving objects. Some machine learning algorithms have turned to the right interpretation of photos or video recordings to communicate coherently. The technique converts visual patterns and features into visual language using dense and sparse optical flow algorithms. To semantically partition smartphone motion sensor data for any video categorization, using integrated bidirectional Long Short-Term Memory layers, this paper proposes a redesigned U-Net architecture. Experiments show that the proposed technique outperforms several existing semantic segmentation algorithms using z-axis accelerometer and z-axis gyroscope properties. The video sequence's numerous moving elements are synchronised with one another to follow the scenario. Also, the objective of this work is to assess the proposed model on roadways and other moving objects using five datasets (self-made dataset and the pothole600 dataset). After looking at the map or tracking an object, the results should be given together with the diagnosis of the moving object and its synchronization with video clips. The suggested model's goals were developed using a machine learning method that combines the validity of the results with the precision of finding the necessary moving parts. Python 3.7 platforms were used to complete the project since they are user-friendly and highly efficient platforms

Periodicals of Engineering and Natural Sciences (PEN - International University of Sarajevo)

Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Abstracting GIS Layers from Hyperspectral Imagery

Author: Howard Torsten E
Publication venue: AFIT Scholar
Publication date: 05/03/2009
Field of study

Modern warfare methods in the urban environment necessitates the use of multiple layers of sensors to manage the battle space. Hyperspectral imagers are one possible sensor modality to provide remotely sensed images that can be converted into Geographic Information Systems (GIS) layers. GIS layers abstract knowledge of roads, buildings, and scene content and contain shape files that outline and highlight scene features. Creating shape files is a labor-intensive and time-consuming process. The availability of shape files that reflect the current configuration of an area of interest significantly enhances Intelligence Preparation of the Battlespace (IPB). The solution presented in this thesis is a novel process to automate the creation of shape files by exploiting the spectral-spatial relationship of a hyperspectral image cube. It is assumed that “a-priori” endmember spectra, a spectral database, or specific scene knowledge is not available. The topological neighborhood of a Self Organizing Map (SOM) is segmented and used as a spectral filter to produce six initial object maps that are spatially processed with logical and morphological operations. A novel road finding algorithm connects road segments under significantly tree-occluded roadways into a contiguous road network. The manual abstraction of GIS shape files is improved into a semi-automated process. The resulting shape files are not susceptible to deviation from orthorectified imagery as they are produced directly from the hyperspectral imagery. The results are eight separate high-quality GIS layers (Vegetation, Non-Tree Vegetation, Trees, Fields, Buildings, Major Buildings, Roadways, and Parking Areas) that follow the terrain of the hyperspectral image and are separately and automatically labeled. Spatial processing improves layer accuracy from 85% to 94%. Significant layer accuracies include the “road network” at 93%, “buildings” at 97%, and “major buildings” at 98%

AFTI Scholar (Air Force Institute of Technology)

Segmentation-guided privacy preservation in visual surveillance monitoring

Author: Geryous Fares Daniel
Publication venue
Publication date: 13/06/2022
Field of study

Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2022, Director: Sergio Escalera Guerrero, Zenjie Li i Kamal Nasrollahi[en] Video surveillance has become a necessity to ensure safety and security. Today, with the advancement of technology, video surveillance has become more accessible and widely available. Furthermore, it can be useful in an enormous amount of applications and situations. For instance, it can be useful in ensuring public safety by preventing vandalism, robbery, and shoplifting. The same applies to more intimate situations, like home monitoring to detect unusual behavior of residents or in similar situations like hospitals and assisted living facilities. Thus, cameras are installed in public places like malls, metro stations, and on-roads for traffic control, as well as in sensitive settings like hospitals, embassies, and private homes. Video surveillance has always been as- sociated with the loss of privacy. Therefore, we developed a real-time visualization of privacy-protected video surveillance data by applying a segmentation mask to protect privacy while still being able to identify existing risk behaviors. This replaces existing privacy safeguards such as blanking, masking, pixelation, blurring, and scrambling. As we want to protect human personal data that are visual such as appearance, physical information, clothing, skin, eye and hair color, and facial gestures. Our main aim of this work is to analyze and compare the most successful deep-learning-based state-of-the-art approaches for semantic segmentation. In this study, we perform an efficiency-accuracy comparison to determine which segmentation methods yield accurate segmentation results while performing at the speed and execution required for real-life application scenarios. Furthermore, we also provide a modified dataset made from a combination of three existing datasets, COCO_stuff164K, PASCAL VOC 2012, and ADE20K, to make our comparison fair and generate privacyprotecting human segmentation masks

Diposit Digital de la Universitat de Barcelona

Instance Segmentation with Mask R-CNN Applied to Loose-Housed Dairy Cows in a Multi-Camera Setting

Author: Krieter Joachim
Salau Jennifer
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

With increasing herd sizes came an enhanced requirement for automated systems to support the farmers in the monitoring of the health and welfare status of their livestock. Cattle are a highly sociable species, and the herd structure has important impact on the animal welfare. As the behaviour of the animals and their social interactions can be influenced by the presence of a human observer, a camera based system that automatically detects the animals would be beneficial to analyse dairy cattle herd activity. In the present study, eight surveillance cameras were mounted above the barn area of a group of thirty-six lactating Holstein Friesian dairy cows at the Chamber of Agriculture in Futterkamp in Northern Germany. With Mask R-CNN, a state-of-the-art model of convolutional neural networks was trained to determine pixel level segmentation masks for the cows in the video material. The model was pre-trained on the Microsoft common objects in the context data set, and transfer learning was carried out on annotated image material from the recordings as training data set. In addition, the relationship between the size of the used training data set and the performance on the model after transfer learning was analysed. The trained model achieved averaged precision (Intersection over union, IOU = 0.5) 91% and 85% for the detection of bounding boxes and segmentation masks of the cows, respectively, thereby laying a solid technical basis for an automated analysis of herd activity and the use of resources in loose-housing

Multidisciplinary Digital Publishing Institute

MACAU: Open Access Repository of Kiel University

Automated artemia length measurement using U-shaped fully convolutional networks and second-order anisotropic Gaussian kernels

Author: De Baets Bernard
Van Stappen Gilbert
WANG Gang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The brine shrimp Artemia, a small crustacean zooplankton organism, is universally used as live prey for larval fish and shrimps in aquaculture. In Artemia studies, it would be highly desired to have access to automated techniques to obtain the length information from Anemia images. However, this problem has so far not been addressed in literature. Moreover, conventional image-based length measurement approaches cannot be readily transferred to measure the Artemia length, due to the distortion of non-rigid bodies, the variation over growth stages and the interference from the antennae and other appendages. To address this problem, we compile a dataset containing 250 images as well as the corresponding label maps of length measuring lines. We propose an automated Anemia length measurement method using U-shaped fully convolutional networks (UNet) and second-order anisotropic Gaussian kernels. For a given Artemia image, the designed UNet model is used to extract a length measuring line structure, and, subsequently, the second-order Gaussian kernels are employed to transform the length measuring line structure into a thin measuring line. For comparison, we also follow conventional fish length measurement approaches and develop a non-learning-based method using mathematical morphology and polynomial curve fitting. We evaluate the proposed method and the competing methods on 100 test images taken from the dataset compiled. Experimental results show that the proposed method can accurately measure the length of Artemia objects in images, obtaining a mean absolute percentage error of 1.16%

Ghent University Academic Bibliography

Pedestrian Attribute Recognition: A Survey

Author: Luo Bin
Tang Jin
Wang Xiao
Yang Rui
Zheng Shaofei
Publication venue
Publication date: 22/01/2019
Field of study

Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

arXiv.org e-Print Archive