6,429 research outputs found

    Automating image analysis by annotating landmarks with deep neural networks

    Full text link
    Image and video analysis is often a crucial step in the study of animal behavior and kinematics. Often these analyses require that the position of one or more animal landmarks are annotated (marked) in numerous images. The process of annotating landmarks can require a significant amount of time and tedious labor, which motivates the need for algorithms that can automatically annotate landmarks. In the community of scientists that use image and video analysis to study the 3D flight of animals, there has been a trend of developing more automated approaches for annotating landmarks, yet they fall short of being generally applicable. Inspired by the success of Deep Neural Networks (DNNs) on many problems in the field of computer vision, we investigate how suitable DNNs are for accurate and automatic annotation of landmarks in video datasets representative of those collected by scientists studying animals. Our work shows, through extensive experimentation on videos of hawkmoths, that DNNs are suitable for automatic and accurate landmark localization. In particular, we show that one of our proposed DNNs is more accurate than the current best algorithm for automatic localization of landmarks on hawkmoth videos. Moreover, we demonstrate how these annotations can be used to quantitatively analyze the 3D flight of a hawkmoth. To facilitate the use of DNNs by scientists from many different fields, we provide a self contained explanation of what DNNs are, how they work, and how to apply them to other datasets using the freely available library Caffe and supplemental code that we provide.https://arxiv.org/abs/1702.00583Published versio

    Stratified decision forests for accurate anatomical landmark localization in cardiac images

    Get PDF
    Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy

    Reducing “Structure from Motion”: a general framework for dynamic vision. 2. Implementation and experimental assessment

    Get PDF
    For pt.1 see ibid., p.933-42 (1998). A number of methods have been proposed in the literature for estimating scene-structure and ego-motion from a sequence of images using dynamical models. Despite the fact that all methods may be derived from a “natural” dynamical model within a unified framework, from an engineering perspective there are a number of trade-offs that lead to different strategies depending upon the applications and the goals one is targeting. We want to characterize and compare the properties of each model such that the engineer may choose the one best suited to the specific application. We analyze the properties of filters derived from each dynamical model under a variety of experimental conditions, assess the accuracy of the estimates, their robustness to measurement noise, sensitivity to initial conditions and visual angle, effects of the bas-relief ambiguity and occlusions, dependence upon the number of image measurements and their sampling rate

    Time-slice analysis of dyadic human activity

    Get PDF
    La reconnaissance d’activités humaines à partir de données vidéo est utilisée pour la surveillance ainsi que pour des applications d’interaction homme-machine. Le principal objectif est de classer les vidéos dans l’une des k classes d’actions à partir de vidéos entièrement observées. Cependant, de tout temps, les systèmes intelligents sont améliorés afin de prendre des décisions basées sur des incertitudes et ou des informations incomplètes. Ce besoin nous motive à introduire le problème de l’analyse de l’incertitude associée aux activités humaines et de pouvoir passer à un nouveau niveau de généralité lié aux problèmes d’analyse d’actions. Nous allons également présenter le problème de reconnaissance d’activités par intervalle de temps, qui vise à explorer l’activité humaine dans un intervalle de temps court. Il a été démontré que l’analyse par intervalle de temps est utile pour la caractérisation des mouvements et en général pour l’analyse de contenus vidéo. Ces études nous encouragent à utiliser ces intervalles de temps afin d’analyser l’incertitude associée aux activités humaines. Nous allons détailler à quel degré de certitude chaque activité se produit au cours de la vidéo. Dans cette thèse, l’analyse par intervalle de temps d’activités humaines avec incertitudes sera structurée en 3 parties. i) Nous présentons une nouvelle famille de descripteurs spatiotemporels optimisés pour la prédiction précoce avec annotations d’intervalle de temps. Notre représentation prédictive du point d’intérêt spatiotemporel (Predict-STIP) est basée sur l’idée de la contingence entre intervalles de temps. ii) Nous exploitons des techniques de pointe pour extraire des points d’intérêts afin de représenter ces intervalles de temps. iii) Nous utilisons des relations (uniformes et par paires) basées sur les réseaux neuronaux convolutionnels entre les différentes parties du corps de l’individu dans chaque intervalle de temps. Les relations uniformes enregistrent l’apparence locale de la partie du corps tandis que les relations par paires captent les relations contextuelles locales entre les parties du corps. Nous extrayons les spécificités de chaque image dans l’intervalle de temps et examinons différentes façons de les agréger temporellement afin de générer un descripteur pour tout l’intervalle de temps. En outre, nous créons une nouvelle base de données qui est annotée à de multiples intervalles de temps courts, permettant la modélisation de l’incertitude inhérente à la reconnaissance d’activités par intervalle de temps. Les résultats expérimentaux montrent l’efficience de notre stratégie dans l’analyse des mouvements humains avec incertitude.Recognizing human activities from video data is routinely leveraged for surveillance and human-computer interaction applications. The main focus has been classifying videos into one of k action classes from fully observed videos. However, intelligent systems must to make decisions under uncertainty, and based on incomplete information. This need motivates us to introduce the problem of analysing the uncertainty associated with human activities and move to a new level of generality in the action analysis problem. We also present the problem of time-slice activity recognition which aims to explore human activity at a small temporal granularity. Time-slice recognition is able to infer human behaviours from a short temporal window. It has been shown that temporal slice analysis is helpful for motion characterization and for video content representation in general. These studies motivate us to consider timeslices for analysing the uncertainty associated with human activities. We report to what degree of certainty each activity is occurring throughout the video from definitely not occurring to definitely occurring. In this research, we propose three frameworks for time-slice analysis of dyadic human activity under uncertainty. i) We present a new family of spatio-temporal descriptors which are optimized for early prediction with time-slice action annotations. Our predictive spatiotemporal interest point (Predict-STIP) representation is based on the intuition of temporal contingency between time-slices. ii) we exploit state-of-the art techniques to extract interest points in order to represent time-slices. We also present an accumulative uncertainty to depict the uncertainty associated with partially observed videos for the task of early activity recognition. iii) we use Convolutional Neural Networks-based unary and pairwise relations between human body joints in each time-slice. The unary term captures the local appearance of the joints while the pairwise term captures the local contextual relations between the parts. We extract these features from each frame in a time-slice and examine different temporal aggregations to generate a descriptor for the whole time-slice. Furthermore, we create a novel dataset which is annotated at multiple short temporal windows, allowing the modelling of the inherent uncertainty in time-slice activity recognition. All the three methods have been evaluated on TAP dataset. Experimental results demonstrate the effectiveness of our framework in the analysis of dyadic activities under uncertaint

    Bi-temporal 3D active appearance models with applications to unsupervised ejection fraction estimation

    Get PDF
    Rapid and unsupervised quantitative analysis is of utmost importance to ensure clinical acceptance of many examinations using cardiac magnetic resonance imaging (MRI). We present a framework that aims at fulfilling these goals for the application of left ventricular ejection fraction estimation in four-dimensional MRI. The theoretical foundation of our work is the generative two-dimensional Active Appearance Models by Cootes et al., here extended to bi-temporal, three-dimensional models. Further issues treated include correction of respiratory induced slice displacements, systole detection, and a texture model pruning strategy. Cross-validation carried out on clinical-quality scans of twelve volunteers indicates that ejection fraction and cardiac blood pool volumes can be estimated automatically and rapidly with accuracy on par with typical inter-observer variability. \u

    Generative Interpretation of Medical Images

    Get PDF

    FPGA-Based Portable Ultrasound Scanning System with Automatic Kidney Detection

    Get PDF
    Bedsides diagnosis using portable ultrasound scanning (PUS) offering comfortable diagnosis with various clinical advantages, in general, ultrasound scanners suffer from a poor signal-to-noise ratio, and physicians who operate the device at point-of-care may not be adequately trained to perform high level diagnosis. Such scenarios can be eradicated by incorporating ambient intelligence in PUS. In this paper, we propose an architecture for a PUS system, whose abilities include automated kidney detection in real time. Automated kidney detection is performed by training the Viola–Jones algorithm with a good set of kidney data consisting of diversified shapes and sizes. It is observed that the kidney detection algorithm delivers very good performance in terms of detection accuracy. The proposed PUS with kidney detection algorithm is implemented on a single Xilinx Kintex-7 FPGA, integrated with a Raspberry Pi ARM processor running at 900 MHz

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    From 3D Models to 3D Prints: an Overview of the Processing Pipeline

    Get PDF
    Due to the wide diffusion of 3D printing technologies, geometric algorithms for Additive Manufacturing are being invented at an impressive speed. Each single step, in particular along the Process Planning pipeline, can now count on dozens of methods that prepare the 3D model for fabrication, while analysing and optimizing geometry and machine instructions for various objectives. This report provides a classification of this huge state of the art, and elicits the relation between each single algorithm and a list of desirable objectives during Process Planning. The objectives themselves are listed and discussed, along with possible needs for tradeoffs. Additive Manufacturing technologies are broadly categorized to explicitly relate classes of devices and supported features. Finally, this report offers an analysis of the state of the art while discussing open and challenging problems from both an academic and an industrial perspective.Comment: European Union (EU); Horizon 2020; H2020-FoF-2015; RIA - Research and Innovation action; Grant agreement N. 68044
    corecore