159 research outputs found

    Region-based representations of image and video: segmentation tools for multimedia services

    Get PDF
    This paper discusses region-based representations of image and video that are useful for multimedia services such as those supported by the MPEG-4 and MPEG-7 standards. Classical tools related to the generation of the region-based representations are discussed. After a description of the main processing steps and the corresponding choices in terms of feature spaces, decision spaces, and decision algorithms, the state of the art in segmentation is reviewed. Mainly tools useful in the context of the MPEG-4 and MPEG-7 standards are discussed. The review is structured around the strategies used by the algorithms (transition based or homogeneity based) and the decision spaces (spatial, spatio-temporal, and temporal). The second part of this paper proposes a partition tree representation of images and introduces a processing strategy that involves a similarity estimation step followed by a partition creation step. This strategy tries to find a compromise between what can be done in a systematic and universal way and what has to be application dependent. It is shown in particular how a single partition tree created with an extremely simple similarity feature can support a large number of segmentation applications: spatial segmentation, motion estimation, region-based coding, semantic object extraction, and region-based retrieval.Peer ReviewedPostprint (published version

    A comprehensive review of fruit and vegetable classification techniques

    Get PDF
    Recent advancements in computer vision have enabled wide-ranging applications in every field of life. One such application area is fresh produce classification, but the classification of fruit and vegetable has proven to be a complex problem and needs to be further developed. Fruit and vegetable classification presents significant challenges due to interclass similarities and irregular intraclass characteristics. Selection of appropriate data acquisition sensors and feature representation approach is also crucial due to the huge diversity of the field. Fruit and vegetable classification methods have been developed for quality assessment and robotic harvesting but the current state-of-the-art has been developed for limited classes and small datasets. The problem is of a multi-dimensional nature and offers significantly hyperdimensional features, which is one of the major challenges with current machine learning approaches. Substantial research has been conducted for the design and analysis of classifiers for hyperdimensional features which require significant computational power to optimise with such features. In recent years numerous machine learning techniques for example, Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Decision Trees, Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN) have been exploited with many different feature description methods for fruit and vegetable classification in many real-life applications. This paper presents a critical comparison of different state-of-the-art computer vision methods proposed by researchers for classifying fruit and vegetable

    Advanced Strategies for Robot Manipulators

    Get PDF
    Amongst the robotic systems, robot manipulators have proven themselves to be of increasing importance and are widely adopted to substitute for human in repetitive and/or hazardous tasks. Modern manipulators are designed complicatedly and need to do more precise, crucial and critical tasks. So, the simple traditional control methods cannot be efficient, and advanced control strategies with considering special constraints are needed to establish. In spite of the fact that groundbreaking researches have been carried out in this realm until now, there are still many novel aspects which have to be explored

    Evaluating Spatial Understanding of Large Language Models

    Full text link
    Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures, and compare these abilities to human performance on the same tasks. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. We also discover that, similar to humans, LLMs utilize object names as landmarks for maintaining spatial maps. Finally, in extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains

    自然環境下で撮影した作物時系列画像を用いた高速フェノタイピングに関する研究

    Get PDF
    学位の種別:課程博士University of Tokyo(東京大学

    Developing advanced mathematical models for detecting abnormalities in 2D/3D medical structures.

    Get PDF
    Detecting abnormalities in two-dimensional (2D) and three-dimensional (3D) medical structures is among the most interesting and challenging research areas in the medical imaging field. Obtaining the desired accurate automated quantification of abnormalities in medical structures is still very challenging. This is due to a large and constantly growing number of different objects of interest and associated abnormalities, large variations of their appearances and shapes in images, different medical imaging modalities, and associated changes of signal homogeneity and noise for each object. The main objective of this dissertation is to address these problems and to provide proper mathematical models and techniques that are capable of analyzing low and high resolution medical data and providing an accurate, automated analysis of the abnormalities in medical structures in terms of their area/volume, shape, and associated abnormal functionality. This dissertation presents different preliminary mathematical models and techniques that are applied in three case studies: (i) detecting abnormal tissue in the left ventricle (LV) wall of the heart from delayed contrast-enhanced cardiac magnetic resonance images (MRI), (ii) detecting local cardiac diseases based on estimating the functional strain metric from cardiac cine MRI, and (iii) identifying the abnormalities in the corpus callosum (CC) brain structure—the largest fiber bundle that connects the two hemispheres in the brain—for subjects that suffer from developmental brain disorders. For detecting the abnormal tissue in the heart, a graph-cut mathematical optimization model with a cost function that accounts for the object’s visual appearance and shape is used to segment the the inner cavity. The model is further integrated with a geometric model (i.e., a fast marching level set model) to segment the outer border of the myocardial wall (the LV). Then the abnormal tissue in the myocardium wall (also called dead tissue, pathological tissue, or infarct area) is identified based on a joint Markov-Gibbs random field (MGRF) model of the image and its region (segmentation) map that accounts for the pixel intensities and the spatial interactions between the pixels. Experiments with real in-vivo data and comparative results with ground truth (identified by a radiologist) and other approaches showed that the proposed framework can accurately detect the pathological tissue and can provide useful metrics for radiologists and clinicians. To estimate the strain from cardiac cine MRI, a novel method based on tracking the LV wall geometry is proposed. To achieve this goal, a partial differential equation (PDE) method is applied to track the LV wall points by solving the Laplace equation between the LV contours of each two successive image frames over the cardiac cycle. The main advantage of the proposed tracking method over traditional texture-based methods is its ability to track the movement and rotation of the LV wall based on tracking the geometric features of the inner, mid-, and outer walls of the LV. This overcomes noise sources that come from scanner and heart motion. To identify the abnormalities in the CC from brain MRI, the CCs are aligned using a rigid registration model and are segmented using a shape-appearance model. Then, they are mapped to a simple unified space for analysis. This work introduces a novel cylindrical mapping model, which is conformal (i.e., one to one transformation and bijective), that enables accurate 3D shape analysis of the CC in the cylindrical domain. The framework can detect abnormalities in all divisions of the CC (i.e., splenium, rostrum, genu and body). In addition, it offers a whole 3D analysis of the CC abnormalities instead of only area-based analysis as done by previous groups. The initial classification results based on the centerline length and CC thickness suggest that the proposed CC shape analysis is a promising supplement to the current techniques for diagnosing dyslexia. The proposed techniques in this dissertation have been successfully tested on complex synthetic and MR images and can be used to advantage in many of today’s clinical applications of computer-assisted medical diagnostics and intervention

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications
    corecore