13 research outputs found

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Classification of Compact Polarimetric Synthetic Aperture Radar Images

    Get PDF
    The RADARSAT Constellation Mission (RCM) was launched in June 2019. RCM, in addition to dual-polarization (DP) and fully quad-polarimetric (QP) imaging modes, provides compact polarimetric (CP) mode data. A CP synthetic aperture radar (SAR) is a coherent DP system in which a single circular polarization is transmitted followed by the reception in two orthogonal linear polarizations. A CP SAR fully characterizes the backscattered field using the Stokes parameters, or equivalently, the complex coherence matrix. This is the main advantage of a CP SAR over the traditional (non-coherent) DP SAR. Therefore, designing scene segmentation and classification methods using CP complex coherence matrix data is advocated in this thesis. Scene classification of remotely captured images is an important task in monitoring the Earth's surface. The high-resolution RCM CP SAR data can be used for land cover classification as well as sea-ice mapping. Mapping sea ice formed in ocean bodies is important for ship navigation and climate change modeling. The Canadian Ice Service (CIS) has expert ice analysts who manually generate sea-ice maps of Arctic areas on a daily basis. An automated sea-ice mapping process that can provide detailed yet reliable maps of ice types and water is desirable for CIS. In addition to linear DP SAR data in ScanSAR mode (500km), RCM wide-swath CP data (350km) can also be used in operational sea-ice mapping of the vast expanses in the Arctic areas. The smaller swath coverage of QP SAR data (50km) is the reason why the use of QP SAR data is limited for sea-ice mapping. This thesis involves the design and development of CP classification methods that consist of two steps: an unsupervised segmentation of CP data to identify homogeneous regions (superpixels) and a labeling step where a ground truth label is assigned to each super-pixel. An unsupervised segmentation algorithm is developed based on the existing Iterative Region Growing using Semantics (IRGS) for CP data and is called CP-IRGS. The constituents of feature model and spatial context model energy terms in CP-IRGS are developed based on the statistical properties of CP complex coherence matrix data. The superpixels generated by CP-IRGS are then used in a graph-based labeling method that incorporates the global spatial correlation among super-pixels in CP data. The classifications of sea-ice and land cover types using test scenes indicate that (a) CP scenes provide improved sea-ice classification than the linear DP scenes, (b) CP-IRGS performs more accurate segmentation than that using only CP channel intensity images, and (c) using global spatial information (provided by a graph-based labeling approach) provides an improvement in classification accuracy values over methods that do not exploit global spatial correlation

    Sensor Independent Deep Learning for Detection Tasks with Optical Satellites

    Get PDF
    The design of optical satellite sensors varies widely, and this variety is mirrored in the data they produce. Deep learning has become a popular method for automating tasks in remote sensing, but currently it is ill-equipped to deal with this diversity of satellite data. In this work, sensor independent deep learning models are proposed, which are able to ingest data from multiple satellites without retraining. This strategy is applied to two tasks in remote sensing: cloud masking and crater detection. For cloud masking, a new dataset---the largest ever to date with respect to the number of scenes---is created for Sentinel-2. Combination of this with other datasets from the Landsat missions results in a state-of-the-art deep learning model, capable of masking clouds on a wide array of satellites, including ones it was not trained on. For small crater detection on Mars, a dataset is also produced, and state-of-the-art deep learning approaches are compared. By combining datasets from sensors with different resolutions, a highly accurate sensor independent model is trained. This is used to produce the largest ever database of crater detections for any solar system body, comprising 5.5 million craters across Isidis Planitia, Mars using CTX imagery. Novel geospatial statistical techniques are used to explore this database of small craters, finding evidence for large populations of distant secondary impacts. Across these problems, sensor independence is shown to offer unique benefits, both regarding model performance and scientific outcomes, and in the future can aid in many problems relating to data fusion, time series analysis, and on-board applications. Further work on a wider range of problems is needed to determine the generalisability of the proposed strategies for sensor independence, and extension from optical sensors to other kinds of remote sensing instruments could expand the possible applications of this new technique

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Human Activity Recognition (HAR) Using Wearable Sensors and Machine Learning

    Get PDF
    Humans engage in a wide range of simple and complex activities. Human Activity Recognition (HAR) is typically a classification problem in computer vision and pattern recognition, to recognize various human activities. Recent technological advancements, the miniaturization of electronic devices, and the deployment of cheaper and faster data networks have propelled environments augmented with contextual and real-time information, such as smart homes and smart cities. These context-aware environments, alongside smart wearable sensors, have opened the door to numerous opportunities for adding value and personalized services to citizens. Vision-based and sensory-based HAR find diverse applications in healthcare, surveillance, sports, event analysis, Human-Computer Interaction (HCI), rehabilitation engineering, occupational science, among others, resulting in significantly improved human safety and quality of life. Despite being an active research area for decades, HAR still faces challenges in terms of gesture complexity, computational cost on small devices, and energy consumption, as well as data annotation limitations. In this research, we investigate methods to sufficiently characterize and recognize complex human activities, with the aim to improving recognition accuracy, reducing computational cost and energy consumption, and creating a research-grade sensor data repository to advance research and collaboration. This research examines the feasibility of detecting natural human gestures in common daily activities. Specifically, we utilize smartwatch accelerometer sensor data and structured local context attributes and apply AI algorithms to determine the complex gesture activities of medication-taking, smoking, and eating. This dissertation is centered around modeling human activity and the application of machine learning techniques to implement automated detection of specific activities using accelerometer data from smartwatches. Our work stands out as the first in modeling human activity based on wearable sensors with a linguistic representation of grammar and syntax to derive clear semantics of complex activities whose alphabet comprises atomic activities. We apply machine learning to learn and predict complex human activities. We demonstrate the use of one of our unified models to recognize two activities using smartwatch: medication-taking and smoking. Another major part of this dissertation addresses the problem of HAR activity misalignment through edge-based computing at data origination points, leading to improved rapid data annotation, albeit with assumptions of subject fidelity in demarcating gesture start and end sections. Lastly, the dissertation describes a theoretical framework for the implementation of a library of shareable human activities. The results of this work can be applied in the implementation of a rich portal of usable human activity models, easily installable in handheld mobile devices such as phones or smart wearables to assist human agents in discerning daily living activities. This is akin to a social media of human gestures or capability models. The goal of such a framework is to domesticate the power of HAR into the hands of everyday users, as well as democratize the service to the public by enabling persons of special skills to share their skills or abilities through downloadable usable trained models

    The role of context in image annotation and recommendation

    Get PDF
    With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start)

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications
    corecore