728 research outputs found

    A region-based image caption generator with refined descriptions

    Get PDF
    Describing the content of an image is a challenging task. To enable detailed description, it requires the detection and recognition of objects, people, relationships and associated attributes. Currently, the majority of the existing research relies on holistic techniques, which may lose details relating to important aspects in a scene. In order to deal with such a challenge, we propose a novel region-based deep learning architecture for image description generation. It employs a regional object detector, recurrent neural network (RNN)-based attribute prediction, and an encoder–decoder language generator embedded with two RNNs to produce refined and detailed descriptions of a given image. Most importantly, the proposed system focuses on a local based approach to further improve upon existing holistic methods, which relates specifically to image regions of people and objects in an image. Evaluated with the IAPR TC-12 dataset, the proposed system shows impressive performance and outperforms state-of-the-art methods using various evaluation metrics. In particular, the proposed system shows superiority over existing methods when dealing with cross-domain indoor scene images

    Methods for Detecting and Classifying Weeds, Diseases and Fruits Using AI to Improve the Sustainability of Agricultural Crops: A Review

    Get PDF
    The rapid growth of the world’s population has put significant pressure on agriculture to meet the increasing demand for food. In this context, agriculture faces multiple challenges, one of which is weed management. While herbicides have traditionally been used to control weed growth, their excessive and random use can lead to environmental pollution and herbicide resistance. To address these challenges, in the agricultural industry, deep learning models have become a possible tool for decision-making by using massive amounts of information collected from smart farm sensors. However, agriculture’s varied environments pose a challenge to testing and adopting new technology effectively. This study reviews recent advances in deep learning models and methods for detecting and classifying weeds to improve the sustainability of agricultural crops. The study compares performance metrics such as recall, accuracy, F1-Score, and precision, and highlights the adoption of novel techniques, such as attention mechanisms, single-stage detection models, and new lightweight models, which can enhance the model’s performance. The use of deep learning methods in weed detection and classification has shown great potential in improving crop yields and reducing adverse environmental impacts of agriculture. The reduction in herbicide use can prevent pollution of water, food, land, and the ecosystem and avoid the resistance of weeds to chemicals. This can help mitigate and adapt to climate change by minimizing agriculture’s environmental impact and improving the sustainability of the agricultural sector. In addition to discussing recent advances, this study also highlights the challenges faced in adopting new technology in agriculture and proposes novel techniques to enhance the performance of deep learning models. The study provides valuable insights into the latest advances and challenges in process systems engineering and technology for agricultural activities

    Recent Advances in Image Restoration with Applications to Real World Problems

    Get PDF
    In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included

    Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation

    Full text link
    The mainstream CNN-based remote sensing (RS) image semantic segmentation approaches typically rely on massive labeled training data. Such a paradigm struggles with the problem of RS multi-view scene segmentation with limited labeled views due to the lack of considering 3D information within the scene. In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR), for RS scene semantic segmentation with sparse labels (such as 4-6 labels per 100 images). We explore a new way of introducing multi-view 3D structure priors to the task for accurate and view-consistent semantic segmentation. The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene based on multi-view images. In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations. Different from previous methods that only consider 3D prior or 2D features, we incorporate additional 2D texture information and 3D prior by broadcasting CNN features to different point features along the sampled ray. To verify the effectiveness of the proposed method, we construct a challenging dataset containing six synthetic sub-datasets collected from the Carla platform and three real sub-datasets from Google Maps. Experiments show that the proposed method outperforms the CNN-based methods and the state-of-the-art INR-based segmentation methods in quantitative and qualitative metrics

    Internet of Underwater Things and Big Marine Data Analytics -- A Comprehensive Survey

    Full text link
    The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.Comment: 54 pages, 11 figures, 19 tables, IEEE Communications Surveys & Tutorials, peer-reviewed academic journa

    The role of communities of practice in shaping modernisation: A case study of change, persistence, and survival in the UK cockle-fishing industry 2011-2018.

    Get PDF
    Most Management and Organisational Studies research attempts to conceptualise Communities of Practice (CoP) within and between organisations. In contrast, with an empirical focus on a single community of practitioners experiencing what Salaman (1974) refers to as, locally structured occupational work, out of the organisation spotlight, this thesis responds to the call to join the conversation about occupational communities as an adjoining branch of CoP theory (Nicolini et al., 2022). The aim is to yield insights about the social interaction and mechanisms of coordination employed by practitioners engaged in modernisation. In other words, a local occupational community (Salaman, 1974) whose work has been transformed by mechanisation and technology. To achieve this aim, a combination of different conceptual aspects of CoP and occupational communities are amalgamated to theorise about a community of practitioners who share a specific work situation albeit with contrasting and competing reference points. Adopting an interpretative approach, combining observation-interview techniques, field data was collected using a scheme of qualitative methods which included unrehearsed questioning of participants using photographs and loosely planned observations. More specifically, while developing an ethnographic case study, called Leigh-on-Sea Cockle Fishery, a collective of independent shellfish merchants who harvest cockle beds along the estuary of the river Thames were observed whilst undertaking their ordinary work of commercial shellfishing. The results are the product of observational analysis of the same group of participants over several annual fishing seasons (2011-2018). As such, the findings reveal a rich cultural description of the everyday work and drama that typifies small-scale fisheries in the UK. The research shows that whilst the effect of modernisation on communities and their practice may well be transformational, the process of modernisation typically involves many intermediate steps. The findings also indicate that modernisation has become a salient element in the self-image of the UK shellfish merchant. Moreover, in the context of CoP and modernisation, but with an alternative formulation of occupational CoP, this study asserts that licence and mandate constitute a proprietary attribute of, to use Wenger’s (1998) term, a community’s shared repertoire. By liberating CoP from the conventional context in which they are enacted, namely organisations, the characterisation of occupational CoP as outlined in this study provide an alternative template for theorising about the dynamics of learning and/in work. Or, to make this point more strategically, because of synthesising two adjacent literatures (CoP and occupational communities) this thesis can offer a nuanced theoretical perspective (Thatcher and Fisher, 2022) on divergent types of communities and their work practice which, in turn may energise Management and Organisation scholars

    Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments

    Get PDF
    Traditionally, recognition systems were only based on human hard biometrics. However, the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from far distances, without people attendance in the acquisition process. Highresolution face closeshots are rarely available at far distances such that facebased systems cannot provide reliable results in surveillance applications. Human soft biometrics such as body and clothing attributes are believed to be more effective in analyzing human data collected by security cameras. This thesis contributes to the human soft biometric analysis in uncontrolled environments and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification (reid). We first review the literature of both tasks and highlight the history of advancements, recent developments, and the existing benchmarks. PAR and person reid difficulties are due to significant distances between intraclass samples, which originate from variations in several factors such as body pose, illumination, background, occlusion, and data resolution. Recent stateoftheart approaches present endtoend models that can extract discriminative and comprehensive feature representations from people. The correlation between different regions of the body and dealing with limited learning data is also the objective of many recent works. Moreover, class imbalance and correlation between human attributes are specific challenges associated with the PAR problem. We collect a large surveillance dataset to train a novel gender recognition model suitable for uncontrolled environments. We propose a deep residual network that extracts several posewise patches from samples and obtains a comprehensive feature representation. In the next step, we develop a model for multiple attribute recognition at once. Considering the correlation between human semantic attributes and class imbalance, we respectively use a multitask model and a weighted loss function. We also propose a multiplication layer on top of the backbone features extraction layers to exclude the background features from the final representation of samples and draw the attention of the model to the foreground area. We address the problem of person reid by implicitly defining the receptive fields of deep learning classification frameworks. The receptive fields of deep learning models determine the most significant regions of the input data for providing correct decisions. Therefore, we synthesize a set of learning data in which the destructive regions (e.g., background) in each pair of instances are interchanged. A segmentation module determines destructive and useful regions in each sample, and the label of synthesized instances are inherited from the sample that shared the useful regions in the synthesized image. The synthesized learning data are then used in the learning phase and help the model rapidly learn that the identity and background regions are not correlated. Meanwhile, the proposed solution could be seen as a data augmentation approach that fully preserves the label information and is compatible with other data augmentation techniques. When reid methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most importance in the final feature representation. Clothbased representations are not reliable in the longterm reid settings as people may change their clothes. Therefore, developing solutions that ignore clothing cues and focus on identityrelevant features are in demand. We transform the original data such that the identityrelevant information of people (e.g., face and body shape) are removed, while the identityunrelated cues (i.e., color and texture of clothes) remain unchanged. A learned model on the synthesized dataset predicts the identityunrelated cues (shortterm features). Therefore, we train a second model coupled with the first model and learns the embeddings of the original data such that the similarity between the embeddings of the original and synthesized data is minimized. This way, the second model predicts based on the identityrelated (longterm) representation of people. To evaluate the performance of the proposed models, we use PAR and person reid datasets, namely BIODI, PETA, RAP, Market1501, MSMTV2, PRCC, LTCC, and MIT and compared our experimental results with stateoftheart methods in the field. In conclusion, the data collected from surveillance cameras have low resolution, such that the extraction of hard biometric features is not possible, and facebased approaches produce poor results. In contrast, soft biometrics are robust to variations in data quality. So, we propose approaches both for PAR and person reid to learn discriminative features from each instance and evaluate our proposed solutions on several publicly available benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
    • …
    corecore