1,546 research outputs found

    Neural Architecture Search for Image Segmentation and Classification

    Get PDF
    Deep learning (DL) is a class of machine learning algorithms that relies on deep neural networks (DNNs) for computations. Unlike traditional machine learning algorithms, DL can learn from raw data directly and effectively. Hence, DL has been successfully applied to tackle many real-world problems. When applying DL to a given problem, the primary task is designing the optimum DNN. This task relies heavily on human expertise, is time-consuming, and requires many trial-and-error experiments. This thesis aims to automate the laborious task of designing the optimum DNN by exploring the neural architecture search (NAS) approach. Here, we propose two new NAS algorithms for two real-world problems: pedestrian lane detection for assistive navigation and hyperspectral image segmentation for biosecurity scanning. Additionally, we also introduce a new dataset-agnostic predictor of neural network performance, which can be used to speed-up NAS algorithms that require the evaluation of candidate DNNs

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Deep learning in crowd counting: A survey

    Get PDF
    Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research, RM60G0680GCRF, P202PF11;Sino‐UK Industrial Fund, RP202G0289LIAS, P202ED10, P202RE969Data Science Enhancement Fund, P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B

    Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

    Full text link
    Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.Comment: Under Revie

    Knowledge Distillation and Continual Learning for Optimized Deep Neural Networks

    Get PDF
    Over the past few years, deep learning (DL) has been achieving state-of-theart performance on various human tasks such as speech generation, language translation, image segmentation, and object detection. While traditional machine learning models require hand-crafted features, deep learning algorithms can automatically extract discriminative features and learn complex knowledge from large datasets. This powerful learning ability makes deep learning models attractive to both academia and big corporations. Despite their popularity, deep learning methods still have two main limitations: large memory consumption and catastrophic knowledge forgetting. First, DL algorithms use very deep neural networks (DNNs) with many billion parameters, which have a big model size and a slow inference speed. This restricts the application of DNNs in resource-constraint devices such as mobile phones and autonomous vehicles. Second, DNNs are known to suffer from catastrophic forgetting. When incrementally learning new tasks, the model performance on old tasks significantly drops. The ability to accommodate new knowledge while retaining previously learned knowledge is called continual learning. Since the realworld environments in which the model operates are always evolving, a robust neural network needs to have this continual learning ability for adapting to new changes

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    RepViT: Revisiting Mobile CNN From ViT Perspective

    Full text link
    Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with nearly 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M3, obtains 81.4\% accuracy with only 1.3ms latency. The code and trained models are available at \url{https://github.com/jameslahm/RepViT}.Comment: 9 pages, 7 figure

    Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

    Get PDF
    The vision models have experienced a paradigm shift from convolutional neural networks (CNNs) to transformers. Compared with convolutions, transformers can capture both short- and long-range dependencies, making them more adaptable for extensive datasets. However, this adaptability comes at a cost: vision transformers are data-hungry and prone to overfitting with limited training data, restricting their applications in various vision tasks. This thesis aims to mitigate these shortcomings through advancements in architectural design and training methodologies, encompassing a comprehensive assessment involving various vision tasks. We investigate the data-hungry nature of transformers due to their lack of inductive bias. Our proposed remedy involves the incorporation of convolution blocks with multi-head self-attention (MHSA) mechanisms within each transformer block. This integration injects the inductive bias into the architecture, formulating the ViTAE model. Moreover, we present an innovative self-supervised learning approach, RegionCL, which bolsters the training process by emphasizing local information via region swapping. What’s more, a ViTPose-G model, based on ViTAE-G, is introduced and demonstrates exceptional performance in pose estimation tasks across various datasets

    Ekologické a evoluční procesy určující strukturu sítí rostlin a opylovačů

    Get PDF
    Abtrakt Rozmnožování většiny druhů rostlin a potrava značné části diverzity živočichů na této planetě přímo závisí na vztazích mezi květy a opylovači. Donedávna se však převážná většina výzkumu opylování zaměřovala pouze na studium opylování konkrétních rostlin a jen málo pozornosti bylo věnováno celým společenstvům rostlin i opylovačů. V posledních desetiletích se však zaměření ekologie opylování posunulo díky zavedení konceptu opylovacích sítí. Tento koncept umožnil zabývat se opylováním v kontextu celého společenstva, poukázal na rozmanitost i komplexitu vztahů mezi rostlinami a jejich opylovači a otevřel řadu nových možností výzkumu těchto vztahů z pohledu jeho významu pro živočichy nebo z pohledu časové a prostorové dynamiky opylovacích interakcí. Přesto však dosud máme jen matné představy o tom, jaké procesy jsou zodpovědné za strukturu a dynamiku těchto sítí. Podoba opylovací sítě je formována jak ekologickými, tak evolučními procesy. Z ekologického pohledu hraje roli například to, jak se druhy v čase a prostoru potkávají nebo jak si jednotlivé taxony opylovačů vybírají mezi rostlinami v závislosti na kontextu prostředí, aktuálních potravních potřebách či nabídce květních zdrojů. Z evolučního pohledu je pak podoba sítě vztahů mezi rostlinami a opylovači určena tím, jak se druhy na sebe vzájemně...Associations between flowers and pollinators are responsible for reproduction of majority of plant species as well as food supply for substantial part of animal diversity on the Earth. Until recently, the studies on plant-pollinator relationship were focused predominantly on pollination of particular plant species, with only little or no accent on community perspective. In recent decades, however, pollination ecology shifted its focus rather to community context by introducing so called pollination networks. This approach allows us to view the ubiquity and complexity of the interactions between plants and their pollinators and it opened up many new opportunities to study the pollination from animal perspective or to access spatio-temporal variability in the interactions. However, we still have only limited insight into the processes driving the structure and dynamics of such networks. The assembly of plants, pollinators and their interactions are driven by various ecological as well as evolutionary processes. From the ecological point of view, species co-occurrence in time and space may affect the interactions, or species flexibility for various community contexts providing different food sources may play role. In the evolutionary perspective, species may have various co-adaptations due to their...Katedra zoologieDepartment of ZoologyPřírodovědecká fakultaFaculty of Scienc
    corecore