46 research outputs found

    Unconstrained Aerial Scene Recognition with Deep Neural Networks and a New Dataset

    Get PDF
    Aerial scene recognition is a fundamental research problem in interpreting high-resolution aerial imagery. Over the past few years, most studies focus on classifying an image into one scene category, while in real-world scenarios, it is more often that a single image contains multiple scenes. Therefore, in this paper, we investigate a more practical yet underexplored task---multi-scene recognition in single images. To this end, we create a large-scale dataset, called MultiScene dataset, composed of 100,000 unconstrained images each with multiple labels from 36 different scenes. Among these images, 14,000 of them are manually interpreted and assigned ground-truth labels, while the remaining images are provided with crowdsourced labels, which are generated from low-cost but noisy OpenStreetMap (OSM) data. By doing so, our dataset allows two branches of studies: 1) developing novel CNNs for multi-scene recognition and 2) learning with noisy labels. We experiment with extensive baseline models on our dataset to offer a benchmark for multi-scene recognition in single images. Aiming to expedite further researches, we will make our dataset and pre-trained models availabl

    FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification

    Get PDF
    Unmanned aerial vehicles (UAVs) are now widely applied to data acquisition due to its low cost and fast mobility. With the increasing volume of aerial videos, the demand for automatically parsing these videos is surging. To achieve this, current research mainly focuses on extracting a holistic feature with convolutions along both spatial and temporal dimensions. However, these methods are limited by small temporal receptive fields and cannot adequately capture long-term temporal dependencies that are important for describing complicated dynamics. In this article, we propose a novel deep neural network, termed Fusing Temporal relations and Holistic features for aerial video classification (FuTH-Net), to model not only holistic features but also temporal relations for aerial video classification. Furthermore, the holistic features are refined by the multiscale temporal relations in a novel fusion module for yielding more discriminative video representations. More specially, FuTH-Net employs a two-pathway architecture: 1) a holistic representation pathway to learn a general feature of both frame appearances and short-term temporal variations and 2) a temporal relation pathway to capture multiscale temporal relations across arbitrary frames, providing long-term temporal dependencies. Afterward, a novel fusion module is proposed to spatiotemporally integrate the two features learned from the two pathways. Our model is evaluated on two aerial video classification datasets, ERA and Drone-Action, and achieves the state-of-the-art results. This demonstrates its effectiveness and good generalization capacity across different recognition tasks (event classification and human action recognition). To facilitate further research, we release the code at https://gitlab.lrz.de/ai4eo/reasoning/futh-net

    Instance Segmentation of Buildings using Keypoints

    Get PDF
    Building segmentation is of great importance in the task of remote sensing imagery interpretation. However, the existing semantic segmentation and instance segmentation methods often lead to segmentation masks with blurred boundaries. In this paper, we propose a novel instance segmentation network for building segmentation in high-resolution remote sensing images. More specifically, we consider segmenting an individual building as detecting several keypoints. The detected keypoints are subsequently reformulated as a closed polygon, which is the semantic boundary of the building. By doing so, the sharp boundary of the building could be preserved. Experiments are conducted on selected Aerial Imagery for Roof Segmentation (AIRS) dataset, and our method achieves better performance in both quantitative and qualitative results with comparison to the state-of-the-art methods. Our network is a bottom-up instance segmentation method that could well preserve geometric details

    A combined light regime and carbon supply regulation strategy for microalgae-based sugar industry wastewater treatment and low-carbon biofuel production to realise a circular economy

    Get PDF
    The replacement of fossil fuels with clean and renewable biofuels is of both research and market interest for realising a circular economy. However, microalgae-based biofuels have shown promise as alternative low-carbon biofuels to other crop-based biofuels, some key obstacles in their production remain to be addressed, such as high costs and low lipid productivity. In this study, a Chlorella sp. CSH4 was cultivated using a combined light regime and carbon supply regulation strategy to enhance sugar industrial wastewater bioremediation, biomass accumulation and lipid production. Blue light irradiance of 200 ÎŒmol photons m -2 s-1 together with 10 g/L glucose and 9.2 g/L glycerol supply was found to effectively enhance the biomass accumulation and pollutant-removal capacity of Chlorella sp. during the growth phase and its lipid production during the stationary phase. Furthermore, the biodiesel properties of the lipid retrieved from Chlorella sp., as demonstrated by its fatty acid profile, were found to be suitable for commercial application. Possible mechanisms were explored to explain how this combined strategy caused this microalga to exhibit highly efficient biomass and lipid production together with efficient pollutant removal. Moreover, upscaled semi-continuous treatment using both sugar industry wastewater and negligible carbon sources (e.g., food waste hydrolysate and crude glycerol) with a mass balance analysis was conducted to initially validate the feasibility of applying our combined strategy for microalgae based wastewater treatment. In sum, this study demonstrated the feasibility of cultivating a microalga using a combined strategy comprising a light regime and carbon supply regulation to achieve both wastewater treatment and low-carbon biofuel production.peer-reviewe

    Prediction model of ocular metastasis from primary liver cancer: Machine learning‐based development and interpretation study

    Get PDF
    Background: Ocular metastasis (OM) is a rare metastatic site of primary liver cancer (PLC). The purpose of this study was to establish a clinical predictive model of OM in PLC patients based on machine learning (ML). Methods: We retrospectively collected the clinical data of 1540 PLC patients and divided it into a training set and an internal test set in a 7:3 proportion. PLC patients were divided into OM and non‐ocular metastasis (NOM) groups, and univariate logistic regression analysis was performed between the two groups. The variables with univariate logistic analysis p < 0.05 were selected for the ML model. We constructed six ML models, which were internally verified by 10‐fold cross‐validation. The prediction performance of each ML model was evaluated by receiver operating characteristic curves (ROCs). We also constructed a web calculator based on the optimal performance ML model to personalize the risk probability for OM. Results: Six variables were selected for the ML model. The extreme gradient boost (XGB) ML model achieved the optimal differential diagnosis ability, with an area under the curve (AUC) = 0.993, accuracy = 0.992, sensitivity = 0.998, and specificity = 0.984. Based on these results, an online web calculator was constructed by using the XGB ML model to help clinicians diagnose and treat the risk probability of OM in PLC patients. Finally, the Shapley additive explanations (SHAP) library was used to obtain the six most important risk factors for OM in PLC patients: CA125, ALP, AFP, TG, CA199, and CEA. Conclusion: We used the XGB model to establish a risk prediction model of OM in PLC patients. The predictive model can help identify PLC patients with a high risk of OM, provide early and personalized diagnosis and treatment, reduce the poor prognosis of OM patients, and improve the quality of life of PLC patients

    Interaction of Language Processing and Motor Skill in Children With Specific Language Impairment

    Get PDF
    As a result of the increasing use of unmanned aerial vehicles (UAVs), large volumes of aerial videos have been produced. It is unrealistic for humans to screen such big data and understand the contents. Hence, methodological research on the automatic understanding of UAV videos is of paramount importance (Figure 1). In this article, we introduce a novel problem of event recognition in unconstrained aerial videos in the remote sensing community and present the large-scale, human-annotated Event Recognition in Aerial Videos (ERA) data set, consisting of 2,864 videos, each with a label from 25 different classes corresponding to an event unfolding for five seconds. All these videos are collected from YouTube. The ERA data set is designed to have significant intra-class variation and interclass similarity and captures dynamic events in various circumstances and at dramatically various scales. Moreover, to offer a benchmark for this task, we extensively validate existing deep networks. We expect that the ERA data set will facilitate further progress in automatic aerial video comprehension. The data set and trained models can be downloaded from https://lcmou.github.io/ERA_Dataset/

    Global Message Passing in Networks via Task-driven Random Walks for Semantic Segmentation of Remote Sensing Images

    Get PDF
    The capability of globally modeling and reasoning about relations between image regions is crucial for complex scene understanding tasks such as semantic segmentation. Most current semantic segmentation methods fall back on deep convolutional neural networks (CNNs), while their use of convolutions with local receptive fields is typically inefficient at capturing long-range dependencies. Recent works on self-attention mechanisms and relational reasoning networks seek to address this issue by learning pairwise relations between each two entities and have showcased promising results. But such approaches have heavy computational and memory overheads, which is computationally infeasible for dense prediction tasks, particularly on large size images, i.e., aerial imagery. In this work, we propose an efficient method for global context modeling in which at each position, a sparse set of features, instead of all features, over the spatial domain are adaptively sampled and aggregated. We further devise a highly efficient instantiation of the proposed method, namely learning RANdom walK samplIng aNd feature aGgregation (RANKING). The proposed module is lightweight and general, which can be used in a plug-and-play fashion with the existing fully convolutional neural network (FCN) framework. To evaluate RANKING-equipped networks, we conduct experiments on two aerial scene parsing datasets, and the networks can achieve competitive results at significant low costs in terms of the computational and memory

    MultiScene: A Large-scale Dataset and Benchmark for Multiscene Recognition in Single Aerial Images

    No full text
    Aerial scene recognition is a fundamental research problem in interpreting high-resolution aerial imagery. Over the past few years, most studies focus on classifying an image into one scene category, while in real-world scenarios, it is more often that a single image contains multiple scenes. Therefore, in this paper, we investigate a more practical yet underexplored task -- multi-scene recognition in single images. To this end, we create a large-scale dataset, called MultiScene, composed of 100,000 unconstrained high-resolution aerial images. Considering that manually labeling such images is extremely arduous, we resort to low-cost annotations from crowdsourcing platforms, e.g., OpenStreetMap (OSM). However, OSM data might suffer from incompleteness and incorrectness, which introduce noise into image labels. To address this issue, we visually inspect 14,000 images and correct their scene labels, yielding a subset of cleanly-annotated images, named MultiScene-Clean. With it, we can develop and evaluate deep networks for multi-scene recognition using clean data. Moreover, we provide crowdsourced annotations of all images for the purpose of studying network learning with noisy labels. We conduct experiments with extensive baseline models on both MultiScene-Clean and MultiScene to offer benchmarks for multi-scene recognition in single images and learning from noisy labels for this task, respectively

    Engineered asymmetric diffractions of diagonal-line odd-symmetric phase gratings

    No full text
    A two-dimensional multi-element phase grating has been designed in terms of the offset refractive index to exhibit the spatially odd symmetry ( antisymmetry ) along one transparent diagonal line with an even number of rectangular elements while leaving other elements in a unit cell opaque. This grating can be engineered to attain a few intriguing phenomena of asymmetric diffraction, including the elimination of equally spaced oblique diffraction lines, the elimination of alternately crossed oblique diffraction lines, and the selection of equally spaced oblique diffraction lines. These phenomena of engineered asymmetric diffraction are well explained via destructive interference between transmitted field amplitudes from paired, dual-paired, and successive elements along the transparent diagonal line
    corecore