130 research outputs found

    Effective Uni-Modal to Multi-Modal Crowd Estimation based on Deep Neural Networks

    Get PDF
    Crowd estimation is a vital component of crowd analysis. It finds many applications in real-worldscenarios, e.g. huge gatherings management like Hajj, sporting and musical events, or political rallies. Automated crowd counting facilitates better and effective management of such events and consequently prevents any undesired situation. This is a very challenging problem in practice since there exists a significant difference in the crowd number in and across different images, varying image resolution, large perspective, severe occlusions, and dense crowd-like cluttered background regions. Current approaches do not handle huge crowd diversity well and thus perform poorly in cases ranging from extreme low to high crowd-density, thus, yielding huge crowd underestimation or overestimation. Also, manual crowd counting proves to be infeasible due to very slow and inaccurate results. To address these major crowd counting issues and challenges, we investigate two different types of input data: uni-modal (image) and multi-modal (image and audio). In the uni-modal setting, we propose and analyze four novel end-to-end crowd counting networks, ranging from multi-scale fusion-based models to uni-scale one-pass and two-pass multitask networks. The multi-scale networks employ the attention mechanism to enhance the model efficacy. On the other hand, the uni-scale models are well-equipped with novel and simple-yet effective patch re-scaling module (PRM) that functions identical but is more lightweight than multi-scale approaches. Experimental evaluation demonstrates that the proposed networks outperform the state-of-the-art in majority cases on four different benchmark datasets with up to 12.6% improvement for the RMSE evaluation metric. The better cross-dataset performance also validates the better generalization ability of our schemes. For the multi-modal input, effective feature-extraction (FE) and strong information fusion between two modalities remain a big challenge. Thus, the multi-modal novel network design focuses on investigating different features fusion techniques amid improving the FE. Based on the comprehensive experimental evaluation, the proposed multi-modal network increases the performance under all standard evaluation criteria with up to 33.8% improvement in comparison to the state-of-the-art. The application of multi-scale uni-modal attention networks also proves more effective in other deep learning domains, as demonstrated successfully on seven different scene-text recognition task datasets with better performance

    Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher

    Full text link
    Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of instances being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic. To this end, the paper concentrates on access to tackle the chaos of the scale distribution incurred by intrinsic scale shift. We propose Gaussian Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely, the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution and decouples the mixture model into sub-normal distributions to regularize the chaos within the sub-distributions. Then, an alignment is introduced to regularize the chaos among sub-distributions. However, despite that GMS is effective in regularizing the data distribution, it amounts to dislodging the hard samples in training set, which incurs overfitting. We assert that it is blamed on the block of transferring the latent knowledge exploited by GMS from data to model. Therefore, a Scoped Teacher playing a role of bridge in knowledge transform is proposed. What' s more, the consistency regularization is also introduced to implement knowledge transform. To that effect, the further constraints are deployed on Scoped Teacher to derive feature consistence between teacher and student end. With proposed GMS and Scoped Teacher implemented on five mainstream datasets of crowd localization, the extensive experiments demonstrate the superiority of our work. Moreover, comparing with existing crowd locators, our work achieves state-of-the-art via F1-meansure comprehensively on five datasets.Comment: Accepted by IEEE TI

    Deep learning in crowd counting: A survey

    Get PDF
    Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research, RM60G0680GCRF, P202PF11;Sino‐UK Industrial Fund, RP202G0289LIAS, P202ED10, P202RE969Data Science Enhancement Fund, P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B

    Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations

    Get PDF
    Colorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists. The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification. We have also compared the performance of eight state-of-the-art deep learning-based object detection models. The results demonstrate that deep CNN models are promising in CRC screening. This work can serve as a baseline for future research in polyp detection and classification

    MolecularRift, a Gesture Based Interaction Tool for Controlling Molecules in 3-D

    Get PDF
    Visualization of molecular models is a vital part in modern drug design. Improved visualization methods increases the conceptual understanding and enables faster and better decision making. The introduction of virtual reality goggles such as Oculus Rift has introduced new opportunities for the capabilities of such visualisations. A new interactive visualization tool (MolecularRift), which lets the user experience molecular models in a virtual reality environment, was developed in collaboration with AstraZeneca. In an attempt to create a more natural way to interact with the tool, users can steer and control molecules through hand gestures. The gestures are recorded using depth data from a Mircosoft Kinect v2 sensor and interpreted using per pixel algorithms, which only focus on the captured frames thus freeing the user from additional devices such as cursor, keyboard, touchpad or even piezoresistive gloves. MolecularRift was developed from a usability perspective using an iterative developing process and test group evaluations. The iterations allowed an agile process where features easily could be evaluated to monitor behavior and performance, resulting in a user-optimized tool. We conclude with reflections on virtual reality's capabilities in chemistry and possibilities for future projects.Virtual reality Ă€r framtiden. Nya tekniker utvecklas konstant och parallellt med att datakapaciteten förbĂ€ttras finner vi nya sĂ€tt att anvĂ€nda dem ihop. Vi har utvecklat ett nytt interaktivt visualiserings verktyg (Molecular Rift) som lĂ„ter anvĂ€ndaren uppleva molekylĂ€ra modeller i en virtuell verklighet. I dagens medicinindustri Ă€r man i stĂ€ndigt behov av nya metoder för att visualisera potentiella lĂ€kemedel i 3-D. Det finns flera verktyg idag som anvĂ€nds för att visualisera molekyler i 3-D stereo. VĂ„ra nyframtagna tekniker inom virtuell verklighet presenterar möjligheter för medicinutvecklare att ”gĂ„ in” i de molekylĂ€ra strukturerna och uppleva dem pĂ„ ett helt nytt sĂ€tt

    Capillary flow of dense colloidal suspensions

    Get PDF
    The purpose of this thesis is to study the flow of dense colloidal suspensions into micronsized capillaries at the particle level. Understanding the flow of complex fluids in terms of their constituents (colloids, polymers, or surfactants) poses deep fundamental challenges, and has wide applications in many industrial processes. Through the use of a novel experimental procedure we find results contrasting with the predicted bulk rheological behaviour of dense colloidal systems and propose an alternative approach based on the analogy with granular systems. Quantitative predictions which successfully explain the data are obtained. In order to obtain quantitative information on the dynamics of the system, we image the flow using a fast confocal microscope and identify the trajectories of each particle. Due to the nature of the flow, conventional techniques for locating and tracking the particles fail to yield satisfactory results. To overcome this limitation, we have developed a novel technique which allows us to successfully track the particles in strongly non-uniform flow fields (published, 2006). We focus our attention on three main aspects of the flow: concentration gradients, velocity profiles and time behaviour. We initially discuss the occurrence of concentration gradients along the flow direction and relate them to the local flow profiles. We observe high density regions where the velocity is uniform across the channel (complete plugs) and lower density regions where shear is present. The observed concentration profiles can be qualitatively explained by considering the relative ow between the solvent and the suspended particles. The flow profiles in the presence of shear consist of a plug in the centre while shear occurs localized adjacent to the channel walls, reminiscent of yield-stress fluid behaviour. However, the observed scaling of the velocity profiles with the flow rate strongly contrasts yield-stress fluid predictions. Instead, the velocity profiles can be captured by a theory of stress fluctuations originally developed for chute flow of dry granular media (published ,2007). We extend the model to our case and discuss it as a function of a series of parameters (boundary conditions, volume fraction, channel size, etc.) highlighting differences and similarities with granular media. Finally we discuss the time behaviour of complete plug flows relating it to the microscopic dynamics of the particles. At variance with dilute systems, dense systems exhibit velocity fluctuations when driven into channels by a constant pressure difference. We find that there exists a threshold value of the flow rate below which oscillations in the velocity are absent and above which their frequency scales as a power law of the flow rate. Despite quantitative predictions on this issue that are still missing, we present a microscopic description of the phenomenon highlighting the interplay between the particles and the solvent
    • 

    corecore