130 research outputs found
Effective Uni-Modal to Multi-Modal Crowd Estimation based on Deep Neural Networks
Crowd estimation is a vital component of crowd analysis. It finds many applications in real-worldscenarios, e.g. huge gatherings management like Hajj, sporting and musical events, or political rallies. Automated crowd counting facilitates better and effective management of such events and consequently prevents any undesired situation. This is a very challenging problem in practice since there exists a significant difference in the crowd number in and across different images, varying image resolution, large perspective, severe occlusions, and dense crowd-like cluttered background regions. Current approaches do not handle huge crowd diversity well and thus perform poorly in cases ranging from extreme low to high crowd-density, thus, yielding huge crowd underestimation or overestimation. Also, manual crowd counting proves to be infeasible due to very slow and inaccurate results. To address these major crowd counting issues and challenges, we investigate two different types of input data: uni-modal (image) and multi-modal (image and audio). In the uni-modal setting, we propose and analyze four novel end-to-end crowd counting networks, ranging from multi-scale fusion-based models to uni-scale one-pass and two-pass multitask networks. The multi-scale networks employ the attention mechanism to enhance the model efficacy. On the other hand, the uni-scale models are well-equipped with novel and simple-yet effective patch re-scaling module (PRM) that functions identical but is more lightweight than multi-scale approaches. Experimental evaluation demonstrates that the proposed networks outperform the state-of-the-art in majority cases on four different benchmark datasets with up to 12.6% improvement for the RMSE evaluation metric. The better cross-dataset performance also validates the better generalization ability of our schemes. For the multi-modal input, effective feature-extraction (FE) and strong information fusion between two modalities remain a big challenge. Thus, the multi-modal novel network design focuses on investigating different features fusion techniques amid improving the FE. Based on the comprehensive experimental evaluation, the proposed multi-modal network increases the performance under all standard evaluation criteria with up to 33.8% improvement in comparison to the state-of-the-art. The application of multi-scale uni-modal attention networks also proves more effective in other deep learning domains, as demonstrated successfully on seven different scene-text recognition task datasets with better performance
Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher
Crowd localization is to predict each instance head position in crowd
scenarios. Since the distance of instances being to the camera are variant,
there exists tremendous gaps among scales of instances within an image, which
is called the intrinsic scale shift. The core reason of intrinsic scale shift
being one of the most essential issues in crowd localization is that it is
ubiquitous in crowd scenes and makes scale distribution chaotic.
To this end, the paper concentrates on access to tackle the chaos of the
scale distribution incurred by intrinsic scale shift. We propose Gaussian
Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely,
the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution
and decouples the mixture model into sub-normal distributions to regularize the
chaos within the sub-distributions. Then, an alignment is introduced to
regularize the chaos among sub-distributions. However, despite that GMS is
effective in regularizing the data distribution, it amounts to dislodging the
hard samples in training set, which incurs overfitting. We assert that it is
blamed on the block of transferring the latent knowledge exploited by GMS from
data to model. Therefore, a Scoped Teacher playing a role of bridge in
knowledge transform is proposed. What' s more, the consistency regularization
is also introduced to implement knowledge transform. To that effect, the
further constraints are deployed on Scoped Teacher to derive feature
consistence between teacher and student end.
With proposed GMS and Scoped Teacher implemented on five mainstream datasets
of crowd localization, the extensive experiments demonstrate the superiority of
our work. Moreover, comparing with existing crowd locators, our work achieves
state-of-the-art via F1-meansure comprehensively on five datasets.Comment: Accepted by IEEE TI
Deep learning in crowd counting: A survey
Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research,
RM60G0680GCRF,
P202PF11;SinoâUK Industrial Fund,
RP202G0289LIAS, P202ED10, P202RE969Data
Science Enhancement Fund,
P202RE237SinoâUK Education Fund, OP202006Fight for Sight, 24NN201Royal Society
International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B
Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations
Colorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists. The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification. We have also compared the performance of eight state-of-the-art deep learning-based object detection models. The results demonstrate that deep CNN models are promising in CRC screening. This work can serve as a baseline for future research in polyp detection and classification
MolecularRift, a Gesture Based Interaction Tool for Controlling Molecules in 3-D
Visualization of molecular models is a vital part in modern drug design. Improved visualization methods increases the conceptual understanding and enables faster and better decision making. The introduction of virtual reality goggles such as Oculus Rift has introduced new opportunities for the capabilities of such visualisations. A new interactive visualization tool (MolecularRift), which lets the user experience molecular models in a virtual reality environment, was developed in collaboration with AstraZeneca. In an attempt to create a more natural way to interact with the tool, users can steer and control molecules through hand gestures. The gestures are recorded using depth data from a Mircosoft Kinect v2 sensor and interpreted using per pixel algorithms, which only focus on the captured frames thus freeing the user from additional devices such as cursor, keyboard, touchpad or even piezoresistive gloves. MolecularRift was developed from a usability perspective using an iterative developing process and test group evaluations. The iterations allowed an agile process where features easily could be evaluated to monitor behavior and performance, resulting in a user-optimized tool. We conclude with reflections on virtual reality's capabilities in chemistry and possibilities for future projects.Virtual reality Ă€r framtiden. Nya tekniker utvecklas konstant och parallellt med att datakapaciteten förbĂ€ttras finner vi nya sĂ€tt att anvĂ€nda dem ihop. Vi har utvecklat ett nytt interaktivt visualiserings verktyg (Molecular Rift) som lĂ„ter anvĂ€ndaren uppleva molekylĂ€ra modeller i en virtuell verklighet. I dagens medicinindustri Ă€r man i stĂ€ndigt behov av nya metoder för att visualisera potentiella lĂ€kemedel i 3-D. Det finns flera verktyg idag som anvĂ€nds för att visualisera molekyler i 3-D stereo. VĂ„ra nyframtagna tekniker inom virtuell verklighet presenterar möjligheter för medicinutvecklare att âgĂ„ inâ i de molekylĂ€ra strukturerna och uppleva dem pĂ„ ett helt nytt sĂ€tt
Capillary flow of dense colloidal suspensions
The purpose of this thesis is to study the
flow of dense colloidal suspensions into micronsized
capillaries at the particle level. Understanding the flow of complex fluids in
terms of their constituents (colloids, polymers, or surfactants) poses deep fundamental
challenges, and has wide applications in many industrial processes. Through the
use of a novel experimental procedure we find results contrasting with the predicted
bulk rheological behaviour of dense colloidal systems and propose an alternative
approach based on the analogy with granular systems. Quantitative predictions which
successfully explain the data are obtained.
In order to obtain quantitative information on the dynamics of the system, we
image the
flow using a fast confocal microscope and identify the trajectories of each
particle. Due to the nature of the
flow, conventional techniques for locating and
tracking the particles fail to yield satisfactory results. To overcome this limitation,
we have developed a novel technique which allows us to successfully track the particles
in strongly non-uniform flow fields (published, 2006).
We focus our attention on three main aspects of the flow: concentration gradients,
velocity profiles and time behaviour.
We initially discuss the occurrence of concentration gradients along the flow
direction and relate them to the local flow profiles. We observe high density regions
where the velocity is uniform across the channel (complete plugs) and lower density
regions where shear is present. The observed concentration profiles can be qualitatively
explained by considering the relative
ow between the solvent and the suspended
particles.
The
flow profiles in the presence of shear consist of a plug in the centre while
shear occurs localized adjacent to the channel walls, reminiscent of yield-stress fluid
behaviour. However, the observed scaling of the velocity profiles with the
flow rate
strongly contrasts yield-stress fluid predictions. Instead, the velocity profiles can be
captured by a theory of stress
fluctuations originally developed for chute flow of dry
granular media (published ,2007). We extend the model to our case and discuss it as a
function of a series of parameters (boundary conditions, volume fraction, channel size,
etc.) highlighting differences and similarities with granular media.
Finally we discuss the time behaviour of complete plug flows relating it to the
microscopic dynamics of the particles. At variance with dilute systems, dense systems
exhibit velocity
fluctuations when driven into channels by a constant pressure difference.
We find that there exists a threshold value of the
flow rate below which oscillations in
the velocity are absent and above which their frequency scales as a power law of the
flow
rate. Despite quantitative predictions on this issue that are still missing, we present
a microscopic description of the phenomenon highlighting the interplay between the
particles and the solvent
- âŠ