7,291 research outputs found

    Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification

    Get PDF
    The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks

    Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap

    Full text link
    Global point cloud registration is essential in many robotics tasks like loop closing and relocalization. Unfortunately, the registration often suffers from the low overlap between point clouds, a frequent occurrence in practical applications due to occlusion and viewpoint change. In this paper, we propose a graph-theoretic framework to address the problem of global point cloud registration with low overlap. To this end, we construct a consistency graph to facilitate robust data association and employ graduated non-convexity (GNC) for reliable pose estimation, following the state-of-the-art (SoTA) methods. Unlike previous approaches, we use semantic cues to scale down the dense point clouds, thus reducing the problem size. Moreover, we address the ambiguity arising from the consistency threshold by constructing a pyramid graph with multi-level consistency thresholds. Then we propose a cascaded gradient ascend method to solve the resulting densest clique problem and obtain multiple pose candidates for every consistency threshold. Finally, fast geometric verification is employed to select the optimal estimation from multiple pose candidates. Our experiments, conducted on a self-collected indoor dataset and the public KITTI dataset, demonstrate that our method achieves the highest success rate despite the low overlap of point clouds and low semantic quality. We have open-sourced our code https://github.com/HKUST-Aerial-Robotics/Pagor for this project.Comment: Accepted by IROS202

    Machine learning and mixed reality for smart aviation: applications and challenges

    Get PDF
    The aviation industry is a dynamic and ever-evolving sector. As technology advances and becomes more sophisticated, the aviation industry must keep up with the changing trends. While some airlines have made investments in machine learning and mixed reality technologies, the vast majority of regional airlines continue to rely on inefficient strategies and lack digital applications. This paper investigates the state-of-the-art applications that integrate machine learning and mixed reality into the aviation industry. Smart aerospace engineering design, manufacturing, testing, and services are being explored to increase operator productivity. Autonomous systems, self-service systems, and data visualization systems are being researched to enhance passenger experience. This paper investigate safety, environmental, technological, cost, security, capacity, and regulatory challenges of smart aviation, as well as potential solutions to ensure future quality, reliability, and efficiency

    Blockchain Technology: Disruptor or Enhnancer to the Accounting and Auditing Profession

    Get PDF
    The unique features of blockchain technology (BCT) - peer-to-peer network, distribution ledger, consensus decision-making, transparency, immutability, auditability, and cryptographic security - coupled with the success enjoyed by Bitcoin and other cryptocurrencies have encouraged many to assume that the technology would revolutionise virtually all aspects of business. A growing body of scholarship suggests that BCT would disrupt the accounting and auditing fields by changing accounting practices, disintermediating auditors, and eliminating financial fraud. BCT disrupts audits (Lombard et al.,2021), reduces the role of audit firms (Yermack 2017), undermines accountants' roles with software developers and miners (Fortin & Pimentel 2022); eliminates many management functions, transforms businesses (Tapscott & Tapscott, 2017), facilitates a triple-entry accounting system (Cai, 2021), and prevents fraudulent transactions (Dai, et al., 2017; Rakshit et al., 2022). Despite these speculations, scholars have acknowledged that the application of BCT in the accounting and assurance industry is underexplored and many existing studies are said to lack engagement with practitioners (Dai & Vasarhelyi, 2017; Lombardi et al., 2021; Schmitz & Leoni, 2019). This study empirically explored whether BCT disrupts or enhances accounting and auditing fields. It also explored the relevance of audit in a BCT environment and the effectiveness of the BCT mechanism for fraud prevention and detection. The study further examined which technical skillsets accountants and auditors require in a BCT environment, and explored the incentives, barriers, and unintended consequences of the adoption of BCT in the accounting and auditing professions. The current COVID-19 environment was also investigated in terms of whether the pandemic has improved BCT adoption or not. A qualitative exploratory study used semi-structured interviews to engage practitioners from blockchain start-ups, IT experts, financial analysts, accountants, auditors, academics, organisational leaders, consultants, and editors who understood the technology. With the aid of NVIVO qualitative analysis software, the views of 44 participants from 13 countries: New Zealand, Australia, United States, United Kingdom, Canada, Germany, Italy, Ireland, Hong Kong, India, Pakistan, United Arab Emirates, and South Africa were analysed. The Technological, Organisational, and Environmental (TOE) framework with consequences of innovation context was adopted for this study. This expanded TOE framework was used as the theoretical lens to understand the disruption of BCT and its adoption in the accounting and auditing fields. Four clear patterns emerged. First, BCT is an emerging tool that accountants and auditors use mainly to analyse financial records because technology cannot disintermediate auditors from the financial system. Second, the technology can detect anomalies but cannot prevent financial fraud. Third, BCT has not been adopted by any organisation for financial reporting and accounting purposes, and accountants and auditors do not require new skillsets or an understanding of the BCT programming language to be able to operate in a BCT domain. Fourth, the advent of COVID-19 has not substantially enhanced the adoption of BCT. Additionally, this study highlights the incentives, barriers, and unintended consequences of adopting BCT as financial technology (FinTech). These findings shed light on important questions about BCT disrupting and disintermediating auditors, the extent of adoption in the accounting industry, preventing fraud and anomalies, and underscores the notion that blockchain, as an emerging technology, currently does not appear to be substantially disrupting the accounting and auditing profession. This study makes methodological, theoretical, and practical contributions. At the methodological level, the study adopted the social constructivist-interpretivism paradigm with an exploratory qualitative method to engage and understand BCT as a disruptive innovation in the accounting industry. The engagement with practitioners from diverse fields, professions, and different countries provides a distinctive and innovative contribution to methodological and practical knowledge. At the theoretical level, the findings contribute to the literature by offering an integrated conceptual TOE framework. The framework offers a reference for practitioners, academics and policymakers seeking to appraise comprehensive factors influencing BCT adoption and its likely unintended consequences. The findings suggest that, at present, no organisations are using BCT for financial reporting and accounting systems. This study contributes to practice by highlighting the differences between initial expectations and practical applications of what BCT can do in the accounting and auditing fields. The study could not find any empirical evidence that BCT will disrupt audits, eliminate the roles of auditors in a financial system, and prevent and detect financial fraud. Also, there was no significant evidence that accountants and auditors required higher-level skillsets and an understanding of BCT programming language to be able to use the technology. Future research should consider the implications of an external audit firm as a node in a BCT network on the internal audit functions. It is equally important to critically examine the relevance of including programming languages or codes in the curriculum of undergraduate accounting students. Future research could also empirically evaluate if a BCT-enabled triple-entry system could prevent financial statements and management fraud

    Application of deep learning methods in materials microscopy for the quality assessment of lithium-ion batteries and sintered NdFeB magnets

    Get PDF
    Die Qualitätskontrolle konzentriert sich auf die Erkennung von Produktfehlern und die Überwachung von Aktivitäten, um zu überprüfen, ob die Produkte den gewünschten Qualitätsstandard erfüllen. Viele Ansätze für die Qualitätskontrolle verwenden spezialisierte Bildverarbeitungssoftware, die auf manuell entwickelten Merkmalen basiert, die von Fachleuten entwickelt wurden, um Objekte zu erkennen und Bilder zu analysieren. Diese Modelle sind jedoch mühsam, kostspielig in der Entwicklung und schwer zu pflegen, während die erstellte Lösung oft spröde ist und für leicht unterschiedliche Anwendungsfälle erhebliche Anpassungen erfordert. Aus diesen Gründen wird die Qualitätskontrolle in der Industrie immer noch häufig manuell durchgeführt, was zeitaufwändig und fehleranfällig ist. Daher schlagen wir einen allgemeineren datengesteuerten Ansatz vor, der auf den jüngsten Fortschritten in der Computer-Vision-Technologie basiert und Faltungsneuronale Netze verwendet, um repräsentative Merkmale direkt aus den Daten zu lernen. Während herkömmliche Methoden handgefertigte Merkmale verwenden, um einzelne Objekte zu erkennen, lernen Deep-Learning-Ansätze verallgemeinerbare Merkmale direkt aus den Trainingsproben, um verschiedene Objekte zu erkennen. In dieser Dissertation werden Modelle und Techniken für die automatisierte Erkennung von Defekten in lichtmikroskopischen Bildern von materialografisch präparierten Schnitten entwickelt. Wir entwickeln Modelle zur Defekterkennung, die sich grob in überwachte und unüberwachte Deep-Learning-Techniken einteilen lassen. Insbesondere werden verschiedene überwachte Deep-Learning-Modelle zur Erkennung von Defekten in der Mikrostruktur von Lithium-Ionen-Batterien entwickelt, von binären Klassifizierungsmodellen, die auf einem Sliding-Window-Ansatz mit begrenzten Trainingsdaten basieren, bis hin zu komplexen Defekterkennungs- und Lokalisierungsmodellen, die auf ein- und zweistufigen Detektoren basieren. Unser endgültiges Modell kann mehrere Klassen von Defekten in großen Mikroskopiebildern mit hoher Genauigkeit und nahezu in Echtzeit erkennen und lokalisieren. Das erfolgreiche Trainieren von überwachten Deep-Learning-Modellen erfordert jedoch in der Regel eine ausreichend große Menge an markierten Trainingsbeispielen, die oft nicht ohne weiteres verfügbar sind und deren Beschaffung sehr kostspielig sein kann. Daher schlagen wir zwei Ansätze vor, die auf unbeaufsichtigtem Deep Learning zur Erkennung von Anomalien in der Mikrostruktur von gesinterten NdFeB-Magneten basieren, ohne dass markierte Trainingsdaten benötigt werden. Die Modelle sind in der Lage, Defekte zu erkennen, indem sie aus den Trainingsdaten indikative Merkmale von nur "normalen" Mikrostrukturmustern lernen. Wir zeigen experimentelle Ergebnisse der vorgeschlagenen Fehlererkennungssysteme, indem wir eine Qualitätsbewertung an kommerziellen Proben von Lithium-Ionen-Batterien und gesinterten NdFeB-Magneten durchführen

    Volumetric memory network for interactive medical image segmentation

    Full text link
    Despite recent progress of automatic medical image segmentation techniques, fully automatic results usually fail to meet clinically acceptable accuracy, thus typically require further refinement. To this end, we propose a novel Volumetric Memory Network, dubbed as VMN, to enable segmentation of 3D medical images in an interactive manner. Provided by user hints on an arbitrary slice, a 2D interaction network is firstly employed to produce an initial 2D segmentation for the chosen slice. Then, the VMN propagates the initial segmentation mask bidirectionally to all slices of the entire volume. Subsequent refinement based on additional user guidance on other slices can be incorporated in the same manner. To facilitate smooth human-in-the-loop segmentation, a quality assessment module is introduced to suggest the next slice for interaction based on the segmentation quality of each slice produced in the previous round. Our VMN demonstrates two distinctive features: First, the memory-augmented network design offers our model the ability to quickly encode past segmentation information, which will be retrieved later for the segmentation of other slices; Second, the quality assessment module enables the model to directly estimate the quality of each segmentation prediction, which allows for an active learning paradigm where users preferentially label the lowest-quality slice for multi-round refinement. The proposed network leads to a robust interactive segmentation engine, which can generalize well to various types of user annotations (e.g., scribble, bounding box, extreme clicking). Extensive experiments have been conducted on three public medical image segmentation datasets (i.e., MSD, KiTS19, CVC-ClinicDB), and the results clearly confirm the superiority of our approach in comparison with state-of-the-art segmentation models. The code is made publicly available at https://github.com/0liliulei/Mem3D

    On Transforming Reinforcement Learning by Transformer: The Development Trajectory

    Full text link
    Transformer, originally devised for natural language processing, has also attested significant success in computer vision. Thanks to its super expressive power, researchers are investigating ways to deploy transformers to reinforcement learning (RL) and the transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances on transforming RL by transformer (transformer-based RL or TRL), in order to explore its development trajectory and future trend. We group existing developments in two categories: architecture enhancement and trajectory optimization, and examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving. For architecture enhancement, these methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, which model agents and environments much more precisely than deep RL methods, but they are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and "deadly triad". For trajectory optimization, these methods treat RL problems as sequence modeling and train a joint state-action model over entire trajectories under the behavior cloning framework, which are able to extract policies from static datasets and fully use the long-sequence modeling capability of the transformer. Given these advancements, extensions and challenges in TRL are reviewed and proposals about future direction are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.Comment: 26 page

    A facial depression recognition method based on hybrid multi-head cross attention network

    Get PDF
    IntroductionDeep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes.MethodsTo address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB).ResultsWe experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches.DiscussionWe proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments

    Using video games to study the acquisition and performance of psychomotor skills

    Get PDF
    Understanding how humans learn complex skills is a fundamental aim of cognitive science. Digital games offer promising opportunities to study cognitive factors associated with skill acquisition and performance, as they motivate longitudinal engagement and produce rich, multivariate data sets. By applying mutlivariate analysis techniques to data arising from gameplay, this thesis extended the literature on cognition as it pertains to psychomotor skill. We describe three studies that were conducted in this regard. In the first study, we analyzed the relationship between the temporal distribution of play instances and performance in a commercial digital game (League of Legends). Using clustering techniques and big data, we demonstrated that players who cram gameplay into short time frames ultimately perform worse than those who space the same number of games over longer periods. In the second study, we examined an experimental data set of participants who played Meta-T, a laboratory version of Tetris. Using Principal Components Analysis and regression techniques, we identified cognitive-behavioural markers of performance, such as action-latency and motor coordination. We also applied Hidden Markov models (HMM) to time series of these markers, showing that moment-to-moment dynamics in performance can be segmented into behavioural states related to latent psychological states. In the third study, we investigated the neural correlates of behavioural states during performance. Using simultaneous MEG and behavioural recordings of participants playing Tetris, we segmented time series datasets of neural activity based on time stamps of behavioural epochs derived from HMMs. We compared behavioural epochs based on neural markers, showing that cognitive states derived from multivariate behavioural data correlate with neural activity in the alpha band power. Taken together, this thesis advances our understanding of using digital game data to study cognition and learning. It demonstrates the feasibility of recording high-density neuroimaging data during complex behavioural tasks and obtaining reliable measures of internal neuronal states during complex behaviour
    corecore