13,556 research outputs found

    Inexpensive fusion methods for enhancing feature detection

    Get PDF
    Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere

    TRECVid 2007 experiments at Dublin City University

    Get PDF
    In this paper we describe our retrieval system and experiments performed for the automatic search task in TRECVid 2007. We submitted the following six automatic runs: • F A 1 DCU-TextOnly6: Baseline run using only ASR/MT text features. • F A 1 DCU-ImgBaseline4: Baseline visual expert only run, no ASR/MT used. Made use of query-time generation of retrieval expert coefficients for fusion. • F A 2 DCU-ImgOnlyEnt5: Automatic generation of retrieval expert coefficients for fusion at index time. • F A 2 DCU-imgOnlyEntHigh3: Combination of coefficient generation which combined the coefficients generated by the query-time approach, and the index-time approach, with greater weight given to the index-time coefficient. • F A 2 DCU-imgOnlyEntAuto2: As above, except that greater weight is given to the query-time coefficient that was generated. • F A 2 DCU-autoMixed1: Query-time expert coefficient generation that used both visual and text experts

    A Generalized Multi-Modal Fusion Detection Framework

    Full text link
    LiDAR point clouds have become the most common data source in autonomous driving. However, due to the sparsity of point clouds, accurate and reliable detection cannot be achieved in specific scenarios. Because of their complementarity with point clouds, images are getting increasing attention. Although with some success, existing fusion methods either perform hard fusion or do not fuse in a direct manner. In this paper, we propose a generic 3D detection framework called MMFusion, using multi-modal features. The framework aims to achieve accurate fusion between LiDAR and images to improve 3D detection in complex scenes. Our framework consists of two separate streams: the LiDAR stream and the camera stream, which can be compatible with any single-modal feature extraction network. The Voxel Local Perception Module in the LiDAR stream enhances local feature representation, and then the Multi-modal Feature Fusion Module selectively combines feature output from different streams to achieve better fusion. Extensive experiments have shown that our framework not only outperforms existing benchmarks but also improves their detection, especially for detecting cyclists and pedestrians on KITTI benchmarks, with strong robustness and generalization capabilities. Hopefully, our work will stimulate more research into multi-modal fusion for autonomous driving tasks

    Secure Face and Liveness Detection with Criminal Identification for Security Systems

    Get PDF
    The advancement of computer vision, machine learning, and image processing techniques has opened new avenues for enhancing security systems. In this research work focuses on developing a robust and secure framework for face and liveness detection with criminal identification, specifically designed for security systems. Machine learning algorithms and image processing techniques are employed for accurate face detection and liveness verification. Advanced facial recognition methods are utilized for criminal identification. The framework incorporates ML technology to ensure data integrity and identification techniques for security system. Experimental evaluations demonstrate the system's effectiveness in detecting faces, verifying liveness, and identifying potential criminals. The proposed framework has the potential to enhance security systems, providing reliable and secure face and liveness detection for improved safety and security. The accuracy of the algorithm is 94.30 percent. The accuracy of the model is satisfactory even after the results are acquired by combining our rules inwritten by humans with conventional machine learning classification algorithms. Still, there is scope for improving and accurately classifying the attack precisely

    Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation

    Get PDF
    Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions

    Studies on deep learning approach in breast lesions detection and cancer diagnosis in mammograms

    Get PDF
    Breast cancer accounts for the largest proportion of newly diagnosed cancers in women recently. Early diagnosis of breast cancer can improve treatment outcomes and reduce mortality. Mammography is convenient and reliable, which is the most commonly used method for breast cancer screening. However, manual examinations are limited by the cost and experience of radiologists, which introduce a high false positive rate and false examination. Therefore, a high-performance computer-aided diagnosis (CAD) system is significant for lesions detection and cancer diagnosis. Traditional CADs for cancer diagnosis require a large number of features selected manually and remain a high false positive rate. The methods based on deep learning can automatically extract image features through the network, but their performance is limited by the problems of multicenter data biases, the complexity of lesion features, and the high cost of annotations. Therefore, it is necessary to propose a CAD system to improve the ability of lesion detection and cancer diagnosis, which is optimized for the above problems. This thesis aims to utilize deep learning methods to improve the CADs' performance and effectiveness of lesion detection and cancer diagnosis. Starting from the detection of multi-type lesions using deep learning methods based on full consideration of characteristics of mammography, this thesis explores the detection method of microcalcification based on multiscale feature fusion and the detection method of mass based on multi-view enhancing. Then, a classification method based on multi-instance learning is developed, which integrates the detection results from the above methods, to realize the precise lesions detection and cancer diagnosis in mammography. For the detection of microcalcification, a microcalcification detection network named MCDNet is proposed to overcome the problems of multicenter data biases, the low resolution of network inputs, and scale differences between microcalcifications. In MCDNet, Adaptive Image Adjustment mitigates the impact of multicenter biases and maximizes the input effective pixels. Then, the proposed pyramid network with shortcut connections ensures that the feature maps for detection contain more precise localization and classification information about multiscale objects. In the structure, trainable Weighted Feature Fusion is proposed to improve the detection performance of both scale objects by learning the contribution of feature maps in different stages. The experiments show that MCDNet outperforms other methods on robustness and precision. In case the average number of false positives per image is 1, the recall rates of benign and malignant microcalcification are 96.8% and 98.9%, respectively. MCDNet can effectively help radiologists detect microcalcifications in clinical applications. For the detection of breast masses, a weakly supervised multi-view enhancing mass detection network named MVMDNet is proposed to solve the lack of lesion-level labels. MVMDNet can be trained on the image-level labeled dataset and extract the extra localization information by exploring the geometric relation between multi-view mammograms. In Multi-view Enhancing, Spatial Correlation Attention is proposed to extract correspondent location information between different views while Sigmoid Weighted Fusion module fuse diagnostic and auxiliary features to improve the precision of localization. CAM-based Detection module is proposed to provide detections for mass through the classification labels. The results of experiments on both in-house dataset and public dataset, [email protected] and [email protected] (recall rate@average number of false positive per image), demonstrate MVMDNet achieves state-of-art performances among weakly supervised methods and has robust generalization ability to alleviate the multicenter biases. In the study of cancer diagnosis, a breast cancer classification network named CancerDNet based on Multi-instance Learning is proposed. CancerDNet successfully solves the problem that the features of lesions are complex in whole image classification utilizing the lesion detection results from the previous chapters. Whole Case Bag Learning is proposed to combined the features extracted from four-view, which works like a radiologist to realize the classification of each case. Low-capacity Instance Learning and High-capacity Instance Learning successfully integrate the detections of multi-type lesions into the CancerDNet, so that the model can fully consider lesions with complex features in the classification task. CancerDNet achieves the AUC of 0.907 and AUC of 0.925 on the in-house and the public datasets, respectively, which is better than current methods. The results show that CancerDNet achieves a high-performance cancer diagnosis. In the works of the above three parts, this thesis fully considers the characteristics of mammograms and proposes methods based on deep learning for lesions detection and cancer diagnosis. The results of experiments on in-house and public datasets show that the methods proposed in this thesis achieve the state-of-the-art in the microcalcifications detection, masses detection, and the case-level classification of cancer and have a strong ability of multicenter generalization. The results also prove that the methods proposed in this thesis can effectively assist radiologists in making the diagnosis while saving labor costs

    CrossTransUnet: A new computationally inexpensive tumor segmentation model for brain MRI

    Get PDF
    Brain tumors are usually fatal diseases with low life expectancies due to the organs they affect, even if the tumors are benign. Diagnosis and treatment of these tumors are challenging tasks, even for experienced physicians and experts, due to the heterogeneity of tumor cells. In recent years, advances in deep learning (DL) methods have been integrated to aid in the diagnosis, detection, and segmentation of brain neoplasms. However, segmentation is a computationally expensive process, typically based on convolutional neural networks (CNNs) in the UNet framework. While UNet has shown promising results, new models and developments can be incorporated into the conventional architecture to improve performance. In this research, we propose three new, computationally inexpensive, segmentation networks inspired by Transformers. These networks are designed in a 4-stage deep encoder-decoder structure and implement our new cross-attention model, along with separable convolution layers, to avoid the loss of dimensionality of the activation maps and reduce the computational cost of the models while maintaining high segmentation performance. The new attention model is integrated in different configurations by modifying the transition layers, encoder, and decoder blocks. The proposed networks are evaluated against the classical UNet network, showing that our networks have differences of up to an order of magnitude in the number of training parameters. Additionally, one of the models outperforms UNet, achieving training in significantly less time and with a Dice Similarity Coefficient (DSC) of up to 94%, ensuring high effectiveness in brain tumor segmentation.publishedVersio

    Is the timed-up and go test feasible in mobile devices? A systematic review

    Get PDF
    The number of older adults is increasing worldwide, and it is expected that by 2050 over 2 billion individuals will be more than 60 years old. Older adults are exposed to numerous pathological problems such as Parkinson’s disease, amyotrophic lateral sclerosis, post-stroke, and orthopedic disturbances. Several physiotherapy methods that involve measurement of movements, such as the Timed-Up and Go test, can be done to support efficient and effective evaluation of pathological symptoms and promotion of health and well-being. In this systematic review, the authors aim to determine how the inertial sensors embedded in mobile devices are employed for the measurement of the different parameters involved in the Timed-Up and Go test. The main contribution of this paper consists of the identification of the different studies that utilize the sensors available in mobile devices for the measurement of the results of the Timed-Up and Go test. The results show that mobile devices embedded motion sensors can be used for these types of studies and the most commonly used sensors are the magnetometer, accelerometer, and gyroscope available in off-the-shelf smartphones. The features analyzed in this paper are categorized as quantitative, quantitative + statistic, dynamic balance, gait properties, state transitions, and raw statistics. These features utilize the accelerometer and gyroscope sensors and facilitate recognition of daily activities, accidents such as falling, some diseases, as well as the measurement of the subject's performance during the test execution.info:eu-repo/semantics/publishedVersio
    • …
    corecore