43 research outputs found

    An Optimized Machine Learning and Deep Learning Framework for Facial and Masked Facial Recognition

    Get PDF
    In this study, we aimed to find an optimized approach to improving facial and masked facial recognition using machine learning and deep learning techniques. Prior studies only used a single machine learning model for classification and did not report optimal parameter values. In contrast, we utilized a grid search with hyperparameter tuning and nested cross-validation to achieve better results during the verification phase. We performed experiments on a large dataset of facial images with and without masks. Our findings showed that the SVM model with hyperparameter tuning had the highest accuracy compared to other models, achieving a recognition accuracy of 0.99912. The precision values for recognition without masks and with masks were 0.99925 and 0.98417, respectively. We tested our approach in real-life scenarios and found that it accurately identified masked individuals through facial recognition. Furthermore, our study stands out from others as it incorporates hyperparameter tuning and nested cross-validation during the verification phase to enhance the model's performance, generalization, and robustness while optimizing data utilization. Our optimized approach has potential implications for improving security systems in various domains, including public safety and healthcare. Doi: 10.28991/ESJ-2023-07-04-010 Full Text: PD

    Cost-sensitive learning classification strategy for predicting product failures

    Get PDF
    In the current era of Industry 4.0, sensor data used in connection with machine learning algorithms can help manufacturing industries to reduce costs and to predict failures in advance. This paper addresses a binary classification problem found in manufacturing engineering, which focuses on how to ensure product quality delivery and at the same time to reduce production costs. The aim behind this problem is to predict the number of faulty products, which in this case is extremely low. As a result of this characteristic, the problem is reduced to an imbalanced binary classification problem. The authors contribute to imbalanced classification research in three important ways. First, the industrial application coming from the electronic manufacturing industry is presented in detail, along with its data and modelling challenges. Second, a modified cost-sensitive classification strategy based on a combination of Voronoi diagrams and genetic algorithm is applied to tackle this problem and is compared to several base classifiers. The results obtained are promising for this specific application. Third, in order to evaluate the flexibility of the strategy, and to demonstrate its wide range of applicability, 25 real-world data sets are selected from the KEEL repository with different imbalance ratios and number of features. The strategy, in this case implemented without a predefined cost, is compared with the same base classifiers as those used for the industrial problem

    Model-informed classification of broadband acoustic backscatter from zooplankton in an in situ mesocosm

    Get PDF
    Funding: The fieldwork was registered in the Research in Svalbard database (RiS ID 11578). Fieldwork and research were financed by Arctic Field Grant Project AZKABAN-light (Norwegian Research Council project no. 322 332), Deep Impact (Norwegian Research Council project no. 300 333), Deeper Impact (Norwegian Research Council project no. 329 305), Marine Alliance for Science and Technology in Scotland (MASTS), the Ocean Frontier Institute (SCORE grant no. HR09011), and Glider Phase II financed by ConocoPhillips Skandinavia AS. Geir Pedersen’s participation was co-funded by CRIMAC (Norwegian Research Council project no. 309 512). Maxime Geoffroy was financially supported by the Ocean Frontier Institute of the Canada First Research Excellence Fund, the Natural Sciences and Engineering Research Council Discovery Grant Programme, the ArcticNet Network of Centres of Excellence Canada, the Research Council of Norway Grant Deep Impact, and the Fisheries and Oceans Canada through the Atlantic Fisheries Fund.Classification of zooplankton to species with broadband echosounder data could increase the taxonomic resolution of acoustic surveys and reduce the dependence on net and trawl samples for ‘ground truthing’. Supervised classification with broadband echosounder data is limited by the acquisition of validated data required to train machine learning algorithms (‘classifiers’). We tested the hypothesis that acoustic scattering models could be used to train classifiers for remote classification of zooplankton. Three classifiers were trained with data from scattering models of four Arctic zooplankton groups (copepods, euphausiids, chaetognaths, and hydrozoans). We evaluated classifier predictions against observations of a mixed zooplankton community in a submerged purpose-built mesocosm (12 m3) insonified with broadband transmissions (185–255 kHz). The mesocosm was deployed from a wharf in Ny-Ålesund, Svalbard, during the Arctic polar night in January 2022. We detected 7722 tracked single targets, which were used to evaluate the classifier predictions of measured zooplankton targets. The classifiers could differentiate copepods from the other groups reasonably well, but they could not differentiate euphausiids, chaetognaths, and hydrozoans reliably due to the similarities in their modelled target spectra. We recommend that model-informed classification of zooplankton from broadband acoustic signals be used with caution until a better understanding of in situ target spectra variability is gained.Publisher PDFPeer reviewe

    Searching for biomarkers of disease-free survival in head and neck cancers using PET/CT radiomics

    Get PDF
    The goals of this thesis were to (1) study methodologies for radiomics data analysis, and (2) apply such methods to identify biomarkers of disease-free survival in head and neck cancers. Procedures for radiomics feature extraction and feature exploration in biomarker discovery were implemented with the Python(TM) programming language. The code is available at https://github.com/gsel9/biorad. In a retrospective study of disease-free survival as response to radiotherapy, radiomics features were extracted from PET/CT images of 198 head and neck cancers patients. A total of 513 features were obtained by combining the radiomics features with clinical factors and PET parameters. Combinations of seven feature selection and 10 classification algorithms were evaluated in terms of their ability to predict patient treatment response. By using a combination of MultiSURF feature selection and Extreme Gradient Boosting classification, subgroup analyses of HPV negative oropharyngeal (HPV unrelated) cancers gave 76.4 +/- 13.2 % area under the Receiver Operating Characteristic curve (AUC). This performance was superior to the baseline of 54 \% for disease-free survival outcomes in the patient subgroup. Four features were identified as prognostic of disease-free survival in the HPV unrelated cohort. Among these were two CT features capturing intratumour heterogeneity. Another feature described tumour shape and was, contrary to the CT features, significantly correlated with the tumour volume. The fourth feature was the median CT intensity. Determining the prognostic value of these features in an independent cohort will elucidate the relevance of tumour volume and intratumour heterogeneity in treatment of HPV unrelated head and neck cancer.submittedVersionM-D

    Associations between the urban exposome and type 2 diabetes: Results from penalised regression by least absolute shrinkage and selection operator and random forest models

    Get PDF
    BACKGROUND: Type 2 diabetes (T2D) is thought to be influenced by environmental stressors such as air pollution and noise. Although environmental factors are interrelated, studies considering the exposome are lacking. We simultaneously assessed a variety of exposures in their association with prevalent T2D by applying penalised regression Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Artificial Neural Networks (ANN) approaches. We contrasted the findings with single-exposure models including consistently associated risk factors reported by previous studies. METHODS: Baseline data (n = 14,829) of the Occupational and Environmental Health Cohort study (AMIGO) were enriched with 85 exposome factors (air pollution, noise, built environment, neighbourhood socio-economic factors etc.) using the home addresses of participants. Questionnaires were used to identify participants with T2D (n = 676(4.6 %)). Models in all applied statistical approaches were adjusted for individual-level socio-demographic variables. RESULTS: Lower average home values, higher share of non-Western immigrants and higher surface temperatures were related to higher risk of T2D in the multivariable models (LASSO, RF). Selected variables differed between the two multi-variable approaches, especially for weaker predictors. Some established risk factors (air pollutants) appeared in univariate analysis but were not among the most important factors in multivariable analysis. Other established factors (green space) did not appear in univariate, but appeared in multivariable analysis (RF). Average estimates of the prediction error (logLoss) from nested cross-validation showed that the LASSO outperformed both RF and ANN approaches. CONCLUSIONS: Neighbourhood socio-economic and socio-demographic characteristics and surface temperature were consistently associated with the risk of T2D. For other physical-chemical factors associations differed per analytical approach

    kCV-B: Bootstrap with Cross-Validation for Deep Learning Model Development, Assessment and Selection

    Get PDF
    This study investigates the inability of two popular data splitting techniques: train/test split and k-fold cross-validation that are to create training and validation data sets, and to achieve sufficient generality for supervised deep learning (DL) methods. This failure is mainly caused by their limited ability of new data creation. In response, the bootstrap is a computer based statistical resampling method that has been used efficiently for estimating the distribution of a sample estimator and to assess a model without having knowledge about the population. This paper couples cross-validation and bootstrap to have their respective advantages in view of data generation strategy and to achieve better generalization of a DL model. This paper contributes by: (i) developing an algorithm for better selection of training and validation data sets, (ii) exploring the potential of bootstrap for drawing statistical inference on the necessary performance metrics (e.g., mean square error), and (iii) introducing a method that can assess and improve the efficiency of a DL model. The proposed method is applied for semantic segmentation and is demonstrated via a DL based classification algorithm, PointNet, through aerial laser scanning point cloud data

    Machine Learning Applied to Raman Spectroscopy to Classify Cancers

    Get PDF
    Cancer diagnosis is notoriously difficult, evident in the inter-rater variability between histopathologists classifying cancerous sub-types. Although there are many cancer pathologies, they have in common that earlier diagnosis would maximise treatment potential. To reduce this variability and expedite diagnosis, there has been a drive to arm histopathologists with additional tools. One such tool is Raman spectroscopy, which has demonstrated potential in distinguishing between various cancer types. However, Raman data has high dimensionality and often contains artefacts and together with challenges inherent to medical data, classification attempts can be frustrated. Deep learning has recently emerged with the promise of unlocking many complex datasets, but it is not clear how this modelling paradigm can best exploit Raman data for cancer diagnosis. Three Raman oncology datasets (from ovarian, colonic and oesophageal tissue) were used to examine various methodological challenges to machine learning applied to Raman data, in conjunction with a thorough review of the recent literature. The performance of each dataset is assessed with two traditional and one deep learning models. A technique is then applied to the deep learning model to aid interpretability and relate biochemical antecedents to disease classes. In addition, a clinical problem for each dataset was addressed, including the transferability of models developed using multi-centre Raman data taken different on spectrometers of the same make. Many subtleties of data processing were found to be important to the realistic assessment of a machine learning models. In particular, appropriate cross-validation during hyperparameter selection, splitting data into training and test sets according to the inherent structure of biomedical data and addressing the number of samples Abstract " per disease class are all found to be important factors. Additionally, it was found that instrument correction was not needed to ensure system transferability if Raman data is collected with a common protocol on spectrometers of the same make

    A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness

    Get PDF
    The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products
    corecore