15 research outputs found

    Machine Learning and Feature Ranking for Impact Fall Detection Event Using Multisensor Data

    Full text link
    Falls among individuals, especially the elderly population, can lead to serious injuries and complications. Detecting impact moments within a fall event is crucial for providing timely assistance and minimizing the negative consequences. In this work, we aim to address this challenge by applying thorough preprocessing techniques to the multisensor dataset, the goal is to eliminate noise and improve data quality. Furthermore, we employ a feature selection process to identify the most relevant features derived from the multisensor UP-FALL dataset, which in turn will enhance the performance and efficiency of machine learning models. We then evaluate the efficiency of various machine learning models in detecting the impact moment using the resulting data information from multiple sensors. Through extensive experimentation, we assess the accuracy of our approach using various evaluation metrics. Our results achieve high accuracy rates in impact detection, showcasing the power of leveraging multisensor data for fall detection tasks. This highlights the potential of our approach to enhance fall detection systems and improve the overall safety and well-being of individuals at risk of falls

    Performance Analysis of Tree-Based Algorithms in Predicting Employee Attrition

    Get PDF
    Based on data throughout 2022, there have been many reductions in employees both globally and Indonesia. The reduction was made due to adjustments with developments to keep the business afloat in increasingly fierce competition. However, reducing the number of employees is not an easy decision to make. This decision can have an impact on many aspects of the development and course of a business or company. To make a decision especially related to the aspect of termination of employment, it is necessary to consider carefully and thoroughly. Assessment and decision-making cannot be based on just one aspect, other aspects need to be seen to be taken into consideration. Additional aspects that can be selected to strengthen decision-making can be taken from the data. Data will not have any value without processing it with various approaches, one of which is the prediction process. Starting from the data, the prediction results will be more appropriate to make a decision. This study made a comparison of 3 decision tree algorithms, and produced a comparison of the three methods in terms of accuracy. The results of this study are the best accuracy for each algorithm C.45 = 83.44; Random Forests = 85.85; LMT = 88.29 with a linear precision value, and the best algorithm model with the highest accuracy is the Logistic Model Tree (LMT) algorithm

    Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach

    Get PDF
    Arabic’s complex morphology, orthography, and dialects make sentiment analysis difficult. This activity makes it harder to extract text attributes from short conversations to evaluate tone. Analyzing and judging a person’s emotional state is complex. Due to these issues, interpreting sentiments accurately and identifying polarity may take much work. Sentiment analysis extracts subjective information from text. This research evaluates machine learning (ML) techniques for understanding Arabic emotions. Sentiment analysis (SA) uses a support vector machine (SVM), Adaboost classifier (AC), maximum entropy (ME), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), logistic regression (LR), and naive Bayes (NB). A model for the ensemble-based sentiment was developed. Ensemble classifiers (ECs) with 10-fold cross-validation out-performed other machine learning classifiers in accuracy (A), specificity (S), precision (P), F1 score (FS), and sensitivity (S).

    Blind color deconvolution, normalization, and classification of histological images using general super Gaussian priors and Bayesian inference

    Get PDF
    This work was sponsored in part by the Agencia Es-tatal de Investigacion under project PID2019-105142RB-C22/AEI/10.13039/50110 0 011033, Junta de Andalucia under project PY20_00286,and the work by Fernando Perez-Bueno was spon-sored by Ministerio de Economia, Industria y Competitividad un-der FPI contract BES-2017-081584. Funding for open access charge: Universidad de Granada/CBUA.Background and Objective: Color variations in digital histopathology severely impact the performance of computer-aided diagnosis systems. They are due to differences in the staining process and acquisition system, among other reasons. Blind color deconvolution techniques separate multi-stained images into single stained bands which, once normalized, can be used to eliminate these negative color variations and improve the performance of machine learning tasks. Methods: In this work, we decompose the observed RGB image in its hematoxylin and eosin components. We apply Bayesian modeling and inference based on the use of Super Gaussian sparse priors for each stain together with prior closeness to a given reference color-vector matrix. The hematoxylin and eosin components are then used for image normalization and classification of histological images. The proposed framework is tested on stain separation, image normalization, and cancer classification problems. The results are measured using the peak signal to noise ratio, normalized median intensity and the area under ROC curve on five different databases. Results: The obtained results show the superiority of our approach to current state-of-the-art blind color deconvolution techniques. In particular, the fidelity to the tissue improves 1,27 dB in mean PSNR. The normalized median intensity shows a good normalization quality of the proposed approach on the tested datasets. Finally, in cancer classification experiments the area under the ROC curve improves from 0.9491 to 0.9656 and from 0.9279 to 0.9541 on Camelyon-16 and Camelyon-17, respectively, when the original and processed images are used. Furthermore, these figures of merits are better than those obtained by the methods compared with. Conclusions: The proposed framework for blind color deconvolution, normalization and classification of images guarantees fidelity to the tissue structure and can be used both for normalization and classification. In addition, color deconvolution enables the use of the optical density space for classification, which improves the classification performance.Agencia Es-tatal de Investigacion PID2019-105142RB-C22/AEI/10.13039/50110 0 011033Junta de Andalucia PY20_00286Ministerio de Economia, Industria y Competitividad under FPI BES-2017-081584Universidad de Granada/CBU

    Experimental and computational vibration analysis for diagnosing the defects in high performance composite structures using machine learning approach

    Get PDF
    Delamination in laminated structures is a concern in high-performance structural applications, which challenges the latest non-destructive testing techniques. This study assesses the delamination damage in the glass fiber-reinforced laminated composite structures using structural health monitoring techniques. Glass fiber-reinforced rectangular laminate composite plates with and without delamination were considered to obtain the forced vibration response using an in-house developed finite element model. The damage was diagnosed in the laminated composite using machine learning algorithms through statistical information extracted from the forced vibration response. Using an attribute evaluator, the features that made the greatest contribution were identified from the extracted features. The selected features were further classified using machine learning algorithms, such as decision tree, random forest, naive Bayes, and Bayes net algorithms, to diagnose the damage in the laminated structure. The decision tree method was found to be a computationally effective model in diagnosing the delamination of the composite structure. The effectiveness of the finite element model was further validated with the experimental results, obtained from modal analysis using fabricated laminated and delaminated composite plates. Our proposed model showed 98.5% accuracy in diagnosing the damage in the fabricated composite structure. Hence, this research work motivates the development of online prognostic and health monitoring modules for detecting early damage to prevent catastrophic failures of structures

    Application of Big Data Technology, Text Classification, and Azure Machine Learning for Financial Risk Management Using Data Science Methodology

    Get PDF
    Data science plays a crucial role in enabling organizations to optimize data-driven opportunities within financial risk management. It involves identifying, assessing, and mitigating risks, ultimately safeguarding investments, reducing uncertainty, ensuring regulatory compliance, enhancing decision-making, and fostering long-term sustainability. This thesis explores three facets of Data Science projects: enhancing customer understanding, fraud prevention, and predictive analysis, with the goal of improving existing tools and enabling more informed decision-making. The first project examined leveraged big data technologies, such as Hadoop and Spark, to enhance financial risk management by accurately predicting loan defaulters and their repayment likelihood. In the second project, we investigated risk assessment and fraud prevention within the financial sector, where Natural Language Processing and machine learning techniques were applied to classify emails into categories like spam, ham, and phishing. After training various models, their performance was rigorously evaluated. In the third project, we explored the utilization of Azure machine learning to identify loan defaulters, emphasizing the comparison of different machine learning algorithms for predictive analysis. The results aimed to determine the best-performing model by evaluating various performance metrics for the dataset. This study is important because it offers a strategy for enhancing risk management, preventing fraud, and encouraging innovation in the financial industry, ultimately resulting in better financial outcomes and enhanced customer protection

    Predicting the Most Tractable Protein Surfaces in the Human Proteome for Developing New Therapeutics

    Get PDF
    A critical step in the target identification phase of drug discovery is evaluating druggability, i.e., whether a protein can be targeted with high affinity using drug-like ligands. The overarching goal of my PhD thesis is to build a machine learning model that predicts the binding affinity that can be attained when addressing a given protein surface. I begin by examining the lead optimization phase of drug development, where I find that in a test set of 297 examples, 41 of these (14%) change binding mode when a ligand is elaborated. My analysis shows that while certain ligand physiochemical properties predispose changes in binding mode, particularly those properties that define fragments, simple structure-based modeling proves far more effective for identifying substitutions that alter the binding mode. My proposed measure of RMAC (rmsd after minimization of the aligned complex) can help determine whether a given ligand can be reliably elaborated without changing binding mode, thus enabling straightforward interpretation of the resulting structure-activity relationships. Moving forward, I next noted that a very popular machine learning algorithm for regression tasks, random forest, has a systematic bias in the predictions it generates; this bias is present in both real-world datasets and synthetic datasets. To address this, I define a numerical transformation that can be applied to the output of random forest models. This transformation fully removes the bias in the resulting predictions, and yields improved predictions across all datasets. Finally, taking advantage of this improved machine learning approach, I describe a model that predicts the “attainable binding affinity” for a given binding pocket on a protein surface. This model uses 13 physiochemical and structural features calculated from the protein structure, without any information about the ligand. While details of the ligand must (of course) contribute somewhat to the binding affinity, I find that this model still recapitulates the binding affinity for 848 different protein-ligand complexes (across 230 different proteins) with correlation coefficient 0.57. I further find that this model is not limited to “traditional” drug targets, but rather that it works just as well for emerging “non-traditional” drug targets such as inhibitors of protein-protein interactions. Collectively, I anticipate that the tools and insights generated in the course of my PhD research will play an important role in facilitating the key target selection phase of drug discovery projects
    corecore