1,111 research outputs found

    A Supervised ML Applied Classification Model for Brain Tumors MRI.

    Full text link
    Brain Tumor originates from abnormal cells, which is developed uncontrollably. Magnetic resonance imaging (MRI) is developed to generate high-quality images and provide extensive medical research information. The machine learning algorithms can improve the diagnostic value of MRI to obtain automation and accurate classification of MRI. In this research, we propose a supervised machine learning applied training and testing model to classify and analyze the features of brain tumors MRI in the performance of accuracy, precision, sensitivity and F1 score. The result presents that more than 95% accuracy is obtained in this model. It can be used to classify features more accurate than other existing methods

    Imputation Techniques in Machine Learning – A Survey

    Get PDF
    Machine learning plays a pivotal role in data analysis and information extraction. However, one common challenge encountered in this process is dealing with missing values. Missing data can find its way into datasets for a variety of reasons. It can result from errors during data collection and management, intentional omissions, or even human errors. It's important to note that most machine learning models are not designed to handle missing values directly. Consequently, it becomes essential to perform data imputation before feeding the data into a machine learning model. Multiple techniques are available for imputing missing values, and the choice of technique should be made judiciously, considering various parameters. An inappropriate choice can disrupt the overall distribution of data values and subsequently impact the model's performance. In this paper, various imputation methods, including Mean, Median, K-nearest neighbors (KNN)-based imputation, Linear Regression, Miss Forest, and MICE are examined

    STANDARD REGRESSION VERSUS MULTILEVEL MODELING OF MULTISTAGE COMPLEX SURVEY DATA

    Get PDF
    Complex surveys based on multistage design are commonly used to collect large population data. Stratification, clustering and unequal probability of the selection of individuals are the complexities of complex survey design. Statistical techniques such as the multilevel modeling – scaled weights technique and the standard regression – robust variance estimation technique are used to analyze the complex survey data. Both statistical techniques take into account the complexities of complex survey data but the ways are different. This thesis compares the performance of the multilevel modeling – scaled weights and the standard regression – robust variance estimation technique based on analysis of the cross-sectional and the longitudinal complex survey data. Performance of these two techniques was examined by Monte Carlo simulation based on cross-sectional complex survey design. A stratified, multistage probability sample design was used to select samples for the cross-sectional Canadian Heart Health Surveys (CHHS) conducted in ten Canadian provinces and for the longitudinal National Population Health Survey (NPHS). Both statistical techniques (the multilevel modeling – scaled weights and the standard regression – robust variance estimation technique) were utilized to analyze CHHS and NPHS data sets. The outcome of interest was based on the question “Do you have any of the following long-term conditions that have been diagnosed by a health professional? – Diabetes”. For the cross-sectional CHHS, the results obtained from the proposed two statistical techniques were not consistent. However, the results based on analysis of the longitudinal NPHS data indicated that the performance of the standard regression – robust variance estimation technique might be better than the multilevel modeling – scaled weight technique for analyzing longitudinal complex survey data. Finally, in order to arrive at a definitive conclusion, a Monte Carlo simulation was used to compare the performance of the multilevel modeling – scaled weights and the standard regression – robust variance estimation techniques . In the Monte Carlo simulation study, the data were generated randomly based on the Canadian Heart Health Survey data for Saskatchewan province. The total 100 and 1000 number of simulated data sets were generated and the sample size for each simulated data set was 1,731. The results of this Monte Carlo simulation study indicated that the performance of the multilevel modeling – scaled weights technique and the standard regression – robust variance estimation technique were comparable to analyze the cross-sectional complex survey data. To conclude, both statistical techniques yield similar results when used to analyze the cross-sectional complex survey data, however standard regression-robust variance estimation technique might be preferred because it fully accounts for stratification, clustering and unequal probability of selection

    Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions

    Full text link
    Food security, a global concern, necessitates precise and diverse data-driven solutions to address its multifaceted challenges. This paper explores the integration of AI foundation models across various food security applications, leveraging distinct data types, to overcome the limitations of current deep and machine learning methods. Specifically, we investigate their utilization in crop type mapping, cropland mapping, field delineation and crop yield prediction. By capitalizing on multispectral imagery, meteorological data, soil properties, historical records, and high-resolution satellite imagery, AI foundation models offer a versatile approach. The study demonstrates that AI foundation models enhance food security initiatives by providing accurate predictions, improving resource allocation, and supporting informed decision-making. These models serve as a transformative force in addressing global food security limitations, marking a significant leap toward a sustainable and secure food future

    Will they take this offer? A machine learning price elasticity model for predicting upselling acceptance of premium airline seating

    Get PDF
    Employing customer information from one of the world's largest airline companies, we develop a price elasticity model (PREM) using machine learning to identify customers likely to purchase an upgrade offer from economy to premium class and predict a customer's acceptable price range. A simulation of 64.3 million flight bookings and 14.1 million email offers over three years mirroring actual data indicates that PREM implementation results in approximately 1.12 million (7.94%) fewer non-relevant customer email messages, a predicted increase of 72,200 (37.2%) offers accepted, and an estimated $72.2 million (37.2%) of increased revenue. Our results illustrate the potential of automated pricing information and targeting marketing messages for upselling acceptance. We also identified three customer segments: (1) Never Upgrades are those who never take the upgrade offer, (2) Upgrade Lovers are those who generally upgrade, and (3) Upgrade Lover Lookalikes have no historical record but fit the profile of those that tend to upgrade. We discuss the implications for airline companies and related travel and tourism industries.© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

    Use of hydroclimatic forecasts for improved water management in central Texas

    Get PDF
    Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk

    Seleção de variáveis aplicada ao controle estatístico multivariado de processos em bateladas

    Get PDF
    A presente tese apresenta proposições para o uso da seleção de variáveis no aprimoramento do controle estatístico de processos multivariados (MSPC) em bateladas, a fim de contribuir com a melhoria da qualidade de processos industriais. Dessa forma, os objetivos desta tese são: (i) identificar as limitações encontradas pelos métodos MSPC no monitoramento de processos industriais; (ii) entender como métodos de seleção de variáveis são integrados para promover a melhoria do monitoramento de processos de elevada dimensionalidade; (iii) discutir sobre métodos para alinhamento e sincronização de bateladas aplicados a processos com diferentes durações; (iv) definir o método de alinhamento e sincronização mais adequado para o tratamento de dados de bateladas, visando aprimorar a construção do modelo de monitoramento na Fase I do controle estatístico de processo; (v) propor a seleção de variáveis, com propósito de classificação, prévia à construção das cartas de controle multivariadas (CCM) baseadas na análise de componentes principais (PCA) para monitorar um processo em bateladas; e (vi) validar o desempenho de detecção de falhas da carta de controle multivariada proposta em comparação às cartas tradicionais e baseadas em PCA. O desempenho do método proposto foi avaliado mediante aplicação em um estudo de caso com dados reais de um processo industrial alimentício. Os resultados obtidos demonstraram que a realização de uma seleção de variáveis prévia à construção das CCM contribuiu para reduzir eficientemente o número de variáveis a serem analisadas e superar as limitações encontradas na detecção de falhas quando bancos de elevada dimensionalidade são monitorados. Conclui-se que, ao possibilitar que CCM, amplamente utilizadas no meio industrial, sejam adequadas para banco de dados reais de elevada dimensionalidade, o método proposto agrega inovação à área de monitoramento de processos em bateladas e contribui para a geração de produtos de elevado padrão de qualidade.This dissertation presents propositions for the use of variable selection in the improvement of multivariate statistical process control (MSPC) of batch processes, in order to contribute to the enhacement of industrial processes’ quality. There are six objectives: (i) identify MSPC limitations in industrial processes monitoring; (ii) understand how methods of variable selection are used to improve high dimensional processes monitoring; (iii) discuss about methods for alignment and synchronization of batches with different durations; (iv) define the most adequate alignment and synchronization method for batch data treatment, aiming to improve Phase I of process monitoring; (v) propose variable selection for classification prior to establishing multivariate control charts (MCC) based on principal component analysis (PCA) to monitor a batch process; and (vi) validate fault detection performance of the proposed MCC in comparison with traditional PCA-based and charts. The performance of the proposed method was evaluated in a case study using real data from an industrial food process. Results showed that performing variable selection prior to establishing MCC contributed to efficiently reduce the number of variables and overcome limitations found in fault detection when high dimensional datasets are monitored. We conclude that by improving control charts widely used in industry to accomodate high dimensional datasets the proposed method adds innovation to the area of batch process monitoring and contributes to the generation of high quality standard products
    corecore