17 research outputs found

    Comparison of Sampling Methods for Predicting Wine Quality Based on Physicochemical Properties

    Get PDF
    Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem

    Classification of Breast Cancer Histopathological Images Using Semi-Supervised GANs

    Get PDF
    Breast cancer is diagnosed more frequently than skin cancer in women in the United States. Most breast cancer cases are diagnosed in women, while children and men are less likely to develop the disease. Various tissues in the breast grow uncontrollably, resulting in breast cancer. Different treatments analyze microscopic histopathology images for diagnosis that help accurately detect cancer cells. Deep learning is one of the evolving techniques to classify images where accuracy depends on the volume and quality of labeled images. This study used various pre-trained models to train the histopathological images and analyze these models to create a new CNN. Deep neural networks are trained in a generative adversarial fashion in a semi-supervised environment by extracting low-level features that improve classification accuracy. This paper proposes an eloquent approach to classifying histopathological images accurately using Semi-Supervised GANs with a classification accuracy greater than 93%

    Classification of Pixel Tracks to Improve Track Reconstruction from Proton-Proton Collisions

    Get PDF
    In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that hinders analysts from seeing true track reconstruction. This paper aims to find and optimize methods that are instrumental in figuring out how the true particle track can be decoupled from the background noise produced at the pixel tracker level of the detector. The results of this study include successful implementation of machine learning techniques to classify signal and background from particle collision data. From these results, it was concluded that neural networks are a successful resource for analyzing and processing particle collision data to reconstruct particle pathways

    Profiting from Dow Jones Industrial Index and Hang Seng Index using moving average and MACD optimization model

    Get PDF
    Before the internet, high-speed laptop computers, and big data became accessible and popular, academia on stock market trading concentrated on Efficient Market Hypothesis (EMH). EMH hinges on the idea that the market is efficient and there is no extra return that could be generated. With the dynamic development of the internet, big-data and computing technology, many researchers started to pay attention to Technical Analysis and its usage. Numerous academic papers claimed that technical analysis can enhance returns by using various technical tools. This paper explores in-depth the simulation model of Moving Average and Moving Average Convergence/Divergence (MACD) to come up with optimized parameters that will allow traders to profit from trading Dow Jones Industrial Index and Hang Seng Index

    Visualization and Machine Learning Techniques for NASA’s EM-1 Big Data Problem

    Get PDF
    In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory creation from hours/days to minutes/seconds with an overall accuracy of 98%. Finally, we create an interactive, calendar-based Tableau visualization for EM-1 that summarizes trajectory data and considers multiple constraints on mission availability. The use of Tableau allows for sharing of visualization dashboards and would eventually be automatically updated upon generation of a new set of trajectory data. Therefore, we conclude that cloud technologies, machine learning, and big data visualization will benefit NASA’s engineering team. Successful implementation will further ensure mission success for the Exploration Program with a team of 20 people accomplishing what Apollo did with a team of 1000

    Machine Learning Approach to Stability Analysis of Semiconductor Memory Element

    Get PDF
    Memory stability analysis traditionally relied heavily on circuit simulation-based approaches that run Monte Carlo (MC) analysis over various manufacturing and use condition parameters. This paper researches application of Machine Learning approaches for memory element failure analysis which could mimic simulation-like accuracy and minimize the need for engineers to rely heavily on simulators for their validations. Both regressor and classifier algorithms are benchmarked for accuracy and recall scores. A high recall score implies fewer escapes of fails to field and is the metric of choice for comparing algorithm. The paper identifies that recall score in excess of 0.97 can be achieved through stack ensemble and logistic regression-based approaches. The high recall score suggests machine learning based approaches can be used for memory failure rate assessments

    Multi-Class Emotion Classification with XGBoost Model Using Wearable EEG Headband Data

    No full text
    Electroencephalography (EEG) or brainwave signals serve as a valuable source for discerning human activities, thoughts, and emotions. This study explores the efficacy of EXtreme Gradient Boosting (XGBoost) models in sentiment classification using EEG signals, specifically those captured by the MUSE EEG headband. The MUSE device, equipped with four EEG electrodes (TP9, AF7, AF8, TP10), offers a cost-effective alternative to traditional EEG setups, which often utilize over 60 channels in laboratory-grade settings. Leveraging a dataset from previous MUSE research (Bird, J. et al., 2019), emotional states (positive, neutral, and negative) were observed in a male and a female participant, each for 3 minutes per state while watching movie scenes designed to stimulate emotions. The dataset comprises 2548 features extracted statistically from each sliding time window (mean, median, standard deviation, etc.). Employing XGBoost, a subset of the top 100 features is selected from the original 2548, achieving an exceptional accuracy of 99.1%. This research aims to make significant contributions to accurately classify human emotion while advancing EEG-based sentiment classification for future real-time emotion prediction applications

    A Symbolic Approach to Nonlinear Time Series Analysis

    No full text
    Current nonlinear time series methods such as neural networks forecast well. However, they act as a black box and are difficult to interpret, leaving the researchers and the audience with little insight into why the forecasts are the way they are. There is a need for a method that forecasts accurately while also being easy to interpret. This paper aims to develop a method to build an interpretable model for univariate and multivariate nonlinear time series data using wavelets and symbolic regression. The final method relies on multilayer perceptron (MLP) neural networks as a form of dimensionality reduction and the PySR algorithm to determine the symbolic relationships. It also explores use cases for using the discrete wavelet transformation to extract information from the dataset

    Classification of Pixel Tracks to Improve Track Reconstruction from Proton-Proton Collisions

    Get PDF
    In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that hinders analysts from seeing true track reconstruction. This paper aims to find and optimize methods that are instrumental in figuring out how the true particle track can be decoupled from the background noise produced at the pixel tracker level of the detector. The results of this study include successful implementation of machine learning techniques to classify signal and background from particle collision data. From these results, it was concluded that neural networks are a successful resource for analyzing and processing particle collision data to reconstruct particle pathways

    Baseball Decision-Making: Optimizing At-bat Simulations

    No full text
    Pitch selection in baseball plays a crucial role, involving pitchers, catchers, and batters working together. This practice, dating back to early baseball, has seen teams try various methods to gain an advantage. This research aims to use reinforcement learning and pitch-by-pitch Statcast data to improve batting strategies. It also builds on previous statistical work (sabermetrics) to make better choices in pitch selection and plate discipline. The dataset used, including over 700,000 pitches for each full season and 200,000 pitches for the COVID-shortened 2020 season, encompasses a wealth of crucial metrics including pitch release point, velocity, and launch angle. This study dives deep into player interactions and pitch behavior, seeking to find new ideas that could change how teams approach their offensive tactics. By analyzing player performance and applying advanced stats, this research hopes to uncover hidden patterns. To ensure accuracy in pitch type classification, a critical aspect of our analysis, we reclassified pitch types. By incorporating 15 distinct variables, ranging from release point coordinates to spin rates, we enhanced the granularity of pitch type identification. These variables were normalized and subjected to UMAP dimensionality reduction, resulting in the creation of 2D vector embeddings for each pitch. This methodology not only refines pitch classification but also unlocks a deeper understanding of player interactions and pitch behavior
    corecore