GitData Archive
Not a member yet
18 research outputs found
Sort by
Predictive Modeling of H5N1 Bird Flu in United States of America: A 2022-2023 Analysis
This research uniquely focuses on predicting the likelihood of H5N1 outbreaks in the United States at the county level. Unlike previous studies, which either excluded the United States or used outdated data, we utilized diverse statistical techniques and publicly available H5N1-related data from January 2022 to March 2023. Employing logistic regression, regularization methods, cross-validation, and eXtreme Gradient Boosting (XGBoost), our models demonstrated remarkable predictive efficacy. Notably, the XGBoost model, trained with 10-fold cross-validation, outperformed others in terms of ROC-AUC. This research provides valuable epidemiological insights, proposes intervention strategies for H5N1 in the United States, and suggests future research directions
NYC Newborn Analysis 2011-2019
The population of baby names has names come and go, and some have evolved. We wanted to see if we could predict sex based on ethnicity and name characteristics to see how these influences shift a name’s sex. The dataset we used was publicly available data from the city of New York containing the top 75 most popular names for each sex and ethnic category for each year from 2011-2019. Through our analysis on the dataset in regard to sex, we saw that there was a high correlation between name endings with a vowel and sex, where female names were 10 times more likely to end with a vowel. Other factors, such as ethnicity, the amount of syllables in the name, the length of the name, and whether the name started with a vowel had statistically significant but small correlation with sex
Predictive Modeling of H5N1 Bird Flu in United States of America: A 2022-2023 Analysis
This research uniquely focuses on predicting the likelihood of H5N1 outbreaks in the United States at the county level. Unlike previous studies, which either excluded the United States or used outdated data, we utilized diverse statistical techniques and publicly available H5N1-related data from January 2022 to March 2023. Employing logistic regression, regularization methods, cross-validation, and eXtreme Gradient Boosting (XGBoost), our models demonstrated remarkable predictive efficacy. Notably, the XGBoost model, trained with 10-fold cross-validation, outperformed others in terms of ROC-AUC. This research provides valuable epidemiological insights, proposes intervention strategies for H5N1 in the United States, and suggests future research directions
2022 - 2023 H5N1 Bird Flu Modeling and Prediction in the United States
This report presents an analysis of the likelihood of H5N1 outbreaks in different counties of the United States in January 2023 using logistic regression, ridge regression, and lasso regression models. The models were trained using historical data from 2022, and the accuracy of the models in predicting H5N1 outbreaks in January 2023 is about 98.4%. The lasso regression model performed the best among the three models, with an AUC of 0.8015. The map generated based on the lasso regression model indicated that counties in the north and west were at a higher risk of having H5N1 outbreaks in January 2023, which matched the actual result. The report concludes that there are limitations to the models, including the consideration of only a limited set of factors affecting the spread of the virus and the use of historical data. Future work could incorporate additional data sources and use more sophisticated machine learning techniques to improve the accuracy of the models. The report also proposes some possible remedies to help control the spread of H5N1
Analysis of Shot Marilyns by Andy Warhol
This paper explores the differences and commons between five different versions of Shot Marilyns done by Andy Warhol. We conducted a comprehensive analysis of color composition and distribution in a set of art pieces. Using various methods such as relative conditional entropy, we explored the distinct color distributions and correlations within each image. By clustering the images and examining specific regions of interest (ROIs), including backgrounds, hair, eyeshadow, and face, we gained detailed insights into the construction and differences of each image. We also observed that initial expectations of uniform colors for certain elements were not met, highlighting the complexity and deceptive nature of color perception in art
2022 - 2023 H5N1 Bird Flu Modeling and Prediction in the United States
This report presents an analysis of the likelihood of H5N1 outbreaks in different counties of the United States in March 2023 using logistic regression, ridge regression, lasso regression, ridge & lasso regression models. The models were trained using historical data from January 2022 to March 2023, and the accuracy of the models in predicting H5N1 outbreaks in March 2023 is about 99.348%. The ridge & lasso regression model performed the best among the four models, with an AUC of 0.7959950. The map generated based on the ridge & lasso regression model indicated that counties in the north and west were at a higher risk of having H5N1 outbreaks in March 2023, which matched the actual result. The report concludes that there are limitations to the models, including the consideration of only a limited set of factors affecting the spread of the virus and the use of historical data. Future work could incorporate additional data sources and use more sophisticated machine learning techniques to improve the accuracy of the models. The report also proposes some possible remedies to help control the spread of H5N1
California 2022 Proposition 30 Feasibility Report and Recommendations
Proposition 30 is a plan in California to increase taxes on high-income earners in order to fund programs that promote the use of zero-emission vehicles and prevent wildfires. The tax is expected to start in 2023 and end by 2043, or earlier if the state is able to maintain its greenhouse gas emissions at expected levels for three consecutive years. Proposition 30 plans to use 80% of the tax revenue to fund the Zero Emission Vehicle program, with 45% dedicated to helping people buy electric vehicles and 35% to installing charging stations. The remaining 20% will be used for Wildfire Response and Prevention Programs. The goal is to help California reduce its greenhouse gas emissions and meet its target of reducing emissions to 80% below 1990 levels by 2050. The authors conducted a study that evaluates whether the proposition can help California achieve its goal
Predictive Modeling of Blood Pressure Categories: Integrating Demographic and Dietary Factors for Personalized Management
oai:archive.gd.edu.kg:20231224040449/v1This study delves into predictive modeling of blood pressure categories, focusing on the United States, addressing the global health concern of hypertension. Mainly utilizing demographic and dietary data from the CDC National Health and Nutrition Examination Survey (NHANES) 2017-2018, aims to craft personalized management strategies. Drawing on research emphasizing the multifaceted determinants of hypertension, we leverage the multinomial regression model with lasso regularization as a baseline. Furthermore, the study advances to the extreme gradient boosting (XGB) algorithm, achieving a slightly better performance than multinomial regression. Evaluation metrics include accuracy and Area Under the Curve (AUC) in a 10-fold cross-validation framework. The study provides possible personal blood pressure management solution