GitData Archive
Not a member yet
    16 research outputs found

    Time Series of Analysis Annual Temperature Anomalies (1850–2021) for the Northern Hemisphere

    No full text
    Today, climate change is one of the most substantial issues in the world. Therefore, annual temperature anomalies are the problem we are concerned about, since anomalies are how we see how the climate changes. Our data set is about the annual temperature anomalies (1850–2021) for the northern hemisphere [1]. A temperature anomaly means a deviation from a reference value or long-term average. The data set contains two columns which are Year and Temperature Anomalies. Based on our knowledge of time series, we want to predict the trend of Temperature Anomalies in the future. As we know, a sequence captured at successive, equally spaced points in time is referred to as a time series. Since Temperature Anomalies are taken over time, which is Years, it is a time series. Because of prior climatic history and its effects on humans, we believe that it is vital to analyze and forecast future anomalies. To better prepare and safeguard the environment, it is crucial to anticipate what these potential future anomalies might be. People can then prepare to mitigate it based on how severe it is

    Data Efficient Dense Cross-Lingual Information Retrieval

    No full text
    Cross-Lingual Information Retrieval (CIR) remains challenging due to limited annotated data and linguistic diversity, especially for low-resource languages. While dense retrieval models have significantly advanced retrieval performance, their reliance on large-scale training datasets hampers their effectiveness in multilingual settings. In this work, we propose two complementary strategies to improve data efficiency and robustness in CIR model fine- tuning. First, we introduce a paraphrase-based query augmentation pipeline leveraging large language models (LLMs) to enrich scarce training data, thereby promoting more robust and language-agnostic representations. Second, we present a weighted InfoNCE loss that emphasizes underrepresented languages, ensuring balanced optimization across heterogeneous linguistic inputs. Experiments on cross-lingual benchmark datasets demonstrate that our combined approaches yield substantial gains in retrieval quality, outperforming standard training protocols on small and imbalanced datasets. These results underscore the potential of targeted data augmentation and reweighted objectives to build more inclusive and effective CIR systems, even under resource constraints

    Student Future Academic Performance after Being Placed on Dean’s List

    No full text
    This study examines the causal impact of being placed on the Dean\u27s List, a positive education incentive, on future student performance using a regression discontinuity design. The results suggest that for students with low prior academic performance and who are native English speakers, there is a positive impact of being on the Dean\u27s List on the probability of getting onto the Dean\u27s List in the following year. However, being on the Dean\u27s List does not appear to have a statistically significant effect on subsequent GPA, total credits taken, dropout rates, or the probability of graduating within four years. These findings suggest that a place on the Dean\u27s List may not be a strong motivator for students to improve their academic performance and achieve better outcomes

    The Causal Impact of Dean’s List Recognition on Academic Performance: Evidence from a Regression Discontinuity Design

    No full text
    This study examines the causal impact of being placed on the Dean’s List, a positive education incentive, on future student performance using a regression discontinuity design. The results suggest that for students with low prior academic performance and who are native English speakers, there is a positive impact of being on the Dean’s List on the probability of getting onto the Dean’s List in the following year. However, being on the Dean’s List does not appear to have a statistically significant effect on subsequent GPA, total credits taken, dropout rates, or the probability of graduating within four years. These findings suggest that a place on the Dean’s List may not be a strong motivator for students to improve their academic performance and achieve better outcomes

    Curriculum Learning For Autonomous Vehicles

    No full text
    This study investigates how the sequence of training environments affects performance in simple driving tasks for an autonomous driving agent. By training agents solely through interaction with maps of varying difficulty, we demonstrate that transfer learning enhances performance within single-environment driving scenarios. However, we find that agents struggle to master advanced driving capabilities and fail to generalize well to new environments, regardless of the sequence of training data. We conclude by looking at areas to build on this work such by combining imitation learning with curriculum learning and developing curriculum-specific MDP

    NYC Newborn Analysis 2011-2019

    No full text
    The population of baby names has names come and go, and some have evolved. We wanted to see if we could predict sex based on ethnicity and name characteristics to see how these influences shift a name’s sex. The dataset we used was publicly available data from the city of New York containing the top 75 most popular names for each sex and ethnic category for each year from 2011-2019. Through our analysis on the dataset in regard to sex, we saw that there was a high correlation between name endings with a vowel and sex, where female names were 10 times more likely to end with a vowel. Other factors, such as ethnicity, the amount of syllables in the name, the length of the name, and whether the name started with a vowel had statistically significant but small correlation with sex

    Predictive Modeling of H5N1 Bird Flu in United States of America: A 2022-2023 Analysis

    No full text
    This research uniquely focuses on predicting the likelihood of H5N1 outbreaks in the United States at the county level. Unlike previous studies, which either excluded the United States or used outdated data, we utilized diverse statistical techniques and publicly available H5N1-related data from January 2022 to March 2023. Employing logistic regression, regularization methods, cross-validation, and eXtreme Gradient Boosting (XGBoost), our models demonstrated remarkable predictive efficacy. Notably, the XGBoost model, trained with 10-fold cross-validation, outperformed others in terms of ROC-AUC. This research provides valuable epidemiological insights, proposes intervention strategies for H5N1 in the United States, and suggests future research directions

    2022 - 2023 H5N1 Bird Flu Modeling and Prediction in the United States

    No full text
    This report presents an analysis of the likelihood of H5N1 outbreaks in different counties of the United States in January 2023 using logistic regression, ridge regression, and lasso regression models. The models were trained using historical data from 2022, and the accuracy of the models in predicting H5N1 outbreaks in January 2023 is about 98.4%. The lasso regression model performed the best among the three models, with an AUC of 0.8015. The map generated based on the lasso regression model indicated that counties in the north and west were at a higher risk of having H5N1 outbreaks in January 2023, which matched the actual result. The report concludes that there are limitations to the models, including the consideration of only a limited set of factors affecting the spread of the virus and the use of historical data. Future work could incorporate additional data sources and use more sophisticated machine learning techniques to improve the accuracy of the models. The report also proposes some possible remedies to help control the spread of H5N1

    Predictive Modeling of Blood Pressure Categories: Integrating Demographic and Dietary Factors for Personalized Management

    No full text
    This study delves into predictive modeling of blood pressure levels, focusing on the United States, addressing the global health concern of hypertension. Mainly utilizing demographic and dietary data from the Centers for Disease Control and Prevention (CDC) National Health and Nutrition Examination Survey (NHANES) 2017-2018, aims to craft personalized management strategies. Drawing on research emphasizing the multifaceted determinants of hypertension, we leverage the multinomial regression model with lasso regularization as a baseline. Furthermore, the study advances to the eXtreme Gradient Boosting (XGBoost) algorithm, achieving a better performance than multinomial regression. Evaluation metrics include accuracy and Area Under the ROC Curve (ROC-AUC) in a 10-fold cross validation framework. The study provides possible personal blood pressure management solution

    Analysis of Shot Marilyns by Andy Warhol

    No full text
    This paper explores the differences and commons between five different versions of Shot Marilyns done by Andy Warhol. We conducted a comprehensive analysis of color composition and distribution in a set of art pieces. Using various methods such as relative conditional entropy, we explored the distinct color distributions and correlations within each image. By clustering the images and examining specific regions of interest (ROIs), including backgrounds, hair, eyeshadow, and face, we gained detailed insights into the construction and differences of each image. We also observed that initial expectations of uniform colors for certain elements were not met, highlighting the complexity and deceptive nature of color perception in art

    0

    full texts

    16

    metadata records
    Updated in last 30 days.
    GitData Archive
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇