Hope is characterized as openness of spirit toward the future, a desire,
expectation, and wish for something to happen or to be true that remarkably
affects human's state of mind, emotions, behaviors, and decisions. Hope is
usually associated with concepts of desired expectations and
possibility/probability concerning the future. Despite its importance, hope has
rarely been studied as a social media analysis task. This paper presents a hope
speech dataset that classifies each tweet first into "Hope" and "Not Hope",
then into three fine-grained hope categories: "Generalized Hope", "Realistic
Hope", and "Unrealistic Hope" (along with "Not Hope"). English tweets in the
first half of 2022 were collected to build this dataset. Furthermore, we
describe our annotation process and guidelines in detail and discuss the
challenges of classifying hope and the limitations of the existing hope speech
detection corpora. In addition, we reported several baselines based on
different learning approaches, such as traditional machine learning, deep
learning, and transformers, to benchmark our dataset. We evaluated our
baselines using weighted-averaged and macro-averaged F1-scores. Observations
show that a strict process for annotator selection and detailed annotation
guidelines enhanced the dataset's quality. This strict annotation process
resulted in promising performance for simple machine learning classifiers with
only bi-grams; however, binary and multiclass hope speech detection results
reveal that contextual embedding models have higher performance in this
dataset.Comment: 20 pages, 9 figure