The COVID-19 pandemic led to an infodemic where an overwhelming amount of
COVID-19 related content was being disseminated at high velocity through social
media. This made it challenging for citizens to differentiate between accurate
and inaccurate information about COVID-19. This motivated us to carry out a
comparative study of the characteristics of COVID-19 misinformation versus
those of accurate COVID-19 information through a large-scale computational
analysis of over 242 million tweets. The study makes comparisons alongside four
key aspects: 1) the distribution of topics, 2) the live status of tweets, 3)
language analysis and 4) the spreading power over time. An added contribution
of this study is the creation of a COVID-19 misinformation classification
dataset. Finally, we demonstrate that this new dataset helps improve
misinformation classification by more than 9% based on average F1 measure