47 research outputs found

    A fairness assessment of mobility-based COVID-19 case prediction models

    Get PDF
    In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, urban, and non-black-dominated counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, non-white, and less educated regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these regions. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.Comment: 24 pages, 4 figures, 2 Table

    Behavior-Profile Clustering for False Alert Reduction in Anomaly Detection Sensors

    Get PDF
    Anomaly detection (AD) sensors compute behavior profiles to recognize malicious or anomalous activities. The behavior of a host is checked continuously by the AD sensor and an alert is raised when the behavior deviates from its behavior profile. Unfortunately, the majority of AD sensors suffer from high volumes of false alerts either maliciously crafted by the host or originating from insufficient training of the sensor. We present a cluster-based AD sensor that relies on clusters of behavior profiles to identify anomalous behavior. The behavior of a host raises an alert only when a group of host profiles with similar behavior (cluster of behavior profiles) detect the anomaly, rather than just relying on the host's own behavior profile to raise the alert (single-profile AD sensor). A cluster-based AD sensor significantly decreases the volume of false alerts by providing a more robust model of normal behavior based on clusters of behavior profiles. Additionally, we introduce an architecture designed for the deployment of cluster-based AD sensors. The behavior profile of each network host is computed by its closest switch that is also responsible for performing the anomaly detection for each of the hosts in its subnet. By placing the AD sensors at the switch, we eliminate the possibility of hosts crafting malicious alerts. Our experimental results based on wireless behavior profiles from users in the CRAWDAD dataset show that the volume of false alerts generated by cluster-based AD sensors is reduced by at least 50% compared to single-profile AD sensors

    Flooding through the lens of mobile phone activity

    Get PDF
    Natural disasters affect hundreds of millions of people worldwide every year. Emergency response efforts depend upon the availability of timely information, such as information concerning the movements of affected populations. The analysis of aggregated and anonymized Call Detail Records (CDR) captured from the mobile phone infrastructure provides new possibilities to characterize human behavior during critical events. In this work, we investigate the viability of using CDR data combined with other sources of information to characterize the floods that occurred in Tabasco, Mexico in 2009. An impact map has been reconstructed using Landsat-7 images to identify the floods. Within this frame, the underlying communication activity signals in the CDR data have been analyzed and compared against rainfall levels extracted from data of the NASA-TRMM project. The variations in the number of active phones connected to each cell tower reveal abnormal activity patterns in the most affected locations during and after the floods that could be used as signatures of the floods - both in terms of infrastructure impact assessment and population information awareness. The representativeness of the analysis has been assessed using census data and civil protection records. While a more extensive validation is required, these early results suggest high potential in using cell tower activity information to improve early warning and emergency management mechanisms.Comment: Submitted to IEEE Global Humanitarian Technologies Conference (GHTC) 201

    Clonal chromosomal mosaicism and loss of chromosome Y in elderly men increase vulnerability for SARS-CoV-2

    Full text link
    The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2, COVID-19) had an estimated overall case fatality ratio of 1.38% (pre-vaccination), being 53% higher in males and increasing exponentially with age. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, we found 133 cases (1.42%) with detectable clonal mosaicism for chromosome alterations (mCA) and 226 males (5.08%) with acquired loss of chromosome Y (LOY). Individuals with clonal mosaic events (mCA and/or LOY) showed a 54% increase in the risk of COVID-19 lethality. LOY is associated with transcriptomic biomarkers of immune dysfunction, pro-coagulation activity and cardiovascular risk. Interferon-induced genes involved in the initial immune response to SARS-CoV-2 are also down-regulated in LOY. Thus, mCA and LOY underlie at least part of the sex-biased severity and mortality of COVID-19 in aging patients. Given its potential therapeutic and prognostic relevance, evaluation of clonal mosaicism should be implemented as biomarker of COVID-19 severity in elderly people. Among 9578 individuals diagnosed with COVID-19 in the SCOURGE study, individuals with clonal mosaic events (clonal mosaicism for chromosome alterations and/or loss of chromosome Y) showed an increased risk of COVID-19 lethality

    Interdisciplinary Big Data Presentation

    No full text
    Presentation for the Interdisciplinary Big Data Workshop focused on data-driven decision-making processes

    Crowdsourcing Land Use Maps via Twitter

    No full text
    Individualsgeneratevastamountsofgeolocated contentthrough the use of mobile social media applications. In this context, Twitter has become an important sensor of the interactions between individuals and their environment. Buildingon this idea, this paper proposes the use of geolocated tweets as a complementary source of information for urbanplanningapplications, focusing on the characterization of land use. The proposed technique uses unsupervised learning and automatically determines land uses in urban areas by clustering geographical regions with similar tweeting activity patterns. Two case studies are presented and validated for London (UK) and Madrid (Spain) using Twitter activity and land use information provided by the city planning departments. Results indicate that geolocated tweets can be used as a powerful data source for urban planning applications

    Estimation of traffic flow using passive cell-phone data

    No full text
    In this paper we present preliminary results for estimating traffic flow using passive cell-phone network information. Two datasets are considered: (a) passive cell-phone data and (b) information provided by the English Highways Agency. Our proposed method identifies cell phone users that are traveling by car and using a linear regression model, estimates the flow for each of the links in which the road network is divided. Initial results indicate, that, under certain conditions, traffic flow can be effectively approximated with passive network data. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneou
    corecore