Search CORE

2,062 research outputs found

Non-Employment Activity Type Imputation from Points of Interest and Mobility Data at an Individual Level: How Accurate Can We Get?

Author: Bantis T
Haworth J
Publication venue: 'MDPI AG'
Publication date: 05/12/2019
Field of study

Human activity type inference has long been the focus for applications ranging from managing transportation demand to monitoring changes in land use patterns. Today’s ever increasing volume of mobility data allow researchers to explore a wide range of methodological approaches for this task. Such data, however, lack reference observations that would allow the validation of methodological approaches. This research proposes a methodological framework for urban activity type inference using a Dirichlet multinomial dynamic Bayesian network with an empirical Bayes prior that can be applied to mobility data of low spatiotemporal resolution. The method was validated using open source Foursquare data under different isochrone configurations. The results provide evidence of the limits of activity detection accuracy using such data as determined by the Area Under Receiving Operating Curve (AUROC), log-loss, and accuracy metrics. At the same time, results demonstrate that a hierarchical modeling framework can provide some flexibility against the challenges related to the nature of unsupervised activity classification using trajectory variables and POIs as input

UCL Discovery

One-step Estimation of Networked Population Size: Respondent-Driven Capture-Recapture with Anonymity

Author: Dombrowski Kirk
Fellows Ian
Khan Bilal
Lee Hsuan-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/10/2017
Field of study

Population size estimates for hidden and hard-to-reach populations are particularly important when members are known to suffer from disproportion health issues or to pose health risks to the larger ambient population in which they are embedded. Efforts to derive size estimates are often frustrated by a range of factors that preclude conventional survey strategies, including social stigma associated with group membership or members' involvement in illegal activities. This paper extends prior research on the problem of network population size estimation, building on established survey/sampling methodologies commonly used with hard-to-reach groups. Three novel one-step, network-based population size estimators are presented, to be used in the context of uniform random sampling, respondent-driven sampling, and when networks exhibit significant clustering effects. Provably sufficient conditions for the consistency of these estimators (in large configuration networks) are given. Simulation experiments across a wide range of synthetic network topologies validate the performance of the estimators, which are seen to perform well on a real-world location-based social networking data set with significant clustering. Finally, the proposed schemes are extended to allow them to be used in settings where participant anonymity is required. Systematic experiments show favorable tradeoffs between anonymity guarantees and estimator performance. Taken together, we demonstrate that reasonable population estimates can be derived from anonymous respondent driven samples of 250-750 individuals, within ambient populations of 5,000-40,000. The method thus represents a novel and cost-effective means for health planners and those agencies concerned with health and disease surveillance to estimate the size of hidden populations. Limitations and future work are discussed in the concluding section

arXiv.org e-Print Archive

DigitalCommons@University of Nebraska

Directory of Open Access Journals

Sampling methods to reach hard populations: appraising and comparing different statistical methods with an application to an HIV prevalence study

Author: TEIXEIRA Ana Belinda de Barros
Publication venue
Publication date: 01/01/2021
Field of study

Os dados de saúde, e especialmente os relacionados com o VIH, são menos robustos para populações de difícil acesso, também chamadas populações-chave ou populações de maior risco, do que para a população em geral e tendem a ser subestimados devido principalmente a problemas relacionados com estigma, discriminação e complexidade na amostragem dos elementos da população. O estigma e a discriminação desencorajam as populações-chave e, especialmente, os indivíduos VIH positivos a frequentar os serviços de saúde, a participarem em inquéritos e a revelarem os seus comportamentos de risco. As complexidades da amostragem ocorrem porque nos países onde essas populações são discriminadas, elas tendem a permanecer ocultas e, consequentemente, é mais difícil identificá-las e recolher informação. Ou seja, não é possível garantir amostras representativas dos elementos da população. O uso de estratégias de amostragem não probabilística tem sido o mais utilizado para rastrear as populações-chave mas os resultados obtidos correm o risco de serem enviesados, o que significa que, por exemplo, as estimativas de prevalência de VIH obtidas para essas populações podem não ser precisas. Nos últimos anos, no entanto, vários métodos não probabilísticos foram utilizados. Extensões desses métodos foram desenvolvidas para evitar que os elementos amostrados sejam escolhidos de maneira casual, o que deu origem aos métodos semi-probabilísticos. Focando num dos métodos semi-probabilísticos mais utilizados, o método de amostragem local-tempo (TLS), desenvolvemos uma abordagem para melhorar a precisão das estimativas de prevalência do VIH. O novo método, chamado CARES, que significa calibração pelos resíduos, consiste em imputar pesos aos entrevistados, considerando o percentil ao qual seus resíduos de regressão logística pertencem. Usando duas bases de dados de HSH de Portugal e Espanha, começámos por as ajustar o mais próximo possível de universos TLS e dos quais várias amostras TLS foram simuladas. Para cada amostra simulada, foi registada a prevalência do VIH, não ponderada e ponderada pelos pesos amostrais. Um modelo de regressão logística foi aplicado e os resíduos foram registados. O método CARES foi aplicado e a prevalência do VIH foi novamente calculada. A prevalência estimada de VIH obtida pelo método CARES foi comparada com as estimativas de prevalência calculadas usando apenas o método de amostragem TLS. Os resultados mostraram que o método CARES melhora as estimativas de prevalência do VIH obtidas quando o método de amostragem local-tempo é usado para recrutar os entrevistados. Este método é uma nova abordagem que visa fornecer melhores estimativas de prevalência do VIH e pode ser muito útil sempre que técnicas de amostragem mais confiáveis não possam ser aplicadas.Health data and especially HIV related data is less robust for hard-to-reach populations, also called key-populations or most-at-risk populations, than for general population and tend to be underestimated mainly due to issues related to stigma, discrimination and complexities in sampling the population elements. Stigma and discrimination discourages key-populations and specially HIV positive individuals to frequent health care facilities, take part in surveys and reveal their risk behaviours. Complexities in sampling happen because in countries where these populations are discriminated they tend to remain hidden and consequently it is more difficult to identify them and collect data. Therefore, it is not possible to assure representative samplings of the population elements. The use of non-probability sampling strategies have been the most used to screen key-populations but results obtained have the risk of being biased, this means that, for instance, HIV prevalence estimates obtained for these populations might not be accurate. In the last years however, several non-probabilistic methods have been used. Extensions of those methods have been developed in order to avoid that the sampled elements are chosen in a casual way, which gave rise to the semi-probabilistic methods. Focusing on one of the most used semi-probabilistic methods, the time-location sampling method (TLS), we developed an approach to improve accuracy of HIV prevalence estimates. The new method, called CARES, which means Calibration on Residuals, consists in imputing weights to respondents, considering the percentile to which their logistic regression residues belong. Using two MSM databases from Portugal and Spain, we began by adjusting the databases as close as possible to a TLS universe and from which several TLS samples were drawn. For each simulated sample HIV prevalence, unweighted and weighted by the sampling weights was recorded. A logistic regression model was run and residues were recorded. The CARES method was applied and HIV prevalence was again calculated. The estimated HIV prevalence obtained by CARES method was compared to HIV prevalence estimates calculated using the TLS method only. Results showed that CARES method improves HIV prevalence estimates obtained when time location sampling method is used to recruit respondents. This method is a new approach that aims to provide better HIV prevalence estimates and might be very useful whenever more reliable sampling techniques cannot be applied

Repositório da Universidade Nova de Lisboa