185,447 research outputs found
Evaluation of Three Sampling Methods to Monitor Outcomes of Antiretroviral Treatment Programmes in Low- and Middle-Income Countries
BACKGROUND: Retention of patients on antiretroviral therapy (ART) over time is a proxy for quality of care and an outcome indicator to monitor ART programs. Using existing databases (Antiretroviral in Lower Income Countries of the International Databases to Evaluate AIDS and Médecins Sans Frontières), we evaluated three sampling approaches to simplify the generation of outcome indicators. METHODS AND FINDINGS: We used individual patient data from 27 ART sites and included 27,201 ART-naive adults (≥15 years) who initiated ART in 2005. For each site, we generated two outcome indicators at 12 months, retention on ART and proportion of patients lost to follow-up (LFU), first using all patient data and then within a smaller group of patients selected using three sampling methods (random, systematic and consecutive sampling). For each method and each site, 500 samples were generated, and the average result was compared with the unsampled value. The 95% sampling distribution (SD) was expressed as the 2.5(th) and 97.5(th) percentile values from the 500 samples. Overall, retention on ART was 76.5% (range 58.9-88.6) and the proportion of patients LFU, 13.5% (range 0.8-31.9). Estimates of retention from sampling (n = 5696) were 76.5% (SD 75.4-77.7) for random, 76.5% (75.3-77.5) for systematic and 76.0% (74.1-78.2) for the consecutive method. Estimates for the proportion of patients LFU were 13.5% (12.6-14.5), 13.5% (12.6-14.3) and 14.0% (12.5-15.5), respectively. With consecutive sampling, 50% of sites had SD within ±5% of the unsampled site value. CONCLUSIONS: Our results suggest that random, systematic or consecutive sampling methods are feasible for monitoring ART indicators at national level. However, sampling may not produce precise estimates in some sites
Awareness and Utilisation of Online Subscription Databases Among Postgraduate Students in Ahmadu Bello University, Zaria
Purpose: The study was carried out to investigate the level of awareness and utilisation of online subscription databases, and challenges to effective utilization of online subscription databases among the Postgraduate Students in Ahmadu Bello University, Zaria.Methodology: A cross-sectional descriptive survey design was adopted for the study with a total population of 8,376 postgraduate students. A sample of 400 students was selected for the study using proportionate stratified sampling and simple random sampling techniques. The data collected were analysed using percentages, frequency, mean, and OneWay MANOVA.Findings: The study discovered that postgraduate students in ABU Zaria are only aware of Science Direct, JSTOR and EBSCOHOST databases out of the 12 databases investigated. The level of utilisation of online subscription databases by postgraduate students in ABU Zaria is low. The result of the hypotheses showed significant differences in the awareness and utilization of online subscription databases among PhD, Masters and PGD postgraduate students Ahmadu Bello University, Zaria. Lastly, lack of training and orientation on how to use databases, lack of online guide on how to use online subscription databases, lack of ICT and computer literacy skills, non-functioning links to some of the databases, and the inability to locate relevant information resources from the databases were the major challenges impeding effective utilization of online subscription databases.Originality/value: The study showed that the differences that exist in the utilization level of online subscription databases among different categories of postgraduate students is largely occasioned by the awareness factor
Database Matching Under Adversarial Column Deletions
The de-anonymization of users from anonymized microdata through matching or
aligning with publicly-available correlated databases has been of scientific
interest recently. While most of the rigorous analyses of database matching
have focused on random-distortion models, the adversarial-distortion models
have been wanting in the relevant literature. In this work, motivated by
synchronization errors in the sampling of time-indexed microdata, matching
(alignment) of random databases under adversarial column deletions is
investigated. It is assumed that a constrained adversary, which observes the
anonymized database, can delete up to a fraction of the columns
(attributes) to hinder matching and preserve privacy. Column histograms of the
two databases are utilized as permutation-invariant features to detect the
column deletion pattern chosen by the adversary. The detection of the column
deletion pattern is then followed by an exact row (user) matching scheme. The
worst-case analysis of this two-phase scheme yields a sufficient condition for
the successful matching of the two databases, under the near-perfect recovery
condition. A more detailed investigation of the error probability leads to a
tight necessary condition on the database growth rate, and in turn, to a
single-letter characterization of the adversarial matching capacity. This
adversarial matching capacity is shown to be significantly lower than the
random matching capacity, where the column deletions occur randomly. Overall,
our results analytically demonstrate the privacy-wise advantages of adversarial
mechanisms over random ones during the publication of anonymized time-indexed
data
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
- …