Search CORE

162,853 research outputs found

The distortionary effects of temporal aggregation on Granger causality

Author: Abeysinghe Tilak
Rajaguru Gulasekaran
Publication venue: Brown Walker Press
Publication date: 01/01/2012
Field of study

Ratings and rankings: Voodoo or Science?

Composite indicators aggregate a set of variables using weights which are understood to reflect the variables' importance in the index. In this paper we propose to measure the importance of a given variable within existing composite indicators via Karl Pearson's `correlation ratio'; we call this measure `main effect'. Because socio-economic variables are heteroskedastic and correlated, (relative) nominal weights are hardly ever found to match (relative) main effects; we propose to summarize their discrepancy with a divergence measure. We further discuss to what extent the mapping from nominal weights to main effects can be inverted. This analysis is applied to five composite indicators, including the Human Development Index and two popular league tables of university performance. It is found that in many cases the declared importance of single indicators and their main effect are very different, and that the data correlation structure often prevents developers from obtaining the stated importance, even when modifying the nominal weights in the set of nonnegative numbers with unit sum.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università dell'Insubria

Recommended from our members

FAM222A encodes a protein which accumulates in plaques in Alzheimer's disease.

Author: Alzheimer Disease Neuroimaging Initiative
Fujioka Hisashi
Gao Ju
Liang Jingjing
Wang Luwen
Wang Xinglong
Yan Tingxiang
Zhu Xiaofeng
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Alzheimer's disease (AD) is characterized by amyloid plaques and progressive cerebral atrophy. Here, we report FAM222A as a putative brain atrophy susceptibility gene. Our cross-phenotype association analysis of imaging genetics indicates a potential link between FAM222A and AD-related regional brain atrophy. The protein encoded by FAM222A is predominantly expressed in the CNS and is increased in brains of patients with AD and in an AD mouse model. It accumulates within amyloid deposits, physically interacts with amyloid-β (Aβ) via its N-terminal Aβ binding domain, and facilitates Aβ aggregation. Intracerebroventricular infusion or forced expression of this protein exacerbates neuroinflammation and cognitive dysfunction in an AD mouse model whereas ablation of this protein suppresses the formation of amyloid deposits, neuroinflammation and cognitive deficits in the AD mouse model. Our data support the pathological relevance of protein encoded by FAM222A in AD

eScholarship - University of California

Consistent distribution-free $K$ -sample and independence tests for univariate random variables

Author: Brill Barak
Gorfine Malka
Heller Ruth
Heller Yair
Kaufman Shachar
Publication venue
Publication date: 18/06/2015
Field of study

A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.Comment: arXiv admin note: substantial text overlap with arXiv:1308.155

arXiv.org e-Print Archive

CiteSeerX

Geographically intelligent disclosure control for flexible aggregation of census data

Author: Martin David
Skinner Chris
Young Caroline
Publication venue: 'University of Southampton'
Publication date: 11/07/2007
Field of study

This paper describes a geographically intelligent approach to disclosure control for protecting flexibly aggregated census data. Increased analytical power has stimulated user demand for more detailed information for smaller geographical areas and customized boundaries. Consequently it is vital that improved methods of statistical disclosure control are developed to protect against the increased disclosure risk. Traditionally methods of statistical disclosure control have been aspatial in nature. Here we present a geographically intelligent approach that takes into account the spatial distribution of risk. We describe empirical work illustrating how the flexibility of this new method, called local density swapping, is an improved alternative to random record swapping in terms of risk-utility

Southampton (e-Prints Soton)

Computing Multi-Relational Sufficient Statistics for Large Databases

Author: Qian Zhensong
Schulte Oliver
Sun Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/08/2014
Field of study

Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sufficient statistics that combine information from different database tables. Such statistics may involve any number of {\em positive and negative} relationships. With a naive enumeration approach, computing sufficient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the efficient implementation of this M\"obius virtual join operation. The M\"obius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evaluation with seven benchmark datasets showed that information about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning.Comment: 11pages, 8 figures, 8 tables, CIKM'14,November 3--7, 2014, Shanghai, Chin

arXiv.org e-Print Archive

CiteSeerX

Measuring Confidentiality Risks in Census Data

Author: Openshaw S.
Duke-Williams O.
Rees P.
Publication venue: School of Geography
Publication date: 01/01/1997
Field of study

Two trends have been on a collision course over the recent past. The first is the increasing demand by researchers for greater detail and flexibility in outputs from the decennial Census of Population. The second is the need felt by the Census Offices to demonstrate more clearly that Census data have been explicitly protected from the risk of disclosure of information about individuals. To reconcile these competing trends the authors propose a statistical measure of risks of disclosure implicit in the release of aggregate census data. The ideas of risk measurement are first developed for microdata where there is prior experience and then modified to measure risk in tables of counts. To make sure that the theoretical ideas are fully expounded, the authors develop small worked example. The risk measure purposed here is currently being tested out with synthetic and a real Census microdata. It is hoped that this approach will both refocus the census confidentiality debate and contribute to the safe use of user defined flexible census output geographies

MIT Libraries Dome

White Rose Research Online

Measuring Confidentiality Risks in Census Data

Author: Duke-Williams O.
Openshaw S.
Rees P.
Publication venue: School of Geography
Publication date: 01/01/1997
Field of study

UCL Discovery

White Rose Research Online

Inconsistencies in Reported Employment Characteristics among Employed Stayers

Author: Bassi Francesca
Padoan Alessandra
Trivellato Ugo
Publication venue
Publication date
Field of study

The paper deals with measurement error, and its potentially distorting role, in information on industry and professional status collected by labour force surveys. The focus of our analyses is on inconsistent information on these employment characteristics resulting from yearly transition matrices for workers who were continuously employed over the year and who did not change job. As a case-study we use yearly panel data for the period from April 1993 to April 2003 collected by the Italian Quarterly Labour Force Survey. The analysis goes through four steps: (i) descriptive indicators of (dis)agreement; (ii) testing whether the consistency of repeated information significantly increases when the number of categories is collapsed; (iii) examination of the pattern of inconsistencies among response categories by means of Goodman's quasi-independence model; (iv) comparisons of alternative classifications. Results document sizable measurement error, which is only moderately reduced by more aggregated classifications. They suggest that even cross-section estimates of employment by industry and/or professional status are affected by non-random measurement error.industry, professional status, measurement errors, survey data

Research Papers in Economics