13 research outputs found

    Parameter estimation in the presence of auxiliary information

    Get PDF
    Dissertação para obtenção do Grau de Doutora em Estatística e Gestão de Risco, Especialidade em EstatísticaIn survey research, there are many situations when the primary variable of interest is sensitive. The sensitivity of some queries can give rise to a refusal to answer or to false answers given intentionally. Survey can be conducted in a variety of settings, in part dictated by the mode of data collection, and these settings can differ in how much privacy they offer the respondent. The estimates obtained from a direct survey on sensitive questions would be subject to high bias. A variety of techniques have been used to improve reporting by increasing the privacy of the respondents. The Randomized Response Technique (RRT), introduced byWarner in 1965, develops a random relation between the individual’s response and the question. This technique provides confidentiality to respondents and still allows the interviewers to estimate the characteristic of interest at an aggregate level. In this thesis we propose some estimators to improve the mean estimation of a sensitive variable based on a RRT by making use of available non-sensitive auxiliary information. In the first part of this thesis we present the ratio and the regression estimators as well as some generalizations in order to study the gain in the estimation over the ordinary RRT mean estimator. In chapters 4 and 5 we study the performance of some exponential type estimators, also based on a RRT. The final part of the thesis illustrates an approach to mean estimation in stratified sampling. This study confirms some previous results for a different sample design. An extensive simulation study and an application to a real dataset are done for all the study estimators to evaluate their performance. In the last chapter we present a general discussion referring to the main results and conclusions as well as showing an application to a real dataset which compares the performance of study estimators

    Statistical Methodologies

    Get PDF
    Statistical practices have recently been questioned by numerous independent authors, to the extent that a significant fraction of accepted research findings can be questioned. This suggests that statistical methodologies may have gone too far into an engineering practice, with minimal concern for their foundation, interpretation, assumptions, and limitations, which may be jeopardized in the current context. Disguised by overwhelming data sets, advanced processing, and stunning presentations, the basic approach is often intractable to anyone but the analyst. The hierarchical nature of statistical inference, exemplified by Bayesian aggregation of prior and derived knowledge, may also be challenging. Conceptual simplified studies of the kind presented in this book could therefore provide valuable guidance when developing statistical methodologies, but also applying state of the art with greater confidence

    Randomized response estimation in multiple frame surveys

    Get PDF
    Large scale surveys are increasingly delving into sensitive topics such as gambling, alcoholism, drug use, sexual behavior, domestic violence. Sensitive, stigmatizing or even incriminating themes are difficult to investigate by using standard datacollection techniques since respondents are generally reluctant to release information which concern their personal sphere. Further, such topics usually pertain elusive population (e.g., irregular immigrants and homeless, alcoholics, drug users, rape and sexual assault victims) which are difficult to sample since not adequately covered in a single sampling frame. On the other hand, researchers often utilize more than one data-collection mode (i.e., mixed-mode surveys) in order to increase response rates and/or improve coverage of the population of interest. Surveying sensitive and elusive populations and mixed-mode researches are strictly connected with multiple frame surveys which are becoming widely used to decrease bias due to undercoverage of the target population. In this work, we combine sensitive research and multiple frame surveys. In particular, we consider statistical techniques for handling sensitive data coming from multiple frame surveys using complex sampling designs. Our aim is to estimate the mean of a sensitive variable connected to undesirable behaviors when data are collected by using the randomized response theory. Some estimators are constructed and their properties theoretically investigated. Variance estimation is also discussed by means of the jackknife technique. Finally, a Monte Carlo simulation study is conducted to evaluate the performance of the proposed estimators and the accuracy of variance estimation..Ministerio de Economía y CompetitividadFPU grant programConsejería de Empleo, Empresa y Comercio, Junta de Andalucí

    Robust regression type estimators to determine the population mean under simple and two-stage random sampling techniques

    Get PDF
    For the estimation of population mean, there are several ratio and regression type estimators available in literature. However, they can be misleadingto contain the desired results when data are contaminated by outliers. Inrecent past, Zaman and Bulut (2019a) provided the solution of this issueby utilizing some robust regression tools and develop a class of ratio typeestimators under simple random sampling scheme. To extending their work,Zaman (2019) has suggested another class of estimators but this time usingratio technique. In this paper, we proposed a new class of robust regression type estimators with utilizing LAD, LMS, LTS, Huber-M, Hampel-M,Tukey-M, Huber-MM as robust regression tools. The desired class is subsequently extended for two stage sampling, where mean of the study variableis not available at first stage. Also, we have developed some reviewed andproposed estimators under above mentioned sampling technique. Further,we have divided our supposition into two cases as: (i)- when drawn a second stage sample depends upon first stage sample and, (ii)- when drawn asecond stage sample is independent of first stage sample. The mean squareexpressions of the proposed estimators have been determined through Taylor series expansion. A real life application and the simulation study are alsoprovided to assess existing and proposed estimators. In the light of numericalillustration, we see that our proposed estimators give more efficient resultsthan the reviewed ones

    Mean estimation of sensitive variables under measurement errors and non-response

    Get PDF
    This study mainly consists of three important issues we face in survey sampling: social desirability bias, measurement errors, and non-response. In this dissertation, we study the mean estimation of a sensitive variable under measurement errors and non-response. We propose a generalized mean estimator, then discuss the bias and the mean square error (MSE) of this estimator and present the comparisons with other estimators under the measurement errors and non-response using optional RRT model (ORRT). We also study the performance of the proposed estimator under the same situations using stratified random sampling. Simulation studies are also conducted to verify the theoretical results. Both the theoretical and empirical results show that the generalized mean estimator is more efficient than the ordinary RRT estimator that does not utilize the auxiliary variable, and the ratio estimator which is one of the commonly used mean estimator

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data
    corecore