32 research outputs found

    Generic Statistical Information Model (GSIM)

    Get PDF
    Presentation at the North American Data Documentation Conference (NADDI) 2013Across the world statistical organizations undertake similar activities. Each of these activities use and produce similar information (for example all agencies use classifications, create data sets and publish products). Although the information is at its core the same, organizations tend to describe this information slightly differently (and often in different ways within each organization). There is no common means to describe the information. GSIM is a conceptual model that provides a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics. DDI is a key standard in both the development of GSIM itself, and as an implementation tool for organizations using GSIM. Beyond that, it also will influence the future directions of DDI development, attracting a larger number of data producers into the DDI community. This presentation introduces GSIM and looks at the interaction between GSIM and DDI (and other related standards), and provides an update on a rapidly-evolving vision around the use of DDI within the statistical institutes in Europe and elsewhere. It will cover both the direct interaction between DDI and GSIM, and also provide a broader context for understanding what that dynamic may mean in the future.Institute for Policy & Social Research, University of Kansas; University of Kansas Libraries; Alfred P. Sloan Foundation; Data Documentation Initiative Allianc

    Methods library of embedded R functions at Statistics Norway

    Get PDF
    Statistics Norway is modernising the production processes. An important element in this work is a library of functions for statistical computations. In principle, the functions in such a methods library can be programmed in several languages. A modernised production environment demand that these functions can be reused for different statistics products, and that they are embedded within a common IT system. The embedding should be done in such a way that the users of the methods do not need to know the underlying programming language. As a proof of concept, Statistics Norway soon has established a methods library offering a limited number of methods for macro-editing, imputation and confi dentiality. This is done within an area of municipal statistics with R as the only programming language. This paper presents the details and experiences from this work. The problem of fi tting real word applications to simple and strict standards is discussed and exemplifi ed by the development of solutions to regression imputation and table suppression. Keywords: Offi cial statistics, R; Common Statistical Production Architecture, Generic Statistical Information Model, Validation and Transformation Language, Imputation, Statistical disclosure control JEL Classifi cation: C18, C88publishedVersio

    Long-term Preservation of Longitudinal Statistical Surveys in Psycholinguistic Research

    Get PDF
    Psycholinguistics deals with different types of evidence and obtained data, including confidential information which needs to be protected from disclosure and other security threats. When it comes to speech-language pathologies, researchers in psycholinguistics are especially interested in aphasia. Aphasia is a loss of language ability as a consequence of brain damage, which may result from head injury or stroke. Research data has to be adequately stored, processed, protected and if possible, preserved for secondary use. Authors are proposing possible application of models and tools used in official statistics and concepts from the archival science that could contribute to solving the so far unresolved issues in the research on aphasia and its records management requirements in the context of long-term preservation, trust, and reuse

    Generic Statistical Business Process Model GSBPM . (Version 5.1, January 2019) Norsk oversettelse

    Get PDF
    Generic Statistical Business Process Model (GSBPM) beskriver og definerer forretningsprosesser som er nødvendige for å produsere offisiell statistikk. Den beskriver faser, delprosesser og overordnede prosesser i statistikkproduksjonen

    An analysis of existing production frameworks for statistical and geographic information: Synergies, gaps and integration

    Get PDF
    The production of official statistical and geospatial data is often in the hands of highly specialized public agencies that have traditionally followed their own paths and established their own production frameworks. In this article, we present the main frameworks of these two areas and focus on the possibility and need to achieve a better integration between them through the interoperability of systems, processes, and data. The statistical area is well led and has well-defined frameworks. The geospatial area does not have clear leadership and the large number of standards establish a framework that is not always obvious. On the other hand, the lack of a general and common legal framework is also highlighted. Additionally, three examples are offered: the first is the application of the spatial data quality model to the case of statistical data, the second of the application of the statistical process model to the geospatial case, and the third is the use of linked geospatial and statistical data. These examples demonstrate the possibility of transferring experiences/advances from one area to another. In this way, we emphasize the conceptual proximity of these two areas, highlighting synergies, gaps, and potential integration. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    О роли Общей системы метаданных в развитии статистики Азербайджана

    Get PDF
    The author establishes the importance of the Common Metadata Framework adapted to Azerbaijani conditions and the possibility of using this experience by the statistical agencies when creating national  statistical metadata systems. The article formulates proposals for  upgrading the structure of the Common Metadata Framework with  regard to its practical applications. It is the author’s opinion that  conclusions and proposals made in this system research can be used to  revise the Common Metadata Framework and to develop state programs  aimed at improving statistical practice and metadata  development strategy within the national statistical systems.Автором обосновываются значение адаптированной к практическим условиям Азербайджанской Республики Общей системы метаданных и возможности использования этого опыта  статистическими службами при создании национальных статистических систем метаданных. В  статье формулируются предложения по модернизации структуры Общей системы метаданных с  учетом возможностей ее реализации на практике. По мнению автора, выводы и предложения,  сделанные в процессе системного исследования, могут быть использованы как для  актуализации Общей системы метаданных, так и для разработки государственных программ  совершенствования официальной статистики и стратегии развития метаданных в  национальных статистических системах

    Capability maturity models towards improved quality of the sustainable development goals indicators data

    Get PDF
    Achieving the Sustainable Development Goals (SDGs) demands coping with the data revolution for sustainable development: the integration of new and traditional data to produce high-quality information that is detailed, timely, and relevant for multiple purposes and to a variety of users. The quality of this information, defined by its completeness, uniqueness, timeliness, validity, accuracy, and consistency, is crucial for appropriate decision making; which leads to improvements in advancing national development imperatives for reaching the goals and targets of the sustainable development agenda. In this paper, we posit that the more mature the organizations within the national data ecosystems are, the higher the quality of data that they produce. The paper motivates for the adoption and mainstreaming of organizational Capability Maturity Models within the SGDs activities. It also presents the preliminary formulation of a multidimensional prescriptive Capability Maturity Model to assess and improve the maturity of organizations within national data ecosystems and, therefore, the effective monitoring of the progress on the SDG targets through the production of better quality indicators data. Furthermore, the paper provides recommendation towards addressing the challenges within the increasingly data-driven domain of social indicators monitoring.Facultad de Informátic

    Provenance of "after the fact" harmonised community-based demographic and HIV surveillance data from ALPHA cohorts

    Get PDF
    Background: Data about data, metadata, for describing Health and Demographic Surveillance System (HDSS) data have often received insufficient attention. This thesis studied how to develop provenance metadata within the context of HDSS data harmonisation - the network for Analysing Longitudinal Population-based HIV/ AIDS data on Africa (ALPHA). Technologies from the data documentation community were customised, among them: A process model - Generic Longitudinal Business Process Model (GLBPM), two metadata standards - Data Documentation Initiative (DDI) and Standard for Data and Metadata eXchange (SDMX) and a data transformations description language - Structured Data Transform Language (SDTL). Methods: A framework with three complementary facets was used: Creating a recipe for annotating primary HDSS data using the GLBPM and DDI; Approaches for documenting data transformations. At a business level, prospective and retrospective documentation using GLBPM and DDI and retrospectively recovering the more granular details using SDMX and SDTL; Requirements analysis for a user-friendly provenance metadata browser. Results: A recipe for the annotation of HDSS data was created outlining considerations to guide HDSS on metadata entry, staff training and software costs. Regarding data transformations, at a business level, a specialised process model for the HDSS domain was created. It has algorithm steps for each data transformation sub-process and data inputs and outputs. At a lower level, the SDMX and SDTL captured about 80% (17/21) of the variable level transformations. The requirements elicitation study yielded requirements for a provenance metadata browser to guide developers. Conclusions: This is a first attempt ever at creating detailed metadata for this resource or any other similar resources in this field. HDSS can implement these recipes to document their data. This will increase transparency and facilitate reuse thus potentially bringing down costs of data management. It will arguably promote the longevity and wide and accurate use of these data

    Statistical metadata in knowledge discovery.

    Get PDF
    Metadata represents the semantic schema of the data collected over the years by an organization in order to apply the business intelligence approach.  However, the metadata normally collected are not enough to facilitate knowledge discovery processes because they are conceived, primarily, for the interoperability between information systems. Research undertaken in this study confirmed the need to enrich data warehousing systems with structured meaningful metadata in order to increase the productivity and efficacy of any investigation, including data management and future business analytics. This need led us to adopt and extend the concept of “statistical metadata”. Thus, our proposed conceptual model of statistical metadata not only considers recognized standards, but also represents other additional properties. This means that our conceptual model allows increased levels of detail about the data and quality of the semantic contents.Los metadatos representan el esquema semántico de los datos recolectados a lo largo de los años por una organización para aplicar el enfoque de inteligencia de negocios. Sin embargo, los metadatos normalmente recopilados no son suficientes para facilitar los procesos de descubrimiento de conocimiento porque están concebidos, principalmente, para la interoperabilidad entre sistemas de información. La investigación realizada en este estudio confirmó la necesidad de enriquecer los sistemas de almacenamiento de datos con metadatos significativos y estructurados con el fin de aumentar la productividad y la eficacia de cualquier investigación, incluida la gestión de datos y la analítica futura del negocio. Esta necesidad nos llevó a adoptar y ampliar el concepto de "metadatos estadísticos". Por lo tanto, nuestro modelo conceptual propuesto de metadatos estadísticos no sólo considera estándares reconocidos, sino que también representa otras propiedades adicionales. Esto significa que nuestro modelo conceptual permite mayores niveles de detalle sobre los datos y la calidad de los contenidos semánticos

    Связанные статистические данные: актуальность и перспективы

    Get PDF
    After a detailed argumentation of the study’s relevance, this article discusses the prospects for introducing the concept of linked open statistics produced within the framework of a single information environment that ensures efficient production, dissemination, and reuse of statistical and administrative data. The implementation of this qualitatively new concept based on technological innovations and aimed to meet rapidly growing user demands is a key task of digital transformation, defined by the Government of the Russian Federation in the field of official statistics. The major part of open data concerns statistics such as demographic, economic and social indicators. Describing and presenting them in the form of linked open statistics sets an important background for accelerating socio-economic development by introducing new socially significant state, municipal, non-commercial and commercial services/products.Linked Open Statistical Data (LOSD) allows performing analysis based on a coordinated, integrated information environment as an alternative to using disparate and often controversial data sets. National statistical institutes and government bodies in many countries, together with international organizations, have already chosen the paradigm of linked open statistics. The authors discuss the advantages of this approach, as well as its practical application in international projects.The article presents the examples and best practices of linked open statistics in a number of publications and strategic documents within the European Statistical System. It also shows the constraints of the linked open statistics development due to the lack of accessible ontologies and standards - the extensions necessary to meet the requirements for classification and management of various concepts in statistics domain. The analysis of projects and initiatives carried out in the article reflects the possibilities and prospects of solving this problem in the field of state statistics. The authors formulate a set of recommendations based both on the analysis of international practice and on the results of their own development experience within the research project «Center of Semantic Integration».В данной статье после развернутой аргументации актуальности проведенного исследования рассмотрены перспективы внедрения концепции связанных статистических данных, формируемых в рамках единого информационного пространства, обеспечивающего эффективное производство, распространение и повторное использование статистических и административных данных. Реализация этой качественно новой концепции на основе технологических новаций, предпринимаемая в целях более полного удовлетворения быстро возрастающих потребностей пользователей - ключевая задача цифровой трансформации, определенная Правительством Российской Федерации в области официальной статистики. Большая часть открытых данных связана со статистикой: демографическими, экономическими и социальными показателями. Их описание и представление в виде связанных данных могло бы стать важной основой для ускорения социально-экономического развития страны путем создания новых общественно значимых государственных, муниципальных, некоммерческих и коммерческих услуг/продуктов.В статистике связанные открытые данные (Linked Open Statistical Data, LOSD) позволяют выполнять анализ на основе скоординированной, интегрированной информационной базы как альтернативы использованию разрозненных и часто противоречивых наборов данных. Национальные статистические службы и государственные органы целого ряда стран, а также международные организации уже перешли на парадигму связанных данных. Авторы статьи рассматривают преимущества этого подхода, а также практику его применения в международных проектах.Приведены примеры и лучший опыт создания связанных открытых статистических данных в публикациях и стратегических документах в рамках Европейской статистической системы. Показано, что развитие связанных статистических данных сдерживается отсутствием доступных онтологий и стандартов - расширений, необходимых для обеспечения требований к классификации различных концептов в статистике и управлению ими. Проведенный в статье анализ проектов и инициатив отражает возможности и перспективы решения данной проблемы в сфере государственной статистики. Сформулированные авторами рекомендации основаны как на анализе международной практики, так и на результатах собственного опыта разработок в рамках научно-исследовательского проекта «Центр семантической интеграции»