17 research outputs found
Open City Data Pipeline
Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while
access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of
data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused
attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner
as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a
modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning
over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such
imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as
machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia.
Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version
of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and
standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we
arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data
Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.Series: Working Papers on Information Systems, Information Business and Operation
Связанные статистические данные: актуальность и перспективы
After a detailed argumentation of the study’s relevance, this article discusses the prospects for introducing the concept of linked open statistics produced within the framework of a single information environment that ensures efficient production, dissemination, and reuse of statistical and administrative data. The implementation of this qualitatively new concept based on technological innovations and aimed to meet rapidly growing user demands is a key task of digital transformation, defined by the Government of the Russian Federation in the field of official statistics. The major part of open data concerns statistics such as demographic, economic and social indicators. Describing and presenting them in the form of linked open statistics sets an important background for accelerating socio-economic development by introducing new socially significant state, municipal, non-commercial and commercial services/products.Linked Open Statistical Data (LOSD) allows performing analysis based on a coordinated, integrated information environment as an alternative to using disparate and often controversial data sets. National statistical institutes and government bodies in many countries, together with international organizations, have already chosen the paradigm of linked open statistics. The authors discuss the advantages of this approach, as well as its practical application in international projects.The article presents the examples and best practices of linked open statistics in a number of publications and strategic documents within the European Statistical System. It also shows the constraints of the linked open statistics development due to the lack of accessible ontologies and standards - the extensions necessary to meet the requirements for classification and management of various concepts in statistics domain. The analysis of projects and initiatives carried out in the article reflects the possibilities and prospects of solving this problem in the field of state statistics. The authors formulate a set of recommendations based both on the analysis of international practice and on the results of their own development experience within the research project «Center of Semantic Integration».В данной статье после развернутой аргументации актуальности проведенного исследования рассмотрены перспективы внедрения концепции связанных статистических данных, формируемых в рамках единого информационного пространства, обеспечивающего эффективное производство, распространение и повторное использование статистических и административных данных. Реализация этой качественно новой концепции на основе технологических новаций, предпринимаемая в целях более полного удовлетворения быстро возрастающих потребностей пользователей - ключевая задача цифровой трансформации, определенная Правительством Российской Федерации в области официальной статистики. Большая часть открытых данных связана со статистикой: демографическими, экономическими и социальными показателями. Их описание и представление в виде связанных данных могло бы стать важной основой для ускорения социально-экономического развития страны путем создания новых общественно значимых государственных, муниципальных, некоммерческих и коммерческих услуг/продуктов.В статистике связанные открытые данные (Linked Open Statistical Data, LOSD) позволяют выполнять анализ на основе скоординированной, интегрированной информационной базы как альтернативы использованию разрозненных и часто противоречивых наборов данных. Национальные статистические службы и государственные органы целого ряда стран, а также международные организации уже перешли на парадигму связанных данных. Авторы статьи рассматривают преимущества этого подхода, а также практику его применения в международных проектах.Приведены примеры и лучший опыт создания связанных открытых статистических данных в публикациях и стратегических документах в рамках Европейской статистической системы. Показано, что развитие связанных статистических данных сдерживается отсутствием доступных онтологий и стандартов - расширений, необходимых для обеспечения требований к классификации различных концептов в статистике и управлению ими. Проведенный в статье анализ проектов и инициатив отражает возможности и перспективы решения данной проблемы в сфере государственной статистики. Сформулированные авторами рекомендации основаны как на анализе международной практики, так и на результатах собственного опыта разработок в рамках научно-исследовательского проекта «Центр семантической интеграции»
Dataset search: a survey
Generating value from data requires the ability to find, access and make
sense of datasets. There are many efforts underway to encourage data sharing
and reuse, from scientific publishers asking authors to submit data alongside
manuscripts to data marketplaces, open data portals and data communities.
Google recently beta released a search service for datasets, which allows users
to discover data stored in various online repositories via keyword queries.
These developments foreshadow an emerging research field around dataset search
or retrieval that broadly encompasses frameworks, methods and tools that help
match a user data need against a collection of datasets. Here, we survey the
state of the art of research and commercial systems in dataset retrieval. We
identify what makes dataset search a research field in its own right, with
unique challenges and methods and highlight open problems. We look at
approaches and implementations from related areas dataset search is drawing
upon, including information retrieval, databases, entity-centric and tabular
search in order to identify possible paths to resolve these open problems as
well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference
A General Semantic Web Approach for Data Analysis on Graduates Statistics
Currently, several datasets released in a Linked Open Data format are available at a national and international level, but the lack of shared strategies concerning the definition of concepts related to the statistical publishing community makes difficult a comparison among given facts starting from different data sources. In order to guarantee a shared representation framework for what concerns the dissemination of statistical concepts about graduates, we developed SW4AL, an ontology- based system for graduate’s surveys domain. The developed system transforms low-level data into an enriched information model and is based on the AlmaLaurea surveys covering more than 90% of Italian graduates. SW4AL: i) semantically describes the different peculiarities of the graduates; ii) promotes the structured definition of the AlmaLaurea data and the following publication in the Linked Open Data context; iii) provides their reuse in the open data scope; iv) enables logical reasoning about knowledge representation. SW4AL establishes a common semantic for addressing the concept of graduate’s surveys domain by proposing the creation of a SPARQL endpoint and a Web based interface for the query and the visualization of the structured data
SEMANTIC LINKING SPATIAL RDF DATA TO THE WEB DATA SOURCES
Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e.g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc
User Perception of the U.S. Open Government Data Success Factors
This quantitative correlational study used the information systems success model to examine the relationship between the U.S. federal departments\u27 open data users\u27 perception of the system quality, perception of information quality, perception of service quality, and the intent to use open data from U.S. federal departments. A pre-existing information system success model survey instrument was used to collect data from 122 open data users. The result of the standard multiple linear regression was statistically significant to predict the intent to use the U.S. open government data F(3,99) = 6479.916, p \u3c0.01 and accounted for 99% of the variance in the intent to use the U.S. open government data (R²= .995), adjusted R²= .995. The interdependent nature of information quality, system quality, and service quality may have contributed to the value of the R². Cronbach\u27s alpha for this study is α=.99, and the value could be attributed to the fact that users of open data are not necessarily technical oriented, and were not able to distinguish the differences between the meanings of the variables. The result of this study confirmed that there is a relationship between the user\u27s perception of the system quality, perception of information quality, perception of service quality, and the intent to use open data from U.S. federal departments. The findings from this study might contribute to positive social change by enabling the solving of problems in the healthcare, education, energy sector, research community, digitization, and preservation of e-government activities. Using study, the results of this study, IT software engineers in the US federal departments, may be able to improve the gathering of user specifications and requirements in information system design
Proceedings of the Seventh Congress of the European Society for Research in Mathematics Education
International audienceThis volume contains the Proceedings of the Seventh Congress of the European Society for Research in Mathematics Education (ERME), which took place 9-13 February 2011, at Rzeszñw in Poland