2,261 research outputs found

    Geographical information retrieval with ontologies of place

    Get PDF
    Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects

    Open City Data Pipeline

    Get PDF
    Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia. Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.Series: Working Papers on Information Systems, Information Business and Operation

    Seeking ‘the New Normal’? Troubled spaces of encountering visible differences in Warsaw

    Get PDF
    In times of globalisation and super-mobility, ideas of normality are in turmoil. In different societies in, across and beyond Europe, we face the challenge of undoing specific notions of normality and creating more inclusive societies with an open culture of learning to live with differences. The scope of the paper is to introduce some findings on encounters with difference and negotiations of social values in relation to a growing visibility of difference after 1989 in Poland, on the background of a critique of normality/ normalisation and normalcy. On the basis of interviews conducted in Warsaw, we investigate how normality/ normalisation discourses of visible homosexuality and physical disability are incorporated into individual self-reflections and justifications of prejudices (homophobia and disabilism). More specifically we argue that there are moments of ‘cultural transgressions’ present in everyday practices towards ‘visible’ sexual and (dis)ability difference

    An integrated approach to deliver OLAP for multidimensional Semantic Web Databases

    Get PDF
    Semantic Webs (SW) and web data have become increasingly important sources to support Business Intelligence (BI), but they are difficult to manage due to the exponential increase in their volumes, inconsistency in semantics and complexity in representations. On-Line Analytical Processing (OLAP) is an important tool in analysing large and complex BI data, but it lacks the capability of processing disperse SW data due to the nature of its design. A new concept with a richer vocabulary than the existing ones for OLAP is needed to model distributed multidimensional semantic web databases. A new OLAP framework is developed, with multiple layers including additional vocabulary, extended OLAP operators, and usage of SPARQL to model heterogeneous semantic web data, unify multidimensional structures, and provide new enabling functions for interoperability. The framework is presented with examples to demonstrate its capability to unify existing vocabularies with additional vocabulary elements to handle both informational and topological data in Graph OLAP. The vocabularies used in this work are: the RDF Cube Vocabulary (QB) – proposed by the W3C to allow multi-dimensional, mostly statistical, data to be published in RDF; and the QB4OLAP – a QB extension introducing standard OLAP operators. The framework enables the composition of multiple databases (e.g. energy consumptions and property market values etc.) to generate observations through semantic pipe-like operators. This approach is demonstrated through Use Cases containing highly valuable data collected from a real-life environment. Its usability is proved through the development and usage of semantic pipe-like operators able to deliver OLAP specific functionalities. To the best of my knowledge there is no available data modelling approach handling both informational and topological Semantic Web data, which is designed either to provide OLAP capabilities over Semantic Web databases or to provide a means to connect such databases for further OLAP analysis. The thesis proposes that the presented work provides a wider understanding of: ways to access Semantic Web data; ways to build specialised Semantic Web databases, and, how to enrich them with powerful capabilities for further Business Intelligence

    Words and their secrets

    Get PDF

    Discovering new kinds of patient safety incidents

    No full text
    Every year, large numbers of patients in National Health Service (NHS) care suffer because of a patient safety incident. The National Patient Safety Agency (NPSA) collects large amounts of data describing individual incidents. As well as being described by categorical and numerical variables, each incident is described using free text. The aim of the work was to find quite small groups of similar incidents, which were of types that were previously unknown to the NPSA. A model of the text was produced, such that the position of each incident reflected its meaning to the greatest extent possible. The basic model was the vector space model. Dimensionality reduction was carried out in two stages: unsupervised dimensionality reduction was carried out using principal component analysis, and supervised dimensionality reduction using linear discriminant analysis. It was then possible to look for groups of incidents that were more tightly packed than would be expected given the overall distribution of the incidents. The process for assessing these groups had three stages. Firstly, a quantitative measure was used, allowing a large number of parameter combinations to be examined. The groups found for an ‘optimum’ parameter combination were then divided into categories using a qualitative filtering method. Finally, clinical experts assessed the groups qualitatively. The transition probabilities model was also examined: this model was based on the empirical probabilities that two word sequences were seen in the text. An alternative method for dimensionality reduction was to use information about the subjective meaning of a small sample of incidents elicited from experts, producing a mapping between high and low dimensional models of the text. The analysis also included the direct use of the categorical variables to model the incidents, and empirical analysis of the behaviour of high dimensional spaces

    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

    Full text link
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of (Helal, 2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table
    corecore