2,261 research outputs found
Geographical information retrieval with ontologies of place
Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects
Open City Data Pipeline
Statistical data about cities, regions and at country level is collected for various purposes and from various institutions. Yet, while
access to high quality and recent such data is crucial both for decision makers as well as for the public, all to often such collections of
data remain isolated and not re-usable, let alone properly integrated. In this paper we present the Open City Data Pipeline, a focused
attempt to collect, integrate, and enrich statistical data collected at city level worldwide, and republish this data in a reusable manner
as Linked Data. The main feature of the Open City Data Pipeline are: (i) we integrate and cleanse data from several sources in a
modular and extensible, always up-to-date fashion; (ii) we use both Machine Learning techniques as well as ontological reasoning
over equational background knowledge to enrich the data by imputing missing values, (iii) we assess the estimated accuracy of such
imputations per indicator. Additionally, (iv) we make the integrated and enriched data available both in a we browser interface and as
machine-readable Linked Data, using standard vocabularies such as QB and PROV, and linking to e.g. DBpedia.
Lastly, in an exhaustive evaluation of our approach, we compare our enrichment and cleansing techniques to a preliminary version
of the Open City Data Pipeline presented at ISWC2015: firstly, we demonstrate that the combination of equational knowledge and
standard machine learning techniques significantly helps to improve the quality of our missing value imputations; secondly, we
arguable show that the more data we integrate, the more reliable our predictions become. Hence, over time, the Open City Data
Pipeline shall provide a sustainable effort to serve Linked Data about cities in increasing quality.Series: Working Papers on Information Systems, Information Business and Operation
Seeking ‘the New Normal’? Troubled spaces of encountering visible differences in Warsaw
In times of globalisation and super-mobility, ideas of normality are in turmoil. In different societies in, across and beyond Europe, we face the challenge of undoing specific notions of normality and creating more inclusive societies with an open culture of learning to live with differences. The scope of the paper is to introduce some findings on encounters with difference and negotiations of social values in relation to a growing visibility of difference after 1989 in Poland, on the background of a critique of normality/ normalisation and normalcy. On the basis of interviews conducted in Warsaw, we investigate how normality/ normalisation discourses of visible homosexuality and physical disability are incorporated into individual self-reflections and justifications of prejudices (homophobia and disabilism). More specifically we argue that there are moments of ‘cultural transgressions’ present in everyday practices towards ‘visible’ sexual and (dis)ability difference
An integrated approach to deliver OLAP for multidimensional Semantic Web Databases
Semantic Webs (SW) and web data have become increasingly important sources to support Business Intelligence (BI), but they are difficult to manage due to the exponential increase in their volumes, inconsistency in semantics and complexity in representations. On-Line Analytical Processing (OLAP) is an important tool in analysing large and complex BI data, but it lacks the capability of processing disperse SW data due to the nature of its design. A new concept with a richer vocabulary than the existing ones for OLAP is needed to model distributed multidimensional semantic web databases.
A new OLAP framework is developed, with multiple layers including additional vocabulary, extended OLAP operators, and usage of SPARQL to model heterogeneous semantic web data, unify multidimensional structures, and provide new enabling functions for interoperability. The framework is presented with examples to demonstrate its capability to unify existing vocabularies with additional vocabulary elements to handle both informational and topological data in Graph OLAP. The vocabularies used in this work are: the RDF Cube Vocabulary (QB) – proposed by the W3C to allow multi-dimensional, mostly statistical, data to be published in RDF; and the QB4OLAP – a QB extension introducing standard OLAP operators. The framework enables the composition of multiple databases (e.g. energy consumptions and property market values etc.) to generate observations through semantic pipe-like operators.
This approach is demonstrated through Use Cases containing highly valuable data collected from a real-life environment. Its usability is proved through the development and usage of semantic pipe-like operators able to deliver OLAP specific functionalities.
To the best of my knowledge there is no available data modelling approach handling both informational and topological Semantic Web data, which is designed either to provide OLAP capabilities over Semantic Web databases or to provide a means to connect such databases for further OLAP analysis.
The thesis proposes that the presented work provides a wider understanding of: ways to access Semantic Web data; ways to build specialised Semantic Web databases, and, how to enrich them with powerful capabilities for further Business Intelligence
Discovering new kinds of patient safety incidents
Every year, large numbers of patients in National Health Service (NHS) care suffer because
of a patient safety incident. The National Patient Safety Agency (NPSA) collects large
amounts of data describing individual incidents. As well as being described by categorical
and numerical variables, each incident is described using free text.
The aim of the work was to find quite small groups of similar incidents, which were of
types that were previously unknown to the NPSA. A model of the text was produced, such
that the position of each incident reflected its meaning to the greatest extent possible.
The basic model was the vector space model. Dimensionality reduction was carried
out in two stages: unsupervised dimensionality reduction was carried out using principal
component analysis, and supervised dimensionality reduction using linear discriminant
analysis. It was then possible to look for groups of incidents that were more tightly packed
than would be expected given the overall distribution of the incidents.
The process for assessing these groups had three stages. Firstly, a quantitative measure
was used, allowing a large number of parameter combinations to be examined. The groups
found for an ‘optimum’ parameter combination were then divided into categories using a
qualitative filtering method. Finally, clinical experts assessed the groups qualitatively.
The transition probabilities model was also examined: this model was based on the
empirical probabilities that two word sequences were seen in the text.
An alternative method for dimensionality reduction was to use information about the subjective meaning of a small sample of incidents elicited from experts, producing a mapping
between high and low dimensional models of the text.
The analysis also included the direct use of the categorical variables to model the incidents,
and empirical analysis of the behaviour of high dimensional spaces
Recommended from our members
Changes in protein levels as markers of severe disease: an investigation of severe malaria
Compounds directly involved in the pathogenesis of cerebral malaria (CM) remain unclear due to lack of robust methods of identifying and quantifying proteins expressed in low abundance. New developments in proteomics have now made it possible to identify low abundant proteins and provided new tools for studying host-parasite interactions. With these new tools, it may be possible to identify proteomic signatures for patients with various complications associated with severe malaria.
A global proteomic strategy was used to identify differentially expressed proteins in archived plasma and CSF drawn from children diagnosed with cerebral malaria (CM) compared to those with acute bacterial meningitis (ABM) and slide negative encephalopathy (EN). Samples were first separated using two-dimensional gel electrophoresis (2-DE) or two-dimensional liquid chromatography (2D-LC) and analysed using mass spectrometry. The data collected was analyzed using various bio-informatics tools. Finally, a CM mass profile was created using MALDI-ToF mass spectrometry.
Averages of about 150 spots per gel were resolved from CSF from CM and EN patients and 80 spots from ABM patients. In the gels from the CM and EN groups, 45 human proteins were found whilst 20 human proteins were unique to ABM compared to CM. For CSF, a total of 202 human proteins were identified using the 2D-LC system. Of these 13 were unique to CM, 124 to ABM and 32 to EN. 6 proteins were found in both CM and ABM and 18 were found in EN and ABM. 9 proteins were common to all 3 disease groups. A total of 66 P. falciparum proteins were identified but of these 48 were hypothetical proteins. Of the non-hypothetical proteins, 2 were found in both CM and ABM and the rest were found only in ABM.
Results show that proteomics can be used to create protein profiles of different disease groups. Majority of the human proteins identified by 2-DE were found to be high abundant proteins found in CSF and plasma. The use of 2D-LC enabled the identification of more low abundant proteins but some of the P. falciparum proteins identified by 2-DE were not seen in the 2D-LC method. Majority of the human proteins found were acute phase response plasma proteins including common circulating proteins such as albumin and apolipoproteins, blood transporters and binding proteins, protease inhibitors, enzymes, cytokines and hormones, and channel and receptor-derived proteins. There seems to be a correlation between the number of proteins found in the CSF and the level of blood brain barrier break down
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework
The burgeoning growth of public domain data and the increasing complexity of
deep learning model architectures have underscored the need for more efficient
data representation and analysis techniques. This paper is motivated by the
work of (Helal, 2023) and aims to present a comprehensive overview of
tensorization. This transformative approach bridges the gap between the
inherently multidimensional nature of data and the simplified 2-dimensional
matrices commonly used in linear algebra-based machine learning algorithms.
This paper explores the steps involved in tensorization, multidimensional data
sources, various multiway analysis methods employed, and the benefits of these
approaches. A small example of Blind Source Separation (BSS) is presented
comparing 2-dimensional algorithms and a multiway algorithm in Python. Results
indicate that multiway analysis is more expressive. Contrary to the intuition
of the dimensionality curse, utilising multidimensional datasets in their
native form and applying multiway analysis methods grounded in multilinear
algebra reveal a profound capacity to capture intricate interrelationships
among various dimensions while, surprisingly, reducing the number of model
parameters and accelerating processing. A survey of the multi-away analysis
methods and integration with various Deep Neural Networks models is presented
using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table
- …