Search CORE

14,234 research outputs found

Quality Assessment of Linked Datasets using Probabilistic Approximation

Author: A Hogan
AZ Broder
BH Bloom
C Guéret
JS Vitter
P Hitzler
Publication venue
Publication date: 17/03/2015
Field of study

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

A Probabilistic Embedding Clustering Method for Urban Structure Detection

Author: H. Li
L. Gao
L. Zhao
M. Deng
X. Lin
X. Lin
Y. Zhang
Publication venue
Publication date: 12/07/2017
Field of study

Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by learning via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.Comment: 6 pages, 7 figures, ICSDM201

arXiv.org e-Print Archive

Directory of Open Access Journals

Complex Networks

Author: Evans T. S.
Publication venue: 'Informa UK Limited'
Publication date: 06/05/2004
Field of study

An outline of recent work on complex networks is given from the point of view of a physicist. Motivation, achievements and goals are discussed with some of the typical applications from a wide range of academic fields. An introduction to the relevant literature and useful resources is also given.Comment: Review for Contemporary Physics, 31 page

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Supervised Typing of Big Graphs using Semantic Embeddings

Author: Ma Yongtao
Pennington Jeffrey
Rosati Jessica
Sahlgren Magnus
Turian Joseph
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/03/2017
Field of study

We propose a supervised algorithm for generating type embeddings in the same semantic vector space as a given set of entity embeddings. The algorithm is agnostic to the derivation of the underlying entity embeddings. It does not require any manual feature engineering, generalizes well to hundreds of types and achieves near-linear scaling on Big Graphs containing many millions of triples and instances by virtue of an incremental execution. We demonstrate the utility of the embeddings on a type recommendation task, outperforming a non-parametric feature-agnostic baseline while achieving 15x speedup and near-constant memory usage on a full partition of DBpedia. Using state-of-the-art visualization, we illustrate the agreement of our extensionally derived DBpedia type embeddings with the manually curated domain ontology. Finally, we use the embeddings to probabilistically cluster about 4 million DBpedia instances into 415 types in the DBpedia ontology.Comment: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD 2017; extended version in preparation for Open Journal of Semantic Web (OJSW

arXiv.org e-Print Archive

Crossref