Search CORE

60,504 research outputs found

Multi-node approach for map data processing

Author: M Haklay
M Haklay
P Neis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

OpenStreetMap (OSM) is a popular collaborative open-source project that offers free editable map across the whole world. However, this data often needs a further on-purpose processing to become the utmost valuable information to work with. That is why the main motivation of this paper is to propose a design for big data processing along with data mining leading to the obtaining of statistics with a focus on the detail of a traffic data as a result in order to create graphs representing a road network. To ensure our High-Performance Computing (HPC) platform routing algorithms work correctly, it is absolutely essential to prepare OSM data to be useful and applicable for above-mentioned graph, and to store this persistent data in both spatial database and HDF5 format.Web of Science8971049

Crossref

DSpace at VSB Technical University of Ostrava

CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

Author: Blase Jennifer
Chu Xu
Li Peng
Rao Xi
Zhang Ce
Zhang Yue
Publication venue
Publication date: 01/01/2020
Field of study

Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Gravity optimised particle filter for hand tracking

Author: Arulampalam
Bradley
Bradski
Bray
Chang
Cohen
Deutscher
Douglas
Erol
Francke
Gordon
Ho
Homma K
Isard
Kitagawa
Kitagawa
Lee
Malik Morshidi
Musso
Pearson
Pitt
Shan
Sklansky
Stefanov
Tardi Tjahjadi
Wang
Wang
Wu
Yoruk
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

This paper presents a gravity optimised particle filter (GOPF) where the magnitude of the gravitational force for every particle is proportional to its weight. GOPF attracts nearby particles and replicates new particles as if moving the particles towards the peak of the likelihood distribution, improving the sampling efficiency. GOPF is incorporated into a technique for hand features tracking. A fast approach to hand features detection and labelling using convexity defects is also presented. Experimental results show that GOPF outperforms the standard particle filter and its variants, as well as state-of-the-art CamShift guided particle filter using a significantly reduced number of particles

CiteSeerX

Crossref

The International Islamic University Malaysia Repository

Warwick Research Archives Portal Repository

Shinren : Non-monotonic trust management for distributed systems

Author: Dong Changyu
Dulay Naranker
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The open and dynamic nature of modern distributed systems and pervasive environments presents signiﬁcant challenges to security management. One solution may be trust management which utilises the notion of trust in order to specify and interpret security policies and make decisions on security-related actions. Most trust management systems assume monotonicity where additional information can only result in the increasing of trust. The monotonic assumption oversimpliﬁes the real world by not considering negative information, thus it cannot handle many real world scenarios. In this paper we present Shinren, a novel non-monotonic trust management system based on bilattice theory and the anyworld assumption. Shinren takes into account negative information and supports reasoning with incomplete information, uncertainty and inconsistency. Information from multiple sources such as credentials, recommendations, reputation and local knowledge can be used and combined in order to establish trust. Shinren also supports prioritisation which is important in decision making and resolving modality conﬂicts that are caused by non-monotonicity

University of Strathclyde Institutional Repository

Spiral - Imperial College Digital Repository

Making large information sources better accessible using fuzzy set theory

Author: B.P. Buckles
G. Tré De
G. Tré De
G. Tré De
G. Tré De
G. Tré De
H. Prade
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Ghent University Academic Bibliography

Accuracy of Author Names in Bibliographic Data Sources: An Italian Case Study

Author: Demetrescu Camil
Ribichini Andrea
Schaerf Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages

Archivio della ricerca- Università di Roma La Sapienza