Search CORE

360 research outputs found

An unsupervised data-driven method to discover equivalent relations in large linked datasets

Author: Caliński
Cruz
Fu
Hripcsak
Hu
Jean-Mary
Jenks
Lambrix
Lehmann
Li
Limaye
Pavel
Schopman
Seddiqui
Suchanek
Zhang
Publication venue: 'IOS Press'
Publication date: 06/12/2016
Field of study

This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the ‘well-formedness’ of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. We propose a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We introduce a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context

Publikationer från Linköpings universitet

Crossref

Nottingham Trent Institutional Repository (IRep)

MAnnheim DOCument Server

Digitala Vetenskapliga Arkivet - Academic Archive On-line

White Rose Research Online

Parallel Mapper

Author: A Collins
D Günther
E Carlsson
G Carlsson
G Carlsson
G Carlsson
G Carlsson
J-D Boissonnat
JR Munkres
LW Beineke
M Nicolau
N Otter
N Shivashankar
PY Lum
R Ghrist
RW Sumner
T Caliński
U Bauer
V Pascucci
V Robins
V Snášel
Y Hiraoka
Publication venue
Publication date: 11/05/2020
Field of study

The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method

arXiv.org e-Print Archive

Crossref

Comparing ultra-high spatial resolution remote-sensing methods in mapping peatland vegetation

Author: Arroyo‐Mora J. P.
Bray J. R.
Caliński T.
Hill M. O.
Liaw A.
Lovitt J.
Ridgeway G.
Rouse J. W. J.
Publication venue
Publication date: 01/09/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Una taxonomía multidimensional de Estados desarrollistas

Author: Alkire
Barro
Berg
Besley
Caliński
Dolnicar
Everitt
Fritz
Gerring
Hanushek
Hanushek
Hayashi
Hsu
Kwon
Leftwich
Mkandawire
Ranis
Ranis
Shadlen
Streeten
Swilling
Thomas
Vu
Wade
Ward Jr
Wong
Yeung
Publication venue: 'Universidad Nacional Autonoma de Mexico'
Publication date: 01/01/2020
Field of study

ABSTRACT. This paper proposes a new approach to the classification of Developmental States (DS) based on their public efforts to foster human development. We conceptualize DS within a multidimensional framework that includes three main dimensions (economic, social and democratic), and run a hierarchical cluster analysis for 112 countries in order to build a multidimensional taxonomy of DS. We propose a country classification and characterize three country-groups with different developmental public efforts: i) the human development States; ii) the unbalanced developmental States and iii) the non-developmental States. Our multidimensional taxonomy offers a more complex understanding of the variety of public efforts devoted to promote human development, thus overcoming the restricted - economical - conception of DS, which is mainly focused to the East Asian region. Key Words: developmental states; multidimensional taxonomy; social equality and democratic participation; welfare states; economic growth.RESUMEN. Este trabajo propone un nuevo marco para clasificar a los Estados Desarrollistas (ED) basado en sus esfuerzos para mejorar el desarrollo humano. Se conceptualiza a los ED en un marco multidimensional con tres dimensiones principales (económica, social y democrática) y se realizó un análisis de clúster jerárquico para 112 economías con el fin de construir dicha taxonomía. Se propone una clasificación por país y se clasifican tres grupos en función de sus esfuerzos desarrollistas: i) los Estados de desarrollo humano; ii) los Estados desarrollistas desbalanceados y iii) los Estados nodesarrollistas. La taxonomía multidimensional ofrece un entendimiento más complejo de la variedad de esfuerzos públicos para promover el desarrollo humano, superando así la concepción - económica - restringida de los ED prevaleciente en la región del Este Asiático

Crossref

UCrea

The 2D shape structure dataset: A user annotated open access database

Author: Axel Carlier
Caliński
Chen
Cohen
Dawid
De Winter
Firestone
Geraldine Morin
Kathryn Leonard
Kimia
Lien
Lu
Luo
Misha Collins
Ng
Oleson
Shamir
Siddiqi
Singh
Snodgrass
Stefanie Hahmann
Walker
Wang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

International audienceIn this paper we present the 2D Shape Structure database, a public, user-generated dataset of 2D shape decompositions into a hierarchy of shape parts with geometric relationships retained. It is the outcome of a large-scale user study obtained by crowdsourcing, involving over 1200 shapes in 70 shape classes, and 2861 participants. A total of 41953 annotations has been collected with at least 24 annotations per shape. For each shape, user decompositions into main shape, one or more levels of parts, and a level of details are available. This database reinforces a philosophy that understanding shape structure as a whole, rather than in the separated categories of parts decomposition, parts hierarchy, and analysis of relationships between parts, is crucial for full shape understanding. We provide initial statistical explorations of the data to determine representative (" mean ") shape annotations and to determine the number of modes in the annotations. The primary goal of the paper is to make this rich and complex database openly available (through the website http://2dshapesstructure.github.io/index.html), providing the shape community with a ground truth of human perception of holistic shape structure

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

Recovering the number of clusters in data sets with noise features using feature rescaling factors

Author: Arbelaitz
Ball
Bezdek
Caliński
Chan
Chiang
Chiang
Christian Hennig
David
de Amorim
de Amorim
de Amorim
Dudoit
Dunn
Gasch
Halkidi
Hartigan
Hennig
Huang
Huang
Hubert
Jain
Jain
Kaufman
MacQueen
Milligan
Mirkin
Pollard
Renato Cordeiro de Amorim
Rousseeuw
Steinley
Steinley
Sturn
Vedaldi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.Peer reviewe

arXiv.org e-Print Archive

University of Essex Research Repository

Crossref

UCL Discovery

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

University of Hertfordshire Research Archive

MaxMin Linear Initialization for Fuzzy C-Means

Author: AM Bensaid
D Steinley
DJ Hand
EH Ruspini
GN Lance
HS Park
J. C. Dunn
JC Bezdek
ME Celebi
MJ Norušis
NR Pal
S Wold
T Caliński
T Su
TF Gonzalez
V Faber
W Wang
XL Xie
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 14/07/2018
Field of study

International audienceClustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categorization. Thus, we need more flexible clustering. Fuzzy clustering methods, where each data point can belong to several clusters, are an interesting alternative. Yet, seeding iterative fuzzy algorithms to achieve high quality clustering is an issue. In this paper, we propose a new linear and efficient initialization algorithm MaxMin Linear to deal with this problem. Then, we validate our theoretical results through extensive experiments on a variety of numerical real-world and artificial datasets. We also test several validity indices, including a new validity index that we propose, Transformed Standardized Fuzzy Difference (TSFD)

arXiv.org e-Print Archive

Chronic pelvic pain in women of reproductive and post-reproductive age : a population-based study

Author: Aaron
Boersma
Breivik
Brown
Caliński
Chalder
Chung
Cross
Daniels
Docking
Enright
Galea
Garcia-Perez
Grace
Gupta
Halder
Hassan
Hays
Hill
Hill
Jenkins
Kroenke
Latthe
Leino
Loving
Mathias
McBeth
McBeth
McGowan
McGowan
Mercado
Othmer
Pitts
Reiter
Shaw
Silva
Sinaii
Viniol
Warren
Whitehead
Zondervan
Zondervan
Publication venue: 'Wiley'
Publication date: 09/02/2017
Field of study

Background Epidemiological studies on chronic pelvic pain (CPP) have focused on women of reproductive age. We aimed to determine the prevalence of chronic pelvic pain (CPP) in adult women and the differences in associated factors among women of reproductive age and older women. In addition, to determine whether distinct subgroups existed among CPP cases. Methods A cross-sectional postal survey was conducted among 5300 randomly selected women aged ≥25 years resident in the Grampian region, UK. Multivariable logistic regression was used to determine pregnancy-related and psychosocial factors associated with CPP. To identify subgroups of CPP cases, we performed cluster analysis using variables of pain severity, psychosocial factors and pain coping strategies. Results Of 2088 participants, 309 (14.8%) reported CPP. CPP was significantly associated with being of reproductive age (odds ratios (OR) 2.43, 95% CI 1.69–3.48), multiple non-pain somatic symptoms (OR 3.58 95% CI 2.23–5.75), having fatigue (OR mild 1.74 95% CI 1.24–2.44, moderate/severe 1.82, 95% CI 1.25–2.63) and having depression (OR 1.61, 95% CI 1.09–2.38). CPP was less associated with multiple non-pain somatic symptoms in women of reproductive age compared to older women (interaction OR 0.51, 95% CI 0.28–0.92). We identified two clusters of CPP cases; those having little/no psychosocial distress and those having high psychosocial distress. Conclusion CPP is common in both age groups, though women of reproductive age are more likely to report it. Heightened somatic awareness may be more strongly associated with CPP in older women. There are distinct groups of CPP cases characterized by the absence/presence of psychosocial distress

Aberdeen University Research

Crossref

Online Research @ Cardiff

Warwick Research Archives Portal Repository

The University of Manchester - Institutional Repository

An unsupervised data-driven method to discover equivalent relations in large Linked Datasets

Author: Caliński
Cruz
Fu
Hripcsak
Hu
Jean-Mary
Jenks
Lambrix
Lehmann
Li
Limaye
Pavel
Schopman
Seddiqui
Suchanek
Zhang
Publication venue: 'IOS Press'
Publication date
Field of study

Crossref

Contextual and Behavioral Customer Journey Discovery Using a Genetic Approach

Author: A Gabadinho
B Vázquez-Barreiros
G Bernard
İ Gürvardar
KN Lemon
S Peltola
T Caliński
VI Levenshtein
WMP Aalst van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

With the advent of new technologies and the increase in customers’ expectations, services are becoming more complex. This complexity calls for new methods to understand, analyze, and improve service delivery. Summarizing customers’ experience using representative journeys that are displayed on a Customer Journey Map (CJM) is one of these techniques. We propose a genetic algorithm that automatically builds a CJM from raw customer experience recorded in a database. Mining representative journeys can be seen a clustering task where both the sequence of activities and some contextual data (e.g., demographics) are considered when measuring the similarity between journeys. We show that our genetic approach outperforms traditional ways of handling this clustering task. Moreover, we apply our algorithm on a real dataset to highlight the benefit of using a genetic approach

Crossref

Serveur académique lausannois