Search CORE

106 research outputs found

Handling metadata in the scope of coreference detection in data collections

Author: Szymczak Marcin
Publication venue: Polisch Academy of Sciences. Systems Research Institute ; Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Coreference detection in XML metadata

Author: De Tré Guy
Szymczak Marcin
Zadrozny Slawomir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Preserving data quality is an important issue in data collection management. One of the crucial issues hereby is the detection of duplicate objects (called coreferent objects) which describe the same entity, but in different ways. In this paper we present a method for detecting coreferent objects in metadata, in particular in XML schemas. Our approach consists in comparing the paths from a root element to a given element in the schema. Each path precisely defines the context and location of a specific element in the schema. Path matching is based on the comparison of the different steps of which paths are composed. The uncertainty about the matching of steps is expressed with possibilistic truth values and aggregated using the Sugeno integral. The discovered coreference of paths can help for determining the coreference of different XML schemas

Ghent University Academic Bibliography

Programming language semantics as a foundation for Bayesian inference

Author: Szymczak Marcin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/07/2018
Field of study

Bayesian modelling, in which our prior belief about the distribution on model parameters is updated by observed data, is a popular approach to statistical data analysis. However, writing specific inference algorithms for Bayesian models by hand is time-consuming and requires significant machine learning expertise. Probabilistic programming promises to make Bayesian modelling easier and more accessible by letting the user express a generative model as a short computer program (with random variables), leaving inference to the generic algorithm provided by the compiler of the given language. However, it is not easy to design a probabilistic programming language correctly and define the meaning of programs expressible in it. Moreover, the inference algorithms used by probabilistic programming systems usually lack formal correctness proofs and bugs have been found in some of them, which limits the confidence one can have in the results they return. In this work, we apply ideas from the areas of programming language theory and statistics to show that probabilistic programming can be a reliable tool for Bayesian inference. The first part of this dissertation concerns the design, semantics and type system of a new, substantially enhanced version of the Tabular language. Tabular is a schema-based probabilistic language, which means that instead of writing a full program, the user only has to annotate the columns of a schema with expressions generating corresponding values. By adopting this paradigm, Tabular aims to be user-friendly, but this unusual design also makes it harder to define the syntax and semantics correctly and reason about the language. We define the syntax of a version of Tabular extended with user-defined functions and pseudo-deterministic queries, design a dependent type system for this language and endow it with a precise semantics. We also extend Tabular with a concise formula notation for hierarchical linear regressions, define the type system of this extended language and show how to reduce it to pure Tabular. In the second part of this dissertation, we present the first correctness proof for a Metropolis-Hastings sampling algorithm for a higher-order probabilistic language. We define a measure-theoretic semantics of the language by means of an operationally-defined density function on program traces (sequences of random variables) and a map from traces to program outputs. We then show that the distribution of samples returned by our algorithm (a variant of “Trace MCMC” used by the Church language) matches the program semantics in the limit

Edinburgh Research Archive

Implementing the Duty Trip Support Application

Author: Grzegorz Frąckowiak
Marcin Paprzycki
Maria Ganzha
Michał Szymczak
Myon Woong Park
Sang Keun Rhee
Publication venue
Publication date
Field of study

We are in the process of developing an agent and ontology-based Duty Trip Support application. The goal of this paper is to consider issues arising when implementing such a system. In addition to the description of our current implementation, which is also critically analyzed, other possible approaches are considered as well.software agents, agent systems, ontologies, transport objects, agent-non-agent integration.

Research Papers in Economics

System SINUS – otwarte narzędzie do budowy bibliograficznych baz danych

Author: Błaszczyńska Marzena
Kozak Michał
Mazurek Cezary
Szymczak Marcin
Werla Marcin
Publication venue: Stowarzyszenie EBIB
Publication date: 26/06/2017
Field of study

The aim of this paper is to present new open tool for building bibliographic databases. SINUS system, developed by Poznań Supercomputing and Networking Center, was initially created to fulfill the needs related to management of scientific publications of Poznań University of Technology staff. In the paper we present basic functional assumptions of the system, its current functionality and future development directions

E-LIS

Wydawnictwa EBIB

Recalibrating classifiers for interpretable abusive content detection

Author: Hale Scott
Kammar Ohad
Margetts Helen
Melham Tom
Staton Sam
Szymczak Marcin
Vidgen Bertie
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 09/10/2020
Field of study

Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election. A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

System SINUS – otwarte narzędzie do budowy bibliograficznych baz danych

Author: Błaszczyńska Marzena
Kozak Michał
Mazurek Cezary
Szymczak Marcin
Werla Marcin
Publication venue: Stowarzyszenie EBIB
Publication date
Field of study

Streszczenie: Celem artykułu jest przedstawienie nowego, otwartego narzędzia do budowy bibliograficznych baz danych. System SINUS, opracowany w Poznańskim Centrum Superkomputerowo-Sieciowym (PCSS), powstał pierwotnie na potrzeby zarządzania danymi o publikacjach naukowych pracowników Politechniki Poznańskiej (PP). W referacie omówione są podstawowe założenia, jakie przyświecały twórcom systemu, jego funkcjonalność oraz dotychczasowe wykorzystanie, a także dalsze kierunki rozwoju.Abstract: The aim of this paper is to present new open tool for building bibliographic databases. SINUS system, developed by Poznań Supercomputing and Networking Center, was initially created to fulfill the needs related to management of scientific publications of Poznań University of Technology staff. In the paper we present basic functional assumptions of the system, its current functionality and future development directions

Wydawnictwa EBIB

Semantical mapping of attribute values for data integration

Author: Bronselaer Antoon
De Tré Guy
Szymczak Marcin
Zadrozny Slawomir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Nowadays the amount of data is increasing very fast. Moreover, useful information is scattered over multiple sources. Therefore, automatic data integration that guarantees high data quality is extremely important. One of the crucial operations in integration of information from independent databases is detection of different representations of the same piece of information (called coreferent data) and translation of the representation of data from one source into the representation of the other source. That translation is also known as object mapping. In this paper, we investigate automatic mapping methods for attributes the values of which may need semantical comparison and can be sorted by means of an order relation that reflects a notion of generality. These mapping methods are investigated closely in terms of their effectiveness. An experimental evaluation of our method shows that using different mapping methods can enlarge a set of true positive mappings

Crossref

Ghent University Academic Bibliography

Fibers as carriers of microbial particles

Author: Agata Stobnicka
Anna Ławniczek-Wałczyk
Marcin Cyprowski
Małgorzata Gołofit-Szymczak
Rafał L. Górny
Publication venue: 'Nofer Institute of Occupational Medicine'
Publication date: 01/08/2015
Field of study

Background: The aim of the study was to assess the ability of natural, synthetic and semi-synthetic fibers to transport microbial particles. Material and Methods: The simultaneously settled dust and aerosol sampling was carried out in 3 industrial facilities processing natural (cotton, silk, flax, hemp), synthetic (polyamide, polyester, polyacrylonitrile, polypropylene) and semi-synthetic (viscose) fibrous materials; 2 stables where horses and sheep were bred; 4 homes where dogs or cats were kept and 1 zoo lion pavilion. All samples were laboratory analyzed for their microbiological purity. The isolated strains were qualitatively identified. To identify the structure and arrangement of fibers that may support transport of microbial particles, a scanning electron microscopy analysis was performed. Results: Both settled and airborne fibers transported analogous microorganisms. All synthetic, semi-synthetic and silk fibers, present as separated threads with smooth surface, were free from microbial contamination. Natural fibers with loose packing and rough surface (e.g., wool, horse hair), sheaf packing and septated surface (e.g., flax, hemp) or present as twisted ribbons with corrugated surface (cotton) were able to carry up to 9×105 cfu/g aerobic bacteria, 3.4×104 cfu/g anaerobic bacteria and 6.3×104 cfu/g of fungi, including pathogenic strains classified by Directive 2000/54/EC in hazard group 2. Conclusions: As plant and animal fibers are contaminated with a significant number of microorganisms, including pathogens, all of them should be mechanically eliminated from the environment. In factories, if the manufacturing process allows, they should be replaced by synthetic or semi-synthetic fibers. To avoid unwanted exposure to harmful microbial agents on fibers, the containment measures that efficiently limit their presence and dissemination in both occupational and non-occupational environments should be introduced. Med Pr 2015;66(4):511–52

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals