Search CORE

27 research outputs found

Czech Text Document Corpus v 2.0

Author: Král Pavel
Lenc Ladislav
Publication venue
Publication date: 30/01/2018
Field of study

This paper introduces "Czech Text Document Corpus v 2.0", a collection of text documents for automatic document classification in Czech language. It is composed of the text documents provided by the Czech News Agency and is freely available for research purposes at http://ctdc.kiv.zcu.cz/. This corpus was created in order to facilitate a straightforward comparison of the document classification approaches on Czech data. It is particularly dedicated to evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is also annotated at the morphological layer. This paper further shows the results of selected state-of-the-art methods on this corpus to offer the possibility of an easy comparison with these approaches.Comment: Accepted for LREC 201

arXiv.org e-Print Archive

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Dialogue Act Recognition using Visual Information

Author: Král Pavel
Lenc Ladislav
Martínek Jiří
Publication venue: Západočeská univerzita v Plzni
Publication date: 01/01/2021
Field of study

Cross-border Cooperation Program Czech Republic - Free State of Bavaria ETS Objective 2014-2020 (project no. 211) and by Grant No.SGS-2019-01

DSpace at University of West Bohemia

Well-calibrated Confidence Measures for Multi-label Text Classification with a Large Number of Labels

Author: Král Pavel
Lenc Ladislav
Maltoudoglou Lysimachos
Martínek Jiří
Paisios Andreas
Papadopoulos Harris
Publication venue
Publication date: 14/12/2023
Field of study

We extend our previous work on Inductive Conformal Prediction (ICP) for multi-label text classification and present a novel approach for addressing the computational inefficiency of the Label Powerset (LP) ICP, arrising when dealing with a high number of unique labels. We present experimental results using the original and the proposed efficient LP-ICP on two English and one Czech language data-sets. Specifically, we apply the LP-ICP on three deep Artificial Neural Network (ANN) classifiers of two types: one based on contextualised (bert) and two on non-contextualised (word2vec) word-embeddings. In the LP-ICP setting we assign nonconformity scores to label-sets from which the corresponding p-values and prediction-sets are determined. Our approach deals with the increased computational burden of LP by eliminating from consideration a significant number of label-sets that will surely have p-values below the specified significance level. This reduces dramatically the computational complexity of the approach while fully respecting the standard CP guarantees. Our experimental results show that the contextualised-based classifier surpasses the non-contextualised-based ones and obtains state-of-the-art performance for all data-sets examined. The good performance of the underlying classifiers is carried on to their ICP counterparts without any significant accuracy loss, but with the added benefits of ICP, i.e. the confidence information encapsulated in the prediction sets. We experimentally demonstrate that the resulting prediction sets can be tight enough to be practically useful even though the set of all possible label-sets contains more than

1e+16

combinations. Additionally, the empirical error rates of the obtained prediction-sets confirm that our outputs are well-calibrated

arXiv.org e-Print Archive

Historical Map Toponym Extraction for Efficient Information Retrieval

Author: Baloun Josef
Král Pavel
Lenc Ladislav
Martínek Jiří
Prantl Martin
Publication venue: Západočeská univerzita v Plzni
Publication date: 01/01/2022
Field of study

ERDF ”Research and Development of Intelligent Components of Advanced Technologies for the Pilsen Metropolitan Area (InteCom)” (no.:CZ.02.1.01/0.0/0.0/17 048/0007267), Grant No. SGS-2022-016 ”Advanced methods of data processing and analysis

DSpace at University of West Bohemia

Rozpoznávání obličejů v reálných podmínkách

Author: Lenc Ladislav
Publication venue: Západočeská univerzita v Plzni
Publication date: 01/09/2014
Field of study

Práce se zabývá rozpoznáváním obličejů v reálných podmínkách. Hlavním cílem je návrh systému pro automatické anotování fotografií z fotobanky ČTK. Prvním krokem je vytvoření korpusu z anotovaných fotografií. Cílem je výběr fotografií vhodných pro vytvoření modelu obličeje. Přínosem práce je návrh algoritmu pro automatické vytvoření korpusu. Pomocí tohoto algoritmu byl vytvořen nový obličejový korpus volně dostupný pro výzkumné účely. Druhým krokem je rozpoznávání. Přínosem práce v tomto směru je návrh několika metod založených na Gaborových waveletech a algoritmu Scale Invariant Feature Transform (SIFT). Metody byly testovány na databázích ORL, FERET a nově vytvořené ČTK databázi. Na základě testů byla jako nejvhodnější kandidát pro náš systém vybrána adaptovaná Kepenekciho metoda založena na algoritmu SIFT. Posledním krokem systému je použití míry důvěry. Ta umožňuje stanovit pravděpodobnost, že výsledek je správný. Přínosem práce je návrh nové dvou krokové míry důvěry. Hlavním výsledkem práce je ucelený systém pro rozpoznávání obličejů. Probíhají jednání o nasazení systému v prostředí ČTK.Katedra informatiky a výpočetní technikyObhájenoThis thesis deals with Automatic Face Recognition under real-world conditions. The main goal of this work is proposing a complete face recognition system intended to be used by the Czech News Agency (ČTK) for automatic annotation of photographs. The first task is to prepare a gallery of known faces. The first contribution of this work is the proposition of an automatic corpus creation algorithm. The goal is to choose the best representing images for each person. An important outcome is the creation of a novel face dataset created using this algorithm. The next step is the face recognition. Our contribution is propsition of several Gabor wavelet and Scale Invariant Feature Transform (SIFT) based methods. We chose the SIFT based adapted Kepenekci method as the best candidate for our system. The final step of the sytem is a confidence measure. It defines the probability that the result is correct. We proposed a novel two-step confidence measure approach for the face recognition. The final outcome of this work is thus a complete face recognition system capable to handle real-world photographs. Currently, discussions about the deployment of the system are under way

University of West Bohemia Digital Library

DSpace at University of West Bohemia

Vylepšení metod pro rozpoznávání obličejů založených na deskriptorech POEM

Author: Král Pavel
Lenc Ladislav
Publication venue: 'Scitepress'
Publication date: 01/01/2020
Field of study

Obvyklý způsob použití POEM deskriptorů je vytvoření příznaků v pravidelných obdélníkových regionech, které pokrývají celý snímek. Příznaky jsou spojeny do jednoho vektoru, který reprezentuje snímek obličeje. V článku je navržena vylepšená metoda, která využívá automaticky detekované body pro vytvoření příznaků. Zároveň je použita komplexnější metoda pro porovnávání příznakových vektorů. Navržená metoda nalezne uplatnění zejména v případech, kdy je k dispozici omezené množství dat a použití např. neuronových sítí by proto bylo obtížné. Metoda je testována na třech standardních obličejových korpusech. Dosažené výsledky ukazují, že použití POEM deskriptorů a příznaků, vytvořených v automaticky detekovaných bodech, dosahuje výrazně lepších výsledků, než základní metody.The usual way how POEM descriptors are utilized consists in constructing features in rectangular non-overlapping regions covering the whole image. The features created in the regions are then concatenated into one long vector representing the face. We propose an enhancement of this method using automatic key-point identification strategies. In our approach, the image features are created in the detected key-points. We also employ a more complex matching procedure that compares the features individually. This method is efficient particularly when the number of training samples is small and therefore neural network based methods fail, because they do not have enough training data. The proposed approach is evaluated on three standard face corpora. The obtained results show that the combination of POEM features with the automatic point identification and a more sophisticated matching algorithm brings significant improvement over the baseline method

Crossref

University of West Bohemia Digital Library

DSpace at University of West Bohemia

SAPKOS: Experimental Czech Multi-label Document Classification and Analysis System

Author: Král Pavel
Lenc Ladislav
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2015
Field of study

Part 6: Classification, Clustering, and ReasoningInternational audienceThis paper presents an experimental multi-label document classification and analysis system called SAPKOS. The system which integrates the state-of-the-art machine learning and natural language processing approaches is intended to be used by the Czech news Agency (ČTK). Its main purpose is to save human resources in the task of annotation of newspaper articles with topics. Another important functionality is automatic comparison of the ČTK production with popular Czech media. The results of this analysis will be used to adapt the ČTK production to better correspond to the today’s market requirements. An interesting contribution is that, to the best of our knowledge, no other automatic Czech document classification system exists. It is also worth mentioning that the system accuracy is very high. This score is obtained due to the unique system architecture which integrates a maximum entropy based classification engine with the novel confidence measure method

Novel Matching Methods for Automatic Face Recognition Using SIFT

Author: Kral Pavel
Lenc Ladislav
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Part 6: Classification Pattern RecognitionInternational audienceThe object of interest of this paper is Automatic Face Recognition (AFR). The usual methods need a labeled corpus and the number of training examples plays a crucial role for the recognition accuracy. Unfortunately, the corpus creation is very expensive and time consuming task. Therefore, the motivation of this work is to propose and implement new AFR approaches that could solve this issue and perform well also with few training examples. Our approaches extend the successful method based on the Scale Invariant Feature Transform (SIFT) proposed by Aly. We propose and evaluate two methods: the Lenc-Kral matching and the SIFT based Kepenekci approach [7]. Our approaches are evaluated on two face data-sets: the ORL database and the Czech News Agency (ČTK) corpus. We experimentally show that the proposed approaches significantly outperform the baseline Aly method on both corpora

Crossref