222 research outputs found
On the consistency of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea
which employs information theoretic concept in order to create a multithreshold
maximum margin model. In this paper we analyze its consistency over
multithreshold linear models and show that its objective function upper bounds
the amount of misclassified points in a similar manner like hinge loss does in
support vector machines. For further confirmation we also conduct some
numerical experiments on five datasets.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
Experiments on Synchronizing Automata
This work is motivated by the ˇCern´y Conjecture – an old unsolved problem in the automata theory. We describe the results of the experiments on synchronizing automata, which have led us to two interesting results. The first one is that the size of an automaton alphabet may play an important role in the issue of synchronization: we have found a 5-state automaton over a 3-letter alphabet which attains the upper bound from the ˇCern´y Conjecture, while there is no such automaton (except ˇCern´y automaton C5) over a binary alphabet. The second result emerging from the experiments is a theorem describing the dependencies between the automaton structure S expressed in terms of the so-called merging system and the maximal length of all minimal synchronizing words for automata of type S
Fast optimization of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a density based model
which searches for a linear projection maximizing the Cauchy-Schwarz Divergence
of dataset kernel density estimation. Despite its good empirical results, one
of its drawbacks is the optimization speed. In this paper we analyze how one
can speed it up through solving an approximate problem. We analyze two methods,
both similar to the approximate solutions of the Kernel Density Estimation
querying and provide adaptive schemes for selecting a crucial parameters based
on user-specified acceptable error. Furthermore we show how one can exploit
well known conjugate gradients and L-BFGS optimizers despite the fact that the
original optimization problem should be solved on the sphere. All above methods
and modifications are tested on 10 real life datasets from UCI repository to
confirm their practical usability.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
Java based transistor level CPU simulation speedup techniques
Transistor level simulation of the CPU, while very accurate, brings also the performance challenge. MOS6502 CPU simulation algorithm is analysed with several optimisation techniques proposed. Application of these techniques improved the transistor level simulation speed by a factor of 3–4, bringing it to the levels on par with fastest RTL-level simulations so far
Impact of Clustering Parameters on the Efficiency of the Knowledge Mining Process in Rule-based Knowledge Bases
In this work the subject of the application of clustering as a knowledge
extraction method from real-world data is discussed. The authors analyze
an influence of different clustering parameters on the quality of the created
structure of rules clusters and the efficiency of the knowledge mining process for
rules / rules clusters. The goal of the experiments was to measure the impact of
clustering parameters on the efficiency of the knowledge mining process in rulebased
knowledge bases denoted by the size of the created clusters or the size
of the representatives. Some parameters guarantee to produce shorter/longer
representatives of the created rules clusters as well as smaller/greater clusters
sizes
An improvement in fuzzy entropy edge detection for X-ray imaging
The following paper discusses the topic of edge detection in X-ray hand images. It criticises the existing solution by highlighting a design fault, which is a carelessly chosen function and then proposes a way to eliminate the fault by replacing it with a better suited function. The search for this function and its results are also discussed in this paper. It also presents the aspect of pre- and postprocessing through filtering as another improvement in edge detection
Approaching automatic cyberbullying detection for Polish tweets
This paper presents contribution to PolEval 20191 automatic cyberbullying detection task. The goal of the task is to classify tweets as harmful or normal. Firstly, the data is preprocessed. Then two classifiers adjusted to the problem are tested: Flair and fastText. Flair utilizes
character-based language models, which are evaluated using perplexity. Both classifiers obtained similar scores on test data
- …