80 research outputs found

    Data Mining on Sequences with recursive Self-Organizing Maps

    No full text
    Analyzing sequences of continuous data is an important step in perception. It has been shown that extensions of the self-organizing map (SOM) learn temporal dynamics. Here, the aim is to find an abstract symbolic description using the SOM for Structured Data (SOMSD). Sequences of real vectors with added noise are generated by stochastic processes described by Markov models and are trained to a SOMSD. Two algorithms are presented that can extract finite order Markov models (FOMMs) from the trained SOMSD using clusterings of the map according to each neuron's weight and context information. Clustering is done using U-Matrices. The algorithms succeed in producing Markov models similar to those used for sequence generation. Comparison of the extracted FOMMs and the input models allows inferences on how the temporal dynamics are represented and on whether the SOMSD in combination with U-Matrix clustering can be used for data mining on sequences

    Statistical evaluation of rough set dependency analysis

    No full text
    Rough set data analysis (RSDA) has recently become a frequently studied symbolic method in data mining. Among other things, it is being used for the extraction of rules from databases; it is, however, not clear from within the methods of rough set analysis, whether the extracted rules are valid. In this paper, we suggest to enhance RSDA by two simple statistical procedures, both based on randomization techniques, to evaluate the validity of prediction based on the approximation quality of attributes of rough set dependency analysis. The first procedure tests the casualness of a prediction to ensure that the prediction is not based on only a few (casual) observations. The second procedure tests the conditional casualness of an attribute within a prediction rule. The procedures are applied to three data sets, originally published in the context of rough set analysis. We argue that several claims of these analyses need to be modified because of lacking validity, and that other possibly significant results were overlooked

    The Open-Access Journal for the Basic Principles of Diffusion Theory, Experiment and Application Determining surface diffusion properties from signal fluctuations

    No full text
    To describe growth kinetics of adsorbates on surfaces, knowledge of diffusion coefficients of atoms and molecules on the surface is of vital importance, especially for controlling the self-assembly of organic molecules. These usually have high mobilities and sizes large compared to the lattice constant of the substrate. In this situation, not all techniques are equally well suited for determining the diffusion coefficient. A convenient and powerful method is the recording of temporal signal fluctuations of a locally fixed probe as, for example, the current of a scanning tunneling microscope or the frequency/height modulation of an atomic force microscope. Origin of the respective fluctuations are single molecules passing the probe. This method is limited neither to sufficiently small mobilities nor to large defect-free areas. Signal fluctuations from a fixed probe show well-defined peaks with stochastically varying widths and interpeak intervals. After transforming the signal into a rectangular one via a suitable noise-eliminating threshold value, a distribution of peak widths (Ψw) and interpeak distances (Ψd) can be obtained. In addition, the autocorrelation function (C) of the signal can be studied. We present a theory for these distributions[1], and, in extension to earlier treatments [2], for the autocorrelation function, that allows one to extract various diffusion parameters. Monte Carlo simulations of the problem are carried out for comparison with the theoretical predictions. Figure 1 shows representative examples for Ψw and Ψd, which exhibit distinct scaling behaviours in different time regimes in agreement with the theory. τ

    Word Formation in Computational Linguistics Pius ten Hacken

    No full text
    Motivation: why do we need a word formation component for computational linguistic applications? Where does word formation information play a role in computational linguistics? Think of information retrieval where the parts of a word might contain the important information. If we want to find information in German on relaxation techniques, we need to look for the verb entspannen 'to relax ' as well as for the derived noun Entspannung 'relaxation ' and compounds containing Entspannung, such as Entspannungsantwort 'relaxation response', Tiefenentspannung 'deep relaxation ' or Entspannungsübung 'relaxation exercise ' etc. Another example are Text-To-Speech systems where the structure of a word can tell us where the stress goes. Consider English words containing so-called neoclassical affixes (affixes of Latin or Greek etymology). Some of these affixes, such as –ation influence the stress of a word: re'lax vs. relax'ation. Even such seemingly 'basic ' components as part-of-speech taggers can (and often do) use word formation information, if only as heuristics. If a tagger encounters an unknown English word ending in the letters <ous>, for example, it ca

    Abstract

    No full text
    The main statistics used in rough set data analysis, the approximation quality, is of limited value when there is a choice of competing models for predicting a decision variable. In keeping within the rough set philosophy of non–invasive data analysis, we present three model selection criteria, using information theoretic entropy in the spirit of the minimum description length principle. Our main procedure is based on the principle of indifference combined with the maximum entropy principle, thus keeping external model assumptions to a minimum. The applicability of the proposed method is demonstrated by a comparison of its error rates with results of C4.5, using 14 published data sets. Key words: Rough set model, minimum description length principle, attribute prediction
    • …
    corecore