33 research outputs found

    Категориально-информационная модель адаптивной системы непрерывного обучения

    Get PDF
    Рассмотрена категориально-информационная модель адаптивной системы поддержки непрерывного обучения, построенная в рамках теории распознавания образов и информационного критерия функциональной эффективности. Показан процесс оптимизации технологических и дидактических параметров системы поддержки обучения. Предложен вероятностный алгоритм дообучения системы в процессе ее функционирования и определения момента ее полного переобучения.Розглянуто категоріально-інформаційну модель адаптивної системи підтримки неперервного навчання, побудовану в межах теорії розпізнавання образів та інформаційного критерію функціональної ефективності. Показано процес оптимізації технологічних та дидактичних параметрів системи підтримки неперервного навчання. Запропоновано ймовірнісний алгоритм донавчання системи в процесі її функціонування та визначення моменту її повного перенавчання.A categorial-information model of the adaptive system to support the lifelong learning built in the framework of the theory of pattern recognition and an information criterion of performance is considered. The process of optimization of technological and didactic parameters of the system of learning support is described. A probabilistic algorithm to adapt the system in its functioning and to determine the moment of its full retraining is suggested

    Fractional norms and quasinorms do not help to overcome the curse of dimensionality

    Full text link
    The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant

    LDA-Based Industry Classification

    Get PDF
    Industry classification is a crucial step for financial analysis. However, existing industry classification schemes have several limitations. In order to overcome these limitations, in this paper, we propose an industry classification methodology on the basis of business commonalities using the topic features learned by the Latent Dirichlet Allocation (LDA) from firms’ business descriptions. Two types of classification – firm-centric classification and industry-centric classification were explored. Preliminary evaluation results showed the effectiveness of our method

    Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

    Full text link
    Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

    Automated Inference System for End-To-End Diagnosis of Network Performance Issues in Client-Terminal Devices

    Full text link
    Traditional network diagnosis methods of Client-Terminal Device (CTD) problems tend to be laborintensive, time consuming, and contribute to increased customer dissatisfaction. In this paper, we propose an automated solution for rapidly diagnose the root causes of network performance issues in CTD. Based on a new intelligent inference technique, we create the Intelligent Automated Client Diagnostic (IACD) system, which only relies on collection of Transmission Control Protocol (TCP) packet traces. Using soft-margin Support Vector Machine (SVM) classifiers, the system (i) distinguishes link problems from client problems and (ii) identifies characteristics unique to the specific fault to report the root cause. The modular design of the system enables support for new access link and fault types. Experimental evaluation demonstrated the capability of the IACD system to distinguish between faulty and healthy links and to diagnose the client faults with 98% accuracy. The system can perform fault diagnosis independent of the user's specific TCP implementation, enabling diagnosis of diverse range of client devicesComment: arXiv admin note: substantial text overlap with arXiv:1207.356

    Effective Dimensionality: A Tutorial

    Get PDF
    The topic of this tutorial is the effective dimensionality (ED) of a dataset, that is, the equivalent number of orthogonal dimensions that would produce the same overall pattern of covariation. The ED quantifies the total dimensionality of a set of variables, with no assumptions about their underlying structure. The ED of a dataset has important implications for the “curse of dimensionality”; it can be used to inform decisions about data analysis and answer meaningful empirical questions. The tutorial offers an accessible introduction to ED, distinguishes it from the related but distinct concept of intrinsic dimensionality, critically reviews various ED estimators, and gives indications for practical use with examples from personality research. An R function is provided to implement the techniques described in the tutorial
    corecore