735 research outputs found

    Automated user modeling for personalized digital libraries

    Get PDF
    Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information

    Individuals tell a fascinating story: using unsupervised text mining methods to cluster policyholders based on their medical history

    Get PDF
    Background and objective: Classifying people according to their health profile is crucial in order to propose appropriate treatment. However, the medical diagnosis is sometimes not available. This is for example the case in health insurance, making the proposal of custom prevention plans difficult. When this is the case, an unsupervised clustering method is needed. This article aims to compare three different methods by adapting some text mining methods to the field of health insurance. Also, a new clustering stability measure is proposed in order to compare the stability of the tested processes. Methods : Nonnegative Matrix Factorization, the word2vec method, and marginalized Stacked Denoising Autoencoders are used and compared in order to create a high-quality input for a clustering method. A self-organizing map is then used to obtain the final clustering. A real health insurance database is used in order to test the methods. Results: the marginalized Stacked Denoising Autoencoder outperforms the other methods both in stability and result quality with our data. Conclusions: The use of text mining methods offers several possibilities to understand the context of any medical act. On a medical database, the process could reveal unexpected correlation between treatment, and thus, pathology. Moreover, this kind of method could exploit the refund dates contained in the data, but the tested method using temporality, word2vec, still needs to be improved since the results, even if satisfying, are not as better as the one offered by other methods

    SOMs for Machine Learning

    Get PDF

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Adapting image processing and clustering methods to productive efficiency analysis and benchmarking: A cross disciplinary approach

    Get PDF
    This dissertation explores the interdisciplinary applications of computational methods in quantitative economics. Particularly, this thesis focuses on problems in productive efficiency analysis and benchmarking that are hardly approachable or solvable using conventional methods. In productive efficiency analysis, null or zero values are often produced due to the wrong skewness or low kurtosis of the inefficiency distribution as against the distributional assumption on the inefficiency term. This thesis uses the deconvolution technique, which is traditionally used in image processing for noise removal, to develop a fully non-parametric method for efficiency estimation. Publications 1 and 2 are devoted to this topic, with focus being laid on the cross-sectional case and panel case, respectively. Through Monte-Carlo simulations and empirical applications to Finnish electricity distribution network data and Finnish banking data, the results show that the Richardson-Lucy blind deconvolution method is insensitive to the distributio-nal assumptions, robust to the data noise levels and heteroscedasticity on efficiency estimation. In benchmarking, which could be the next step of productive efficiency analysis, the 'best practice' target may not perform under the same operational environment with the DMU under study. This would render the benchmarks impractical to follow and adversely affects the managers to make the correct decisions on performance improvement of a DMU. This dissertation proposes a clustering-based benchmarking framework in Publication 3. The empirical study on Finnish electricity distribution network reveals that the proposed framework novels not only in its consideration on the differences of the operational environment among DMUs, but also its extreme flexibility. We conducted a comparison analysis on the different combinations of the clustering and efficiency estimation techniques using computational simulations and empirical applications to Finnish electricity distribution network data, based on which Publication 4 specifies an efficient combination for benchmarking in energy regulation.  This dissertation endeavors to solve problems in quantitative economics using interdisciplinary approaches. The methods developed benefit this field and the way how we approach the problems open a new perspective

    Machine Learning for Financial Prediction Under Regime Change Using Technical Analysis: A Systematic Review

    Get PDF
    Recent crises, recessions and bubbles have stressed the non-stationary nature and the presence of drastic structural changes in the financial domain. The most recent literature suggests the use of conventional machine learning and statistical approaches in this context. Unfortunately, several of these techniques are unable or slow to adapt to changes in the price-generation process. This study aims to survey the relevant literature on Machine Learning for financial prediction under regime change employing a systematic approach. It reviews key papers with a special emphasis on technical analysis. The study discusses the growing number of contributions that are bridging the gap between two separate communities, one focused on data stream learning and the other on economic research. However, it also makes apparent that we are still in an early stage. The range of machine learning algorithms that have been tested in this domain is very wide, but the results of the study do not suggest that currently there is a specific technique that is clearly dominant

    Loan modifications and risk of default: a Markov chains approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Risk Analysis and ManagementWith the housing crisis, credit risk analysis has had an exponentially increasing importance, since it is a key tool for banks’ credit risk management, as well as being of great relevance for rigorous regulation. Credit scoring models that rely on logistic regression have been the most widely applied to evaluate credit risk, more specifically to analyze the probability of default of a borrower when a credit contract initiates. However, these methods have some limitations, such as the inability to model the entire probabilistic structure of a process, namely, the life of a mortgage, since they essentially focus on binary outcomes. Thus, there is a weakness regarding the analysis and characterization of the behavior of borrowers over time and, consequently, a disregard of the multiple loan outcomes and the various transitions a borrower may face. Therefore, it hampers the understanding of the recurrence of risk events. A discrete-time Markov chain model is applied in order to overcome these limitations. Several states and transitions are considered with the purpose of perceiving a borrower’s behavior and estimating his default risk before and after some modifications are made, along with the determinants of post-modification mortgage outcomes. Mortgages loans are considered in order to take a reasonable timeline towards a proper assessment of different loan performances. In addition to analyzing the impact of modifications, this work aims to identify and evaluate the main risk factors among borrowers that justify transitions to default states and different loan outcomes
    • …
    corecore