1,428 research outputs found

    Data exploration process based on the self-organizing map

    Get PDF
    With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

    Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data. Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientist‟s requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...

    Adaptive firefly algorithm for hierarchical text clustering

    Get PDF
    Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    A method for incremental discovery of financial event types based on anomaly detection

    Full text link
    Event datasets in the financial domain are often constructed based on actual application scenarios, and their event types are weakly reusable due to scenario constraints; at the same time, the massive and diverse new financial big data cannot be limited to the event types defined for specific scenarios. This limitation of a small number of event types does not meet our research needs for more complex tasks such as the prediction of major financial events and the analysis of the ripple effects of financial events. In this paper, a three-stage approach is proposed to accomplish incremental discovery of event types. For an existing annotated financial event dataset, the three-stage approach consists of: for a set of financial event data with a mixture of original and unknown event types, a semi-supervised deep clustering model with anomaly detection is first applied to classify the data into normal and abnormal events, where abnormal events are events that do not belong to known types; then normal events are tagged with appropriate event types and abnormal events are reasonably clustered. Finally, a cluster keyword extraction method is used to recommend the type names of events for the new event clusters, thus incrementally discovering new event types. The proposed method is effective in the incremental discovery of new event types on real data sets.Comment: 11 pages,4 figure

    Outlier Detection Methods for Industrial Applications

    Get PDF
    An outlier is an observation (or measurement) that is different with respect to the other values contained in a given dataset. Outliers can be due to several causes. The measurement can be incorrectly observed, recorded or entered into the process computer, the observed datum can come from a different population with respect to the normal situation and thus is correctly measured but represents a rare event. In literature different definitions of outlier exist: the most commonly referred are reported in the following: - "An outlier is an observation that deviates so much from other observations as to arouse suspicions that is was generated by a different mechanism " (Hawkins, 1980). - "An outlier is an observation (or subset of observations) which appear to be inconsistent with the remainder of the dataset" (Barnet & Lewis, 1994). - "An outlier is an observation that lies outside the overall pattern of a distribution" (Moore and McCabe, 1999). - "Outliers are those data records that do not follow any pattern in an application" (Chen and al., 2002). - "An outlier in a set of data is an observation or a point that is considerably dissimilar or inconsistent with the remainder of the data" (Ramasmawy at al., 2000). Many data mining algorithms try to minimize the influence of outliers for instance on a final model to develop, or to eliminate them in the data pre-processing phase. However, a data miner should be careful when automatically detecting and eliminating outliers because, if the data are correct, their elimination can cause the loss of important hidden information (Kantardzic, 2003). Some data mining applications are focused on outlier detection and they are the essential result of a data-analysis (Sane & Ghatol, 2006). The outlier detection techniques find applications in credit card fraud, network robustness analysis, network intrusion detection, financial applications and marketing (Han & Kamber, 2001). A more exhaustive list of applications that exploit outlier detection is provided below (Hodge, 2004): - Fraud detection: fraudulent applications for credit cards, state benefits or fraudulent usage of credit cards or mobile phones. - Loan application processing: fraudulent applications or potentially problematical customers. - Intrusion detection, such as unauthorized access in computer networks
    • …
    corecore