1,010 research outputs found

    Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

    Get PDF
    The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.Comment: 26 page

    Reducing the number of membership functions in linguistic variables

    Get PDF
    Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations ResearchThe purpose of this thesis was to develop algorithms to reduce the number of membership functions in a fuzzy linguistic variable. Groups of similar membership functions to be merged were found using clustering algorithms. By “summarizing” the information given by a similar group of membership functions into a new membership function we obtain a smaller set of membership functions representing the same concept as the initial linguistic variable. The complexity of clustering problems makes it difficult for exact methods to solve them in practical time. Heuristic methods were therefore used to find good quality solutions. A Scatter Search clustering algorithm was implemented in Matlab and compared to a variation of the K-Means algorithm. Computational results on two data sets are discussed. A case study with linguistic variables belonging to a fuzzy inference system automatically constructed from data collected by sensors while drilling in different scenarios is also studied. With these systems already constructed, the task was to reduce the number of membership functions in its linguistic variables without losing performance. A hierarchical clustering algorithm relying on performance measures for the inference system was implemented in Matlab. It was possible not only to simplify the inference system by reducing the number of membership functions in each linguistic variable but also to improve its performance

    Some new techniques for pattern recognition research and lung sound signal analysis

    Get PDF
    This thesis describes the results of a collaborative research programme between the Department of Electronics & Electrical Engineering, University of Glasgow, and the Centre for Respiratory Investigation, Glasgow Royal Infirmary. The research was initially aimed at studying lung sound using signal processing and pattern recognition techniques. The use of pattern recogntion techniques was largely confined to exploratory data analysis which led to an interest in the methods themselves. A study was carried out to apply recent research in computational geometry to clustering Two geometric structures, the Gabriel graph and the relative neighbourhood graph, are both defined by a region of influence. A generalization of these graphs is used to find the conditions under which graphs defined by a region of influence are connected and planar. The Gabriel graph may be considered to be just planar and the relative neighbourhood graph to be just connected. From this two variable regions of influence were defined that were aimed at producing disconnected graphs and hence a partitioning of the data set, A hierarchic clustering based on relative distance may be generated by varying the size of the region of influence. The value of the clustering method is examined in terms of admissibility criteria and by a case study. An interactive display to complement the graph theoretical clustering was also developed. This display allows a partition in the clustering to be examined. The relationship between clusters in the partition may be studied by using the partition to define a contracted graph which is then displayed. Subgraphs of the original graph may be used to provide displays of individual clusterso This display should provide additional information about a partition and hence allow the user to understand the data better. The remainder of the work in this thesis concerns the application of pattern recogntition techniques to the analysis of lung sound signals. Breath sound was analysed using frequency domain methods since it is basically a continuous signal. Initially, a rather ad hoc method was used for feature extraction which was based on a piecewise constant approximation to the amplitude spectrum. While this method provided a useful set of features, it is clear that more systematic methods are required. These methods were used to study lung sound in four groups of patients: (1) normal patients, (2) patients with asbestosis, (3) patients with cryptogenic fibrosing alveolitis (CFA) and (4) patients with interstitial pulmonary oedema. The data sets were analysed using principal components analysis and the new graph theroretical clustering method (this data was used as a case study for the clustering method). Three groups of patients could be identified from the data;- (a) normal subjects, (b) patients with fibrosis of the lungs (asbestosis & CFA) and (c) patients with pulmonary oedema. These results suggest that lung sound may be able to make a useful contribution to non-invasive diagnosis. However more extensive studies are required before the real value of lung sound in diagnosis is established

    Clustering in massive data sets

    Get PDF
    We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Theoretical results developed as far back as the 1960s still very often remain topical. More recent work is also covered in this article. This includes a solution for the statistical question of how many clusters there are in a dataset. We also look at one line of inquiry in the use of clustering for human-computer user interfaces. Finally, the visualization of data leads to the consideration of data arrays as images, and we speculate on future results to be expected here

    Dissimilarity-based algorithms for selecting structurally diverse sets of compounds

    Get PDF
    This paper commences with a brief introduction to modern techniques for the computational analysis of molecular diversity and the design of combinatorial libraries. It then reviews dissimilarity-based algorithms for the selection of structurally diverse sets of compounds in chemical databases. Procedures are described for selecting a diverse subset of an entire database, and for selecting diverse combinatorial libraries using both reagent-based and product-based selection

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated
    corecore