1,010 research outputs found
Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering
The history of data analysis that is addressed here is underpinned by two
themes, -- those of tabular data analysis, and the analysis of collected
heterogeneous data. "Exploratory data analysis" is taken as the heuristic
approach that begins with data and information and seeks underlying explanation
for what is observed or measured. I also cover some of the evolving context of
research and applications, including scholarly publishing, technology transfer
and the economic relationship of the university to society.Comment: 26 page
Reducing the number of membership functions in linguistic variables
Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations ResearchThe purpose of this thesis was to develop algorithms to reduce the number of
membership functions in a fuzzy linguistic variable. Groups of similar membership
functions to be merged were found using clustering algorithms. By “summarizing” the
information given by a similar group of membership functions into a new membership
function we obtain a smaller set of membership functions representing the same
concept as the initial linguistic variable.
The complexity of clustering problems makes it difficult for exact methods to solve them in practical time. Heuristic methods were therefore used to find good quality solutions. A Scatter Search clustering algorithm was implemented in Matlab and compared to a variation of the K-Means algorithm. Computational results on two data sets are discussed.
A case study with linguistic variables belonging to a fuzzy inference system
automatically constructed from data collected by sensors while drilling in different scenarios is also studied. With these systems already constructed, the task was to reduce the number of membership functions in its linguistic variables without losing performance. A hierarchical clustering algorithm relying on performance measures for the inference system was implemented in Matlab. It was possible not only to simplify the inference system by reducing the number of membership functions in each linguistic variable but also to improve its performance
Some new techniques for pattern recognition research and lung sound signal analysis
This thesis describes the results of a collaborative research programme between the Department of Electronics & Electrical Engineering, University of Glasgow, and the Centre for Respiratory Investigation, Glasgow Royal Infirmary. The research was initially aimed at studying lung sound using signal processing and pattern recognition techniques. The use of pattern recogntion techniques was largely confined to exploratory data analysis which led to an interest in the methods themselves. A study was carried out to apply recent research in computational geometry to clustering Two geometric structures, the Gabriel graph and the relative neighbourhood graph, are both defined by a region of influence. A generalization of these graphs is used to find the conditions under which graphs defined by a region of influence are connected and planar. The Gabriel graph may be considered to be just planar and the relative neighbourhood graph to be just connected. From this two variable regions of influence were defined that were aimed at producing disconnected graphs and hence a partitioning of the data set, A hierarchic clustering based on relative distance may be generated by varying the size of the region of influence. The value of the clustering method is examined in terms of admissibility criteria and by a case study. An interactive display to complement the graph theoretical clustering was also developed. This display allows a partition in the clustering to be examined. The relationship between clusters in the partition may be studied by using the partition to define a contracted graph which is then displayed. Subgraphs of the original graph may be used to provide displays of individual clusterso This display should provide additional information about a partition and hence allow the user to understand the data better. The remainder of the work in this thesis concerns the application of pattern recogntition techniques to the analysis of lung sound signals. Breath sound was analysed using frequency domain methods since it is basically a continuous signal. Initially, a rather ad hoc method was used for feature extraction which was based on a piecewise constant approximation to the amplitude spectrum. While this method provided a useful set of features, it is clear that more systematic methods are required. These methods were used to study lung sound in four groups of patients: (1) normal patients, (2) patients with asbestosis, (3) patients with cryptogenic fibrosing alveolitis (CFA) and (4) patients with interstitial pulmonary oedema. The data sets were analysed using principal components analysis and the new graph theroretical clustering method (this data was used as a case study for the clustering method). Three groups of patients could be identified from the data;- (a) normal subjects, (b) patients with fibrosis of the lungs (asbestosis & CFA) and (c) patients with pulmonary oedema. These results suggest that lung sound may be able to make a useful contribution to non-invasive diagnosis. However more extensive studies are required before the real value of lung sound in diagnosis is established
Clustering in massive data sets
We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Theoretical results developed as far back as the 1960s still very often remain topical. More recent work is also covered in this article. This includes a solution for the statistical question of how many clusters there are in a dataset. We also look at one line of inquiry in the use of clustering for human-computer user interfaces. Finally, the visualization of data leads to the consideration of data arrays as images, and we speculate on future results to be expected here
Dissimilarity-based algorithms for selecting structurally diverse sets of compounds
This paper commences with a brief introduction to modern techniques for the computational analysis of molecular diversity and the design of combinatorial libraries. It then reviews dissimilarity-based algorithms for the selection of structurally diverse sets of compounds in chemical databases. Procedures are described for selecting a diverse subset of an entire database, and for selecting diverse combinatorial libraries using both reagent-based and product-based selection
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
- …