    Data mining using neural networks

    Data mining is about the search for relationships and global patterns in large databases that are increasing in size. Data mining is beneficial for anyone who has a huge amount of data, for example, customer and business data, transaction, marketing, financial, manufacturing and web data etc. The results of data mining are also referred to as knowledge in the form of rules, regularities and constraints. Rule mining is one of the popular data mining methods since rules provide concise statements of potentially important information that is easily understood by end users and also actionable patterns. At present rule mining has received a good deal of attention and enthusiasm from data mining researchers since rule mining is capable of solving many data mining problems such as classification, association, customer profiling, summarization, segmentation and many others. This thesis makes several contributions by proposing rule mining methods using genetic algorithms and neural networks. The thesis first proposes rule mining methods using a genetic algorithm. These methods are based on an integrated framework but capable of mining three major classes of rules. Moreover, the rule mining processes in these methods are controlled by tuning of two data mining measures such as support and confidence. The thesis shows how to build data mining predictive models using the resultant rules of the proposed methods. Another key contribution of the thesis is the proposal of rule mining methods using supervised neural networks. The thesis mathematically analyses the Widrow-Hoff learning algorithm of a single-layered neural network, which results in a foundation for rule mining algorithms using single-layered neural networks. Three rule mining algorithms using single-layered neural networks are proposed for the three major classes of rules on the basis of the proposed theorems. The thesis also looks at the problem of rule mining where user guidance is absent. The thesis proposes a guided rule mining system to overcome this problem. The thesis extends this work further by comparing the performance of the algorithm used in the proposed guided rule mining system with Apriori data mining algorithm. Finally, the thesis studies the Kohonen self-organization map as an unsupervised neural network for rule mining algorithms. Two approaches are adopted based on the way of self-organization maps applied in rule mining models. In the first approach, self-organization map is used for clustering, which provides class information to the rule mining process. In the second approach, automated rule mining takes the place of trained neurons as it grows in a hierarchical structure

    Fuzzy Logic

    Fuzzy Logic is becoming an essential method of solving problems in all domains. It gives tremendous impact on the design of autonomous intelligent systems. The purpose of this book is to introduce Hybrid Algorithms, Techniques, and Implementations of Fuzzy Logic. The book consists of thirteen chapters highlighting models and principles of fuzzy logic and issues on its techniques and implementations. The intended readers of this book are engineers, researchers, and graduate students interested in fuzzy logic systems

    Itseorganisoituvat kartat päätöksenteon tuessa: päätöksenteon tukijärjestelmän prototyyppi

    Teollisuus ja liiketoiminta ovat täynnä monimutkaisia päätöksentekoprosesseja, joissa inhimillisen virheen mahdollisuus on suuri. Nämä prosessit ovat hyvin kriittisiä ydinvoimaloiden toiminnan ohjauksessa. Päätöksien laatua voidaan parantaa ja virheiden todennäköisyyttä vähentää antamalla tietokoneistettua päätöksenteon tukea päätöksen tekijälle. Itseorganisoituva kartta (SOM) on hyödyllinen tapa visualisoida moniulotteisia ja suuria data-aineistoja. Tämän työn tavoitteena on löytää tapoja SOM-menetelmän hyödyntämiseen ydinvoimalan operaattorin päätöksenteon tuessa ja analysoida, voiko kyseisiä tapoja käyttää myös muiden sovellusalueiden päätöksenteon tukeen. Työn tutkimusmenetelmät ovat kokeellinen prototyyppikehitys, tiedonlouhinta ja kirjallisuustutkimus. Tutkimuksessa on toteutettu prototyyppi (DERSI) päätöksenteon tukijärjestelmän (DSS) alustasta. Se hyödyntää päätöksenteon tuessa kokoelmaa erilaisia menetelmiä, kuten SOM kvantisointivirhettä, SOM U-matriisia, sumeaa logiikkaa, sääntöpohjaista päättelyä ja tapauspohjaista päättelyä. Prototyyppi on ohjelmoitu Matlabohjelmointikielellä ja se hyödyntää Matlabin SOM Toolbox -laajennusta. Siihen kuuluu myös graafinen käyttöliittymä, joka sisältää käytettyjen menetelmien visualisoinnit. Tutkimuksen alustalle on rakennettu kaksi päätöksenteon tukijärjestelmän prototyypiyksikköä. Yksi niistä hyödyntää tutkimuksen Simulink-simulaatiomallin dataa ja toinen Teollisuuden Voiman (TVO) ydinvoimalasimulaattorista saatua dataa. Nämä yksiköt demonstroivat prototyypin menetelmien mahdollisuuksia. Kirjallisuudessa esiintyi myös vaihtoehtoisia tapoja hyödyntää SOM-menetelmää päätöksenteon tuessa. Näitä verrattiin prototyypin menetelmiin ja lisäksi pohdittiin, voiko prototyyppialustaa hyödyntää muilla sovellusalueilla.Industry and business are full of complicated decision making processes in which there is a high probability of human error. These processes are most crucial in the operation of nuclear power plants. The quality of decisions can be increased and probability of errors can be reduced by providing computerized decision support for the decision maker. Self-Organizing Map (SOM) is a useful method for visualizing high-dimensional and large datasets. The aim of this work is to find approaches for using SOM in supporting the decision making processes of nuclear power plant operators, and to analyze whether these approaches can be used for decision support in other applications. The research methods are prototyping, data mining and survey of literature. A prototype of a decision support system (DSS) platform (DERSI) has been developed. The prototype uses a collection of methods for decision support, including SOM quantization error, SOM U-matrix, fuzzy logic, rule-based reasoning and case-based reasoning. It is programmed with the Matlab programming language and uses a SOM Toolbox add-on. It has a graphical user interface that contains visualizations of the methods. Two units of a DSS prototype have been built on this platform. One uses data from a Simulink simulation model and the other unit uses data from the Teollisuuden Voima (TVO) nuclear power plant simulator. These prototype units demonstrate the potential of the prototype methods. Other approaches for using SOM in decision support were found from literature. The thesis compares these approaches with the prototype methods and discusses the possible use of this prototype in other applications

    Los mapas auto-organizados para la evaluación de la investigación de tesis doctorales : el caso de la Didáctica de las Ciencias Sociales en España

    This paper has as main objective to demonstrate how the use of neural networks, self-organized maps type, is a potentially clarifying tool in the treatment, analysis and visualization of scientometric data, specifically, in the case of the analysis of the Spanish doctoral theses in teaching Social Sciences, indexed in TESEO database and defended between 1976 and 2014. A census of 301 doctoral theses has been recovered, analyzed according to autonomous communities (Andalusia and Catalonia), five-year term groups, thematic categories and educational stages. In Andalusia, research has concentrated its production in the stages of Primary and Secondary Education, and in the thematic of Didactics of Geography. The dissertations production is highest in the five-year period 1986-1990 and 2001-2005. In Catalonia, research deals mainly with the stages of Secondary and Higher Education, and the theme of Didactics of Social Sciences. The most productive five-year periods in Catalonia were 1991-1995, 1996-2000, 2001-2005 and 2006-2010

    Process Flow Features as a Host-based Event Knowledge Representation

    The detection of malware is of great importance but even non-malicious software can be used for malicious purposes. Monitoring processes and their associated information can characterize normal behavior and help identify malicious processes or malicious use of normal process by measuring deviations from the learned baseline. This exploratory research describes a novel host feature generation process that calculates statistics of an executing process during a window of time called a process flow. Process flows are calculated from key process data structures extracted from computer memory using virtual machine introspection. Each flow cluster generated using k-means of the flow features represents a behavior where the members of the cluster all exhibit similar behavior. Testing explores associations between behavior and process flows that in the future may be useful for detecting unauthorized behavior or behavioral trends on a host. Analysis of two data collections demonstrate that this novel way of thinking of process behavior as process flows can produce baseline models in the form of clusters that do represent specific behaviors


    Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process. Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining. To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining

    Knowledge representation and text mining in biomedical, healthcare, and political domains

    Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains. Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models. Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media

    Pattern Recognition

    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition