1,212 research outputs found

    Approximation contexts in addressing graph data structures

    Get PDF
    While the application of machine learning algorithms to practical problems has been expanded from fixed sized input data to sequences, trees or graphs input data, the composition of learning system has developed from a single model to integrated ones. Recent advances in graph based learning algorithms include: the SOMSD (Self Organizing Map for Structured Data), PMGraphSOM (Probability Measure Graph Self Organizing Map,GNN (Graph Neural Network) and GLSVM (Graph Laplacian Support Vector Machine). A main motivation of this thesis is to investigate if such algorithms, whether by themselves individually or modified, or in various combinations, would provide better performance over the more traditional artificial neural networks or kernel machine methods on some practical challenging problems. More succinctly, this thesis seeks to answer the main research question: when or under what conditions/contexts could graph based models be adjusted and tailored to be most efficacious in terms of predictive or classification performance on some challenging practical problems? There emerges a range of sub-questions including: how do we craft an effective neural learning system which can be an integration of several graph and non-graph based models? Integration of various graph based and non graph based kernel machine algorithms; enhancing the capability of the integrated model in working with challenging problems; tackling the problem of long term dependency issues which aggravate the performance of layer-wise graph based neural systems. This thesis will answer these questions. Recent research on multiple staged learning models has demonstrated the efficacy of multiple layers of alternating unsupervised and supervised learning approaches. This underlies the very successful front-end feature extraction techniques in deep neural networks. However much exploration is still possible with the investigation of the number of layers required, and the types of unsupervised or supervised learning models which should be used. Such issues have not been considered so far, when the underlying input data structure is in the form of a graph. We will explore empirically the capabilities of models of increasing complexities, the combination of the unsupervised learning algorithms, SOM, or PMGraphSOM, with or without a cascade connection with a multilayer perceptron, and with or without being followed by multiple layers of GNN. Such studies explore the effects of including or ignoring context. A parallel study involving kernel machines with or without graph inputs has also been conducted empirically

    An informatics based approach to respiratory healthcare.

    Get PDF
    By 2005 one person in every five UK households suffered with asthma. Research has shown that episodes of poor air quality can have a negative effect on respiratory health and is a growing concern for the asthmatic. To better inform clinical staff and patients to the contribution of poor air quality on patient health, this thesis defines an IT architecture that can be used by systems to identify environmental predictors leading to a decline in respiratory health of an individual patient. Personal environmental predictors of asthma exacerbation are identified by validating the delay between environmental predictors and decline in respiratory health. The concept is demonstrated using prototype software, and indicates that the analytical methods provide a mechanism to produce an early warning of impending asthma exacerbation due to poor air quality. The author has introduced the term enviromedics to describe this new field of research. Pattern recognition techniques are used to analyse patient-specific environments, and extract meaningful health predictors from the large quantities of data involved (often in the region of '/o million data points). This research proposes a suitable architecture that defines processes and techniques that enable the validation of patient-specific environmental predictors of respiratory decline. The design of the architecture was validated by implementing prototype applications that demonstrate, through hospital admissions data and personal lung function monitoring, that air quality can be used as a predictor of patient-specific health. The refined techniques developed during the research (such as Feature Detection Analysis) were also validated by the application prototypes. This thesis makes several contributions to knowledge, including: the process architecture; Feature Detection Analysis (FDA) that automates the detection of trend reversals within time series data; validation of the delay characteristic using a Self-organising Map (SOM) that is used as an unsupervised method of pattern recognition; Frequency, Boundary and Cluster Analysis (FBCA), an additional technique developed by this research to refine the SOM

    Concept Relation Discovery and Innovation Enabling Technology (CORDIET)

    Get PDF
    Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document's timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document

    Concept discovery innovations in law enforcement: a perspective.

    Get PDF
    In the past decades, the amount of information available to law enforcement agencies has increased significantly. Most of this information is in textual form, however analyses have mainly focused on the structured data. In this paper, we give an overview of the concept discovery projects at the Amsterdam-Amstelland police where Formal Concept Analysis (FCA) is being used as text mining instrument. FCA is combined with statistical techniques such as Hidden Markov Models (HMM) and Emergent Self Organizing Maps (ESOM). The combination of this concept discovery and refinement technique with statistical techniques for analyzing high-dimensional data not only resulted in new insights but often in actual improvements of the investigation procedures.Formal concept analysis; Intelligence led policing; Knowledge discovery;

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    Clustering Algorithm for Enhanced Bibliography Visualization

    Get PDF
    A Bibliography is a list of books, publications, journals etc., with details such as authors and references. Visualization could be used as a data analysis tool to represent various types of data, analyze huge chunks of data easily and arrive at interesting results. The idea of this project is to provide a medium which eases the combination of bibliography with visualization. Though there are many sources of bibliographic data like the Digital Bibliography and Library Project (DBLP), Citeseer, Google Scholar, none of these data could be used directly for deducing relations between various entities or for visualizing the relationship between related entities. This project aims at providing a web-service that takes user queries as input and retrieves the corresponding data from a local database. Then the web-service applies a clustering algorithm to the retrieved data and then presents the clustered data as XML to the requestor. The user of the system could be any automated program that aims at providing a visual interface to bibliography. One of the main outcomes of this approach would be bringing out the hidden relationship between various related bibliographic entities and making the relationships more obvious and readable than the existing systems
    corecore