1,110 research outputs found

    Dimensionality Reduction and Visualisation Tools for Voting Records

    Get PDF
    Abstract. Recorded votes in legislative bodies are an important source of data for political scientists. Voting records can be used to describe parliamentary processes, identify ideological divides between members and reveal the strength of party cohesion. We explore the problem of working with vote data using popular dimensionality reduction techniques and cluster validation methods, as an alternative to more traditional scaling techniques. We present results of dimensionality reduction techniques applied to votes from the 6th and 7th European Parliaments, covering activity from 2004 to 2014

    Three-dimensional Radial Visualization of High-dimensional Datasets with Mixed Features

    Full text link
    We develop methodology for 3D radial visualization (RadViz) of high-dimensional datasets. Our display engine is called RadViz3D and extends the classical 2D RadViz that visualizes multivariate data in the 2D plane by mapping every record to a point inside the unit circle. We show that distributing anchor points at least approximately uniformly on the 3D unit sphere provides a better visualization with minimal artificial visual correlation for data with uncorrelated variables. Our RadViz3D methodology therefore places equi-spaced anchor points, one for every feature, exactly for the five Platonic solids, and approximately via a Fibonacci grid for the other cases. Our Max-Ratio Projection (MRP) method then utilizes the group information in high dimensions to provide distinctive lower-dimensional projections that are then displayed using Radviz3D. Our methodology is extended to datasets with discrete and continuous features where a Gaussianized distributional transform is used in conjunction with copula models before applying MRP and visualizing the result using RadViz3D. A R package radviz3d implementing our complete methodology is available.Comment: 12 pages, 10 figures, 1 tabl

    Computational diagnosis and risk evaluation for canine lymphoma

    Full text link
    The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these problems. Three family of methods, decision trees, kNN (including advanced and adaptive kNN) and probability density evaluation with radial basis functions, are used for classification and risk estimation. Several pre-processing approaches were implemented and compared. The best of them are used to create the diagnostic system. For the differential diagnosis the best solution gives the sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provides the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). If the clinical symptoms (Lymphadenopathy) are considered as unknown then a decision tree with CRP and Hapt only provides sensitivity 69% and specificity 83.5%. The lymphoma risk evaluation problem is formulated and solved. The best models are selected as the system for computational lymphoma diagnosis and evaluation the risk of lymphoma as well. These methods are implemented into a special web-accessed software and are applied to problem of monitoring dogs with lymphoma after treatment. It detects recurrence of lymphoma up to two months prior to the appearance of clinical signs. The risk map visualisation provides a friendly tool for explanatory data analysis.Comment: 24 pages, 86 references in the bibliography, Significantly extended version with review of lymphoma biomarkers and data mining methods (Three new sections are added: 1.1. Biomarkers for canine lymphoma, 1.2. Acute phase proteins as lymphoma biomarkers and 3.1. Data mining methods for biomarker cancer diagnosis. Flowcharts of data analysis are included as supplementary material (20 pages

    Self-Organizing Maps For Knowledge Discovery From Corporate Databases To Develop Risk Based Prioritization For Stagnation 

    Get PDF
    Stagnation or low turnover of water within water distribution systems may result in water quality issues, even for relatively short durations of stagnation / low turnover if other factors such as deteriorated aging pipe infrastructure are present. As leakage management strategies, including the creation of smaller pressure management zones, are implemented increasingly more dead ends are being created within networks and hence potentially there is an increasing risk to water quality due to stagnation / low turnover. This paper presents results of applying data driven tools to the large corporate databases maintained by UK water companies. These databases include multiple information sources such as asset data, regulatory water quality sampling, customer complaints etc. A range of techniques exist for exploring the interrelationships between various types of variables, with a number of studies successfully using Artificial Neural Networks (ANNs) to probe complex data sets. Self Organising Maps (SOMs), are a class of unsupervised ANN that perform dimensionality reduction of the feature space to yield topologically ordered maps, have been used successfully for similar problems to that posed here. Notably for this application, SOM are trained without classes attached in an unsupervised fashion. Training combines competitive learning (learning the position of a data cloud) and co-operative learning (self-organising of neighbourhoods). Specifically, in this application SOMs performed multidimensional data analysis of a case study area (covering a town for an eight year period). The visual output of the SOM analysis provides a rapid and intuitive means of examining covariance between variables and exploring hypotheses for increased understanding. For example, water age (time from system entry, from hydraulic modelling) in combination with high pipe specific residence time and old cast iron pipe were found to be strong explanatory variables. This derived understanding could ultimately be captured in a tool providing risk based prioritisation scores

    Iconic Indexing for Video Search

    Get PDF
    Submitted for the degree of Doctor of Philosophy, Queen Mary, University of London

    Gender and gaze gesture recognition for human-computer interaction

    Get PDF
    © 2016 Elsevier Inc. The identification of visual cues in facial images has been widely explored in the broad area of computer vision. However theoretical analyses are often not transformed into widespread assistive Human-Computer Interaction (HCI) systems, due to factors such as inconsistent robustness, low efficiency, large computational expense or strong dependence on complex hardware. We present a novel gender recognition algorithm, a modular eye centre localisation approach and a gaze gesture recognition method, aiming to escalate the intelligence, adaptability and interactivity of HCI systems by combining demographic data (gender) and behavioural data (gaze) to enable development of a range of real-world assistive-technology applications. The gender recognition algorithm utilises Fisher Vectors as facial features which are encoded from low-level local features in facial images. We experimented with four types of low-level features: greyscale values, Local Binary Patterns (LBP), LBP histograms and Scale Invariant Feature Transform (SIFT). The corresponding Fisher Vectors were classified using a linear Support Vector Machine. The algorithm has been tested on the FERET database, the LFW database and the FRGCv2 database, yielding 97.7%, 92.5% and 96.7% accuracy respectively. The eye centre localisation algorithm has a modular approach, following a coarse-to-fine, global-to-regional scheme and utilising isophote and gradient features. A Selective Oriented Gradient filter has been specifically designed to detect and remove strong gradients from eyebrows, eye corners and self-shadows (which sabotage most eye centre localisation methods). The trajectories of the eye centres are then defined as gaze gestures for active HCI. The eye centre localisation algorithm has been compared with 10 other state-of-the-art algorithms with similar functionality and has outperformed them in terms of accuracy while maintaining excellent real-time performance. The above methods have been employed for development of a data recovery system that can be employed for implementation of advanced assistive technology tools. The high accuracy, reliability and real-time performance achieved for attention monitoring, gaze gesture control and recovery of demographic data, can enable the advanced human-robot interaction that is needed for developing systems that can provide assistance with everyday actions, thereby improving the quality of life for the elderly and/or disabled

    Policymaking prior to decision-making in the Digital Age

    Get PDF
    This thesis will examine the application of information and communication technology (ICT) innovations over recent times in the policymaking process, focusing on the policy stages prior to decision-making stage. Recent developments in technology innovation have led to a re-examination of citizen involvement in government processes and the expansion of opportunities for citizens to engage in the policymaking process

    A Novel and Domain-Specific Document Clustering and Topic Aggregation Toolset for a News Organisation

    Get PDF
    Large collections of documents are becoming increasingly common in the news gathering industry. A review of the literature shows there is a growing interest in datadriven journalism and specifically that the journalism profession needs better tools to understand and develop actionable knowledge from large document sets. On a daily basis, journalists are tasked with searching a diverse range of document sets including news gathering services, emails, freedom of information requests, court records, government reports, press releases and many other types of generally unstructured documents. Document clustering techniques can help address problems of understanding the ever expanding quantities of documents available to journalists by finding patterns within documents. These patterns can be used to develop useful and actionable knowledge which can contribute to journalism. News articles in particular are fertile ground for document clustering principles. Term weighting schemes assign importance to terms within a document and are central to the study of document clustering methods. This study contributes a review of the dominant and most commonly used term frequency weighting functions put forward in research, establishes the merits and limitations of each approach, and proposes modifications to develop a news-centric document clustering and topic aggregation approach. Experimentation was conducted on a large unstructured collection of newspaper articles from the Irish Times to establish if the newly proposed news-centric term weighting and document similarity approach improves document clustering accuracy and topic aggregation capabilities for news articles when compared to the traditional term weighting approach. Whilst the experimentation shows that that the developed approach is promising when compared to the manual document clustering effort undertaken by the three journalist expert users, it also highlights the challenges of natural language processing and document clustering methods in general. The results may suggest that a blended approach of complimenting automated methods with human-level supervision and guidance may yield the best results
    • …
    corecore