220 research outputs found

    An algorithmic framework for visualising and exploring multidimensional data

    Get PDF
    To help understand multidimensional data, information visualisation techniques are often applied to take advantage of human visual perception in exposing latent structure. A popular means of presenting such data is via two-dimensional scatterplots where the inter-point proximities reflect some notion of similarity between the entities represented. This can result in potentially interesting structure becoming almost immediately apparent. Traditional algorithms for carrying out this dimension reduction tend to have different strengths and weaknesses in terms of run times and layout quality. However, it has been found that the combination of algorithms can produce hybrid variants that exhibit significantly lower run times while maintaining accurate depictions of high-dimensional structure. The author's initial contribution in the creation of such algorithms led to the design and implementation of a software system (HIVE) for the development and investigation of new hybrid variants and the subsequent analysis of the data they transform. This development was motivated by the fact that there are potentially many hybrid algorithmic combinations to explore and therefore an environment that is conductive to their development, analysis and use is beneficial not only in exploring the data they transform but also in exploring the growing number of visualisation tools that these algorithms beget. This thesis descries three areas of the author's contribution to the field of information visualisation. Firstly, work on hybrid algorithms for dimension reduction is presented and their analysis shows their effectiveness. Secondly, the development of a framework for the creation of tailored hybrid algorithms is illustrated. Thirdly, a system embodying the framework, providing an environment conductive to the development, evaluation and use of the algorithms is described. Case studies are provided to demonstrate how the author and others have used and found value in the system across areas as diverse as environmental science, social science and investigative psychology, where multidimensional data are in abundance

    CBR and MBR techniques: review for an application in the emergencies domain

    Get PDF
    The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system. RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to: a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location. In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations. This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version

    Strategies for image visualisation and browsing

    Get PDF
    PhDThe exploration of large information spaces has remained a challenging task even though the proliferation of database management systems and the state-of-the art retrieval algorithms is becoming pervasive. Signi cant research attention in the multimedia domain is focused on nding automatic algorithms for organising digital image collections into meaningful structures and providing high-semantic image indices. On the other hand, utilisation of graphical and interactive methods from information visualisation domain, provide promising direction for creating e cient user-oriented systems for image management. Methods such as exploratory browsing and query, as well as intuitive visual overviews of image collection, can assist the users in nding patterns and developing the understanding of structures and content in complex image data-sets. The focus of the thesis is combining the features of automatic data processing algorithms with information visualisation. The rst part of this thesis focuses on the layout method for displaying the collection of images indexed by low-level visual descriptors. The proposed solution generates graphical overview of the data-set as a combination of similarity based visualisation and random layout approach. Second part of the thesis deals with problem of visualisation and exploration for hierarchical organisation of images. Due to the absence of the semantic information, images are considered the only source of high-level information. The content preview and display of hierarchical structure are combined in order to support image retrieval. In addition to this, novel exploration and navigation methods are proposed to enable the user to nd the way through database structure and retrieve the content. On the other hand, semantic information is available in cases where automatic or semi-automatic image classi ers are employed. The automatic annotation of image items provides what is referred to as higher-level information. This type of information is a cornerstone of multi-concept visualisation framework which is developed as a third part of this thesis. This solution enables dynamic generation of user-queries by combining semantic concepts, supported by content overview and information ltering. Comparative analysis and user tests, performed for the evaluation of the proposed solutions, focus on the ways information visualisation a ects the image content exploration and retrieval; how e cient and comfortable are the users when using di erent interaction methods and the ways users seek for information through di erent types of database organisation

    An on-demand fixture manufacturing cell for mass customisation production systems.

    Get PDF
    Master of Science in Engineering. University of KwaZulu-Natal, Durban, 2017.Increased demand for customised products has given rise to the research of mass customisation production systems. Customised products exhibit geometric differences that render the use of standard fixtures impractical. Fixtures must be configured or custom-manufactured according to the unique requirements of each product. Reconfigurable modular fixtures have emerged as a cost-effective solution to this problem. Customised fixtures must be made available to a mass customisation production system as rapidly as parts are manufactured. Scheduling the creation/modification of these fixtures must now be treated together with the production scheduling of parts on machines. Scheduling and optimisation of such a problem in this context was found to be a unique avenue of research. An on-demand Fixture Manufacturing Cell (FxMC) that resides within a mass customisation production system was developed. This allowed fixtures to be created or reconfigured on-demand in a cellular manufacturing environment, according to the scheduling of the customised parts to be processed. The concept required the research and development of such a cell, together with the optimisation modelling and simulation of this cell in an appropriate manufacturing environment. The research included the conceptualisation of a fixture manufacturing cell in a mass customisation production system. A proof-of-concept of the cell was assembled and automated in the laboratory. A three-stage optimisation method was developed to model and optimise the scheduling of the cell in the manufacturing environment. This included clustering of parts to fixtures; optimal scheduling of those parts on those fixtures; and a Mixed Integer Linear Programming (MILP) model to optimally synchronise the fixture manufacturing cell with the part processing cell. A heuristic was developed to solve the MILP problem much faster and for much larger problem sizes – producing good, feasible solutions. These problems were modelled and tested in MATLAB¼. The cell was simulated and tested in AnyLogic¼. The research topic is beneficial to mass customisation production systems, where the use of reconfigurable modular fixtures in the manufacturing process cannot be optimised with conventional scheduling approaches. The results showed that the model optimally minimised the total idle time of the production schedule; the heuristic also provided good, feasible solutions to those problems. The concept of the on-demand fixture manufacturing cell was found to be capable of facilitating the manufacture of customised products

    A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text

    Get PDF
    Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies

    Grounding semantic cognition using computational modelling and network analysis

    Get PDF
    The overarching objective of this thesis is to further the field of grounded semantics using a range of computational and empirical studies. Over the past thirty years, there have been many algorithmic advances in the modelling of semantic cognition. A commonality across these cognitive models is a reliance on hand-engineering “toy-models”. Despite incorporating newer techniques (e.g. Long short-term memory), the model inputs remain unchanged. We argue that the inputs to these traditional semantic models have little resemblance with real human experiences. In this dissertation, we ground our neural network models by training them with real-world visual scenes using naturalistic photographs. Our approach is an alternative to both hand-coded features and embodied raw sensorimotor signals. We conceptually replicate the mutually reinforcing nature of hybrid (feature-based and grounded) representations using silhouettes of concrete concepts as model inputs. We next gradually develop a novel grounded cognitive semantic representation which we call scene2vec, starting with object co-occurrences and then adding emotions and language-based tags. Limitations of our scene-based representation are identified for more abstract concepts (e.g. freedom). We further present a large-scale human semantics study, which reveals small-world semantic network topologies are context-dependent and that scenes are the most dominant cognitive dimension. This finding leads us to conclude that there is no meaning without context. Lastly, scene2vec shows promising human-like context-sensitive stereotypes (e.g. gender role bias), and we explore how such stereotypes are reduced by targeted debiasing. In conclusion, this thesis provides support for a novel computational viewpoint on investigating meaning - scene-based grounded semantics. Future research scaling scene-based semantic models to human-levels through virtual grounding has the potential to unearth new insights into the human mind and concurrently lead to advancements in artificial general intelligence by enabling robots, embodied or otherwise, to acquire and represent meaning directly from the environment

    Molecular Formula Identification using High Resolution Mass Spectrometry: Algorithms and Applications in Metabolomics and Proteomics

    Get PDF
    Wir untersuchen mehrere theoretische und praktische Aspekte der Identifikation der Summenformel von BiomolekĂŒlen mit Hilfe von hochauflösender Massenspektrometrie. Durch die letzten Forschritte in der Instrumentation ist die Massenspektrometrie (MS) zur einen der SchlĂŒsseltechnologien fĂŒr die Analyse von BiomolekĂŒlen in der Proteomik und Metabolomik geworden. Sie misst die Massen der MolekĂŒle in der Probe mit hoher Genauigkeit, und ist fĂŒr die Messdatenerfassung im Hochdurchsatz gut geeignet. Eine der Kernaufgaben in der MS-basierten Proteomik und Metabolomik ist die Identifikation der MolekĂŒle in der Probe. In der Metabolomik unterliegen Metaboliten der StrukturaufklĂ€rung, beginnend bei der Summenformel eines MolekĂŒls, d.h. der Anzahl der Atome jedes Elements. Dies ist der entscheidende Schritt in der Identifikation eines unbekannten Metabolits, da die festgelegte Formel die Anzahl der möglichen MolekĂŒlstrukturen auf eine viel kleinere Menge reduziert, die mit Methoden der automatischen StrukturaufklĂ€rung weiter analysiert werden kann. Nach der Vorverarbeitung ist die Ausgabe eines Massenspektrometers eine Liste von Peaks, die den MolekĂŒlmassen und deren IntensitĂ€ten, d.h. der Anzahl der MolekĂŒle mit einer bestimmten Masse, entspricht. Im Prinzip können die Summenformel kleiner MolekĂŒle nur mit prĂ€zisen Massen identifiziert werden. Allerdings wurde festgestellt, dass aufgrund der hohen Anzahl der chemisch legitimer Formeln in oberen Massenbereich eine exzellente Massengenaugkeit alleine fĂŒr die Identifikation nicht genĂŒgt. Hochauflösende MS erlaubt die Bestimmung der MolekĂŒlmassen und IntensitĂ€ten mit hervorragender Genauigkeit. In dieser Arbeit entwickeln wir mehrere Algorithmen und Anwendungen, die diese Information zur Identifikation der Summenformel der BiomolekĂŒlen anwenden

    Methodology and Software for Interactive Decision Support

    Get PDF
    These Proceedings report the scientific results of an International Workshop on "Methodology and Software for Interactive Decision Support" organized jointly by the System and Decision Sciences Program of IIASA and The National Committee for Applied Systems Analysis and Management in Bulgaria. Several other Bulgarian institutions sponsored the workshop -- The Committee for Science to the Council of Ministers, The State Committee for Research and Technology and The Bulgarian Industrial Association. The workshop was held in Albena, on the Black Sea Coast. In the first section, "Theory and Algorithms for Multiple Criteria Optimization," new theoretical developments in multiple criteria optimization are presented. In the second section, "Theory, Methodology and Software for Decision Support Systems," the principles of building decision support systems are presented as well as software tools constituting the building components of such systems. Moreover, several papers are devoted to the general methodology of building such systems or present experimental design of systems supporting certain class of decision problems. The third section addresses issues of "Applications of Decision Support Systems and Computer Implementations of Decision Support Systems." Another part of this section has a special character. Beside theoretical and methodological papers, several practical implementations of software for decision support have been presented during the workshop. These software packages varied from very experimental and illustrative implementations of some theoretical concept to well developed and documented systems being currently commercially distributed and used for solving practical problems

    Learning from Noisy Data in Statistical Machine Translation

    Get PDF
    In dieser Arbeit wurden Methoden entwickelt, die in der Lage sind die negativen Effekte von verrauschten Daten in SMT Systemen zu senken und dadurch die Leistung des Systems zu steigern. Hierbei wird das Problem in zwei verschiedenen Schritten des Lernprozesses behandelt: Bei der Vorverarbeitung und wĂ€hrend der Modellierung. Bei der Vorverarbeitung werden zwei Methoden zur Verbesserung der statistischen Modelle durch die Erhöhung der QualitĂ€t von Trainingsdaten entwickelt. Bei der Modellierung werden verschiedene Möglichkeiten vorgestellt, um Daten nach ihrer NĂŒtzlichkeit zu gewichten. ZunĂ€chst wird der Effekt des Entfernens von False-Positives vom Parallel Corpus gezeigt. Ein Parallel Corpus besteht aus einem Text in zwei Sprachen, wobei jeder Satz einer Sprache mit dem entsprechenden Satz der anderen Sprache gepaart ist. Hierbei wird vorausgesetzt, dass die Anzahl der SĂ€tzen in beiden Sprachversionen gleich ist. False-Positives in diesem Sinne sind Satzpaare, die im Parallel Corpus gepaart sind aber keine Übersetzung voneinander sind. Um diese zu erkennen wird ein kleiner und fehlerfreier paralleler Corpus (Clean Corpus) vorausgesetzt. Mit Hilfe verschiedenen lexikalischen Eigenschaften werden zuverlĂ€ssig False-Positives vor der Modellierungsphase gefiltert. Eine wichtige lexikalische Eigenschaft hierbei ist das vom Clean Corpus erzeugte bilinguale Lexikon. In der Extraktion dieses bilingualen Lexikons werden verschiedene Heuristiken implementiert, die zu einer verbesserten Leistung fĂŒhren. Danach betrachten wir das Problem vom Extrahieren der nĂŒtzlichsten Teile der Trainingsdaten. Dabei ordnen wir die Daten basierend auf ihren Bezug zur Zieldomaine. Dies geschieht unter der Annahme der Existenz eines guten reprĂ€sentativen Tuning Datensatzes. Da solche Tuning Daten typischerweise beschrĂ€nkte GrĂ¶ĂŸe haben, werden WortĂ€hnlichkeiten benutzt um die Abdeckung der Tuning Daten zu erweitern. Die im vorherigen Schritt verwendeten WortĂ€hnlichkeiten sind entscheidend fĂŒr die QualitĂ€t des Verfahrens. Aus diesem Grund werden in der Arbeit verschiedene automatische Methoden zur Ermittlung von solche WortĂ€hnlichkeiten ausgehend von monoligual und biligual Corpora vorgestellt. Interessanterweise ist dies auch bei beschrĂ€nkten Daten möglich, indem auch monolinguale Daten, die in großen Mengen zur VerfĂŒgung stehen, zur Ermittlung der WortĂ€hnlichkeit herangezogen werden. Bei bilingualen Daten, die hĂ€ufig nur in beschrĂ€nkter GrĂ¶ĂŸe zur VerfĂŒgung stehen, können auch weitere Sprachpaare herangezogen werden, die mindestens eine Sprache mit dem vorgegebenen Sprachpaar teilen. Im Modellierungsschritt behandeln wir das Problem mit verrauschten Daten, indem die Trainingsdaten anhand der GĂŒte des Corpus gewichtet werden. Wir benutzen Statistik signifikante MessgrĂ¶ĂŸen, um die weniger verlĂ€sslichen Sequenzen zu finden und ihre Gewichtung zu reduzieren. Ähnlich zu den vorherigen AnsĂ€tzen, werden WortĂ€hnlichkeiten benutzt um das Problem bei begrenzten Daten zu behandeln. Ein weiteres Problem tritt allerdings auf sobald die absolute HĂ€ufigkeiten mit den gewichteten HĂ€ufigkeiten ersetzt werden. In dieser Arbeit werden hierfĂŒr Techniken zur GlĂ€ttung der Wahrscheinlichkeiten in dieser Situation entwickelt. Die GrĂ¶ĂŸe der Trainingsdaten werden problematisch sobald man mit Corpora von erheblichem Volumen arbeitet. Hierbei treten zwei Hauptschwierigkeiten auf: Die LĂ€nge der Trainingszeit und der begrenzte Arbeitsspeicher. FĂŒr das Problem der Trainingszeit wird ein Algorithmus entwickelt, der die rechenaufwendigen Berechnungen auf mehrere Prozessoren mit gemeinsamem Speicher ausfĂŒhrt. FĂŒr das Speicherproblem werden speziale Datenstrukturen und Algorithmen fĂŒr externe Speicher benutzt. Dies erlaubt ein effizientes Training von extrem großen Modellne in Hardware mit begrenztem Speicher

    MNEs Paradoxes in Responsible Global Business – A Theoretical and Empirical Investigation

    Get PDF
    In an increasingly interconnected and interdependent world, where our planet faces the risk of collapse, there is a growing call for all institutional actors to engage in supporting economic, social, and environmental ambitions to ensure humanity’s future and security. This dissertation aims to explore the critical role and position of multinational enterprises (MNEs) in addressing grand societal challenges. The research adopts a comprehensive and multidimensional framework to examine the various dimensions of MNEs’ competing and conflicting demands through a holistic approach. The first essays delve into existing academic literature associated with current approaches to deal with pursuing business and society goals through a bibliometric analysis. Based on the various conflicting and overlapping conceptualizations an overarching framework labeled responsible global business is proposed. The second essay is a theoretical development of propositions to address three global paradoxes faced by MNEs – purpose, global, and innovation. I posit that accepting and embracing contradictions as interrelated opposing elements of the same whole is essential to identify novel sources of innovation and competitiveness. Lastly, the third essay is an in-depth qualitative empirical examination of MNEs\u27 paradoxical tensions emergence, experience, and management. Ultimately, the research aims to contribute novel insights into how MNEs can play a transformative role in addressing grand societal challenges, fostering sustainable development, and ensuring a more secure and prosperous future for all
    • 

    corecore