220 research outputs found
An algorithmic framework for visualising and exploring multidimensional data
To help understand multidimensional data, information visualisation techniques are often applied to take advantage of human visual perception in exposing latent structure. A popular means of presenting such data is via two-dimensional scatterplots where the inter-point proximities reflect some notion of similarity between the entities represented. This can result in potentially interesting structure becoming almost immediately apparent. Traditional algorithms for carrying out this dimension reduction tend to have different strengths and weaknesses in terms of run times and layout quality. However, it has been found that the combination of algorithms can produce hybrid variants that exhibit significantly lower run times while maintaining accurate depictions of high-dimensional structure. The author's initial contribution in the creation of such algorithms led to the design and implementation of a software system (HIVE) for the development and investigation of new hybrid variants and the subsequent analysis of the data they transform. This development was motivated by the fact that there are potentially many hybrid algorithmic combinations to explore and therefore an environment that is conductive to their development, analysis and use is beneficial not only in exploring the data they transform but also in exploring the growing number of visualisation tools that these algorithms beget. This thesis descries three areas of the author's contribution to the field of information visualisation. Firstly, work on hybrid algorithms for dimension reduction is presented and their analysis shows their effectiveness. Secondly, the development of a framework for the creation of tailored hybrid algorithms is illustrated. Thirdly, a system embodying the framework, providing an environment conductive to the development, evaluation and use of the algorithms is described. Case studies are provided to demonstrate how the author and others have used and found value in the system across areas as diverse as environmental science, social science and investigative psychology, where multidimensional data are in abundance
CBR and MBR techniques: review for an application in the emergencies domain
The purpose of this document is to provide an in-depth analysis of current reasoning engine practice and the integration strategies of Case Based Reasoning and Model Based Reasoning that will be used in the design and development of the RIMSAT system.
RIMSAT (Remote Intelligent Management Support and Training) is a European Commission funded project designed to:
a.. Provide an innovative, 'intelligent', knowledge based solution aimed at improving the quality of critical decisions
b.. Enhance the competencies and responsiveness of individuals and organisations involved in highly complex, safety critical incidents - irrespective of their location.
In other words, RIMSAT aims to design and implement a decision support system that using Case Base Reasoning as well as Model Base Reasoning technology is applied in the management of emergency situations.
This document is part of a deliverable for RIMSAT project, and although it has been done in close contact with the requirements of the project, it provides an overview wide enough for providing a state of the art in integration strategies between CBR and MBR technologies.Postprint (published version
Strategies for image visualisation and browsing
PhDThe exploration of large information spaces has remained a challenging task even
though the proliferation of database management systems and the state-of-the art
retrieval algorithms is becoming pervasive. Signi cant research attention in the
multimedia domain is focused on nding automatic algorithms for organising digital
image collections into meaningful structures and providing high-semantic image
indices. On the other hand, utilisation of graphical and interactive methods from
information visualisation domain, provide promising direction for creating e cient
user-oriented systems for image management. Methods such as exploratory browsing
and query, as well as intuitive visual overviews of image collection, can assist
the users in nding patterns and developing the understanding of structures and
content in complex image data-sets.
The focus of the thesis is combining the features of automatic data processing
algorithms with information visualisation. The rst part of this thesis focuses on
the layout method for displaying the collection of images indexed by low-level visual
descriptors. The proposed solution generates graphical overview of the data-set as
a combination of similarity based visualisation and random layout approach.
Second part of the thesis deals with problem of visualisation and exploration for
hierarchical organisation of images. Due to the absence of the semantic information,
images are considered the only source of high-level information. The content preview
and display of hierarchical structure are combined in order to support image
retrieval. In addition to this, novel exploration and navigation methods are proposed
to enable the user to nd the way through database structure and retrieve
the content.
On the other hand, semantic information is available in cases where automatic
or semi-automatic image classi ers are employed. The automatic annotation of
image items provides what is referred to as higher-level information. This type
of information is a cornerstone of multi-concept visualisation framework which is
developed as a third part of this thesis. This solution enables dynamic generation
of user-queries by combining semantic concepts, supported by content overview and
information ltering.
Comparative analysis and user tests, performed for the evaluation of the proposed
solutions, focus on the ways information visualisation a ects the image content
exploration and retrieval; how e cient and comfortable are the users when
using di erent interaction methods and the ways users seek for information through
di erent types of database organisation
An on-demand fixture manufacturing cell for mass customisation production systems.
Master of Science in Engineering. University of KwaZulu-Natal, Durban, 2017.Increased demand for customised products has given rise to the research of mass customisation production systems. Customised products exhibit geometric differences that render the use of standard fixtures impractical. Fixtures must be configured or custom-manufactured according to the unique requirements of each product. Reconfigurable modular fixtures have emerged as a cost-effective solution to this problem. Customised fixtures must be made available to a mass customisation production system as rapidly as parts are manufactured. Scheduling the creation/modification of these fixtures must now be treated together with the production scheduling of parts on machines.
Scheduling and optimisation of such a problem in this context was found to be a unique avenue of research. An on-demand Fixture Manufacturing Cell (FxMC) that resides within a mass customisation production system was developed. This allowed fixtures to be created or reconfigured on-demand in a cellular manufacturing environment, according to the scheduling of the customised parts to be processed. The concept required the research and development of such a cell, together with the optimisation modelling and simulation of this cell in an appropriate manufacturing environment.
The research included the conceptualisation of a fixture manufacturing cell in a mass customisation production system. A proof-of-concept of the cell was assembled and automated in the laboratory. A three-stage optimisation method was developed to model and optimise the scheduling of the cell in the manufacturing environment. This included clustering of parts to fixtures; optimal scheduling of those parts on those fixtures; and a Mixed Integer Linear Programming (MILP) model to optimally synchronise the fixture manufacturing cell with the part processing cell. A heuristic was developed to solve the MILP problem much faster and for much larger problem sizes â producing good, feasible solutions. These problems were modelled and tested in MATLABÂź. The cell was simulated and tested in AnyLogicÂź.
The research topic is beneficial to mass customisation production systems, where the use of reconfigurable modular fixtures in the manufacturing process cannot be optimised with conventional scheduling approaches. The results showed that the model optimally minimised the total idle time of the production schedule; the heuristic also provided good, feasible solutions to those problems. The concept of the on-demand fixture manufacturing cell was found to be capable of facilitating the manufacture of customised products
A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text
Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as ÂŁ193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine âwhatâ was said as opposed to âhowâ. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies
Grounding semantic cognition using computational modelling and network analysis
The overarching objective of this thesis is to further the field of grounded semantics using a range of computational and empirical studies. Over the past thirty years, there have been many algorithmic advances in the
modelling of semantic cognition. A commonality across these cognitive models is a reliance on hand-engineering âtoy-modelsâ. Despite incorporating newer
techniques (e.g. Long short-term memory), the model inputs remain unchanged. We argue that the inputs to these traditional semantic models have little resemblance with real human experiences. In this dissertation, we ground our neural network models by training them with real-world visual scenes using naturalistic photographs. Our approach is an alternative to both hand-coded
features and embodied raw sensorimotor signals.
We conceptually replicate the mutually reinforcing nature of hybrid (feature-based and grounded) representations using silhouettes of concrete concepts as model inputs. We next gradually develop a novel grounded cognitive semantic representation which we call scene2vec, starting with object co-occurrences and then adding emotions and language-based tags. Limitations of our scene-based representation are identified for more abstract concepts (e.g. freedom). We further present a large-scale human semantics study, which reveals small-world semantic network topologies are context-dependent and
that scenes are the most dominant cognitive dimension. This finding leads us to conclude that there is no meaning without context. Lastly, scene2vec shows
promising human-like context-sensitive stereotypes (e.g. gender role bias), and we explore how such stereotypes are reduced by targeted debiasing. In conclusion, this thesis provides support for a novel computational
viewpoint on investigating meaning - scene-based grounded semantics. Future research scaling scene-based semantic models to human-levels through virtual grounding has the potential to unearth new insights into the human mind and
concurrently lead to advancements in artificial general intelligence by enabling robots, embodied or otherwise, to acquire and represent meaning directly from the environment
Molecular Formula Identification using High Resolution Mass Spectrometry: Algorithms and Applications in Metabolomics and Proteomics
Wir untersuchen mehrere theoretische und praktische Aspekte der Identifikation der Summenformel von BiomolekĂŒlen mit Hilfe von hochauflösender Massenspektrometrie. Durch die letzten Forschritte in der Instrumentation ist die Massenspektrometrie (MS) zur einen der SchlĂŒsseltechnologien fĂŒr die Analyse von BiomolekĂŒlen in der Proteomik und Metabolomik geworden. Sie misst die Massen der MolekĂŒle in der Probe mit hoher Genauigkeit, und ist fĂŒr die Messdatenerfassung im Hochdurchsatz gut geeignet. Eine der Kernaufgaben in der MS-basierten Proteomik und Metabolomik ist die Identifikation der MolekĂŒle in der Probe. In der Metabolomik unterliegen Metaboliten der StrukturaufklĂ€rung, beginnend bei der Summenformel eines MolekĂŒls, d.h. der Anzahl der Atome jedes Elements. Dies ist der entscheidende Schritt in der Identifikation eines unbekannten Metabolits, da die festgelegte Formel die Anzahl der möglichen MolekĂŒlstrukturen auf eine viel kleinere Menge reduziert, die mit Methoden der automatischen StrukturaufklĂ€rung weiter analysiert werden kann. Nach der Vorverarbeitung ist
die Ausgabe eines Massenspektrometers eine Liste von Peaks, die den MolekĂŒlmassen und deren IntensitĂ€ten, d.h. der Anzahl der MolekĂŒle mit einer bestimmten Masse, entspricht. Im Prinzip können die Summenformel kleiner MolekĂŒle nur mit prĂ€zisen Massen identifiziert werden. Allerdings wurde festgestellt, dass aufgrund der hohen Anzahl der chemisch legitimer Formeln in oberen Massenbereich eine exzellente Massengenaugkeit alleine fĂŒr die Identifikation nicht genĂŒgt. Hochauflösende MS erlaubt die Bestimmung der MolekĂŒlmassen und IntensitĂ€ten mit hervorragender Genauigkeit. In dieser Arbeit entwickeln wir mehrere Algorithmen und Anwendungen, die diese Information zur Identifikation der Summenformel der BiomolekĂŒlen anwenden
Methodology and Software for Interactive Decision Support
These Proceedings report the scientific results of an International Workshop on "Methodology and Software for Interactive Decision Support" organized jointly by the System and Decision Sciences Program of IIASA and The National Committee for Applied Systems Analysis and Management in Bulgaria. Several other Bulgarian institutions sponsored the workshop -- The Committee for Science to the Council of Ministers, The State Committee for Research and Technology and The Bulgarian Industrial Association. The workshop was held in Albena, on the Black Sea Coast.
In the first section, "Theory and Algorithms for Multiple Criteria Optimization," new theoretical developments in multiple criteria optimization are presented.
In the second section, "Theory, Methodology and Software for Decision Support Systems," the principles of building decision support systems are presented as well as software tools constituting the building components of such systems. Moreover, several papers are devoted to the general methodology of building such systems or present experimental design of systems supporting certain class of decision problems.
The third section addresses issues of "Applications of Decision Support Systems and Computer Implementations of Decision Support Systems." Another part of this section has a special character. Beside theoretical and methodological papers, several practical implementations of software for decision support have been presented during the workshop. These software packages varied from very experimental and illustrative implementations of some theoretical concept to well developed and documented systems being currently commercially distributed and used for solving practical problems
Learning from Noisy Data in Statistical Machine Translation
In dieser Arbeit wurden Methoden entwickelt, die in der Lage sind die negativen
Effekte von verrauschten Daten in SMT Systemen zu senken und dadurch die Leistung des
Systems zu steigern. Hierbei wird das Problem in zwei verschiedenen Schritten des
Lernprozesses behandelt: Bei der Vorverarbeitung und wÀhrend der
Modellierung. Bei der Vorverarbeitung werden zwei Methoden zur Verbesserung der
statistischen Modelle durch die Erhöhung der QualitÀt von Trainingsdaten entwickelt.
Bei der Modellierung werden verschiedene Möglichkeiten vorgestellt, um Daten nach ihrer NĂŒtzlichkeit zu gewichten.
ZunÀchst wird der Effekt des Entfernens von False-Positives vom Parallel Corpus
gezeigt. Ein Parallel Corpus besteht aus einem Text in zwei Sprachen,
wobei jeder Satz einer Sprache mit dem entsprechenden Satz der
anderen Sprache gepaart ist. Hierbei wird vorausgesetzt, dass die Anzahl
der SĂ€tzen in beiden Sprachversionen gleich ist. False-Positives in diesem
Sinne sind Satzpaare, die im Parallel Corpus gepaart sind aber keine Ăbersetzung voneinander sind.
Um diese zu erkennen wird ein kleiner und fehlerfreier
paralleler Corpus (Clean Corpus) vorausgesetzt. Mit Hilfe verschiedenen
lexikalischen Eigenschaften werden zuverlÀssig False-Positives vor der
Modellierungsphase gefiltert. Eine wichtige lexikalische Eigenschaft hierbei
ist das vom Clean Corpus erzeugte bilinguale Lexikon.
In der Extraktion dieses bilingualen Lexikons werden verschiedene Heuristiken implementiert, die zu einer verbesserten Leistung fĂŒhren.
Danach betrachten wir das Problem vom Extrahieren der nĂŒtzlichsten Teile der Trainingsdaten.
Dabei ordnen wir die Daten basierend auf ihren Bezug zur Zieldomaine.
Dies geschieht unter der Annahme der Existenz eines guten reprÀsentativen Tuning Datensatzes.
Da solche Tuning Daten typischerweise beschrĂ€nkte GröĂe haben,
werden WortÀhnlichkeiten benutzt um die Abdeckung der Tuning Daten zu erweitern.
Die im vorherigen Schritt verwendeten WortĂ€hnlichkeiten sind entscheidend fĂŒr
die QualitÀt des Verfahrens. Aus diesem Grund werden in der Arbeit verschiedene
automatische Methoden zur Ermittlung von solche WortÀhnlichkeiten ausgehend von
monoligual und biligual Corpora vorgestellt. Interessanterweise ist dies auch
bei beschrÀnkten Daten möglich, indem auch monolinguale
Daten, die in groĂen Mengen zur VerfĂŒgung stehen, zur Ermittlung der
WortĂ€hnlichkeit herangezogen werden. Bei bilingualen Daten, die hĂ€ufig nur in beschrĂ€nkter GröĂe zur
VerfĂŒgung stehen, können auch weitere Sprachpaare herangezogen werden, die mindestens eine Sprache mit dem
vorgegebenen Sprachpaar teilen.
Im Modellierungsschritt behandeln wir das Problem mit verrauschten Daten, indem die
Trainingsdaten anhand der GĂŒte des Corpus gewichtet werden.
Wir benutzen Statistik signifikante MessgröĂen, um die weniger verlĂ€sslichen
Sequenzen zu finden und ihre Gewichtung zu reduzieren.
Ăhnlich zu den vorherigen AnsĂ€tzen, werden WortĂ€hnlichkeiten benutzt um das Problem bei begrenzten Daten zu behandeln.
Ein weiteres Problem tritt allerdings auf sobald die absolute HĂ€ufigkeiten mit den gewichteten HĂ€ufigkeiten ersetzt werden. In dieser Arbeit werden hierfĂŒr Techniken zur GlĂ€ttung der Wahrscheinlichkeiten in dieser Situation entwickelt.
Die GröĂe der Trainingsdaten werden problematisch sobald man mit Corpora von erheblichem Volumen arbeitet.
Hierbei treten zwei Hauptschwierigkeiten auf: Die LĂ€nge der Trainingszeit und der begrenzte Arbeitsspeicher.
FĂŒr das Problem der Trainingszeit wird ein Algorithmus entwickelt, der die rechenaufwendigen Berechnungen auf mehrere Prozessoren mit gemeinsamem Speicher ausfĂŒhrt.
FĂŒr das Speicherproblem werden speziale Datenstrukturen und Algorithmen fĂŒr externe Speicher benutzt.
Dies erlaubt ein effizientes Training von extrem groĂen Modellne in Hardware mit begrenztem Speicher
MNEs Paradoxes in Responsible Global Business â A Theoretical and Empirical Investigation
In an increasingly interconnected and interdependent world, where our planet faces the risk of collapse, there is a growing call for all institutional actors to engage in supporting economic, social, and environmental ambitions to ensure humanityâs future and security. This dissertation aims to explore the critical role and position of multinational enterprises (MNEs) in addressing grand societal challenges. The research adopts a comprehensive and multidimensional framework to examine the various dimensions of MNEsâ competing and conflicting demands through a holistic approach. The first essays delve into existing academic literature associated with current approaches to deal with pursuing business and society goals through a bibliometric analysis. Based on the various conflicting and overlapping conceptualizations an overarching framework labeled responsible global business is proposed. The second essay is a theoretical development of propositions to address three global paradoxes faced by MNEs â purpose, global, and innovation. I posit that accepting and embracing contradictions as interrelated opposing elements of the same whole is essential to identify novel sources of innovation and competitiveness. Lastly, the third essay is an in-depth qualitative empirical examination of MNEs\u27 paradoxical tensions emergence, experience, and management. Ultimately, the research aims to contribute novel insights into how MNEs can play a transformative role in addressing grand societal challenges, fostering sustainable development, and ensuring a more secure and prosperous future for all
- âŠ