20 research outputs found
Navigating Diverse Datasets in the Face of Uncertainty
When exploring big volumes of data, one of the challenging aspects is their diversity
of origin. Multiple files that have not yet been ingested into a database system may
contain information of interest to a researcher, who must curate, understand and sieve
their content before being able to extract knowledge.
Performance is one of the greatest difficulties in exploring these datasets. On the
one hand, examining non-indexed, unprocessed files can be inefficient. On the other
hand, any processing before its understanding introduces latency and potentially un-
necessary work if the chosen schema matches poorly the data. We have surveyed the
state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle
data in-situ performantly.
Another major difficulty is matching files from multiple origins since their schema
and layout may not be compatible or properly documented. Most surveyed solutions
overlook this problem, especially for numeric, uncertain data, as is typical in fields
like astronomy.
The main objective of our research is to assist data scientists during the exploration
of unprocessed, numerical, raw data distributed across multiple files based solely on
its intrinsic distribution.
In this thesis, we first introduce the concept of Equally-Distributed Dependencies,
which provides the foundations to match this kind of dataset. We propose PresQ,
a novel algorithm that finds quasi-cliques on hypergraphs based on their expected
statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can
be assumed to be the same.
Finally, we propose a two-sample statistical test based on Self-Organizing Maps
(SOM). This method can outperform, in terms of power, other classifier-based two-
sample tests, being in some cases comparable to kernel-based methods, with the
advantage of being interpretable.
Both PresQ and the SOM-based statistical test can provide insights that drive
serendipitous discoveries
Artificial Intelligence and Cognitive Computing
Artificial intelligence (AI) is a subject garnering increasing attention in both academia and the industry today. The understanding is that AI-enhanced methods and techniques create a variety of opportunities related to improving basic and advanced business functions, including production processes, logistics, financial management and others. As this collection demonstrates, AI-enhanced tools and methods tend to offer more precise results in the fields of engineering, financial accounting, tourism, air-pollution management and many more. The objective of this collection is to bring these topics together to offer the reader a useful primer on how AI-enhanced tools and applications can be of use in today’s world. In the context of the frequently fearful, skeptical and emotion-laden debates on AI and its value added, this volume promotes a positive perspective on AI and its impact on society. AI is a part of a broader ecosystem of sophisticated tools, techniques and technologies, and therefore, it is not immune to developments in that ecosystem. It is thus imperative that inter- and multidisciplinary research on AI and its ecosystem is encouraged. This collection contributes to that
Women in Artificial intelligence (AI)
This Special Issue, entitled "Women in Artificial Intelligence" includes 17 papers from leading women scientists. The papers cover a broad scope of research areas within Artificial Intelligence, including machine learning, perception, reasoning or planning, among others. The papers have applications to relevant fields, such as human health, finance, or education. It is worth noting that the Issue includes three papers that deal with different aspects of gender bias in Artificial Intelligence. All the papers have a woman as the first author. We can proudly say that these women are from countries worldwide, such as France, Czech Republic, United Kingdom, Australia, Bangladesh, Yemen, Romania, India, Cuba, Bangladesh and Spain. In conclusion, apart from its intrinsic scientific value as a Special Issue, combining interesting research works, this Special Issue intends to increase the invisibility of women in AI, showing where they are, what they do, and how they contribute to developments in Artificial Intelligence from their different places, positions, research branches and application fields. We planned to issue this book on the on Ada Lovelace Day (11/10/2022), a date internationally dedicated to the first computer programmer, a woman who had to fight the gender difficulties of her times, in the XIX century. We also thank the publisher for making this possible, thus allowing for this book to become a part of the international activities dedicated to celebrating the value of women in ICT all over the world. With this book, we want to pay homage to all the women that contributed over the years to the field of AI
Footfall and the territorialisation of urban places measured through the rhythms of social activity
The UK high street is constantly changing and evolving in response to, for example, online sales, out-of-town developments, and economic crises. With over 10 years of hourly footfall counts from sensors across the UK, this study was an opportunity to perform a longitudinal and quantitative investigation to diagnose how these changes are reflected in the changing patterns of pedestrian activity.
Footfall provides a recognised performance measure of place vitality. However, through a lack of data availability due to historic manual counting methods, few opportunities to contextualise the temporal patterns longitudinally have existed. This study therefore investigates daily, weekly, and annual footfall patterns, to diagnose the similarities and differences between places as social activity patterns from UK high streets evolve over time.
Theoretically, footfall is conceptualised within the framework of Territorology and Assemblage Theory, conceptually underpinning a quantitative approach to represent the collective meso-level (street and town-centre) patterns of footfall (social) activity. To explore the data, the periodic signatures of daily, weekly, and annual footfall are extracted using STL (seasonal trend decomposition using Loess) algorithms and the outputs are then analysed using fuzzy clustering techniques. The analyses successfully identify daily, weekly, and annual periodic patterns and diagnose the varying social activity patterns for different urban place types and how places, both individually and collectively are changing.
Footfall is demonstrated to be a performance measure of meso-scale changes in collective social activity. For place management, the fuzzy analysis provides an analytical tool to monitor the annual, weekly, and daily footfall signatures providing an evidence-based diagnostic of how places are changing over time. The place manager is therefore better able to identify place specific interventions that correspond to the usage patterns of visitors and adapt these interventions as behaviours change
Industrial Applications: New Solutions for the New Era
This book reprints articles from the Special Issue "Industrial Applications: New Solutions for the New Age" published online in the open-access journal Machines (ISSN 2075-1702). This book consists of twelve published articles. This special edition belongs to the "Mechatronic and Intelligent Machines" section
Using Particle Swarm Optimization for Market Timing Strategies
Market timing is the issue of deciding when to buy or sell a given asset on the market. As one of the core issues of algorithmic trading systems, designers of such system have turned to computational intelligence methods to aid them in this task. In this thesis, we explore the use of Particle Swarm Optimization (PSO) within the domain of market timing.nPSO is a search metaheuristic that was first introduced in 1995 [28] and is based on the behavior of birds in flight. Since its inception, the PSO metaheuristic has seen extensions to adapt it to a variety of problems including single objective optimization, multiobjective optimization, niching and dynamic optimization problems. Although popular in other domains, PSO has seen limited application to the issue of market timing. The current incumbent algorithm within the market timing domain is Genetic Algorithms (GA), based on the volume of publications as noted in [40] and [84]. In this thesis, we use PSO to compose market timing strategies using technical analysis indicators. Our first contribution is to use a formulation that considers both the selection of components and the tuning of their parameters in a simultaneous manner, and approach market timing as a single objective optimization problem. Current approaches only considers one of those aspects at a time: either selecting from a set of components with fixed values for their parameters or tuning the parameters of a preset selection of components. Our second contribution is proposing a novel training and testing methodology that explicitly exposes candidate market timing strategies to numerous price trends to reduce the likelihood of overfitting to a particular trend and give a better approximation of performance under various market conditions. Our final contribution is to consider market timing as a multiobjective optimization problem, optimizing five financial metrics and comparing the performance of our PSO variants against a well established multiobjective optimization algorithm. These algorithms address unexplored research areas in the context of PSO algorithms to the best of our knowledge, and are therefore original contributions. The computational results over a range of datasets shows that the proposed PSO algorithms are competitive to GAs using the same formulation. Additionally, the multiobjective variant of our PSO algorithm achieve statistically significant improvements over NSGA-II
Interactive optimisation for high-lift design.
Interactivity always involves two entities; one of them by default is a human user. The specialised subject of human factors is introduced in the context of computational aerodynamics and optimisation, specifically a high-lift aerofoil. The trial and error nature of a design process hinges on designer’s knowledge, skill and intuition. A basic, important assumption of a man-machine system is that in solving a problem, there are some steps in which the computer has an advantageous edge while in other steps a human has dominance. Computational technologies are now an indispensable part of aerospace technology; algorithms involving significant user interaction, either during the process of generating solutions or as a component of post-optimisation evaluation where human decision making is involved are increasingly becoming popular, multi-objective particle swarm is one such optimiser.
Several design optimisation problems in engineering are by nature multi-objective; the interest of a designer lies in simultaneous optimisation against two or more objectives
which are usually in conflict. Interactive optimisation allows the designer to understand trade-offs between various objectives, and is generally used as a tool for decision making. The solution to a multi-objective problem, one where betterment in one objective occurs over the deterioration of at least one other objective is called a Pareto set. There are multiple solutions to a problem and multiple betterment ideas to an already existing design. The final responsibility of identifying an optimal solution or idea rests on the design engineers and decision making is done based on quantitative metrics, displayed as numbers or graphs. However, visualisation, ergonomics and human factors influence and impact this decision making process.
A visual, graphical depiction of the Pareto front is oftentimes used as a design aid tool for purposes of decision making with chances of errors and fallacies fundamentally
existing in engineering design. An effective visualisation tool benefits complex engineering analyses by providing the decision-maker with a good imagery of the most important information. Two high-lift aerofoil data-sets have been used as test-case examples; a multi-element solver, an optimiser based on swarm intelligence technique, and visual techniques which include parallel co-ordinates, heat map, scatter plot, self-organising map and radial coordinate visualisation comprise the module. Factors that affect optima and various evaluation criteria have been studied in light of the human user.
This research enquires into interactive optimisation by adapting three interactive approaches: information trade-off, reference point and classification, and investigates selected visualisation techniques which act as chief aids in the context of high-lift design trade studies. Human-in-the-loop engineering, man-machine interaction & interface along with influencing factors, reliability, validation and verification in the presence of design uncertainty are considered. The research structure, choice of optimiser and visual aids adapted in this work are influenced by and streamlined to fit with the parallel on-going development work on Airbus’ Python based tool.
Results, analysis, together with literature survey are presented in this report. The words human, user, engineer, aerodynamicist, designer, analyst and decision-maker/ DM are
synonymous, and are used interchangeably in this research.
In a virtual engineering setting, for an efficient interactive optimisation task, a suitable visualisation tool is a crucial prerequisite. Various optimisation design tools & methods are most useful when combined with a human engineer's insight is the underlying premise of this work; questions such as why, what, how might help aid aeronautical technical innovation.PhD in Aerospac
Effective Visualization Approaches For Ultra-High Dimensional Datasets
Multivariate informational data, which are abstract as well as complex, are becoming increasingly common in many areas such as scientific, medical, social, business, and so on. Displaying and analyzing large amounts of multivariate data with more than three variables of different types is quite challenging. Visualization of such multivariate data suffers from a high degree of clutter when the numbers of dimensions/variables and data observations become too large. We propose multiple approaches to effectively visualize large datasets of ultrahigh number of dimensions by generalizing two standard multivariate visualization methods, namely star plot and parallel coordinates plot. We refine three variants of the star plot, which include overlapped star plot, shifted origin plot, and multilevel star plot by embedding distribution plots, displaying dataset in groups, and supporting adjustable positioning of the star axes. We introduce a bifocal parallel coordinates plot (BPCP) based on the focus + context approach. BPCP splits vertically the overall rendering area into the focus and context regions. The focus area maps a few selected dimensions of interest at sufficiently wide spacing. The remaining dimensions are represented in the context area in a compact way to retain useful information and provide the data continuity. The focus display can be further enriched with various options, such as axes overlays, scatterplot, and nested PCPs. In order to accommodate an arbitrarily large number of dimensions, the context display supports the multi-level stacked view. Finally, we present two innovative ways of enhancing parallel coordinates axes to better understand all variables and their interrelationships in high-dimensional datasets. Histogram and circle/ellipse plots based on uniform and non-uniform frequency/density mappings are adopted to visualize distributions of numerical and categorical data values. Color-mapped axis stripes are designed in the parallel coordinates layout so that correlations can be fully realized in the same display plot irrespective of axes locations. These colors are also propagated to histograms as stacked bars and categorical values as pie charts to further facilitate data exploration. By using the datasets consisting of 25 to 130 variables of different data types we have demonstrated effectiveness of the proposed multivariate visualization enhancements