1,542 research outputs found
Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context
Financial simulators play an important role in enhancing forecasting
accuracy, managing risks, and fostering strategic financial decision-making.
Despite the development of financial market simulation methodologies, existing
frameworks often struggle with adapting to specialized simulation context. We
pinpoint the challenges as i) current financial datasets do not contain context
labels; ii) current techniques are not designed to generate financial data with
context as control, which demands greater precision compared to other
modalities; iii) the inherent difficulties in generating context-aligned,
high-fidelity data given the non-stationary, noisy nature of financial data. To
address these challenges, our contributions are: i) we proposed the Contextual
Market Dataset with market dynamics, stock ticker, and history state as
context, leveraging a market dynamics modeling method that combines linear
regression and Dynamic Time Warping clustering to extract market dynamics; ii)
we present Market-GAN, a novel architecture incorporating a Generative
Adversarial Networks (GAN) for the controllable generation with context, an
autoencoder for learning low-dimension features, and supervisors for knowledge
transfer; iii) we introduce a two-stage training scheme to ensure that
Market-GAN captures the intrinsic market distribution with multiple objectives.
In the pertaining stage, with the use of the autoencoder and supervisors, we
prepare the generator with a better initialization for the adversarial training
stage. We propose a set of holistic evaluation metrics that consider alignment,
fidelity, data usability on downstream tasks, and market facts. We evaluate
Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and
showcase superior performance in comparison to 4 state-of-the-art time-series
generative models
Generating tabular datasets under differential privacy
Machine Learning (ML) is accelerating progress across fields and industries,
but relies on accessible and high-quality training data. Some of the most
important datasets are found in biomedical and financial domains in the form of
spreadsheets and relational databases. But this tabular data is often sensitive
in nature. Synthetic data generation offers the potential to unlock sensitive
data, but generative models tend to memorise and regurgitate training data,
which undermines the privacy goal. To remedy this, researchers have
incorporated the mathematical framework of Differential Privacy (DP) into the
training process of deep neural networks. But this creates a trade-off between
the quality and privacy of the resulting data. Generative Adversarial Networks
(GANs) are the dominant paradigm for synthesising tabular data under DP, but
suffer from unstable adversarial training and mode collapse, which are
exacerbated by the privacy constraints and challenging tabular data modality.
This work optimises the quality-privacy trade-off of generative models,
producing higher quality tabular datasets with the same privacy guarantees. We
implement novel end-to-end models that leverage attention mechanisms to learn
reversible tabular representations. We also introduce TableDiffusion, the first
differentially-private diffusion model for tabular data synthesis. Our
experiments show that TableDiffusion produces higher-fidelity synthetic
datasets, avoids the mode collapse problem, and achieves state-of-the-art
performance on privatised tabular data synthesis. By implementing
TableDiffusion to predict the added noise, we enabled it to bypass the
challenges of reconstructing mixed-type tabular data. Overall, the diffusion
paradigm proves vastly more data and privacy efficient than the adversarial
paradigm, due to augmented re-use of each data batch and a smoother iterative
training process
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Deep learning (DL) enables the development of computer models that are
capable of learning, visualizing, optimizing, refining, and predicting data. In
recent years, DL has been applied in a range of fields, including audio-visual
data processing, agriculture, transportation prediction, natural language,
biomedicine, disaster management, bioinformatics, drug design, genomics, face
recognition, and ecology. To explore the current state of deep learning, it is
necessary to investigate the latest developments and applications of deep
learning in these disciplines. However, the literature is lacking in exploring
the applications of deep learning in all potential sectors. This paper thus
extensively investigates the potential applications of deep learning across all
major fields of study as well as the associated benefits and challenges. As
evidenced in the literature, DL exhibits accuracy in prediction and analysis,
makes it a powerful computational tool, and has the ability to articulate
itself and optimize, making it effective in processing data with no prior
training. Given its independence from training data, deep learning necessitates
massive amounts of data for effective analysis and processing, much like data
volume. To handle the challenge of compiling huge amounts of medical,
scientific, healthcare, and environmental data for use in deep learning, gated
architectures like LSTMs and GRUs can be utilized. For multimodal learning,
shared neurons in the neural network for all activities and specialized neurons
for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table
Will they take this offer? A machine learning price elasticity model for predicting upselling acceptance of premium airline seating
Employing customer information from one of the world's largest airline companies, we develop a price elasticity model (PREM) using machine learning to identify customers likely to purchase an upgrade offer from economy to premium class and predict a customer's acceptable price range. A simulation of 64.3 million flight bookings and 14.1 million email offers over three years mirroring actual data indicates that PREM implementation results in approximately 1.12 million (7.94%) fewer non-relevant customer email messages, a predicted increase of 72,200 (37.2%) offers accepted, and an estimated $72.2 million (37.2%) of increased revenue. Our results illustrate the potential of automated pricing information and targeting marketing messages for upselling acceptance. We also identified three customer segments: (1) Never Upgrades are those who never take the upgrade offer, (2) Upgrade Lovers are those who generally upgrade, and (3) Upgrade Lover Lookalikes have no historical record but fit the profile of those that tend to upgrade. We discuss the implications for airline companies and related travel and tourism industries.© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications
Biopsychosocial Assessment and Ergonomics Intervention for Sustainable Living: A Case Study on Flats
This study proposes an ergonomics-based approach for those who are living in small housings (known as flats) in Indonesia. With regard to human capability and limitation, this research shows how the basic needs of human beings are captured and analyzed, followed by proposed designs of facilities and standard living in small housings. Ninety samples were involved during the study through in- depth interview and face-to-face questionnaire. The results show that there were some proposed of modification of critical facilities (such as multifunction ironing work station, bed furniture, and clothesline) and validated through usability testing. Overall, it is hoped that the proposed designs will support biopsychosocial needs and sustainability
A genetic algorithm optimized fractal model to predict the constriction resistance from surface roughness measurements
The electrical contact resistance greatly influences the thermal behavior of substation connectors and other electrical equipment. During the design stage of such electrical devices, it is essential to accurately predict the contact resistance to achieve an optimal thermal behavior, thus ensuring contact stability and extended service life. This paper develops a genetic algorithm (GA) approach to determine the optimal values of the parameters of a fractal model of rough surfaces to accurately predict the measured value of the surface roughness. This GA-optimized fractal model provides an accurate prediction of the contact resistance when the electrical and mechanical properties of the contacting materials, surface roughness, contact pressure, and apparent area of contact are known. Experimental results corroborate the usefulness and accuracy of the proposed approach. Although the proposed model has been validated for substation connectors, it can also be applied in the design stage of many other electrical equipments.Postprint (author's final draft
Accelerating Defect Predictions in Semiconductors Using Graph Neural Networks
Here, we develop a framework for the prediction and screening of native
defects and functional impurities in a chemical space of Group IV, III-V, and
II-VI zinc blende (ZB) semiconductors, powered by crystal Graph-based Neural
Networks (GNNs) trained on high-throughput density functional theory (DFT)
data. Using an innovative approach of sampling partially optimized defect
configurations from DFT calculations, we generate one of the largest
computational defect datasets to date, containing many types of vacancies,
self-interstitials, anti-site substitutions, impurity interstitials and
substitutions, as well as some defect complexes. We applied three types of
established GNN techniques, namely Crystal Graph Convolutional Neural Network
(CGCNN), Materials Graph Network (MEGNET), and Atomistic Line Graph Neural
Network (ALIGNN), to rigorously train models for predicting defect formation
energy (DFE) in multiple charge states and chemical potential conditions. We
find that ALIGNN yields the best DFE predictions with root mean square errors
around 0.3 eV, which represents a prediction accuracy of 98 % given the range
of values within the dataset, improving significantly on the state-of-the-art.
Models are tested for different defect types as well as for defect charge
transition levels. We further show that GNN-based defective structure
optimization can take us close to DFT-optimized geometries at a fraction of the
cost of full DFT. DFT-GNN models enable prediction and screening across
thousands of hypothetical defects based on both unoptimized and
partially-optimized defective structures, helping identify electronically
active defects in technologically-important semiconductors
Can Tabular Generative Models Generate Realistic Synthetic Near Infrared Spectroscopic Data?
In this thesis, we evaluated the performance of two generative models, Conditional Tabular Gen-
erative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), from the
open-source library Synthetic Data Vault (SDV), for generating synthetic Near Infrared (NIR)
spectral data. The aim was to assess the viability of these models in synthetic data generation
for predicting Dry Matter Content (DMC) in the field of NIR spectroscopy. The fidelity and
utility of the synthetic data were examined through a series of benchmarks, including statistical
comparisons, dimensionality reduction, and machine learning tasks.
The results showed that while both CTGAN and TVAE could generate synthetic data with
statistical properties similar to real data, TVAE outperformed CTGAN in terms of preserving
the correlation structure of the data and the relationship between the features and the target
variable, DMC. However, the synthetic data fell short in fooling machine learning classifiers,
indicating a persisting challenge in synthetic data generation.
With respect to utility, neither synthetic dataset produced by CTGAN or TVAE could serve as
a satisfactory substitute for real data in training machine learning models for predicting DMC.
Although TVAE-generated synthetic data showed some potential when used with Random For-
est (RF) and K-Nearest Neighbors (KNN) classifiers, the performance was still inadequate for
practical use.
This study offers valuable insights into the use of generative models for synthetic NIR spectral
data generation, highlighting their current limitations and potential areas for future research
- …