78 research outputs found
Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example
Machine learning (ML) is widely used to explore crystal materials and predict
their properties. However, the training is time-consuming for deep-learning
models, and the regression process is a black box that is hard to interpret.
Also, the preprocess to transfer a crystal structure into the input of ML,
called descriptor, needs to be designed carefully. To efficiently predict
important properties of materials, we propose an approach based on ensemble
learning consisting of regression trees to predict formation energy and elastic
constants based on small-size datasets of carbon allotropes as an example.
Without using any descriptor, the inputs are the properties calculated by
molecular dynamics with 9 different classical interatomic potentials. Overall,
the results from ensemble learning are more accurate than those from classical
interatomic potentials, and ensemble learning can capture the relatively
accurate properties from the 9 classical potentials as criteria for predicting
the final properties
Text Similarity Between Concepts Extracted from Source Code and Documentation
Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations
The increased digitalisation and monitoring of the energy system opens up
numerous opportunities to decarbonise the energy system. Applications on low
voltage, local networks, such as community energy markets and smart storage
will facilitate decarbonisation, but they will require advanced control and
management. Reliable forecasting will be a necessary component of many of these
systems to anticipate key features and uncertainties. Despite this urgent need,
there has not yet been an extensive investigation into the current
state-of-the-art of low voltage level forecasts, other than at the smart meter
level. This paper aims to provide a comprehensive overview of the landscape,
current approaches, core applications, challenges and recommendations. Another
aim of this paper is to facilitate the continued improvement and advancement in
this area. To this end, the paper also surveys some of the most relevant and
promising trends. It establishes an open, community-driven list of the known
low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape
Acoustically Inspired Probabilistic Time-domain Music Transcription and Source Separation.
PhD ThesisAutomatic music transcription (AMT) and source separation are important
computational tasks, which can help to understand, analyse and process music
recordings. The main purpose of AMT is to estimate, from an observed
audio recording, a latent symbolic representation of a piece of music (piano-roll).
In this sense, in AMT the duration and location of every note played is
reconstructed from a mixture recording. The related task of source separation
aims to estimate the latent functions or source signals that were mixed
together in an audio recording. This task requires not only the duration and
location of every event present in the mixture, but also the reconstruction
of the waveform of all the individual sounds. Most methods for AMT and
source separation rely on the magnitude of time-frequency representations
of the analysed recording, i.e., spectrograms, and often arbitrarily discard
phase information. On one hand, this decreases the time resolution in AMT.
On the other hand, discarding phase information corrupts the reconstruction
in source separation, because the phase of each source-spectrogram must
be approximated. There is thus a need for models that circumvent phase
approximation, while operating at sample-rate resolution.
This thesis intends to solve AMT and source separation together from
an unified perspective. For this purpose, Bayesian non-parametric signal
processing, covariance kernels designed for audio, and scalable variational
inference are integrated to form efficient and acoustically-inspired probabilistic
models. To circumvent phase approximation while keeping sample-rate
resolution, AMT and source separation are addressed from a Bayesian time-domain
viewpoint. That is, the posterior distribution over the waveform of
each sound event in the mixture is computed directly from the observed data.
For this purpose, Gaussian processes (GPs) are used to define priors over the
sources/pitches. GPs are probability distributions over functions, and its
kernel or covariance determines the properties of the functions sampled from
a GP. Finally, the GP priors and the available data (mixture recording) are
combined using Bayes' theorem in order to compute the posterior distributions
over the sources/pitches.
Although the proposed paradigm is elegant, it introduces two main challenges.
First, as mentioned before, the kernel of the GP priors determines the
properties of each source/pitch function, that is, its smoothness, stationariness,
and more importantly its spectrum. Consequently, the proposed model
requires the design of flexible kernels, able to learn the rich frequency content
and intricate properties of audio sources. To this end, spectral mixture
(SM) kernels are studied, and the Mat ern spectral mixture (MSM) kernel
is introduced, i.e. a modified version of the SM covariance function. The
MSM kernel introduces less strong smoothness, thus it is more suitable for
modelling physical processes. Second, the computational complexity of GP
inference scales cubically with the number of audio samples. Therefore, the
application of GP models to large audio signals becomes intractable. To
overcome this limitation, variational inference is used to make the proposed
model scalable and suitable for signals in the order of hundreds of thousands
of data points.
The integration of GP priors, kernels intended for audio, and variational
inference could enable AMT and source separation time-domain methods to
reconstruct sources and transcribe music in an efficient and informed manner.
In addition, AMT and source separation are current challenges, because
the spectra of the sources/pitches overlap with each other in intricate
ways. Thus, the development of probabilistic models capable of differentiating
sources/pitches in the time domain, despite the high similarity between
their spectra, opens the possibility to take a step towards solving source separation
and automatic music transcription. We demonstrate the utility of our
methods using real and synthesized music audio datasets for various types of
musical instruments
Automatic machine learning:methods, systems, challenges
This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself
Information technologies for pain management
Millions of people around the world suffer from pain, acute or chronic and this raises the
importance of its screening, assessment and treatment. The importance of pain is attested by
the fact that it is considered the fifth vital sign for indicating basic bodily functions, health
and quality of life, together with the four other vital signs: blood pressure, body
temperature, pulse rate and respiratory rate. However, while these four signals represent an
objective physical parameter, the occurrence of pain expresses an emotional status that
happens inside the mind of each individual and therefore, is highly subjective that makes
difficult its management and evaluation. For this reason, the self-report of pain is considered
the most accurate pain assessment method wherein patients should be asked to periodically
rate their pain severity and related symptoms. Thus, in the last years computerised systems
based on mobile and web technologies are becoming increasingly used to enable patients to
report their pain which lead to the development of electronic pain diaries (ED). This approach
may provide to health care professionals (HCP) and patients the ability to interact with the
system anywhere and at anytime thoroughly changes the coordinates of time and place and
offers invaluable opportunities to the healthcare delivery. However, most of these systems
were designed to interact directly to patients without presence of a healthcare professional
or without evidence of reliability and accuracy. In fact, the observation of the existing
systems revealed lack of integration with mobile devices, limited use of web-based interfaces
and reduced interaction with patients in terms of obtaining and viewing information. In
addition, the reliability and accuracy of computerised systems for pain management are
rarely proved or their effects on HCP and patients outcomes remain understudied.
This thesis is focused on technology for pain management and aims to propose a monitoring
system which includes ubiquitous interfaces specifically oriented to either patients or HCP
using mobile devices and Internet so as to allow decisions based on the knowledge obtained
from the analysis of the collected data. With the interoperability and cloud computing
technologies in mind this system uses web services (WS) to manage data which are stored in a
Personal Health Record (PHR).
A Randomised Controlled Trial (RCT) was implemented so as to determine the effectiveness
of the proposed computerised monitoring system. The six weeks RCT evidenced the
advantages provided by the ubiquitous access to HCP and patients so as to they were able to
interact with the system anywhere and at anytime using WS to send and receive data. In
addition, the collected data were stored in a PHR which offers integrity and security as well
as permanent on line accessibility to both patients and HCP. The study evidenced not only
that the majority of participants recommend the system, but also that they recognize it
suitability for pain management without the requirement of advanced skills or experienced users. Furthermore, the system enabled the definition and management of patient-oriented
treatments with reduced therapist time. The study also revealed that the guidance of HCP at
the beginning of the monitoring is crucial to patients' satisfaction and experience stemming
from the usage of the system as evidenced by the high correlation between the
recommendation of the application, and it suitability to improve pain management and to
provide medical information. There were no significant differences regarding to
improvements in the quality of pain treatment between intervention group and control group.
Based on the data collected during the RCT a clinical decision support system (CDSS) was
developed so as to offer capabilities of tailored alarms, reports, and clinical guidance. This
CDSS, called Patient Oriented Method of Pain Evaluation System (POMPES), is based on the
combination of several statistical models (one-way ANOVA, Kruskal-Wallis and Tukey-Kramer)
with an imputation model based on linear regression. This system resulted in fully accuracy
related to decisions suggested by the system compared with the medical diagnosis, and
therefore, revealed it suitability to manage the pain. At last, based on the aerospace systems
capability to deal with different complex data sources with varied complexities and
accuracies, an innovative model was proposed. This model is characterized by a qualitative
analysis stemming from the data fusion method combined with a quantitative model based on
the comparison of the standard deviation together with the values of mathematical
expectations. This model aimed to compare the effects of technological and pen-and-paper
systems when applied to different dimension of pain, such as: pain intensity, anxiety,
catastrophizing, depression, disability and interference. It was observed that pen-and-paper
and technology produced equivalent effects in anxiety, depression, interference and pain
intensity. On the contrary, technology evidenced favourable effects in terms of
catastrophizing and disability. The proposed method revealed to be suitable, intelligible, easy
to implement and low time and resources consuming. Further work is needed to evaluate the
proposed system to follow up participants for longer periods of time which includes a
complementary RCT encompassing patients with chronic pain symptoms. Finally, additional
studies should be addressed to determine the economic effects not only to patients but also
to the healthcare system
- …