78 research outputs found

    Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

    Full text link
    Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

    Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations

    Full text link
    The increased digitalisation and monitoring of the energy system opens up numerous opportunities to decarbonise the energy system. Applications on low voltage, local networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape

    Acoustically Inspired Probabilistic Time-domain Music Transcription and Source Separation.

    Get PDF
    PhD ThesisAutomatic music transcription (AMT) and source separation are important computational tasks, which can help to understand, analyse and process music recordings. The main purpose of AMT is to estimate, from an observed audio recording, a latent symbolic representation of a piece of music (piano-roll). In this sense, in AMT the duration and location of every note played is reconstructed from a mixture recording. The related task of source separation aims to estimate the latent functions or source signals that were mixed together in an audio recording. This task requires not only the duration and location of every event present in the mixture, but also the reconstruction of the waveform of all the individual sounds. Most methods for AMT and source separation rely on the magnitude of time-frequency representations of the analysed recording, i.e., spectrograms, and often arbitrarily discard phase information. On one hand, this decreases the time resolution in AMT. On the other hand, discarding phase information corrupts the reconstruction in source separation, because the phase of each source-spectrogram must be approximated. There is thus a need for models that circumvent phase approximation, while operating at sample-rate resolution. This thesis intends to solve AMT and source separation together from an unified perspective. For this purpose, Bayesian non-parametric signal processing, covariance kernels designed for audio, and scalable variational inference are integrated to form efficient and acoustically-inspired probabilistic models. To circumvent phase approximation while keeping sample-rate resolution, AMT and source separation are addressed from a Bayesian time-domain viewpoint. That is, the posterior distribution over the waveform of each sound event in the mixture is computed directly from the observed data. For this purpose, Gaussian processes (GPs) are used to define priors over the sources/pitches. GPs are probability distributions over functions, and its kernel or covariance determines the properties of the functions sampled from a GP. Finally, the GP priors and the available data (mixture recording) are combined using Bayes' theorem in order to compute the posterior distributions over the sources/pitches. Although the proposed paradigm is elegant, it introduces two main challenges. First, as mentioned before, the kernel of the GP priors determines the properties of each source/pitch function, that is, its smoothness, stationariness, and more importantly its spectrum. Consequently, the proposed model requires the design of flexible kernels, able to learn the rich frequency content and intricate properties of audio sources. To this end, spectral mixture (SM) kernels are studied, and the Mat ern spectral mixture (MSM) kernel is introduced, i.e. a modified version of the SM covariance function. The MSM kernel introduces less strong smoothness, thus it is more suitable for modelling physical processes. Second, the computational complexity of GP inference scales cubically with the number of audio samples. Therefore, the application of GP models to large audio signals becomes intractable. To overcome this limitation, variational inference is used to make the proposed model scalable and suitable for signals in the order of hundreds of thousands of data points. The integration of GP priors, kernels intended for audio, and variational inference could enable AMT and source separation time-domain methods to reconstruct sources and transcribe music in an efficient and informed manner. In addition, AMT and source separation are current challenges, because the spectra of the sources/pitches overlap with each other in intricate ways. Thus, the development of probabilistic models capable of differentiating sources/pitches in the time domain, despite the high similarity between their spectra, opens the possibility to take a step towards solving source separation and automatic music transcription. We demonstrate the utility of our methods using real and synthesized music audio datasets for various types of musical instruments

    Automatic machine learning:methods, systems, challenges

    Get PDF

    Systems for AutoML Research

    Get PDF

    Automatic machine learning:methods, systems, challenges

    Get PDF
    This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself

    Information technologies for pain management

    Get PDF
    Millions of people around the world suffer from pain, acute or chronic and this raises the importance of its screening, assessment and treatment. The importance of pain is attested by the fact that it is considered the fifth vital sign for indicating basic bodily functions, health and quality of life, together with the four other vital signs: blood pressure, body temperature, pulse rate and respiratory rate. However, while these four signals represent an objective physical parameter, the occurrence of pain expresses an emotional status that happens inside the mind of each individual and therefore, is highly subjective that makes difficult its management and evaluation. For this reason, the self-report of pain is considered the most accurate pain assessment method wherein patients should be asked to periodically rate their pain severity and related symptoms. Thus, in the last years computerised systems based on mobile and web technologies are becoming increasingly used to enable patients to report their pain which lead to the development of electronic pain diaries (ED). This approach may provide to health care professionals (HCP) and patients the ability to interact with the system anywhere and at anytime thoroughly changes the coordinates of time and place and offers invaluable opportunities to the healthcare delivery. However, most of these systems were designed to interact directly to patients without presence of a healthcare professional or without evidence of reliability and accuracy. In fact, the observation of the existing systems revealed lack of integration with mobile devices, limited use of web-based interfaces and reduced interaction with patients in terms of obtaining and viewing information. In addition, the reliability and accuracy of computerised systems for pain management are rarely proved or their effects on HCP and patients outcomes remain understudied. This thesis is focused on technology for pain management and aims to propose a monitoring system which includes ubiquitous interfaces specifically oriented to either patients or HCP using mobile devices and Internet so as to allow decisions based on the knowledge obtained from the analysis of the collected data. With the interoperability and cloud computing technologies in mind this system uses web services (WS) to manage data which are stored in a Personal Health Record (PHR). A Randomised Controlled Trial (RCT) was implemented so as to determine the effectiveness of the proposed computerised monitoring system. The six weeks RCT evidenced the advantages provided by the ubiquitous access to HCP and patients so as to they were able to interact with the system anywhere and at anytime using WS to send and receive data. In addition, the collected data were stored in a PHR which offers integrity and security as well as permanent on line accessibility to both patients and HCP. The study evidenced not only that the majority of participants recommend the system, but also that they recognize it suitability for pain management without the requirement of advanced skills or experienced users. Furthermore, the system enabled the definition and management of patient-oriented treatments with reduced therapist time. The study also revealed that the guidance of HCP at the beginning of the monitoring is crucial to patients' satisfaction and experience stemming from the usage of the system as evidenced by the high correlation between the recommendation of the application, and it suitability to improve pain management and to provide medical information. There were no significant differences regarding to improvements in the quality of pain treatment between intervention group and control group. Based on the data collected during the RCT a clinical decision support system (CDSS) was developed so as to offer capabilities of tailored alarms, reports, and clinical guidance. This CDSS, called Patient Oriented Method of Pain Evaluation System (POMPES), is based on the combination of several statistical models (one-way ANOVA, Kruskal-Wallis and Tukey-Kramer) with an imputation model based on linear regression. This system resulted in fully accuracy related to decisions suggested by the system compared with the medical diagnosis, and therefore, revealed it suitability to manage the pain. At last, based on the aerospace systems capability to deal with different complex data sources with varied complexities and accuracies, an innovative model was proposed. This model is characterized by a qualitative analysis stemming from the data fusion method combined with a quantitative model based on the comparison of the standard deviation together with the values of mathematical expectations. This model aimed to compare the effects of technological and pen-and-paper systems when applied to different dimension of pain, such as: pain intensity, anxiety, catastrophizing, depression, disability and interference. It was observed that pen-and-paper and technology produced equivalent effects in anxiety, depression, interference and pain intensity. On the contrary, technology evidenced favourable effects in terms of catastrophizing and disability. The proposed method revealed to be suitable, intelligible, easy to implement and low time and resources consuming. Further work is needed to evaluate the proposed system to follow up participants for longer periods of time which includes a complementary RCT encompassing patients with chronic pain symptoms. Finally, additional studies should be addressed to determine the economic effects not only to patients but also to the healthcare system

    Vol. 16, No. 1 (Full Issue)

    Get PDF
    corecore