114 research outputs found

    Leveraging Return Prediction Approaches for Improved Value-at-Risk Estimation

    Get PDF
    Value at risk is a statistic used to anticipate the largest possible losses over a specific time frame and within some level of confidence, usually 95% or 99%. For risk management and regulators, it offers a solution for trustworthy quantitative risk management tools. VaR has become the most widely used and accepted indicator of downside risk. Today, commercial banks and financial institutions utilize it as a tool to estimate the size and probability of upcoming losses in portfolios and, as a result, to estimate and manage the degree of risk exposure. The goal is to obtain the average number of VaR “failures” or “breaches” (losses that are more than the VaR) as near to the target rate as possible. It is also desired that the losses be evenly distributed as possible. VaR can be modeled in a variety of ways. The simplest method is to estimate volatility based on prior returns according to the assumption that volatility is constant. Otherwise, the volatility process can be modeled using the GARCH model. Machine learning techniques have been used in recent years to carry out stock market forecasts based on historical time series. A machine learning system is often trained on an in-sample dataset, where it can adjust and improve specific hyperparameters in accordance with the underlying metric. The trained model is tested on an out-of-sample dataset. We compared the baselines for the VaR estimation of a day (d) according to different metrics (i) to their respective variants that included stock return forecast information of d and stock return data of the days before d and (ii) to a GARCH model that included return prediction information of d and stock return data of the days before d. Various strategies such as ARIMA and a proposed ensemble of regressors have been employed to predict stock returns. We observed that the versions of the univariate techniques and GARCH integrated with return predictions outperformed the baselines in four different marketplaces

    A survey on tidal analysis and forecasting methods for Tsunami detection

    Get PDF
    Accurate analysis and forecasting of tidal level are very important tasks for human activities in oceanic and coastal areas. They can be crucial in catastrophic situations like occurrences of Tsunamis in order to provide a rapid alerting to the human population involved and to save lives. Conventional tidal forecasting methods are based on harmonic analysis using the least squares method to determine harmonic parameters. However, a large number of parameters and long-term measured data are required for precise tidal level predictions with harmonic analysis. Furthermore, traditional harmonic methods rely on models based on the analysis of astronomical components and they can be inadequate when the contribution of non-astronomical components, such as the weather, is significant. Other alternative approaches have been developed in the literature in order to deal with these situations and provide predictions with the desired accuracy, with respect also to the length of the available tidal record. These methods include standard high or band pass filtering techniques, although the relatively deterministic character and large amplitude of tidal signals make special techniques, like artificial neural networks and wavelets transform analysis methods, more effective. This paper is intended to provide the communities of both researchers and practitioners with a broadly applicable, up to date coverage of tidal analysis and forecasting methodologies that have proven to be successful in a variety of circumstances, and that hold particular promise for success in the future. Classical and novel methods are reviewed in a systematic and consistent way, outlining their main concepts and components, similarities and differences, advantages and disadvantages

    Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes

    Get PDF
    The past decade has seen an explosion of the amount of digital information generated within the healthcare domain. Digital data exist in the form of images, video, speech, transcripts, electronic health records, clinical records, and free-text. Analysis and interpretation of healthcare data is a daunting task, and it demands a great deal of time, resources, and human effort. In this paper, we focus on the problem of co-morbidity recognition from patient’s clinical records. To this aim, we employ both classical machine learning and deep learning approaches.We use word embeddings and bag-of-words representations, coupled with feature selection techniques. The goal of our work is to develop a classification system to identify whether a certain health condition occurs for a patient by studying his/her past clinical records. In more detail, we have used pre-trained word2vec, domain-trained, GloVe, fastText, and universal sentence encoder embeddings to tackle the classification of sixteen morbidity conditions within clinical records. We have compared the outcomes of classical machine learning and deep learning approaches with the employed feature representation methods and feature selection methods. We present a comprehensive discussion of the performances and behaviour of the employed classical machine learning and deep learning approaches. Finally, we have also used ensemble learning techniques over a large number of combinations of classifiers to improve the single model performance. For our experiments, we used the n2c2 natural language processing research dataset, released by Harvard Medical School. The dataset is in the form of clinical notes that contain patient discharge summaries. Given the unbalancedness of the data and their small size, the experimental results indicate the advantage of the ensemble learning technique with respect to single classifier models. In particular, the ensemble learning technique has slightly improved the performances of single classification models but has greatly reduced the variance of predictions stabilizing the accuracies (i.e., the lower standard deviation in comparison with single classifiers). In real-life scenarios, our work can be employed to identify with high accuracy morbidity conditions of patients by feeding our tool with their current clinical notes. Moreover, other domains where classification is a common problem might benefit from our approach as well

    Towards the definition of a European Digital Building Logbook: A survey

    Get PDF
    Both the operational phase and embodied emissions that are introduced during the construction phase through the manufacture, sourcing, and installation of the building's materials and components are significant contributors to carbon emissions from the built environment. It is essential to change the current design and (re)construction processes in order to achieve the energy-saving targets for the EU building stock and move toward a society that is net carbon neutral. This change must be made from both a technical perspective as well as from a methodological perspective. To accomplish this, the EU has suggested several regulations and legislative steps to phase out inefficient structures. The most recent of these initiatives propose the idea of a Digital Building Logbook, which serves as a central repository for all pertinent building data, including information on energy efficiency. In this work, we present a survey of the elements that have been taken into consideration for the creation of the Digital Building Logbook to give an overview of what research has been done so far

    Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach

    Get PDF
    Modern financial markets produce massive datasets that need to be analysed using new modelling techniques like those from (deep) Machine Learning and Artificial Intelligence. The common goal of these techniques is to forecast the behaviour of the market, which can be translated into various classification tasks, such as, for instance, predicting the likelihood of companies’ bankruptcy or in fraud detection systems. However, it is often the case that real-world financial data are unbalanced, meaning that the classes’ distribution is not equally represented in such datasets. This gives the main issue since any Machine Learning model is trained according to the majority class mainly, leading to inaccurate predictions. In this paper, we explore different data augmentation techniques to deal with very unbalanced financial data. We consider a number of publicly available datasets, then apply state-of-the-art augmentation strategies to them, and finally evaluate the results for several Machine Learning models trained on the sampled data. The performance of the various approaches is evaluated according to their accuracy, micro, and macro F1 score, and finally by analyzing the precision and recall over the minority class. We show that a consistent and accurate improvement is achieved when data augmentation is employed. The obtained classification results look promising and indicate the efficiency of augmentation strategies on financial tasks. On the basis of these results, we present an approach focused on classification tasks within the financial domain that takes a dataset as input, identifies what kind of augmentation technique to use, and then applies an ensemble of all the augmentation techniques of the identified type to the input dataset along with an ensemble of different methods to tackle the underlying classification

    TF-IDF vs word embeddings for morbidity identification in clinical notes: An initial study

    Get PDF
    Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperform the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches

    A transformers-based approach for fine and coarse-grained classification and generation of MIDI songs and soundtracks

    Get PDF
    Music is an extremely subjective art form whose commodification via the recording industry in the 20th century has led to an increasingly subdivided set of genre labels that attempt to organize musical styles into definite categories. Music psychology has been studying the processes through which music is perceived, created, responded to, and incorporated into everyday life, and, modern artificial intelligence technology can be exploited in such a direction. Music classification and generation are emerging fields that gained much attention recently, especially with the latest discoveries within deep learning technologies. Self attention networks have in fact brought huge benefits for several tasks of classification and generation in different domains where data of different types were used (text, images, videos, sounds). In this article, we want to analyze the effectiveness of Transformers for both classification and generation tasks and study the performances of classification at different granularity and of generation using different human and automatic metrics. The input data consist of MIDI sounds that we have considered from different datasets: sounds from 397 Nintendo Entertainment System video games, classical pieces, and rock songs from different composers and bands. We have performed classification tasks within each dataset to identify the types or composers of each sample (fine-grained) and classification at a higher level. In the latter, we combined the three datasets together with the goal of identifying for each sample just NES, rock, or classical (coarse-grained) pieces. The proposed transformers-based approach outperformed competitors based on deep learning and machine learning approaches. Finally, the generation task has been carried out on each dataset and the resulting samples have been evaluated using human and automatic metrics (the local alignment)

    Citation prediction by leveraging transformers and natural language processing heuristics

    Get PDF
    In scientific papers, it is common practice to cite other articles to substantiate claims, provide evidence for factual assertions, reference limitations, and research gaps, and fulfill various other purposes. When authors include a citation in a given sentence, there are two considerations they need to take into account: (i) where in the sentence to place the citation and (ii) which citation to choose to support the underlying claim. In this paper, we focus on the first task as it allows multiple potential approaches that rely on the researcher's individual style and the specific norms and conventions of the relevant scientific community. We propose two automatic methodologies that leverage transformers architecture for either solving a Mask-Filling problem or a Named Entity Recognition problem. On top of the results of the proposed methodologies, we apply ad-hoc Natural Language Processing heuristics to further improve their outcome. We also introduce s2orc-9K, an open dataset for fine-tuning models on this task. A formal evaluation demonstrates that the generative approach significantly outperforms five alternative methods when fine-tuned on the novel dataset. Furthermore, this model's results show no statistically significant deviation from the outputs of three senior researchers

    A blockchain-based distributed paradigm to secure localization services

    Get PDF
    In recent decades, modern societies are experiencing an increasing adoption of interconnected smart devices. This revolution involves not only canonical devices such as smartphones and tablets, but also simple objects like light bulbs. Named the Internet of Things (IoT), this ever-growing scenario offers enormous opportunities in many areas of modern society, especially if joined by other emerging technologies such as, for example, the blockchain. Indeed, the latter allows users to certify transactions publicly, without relying on central authorities or intermediaries. This work aims to exploit the scenario above by proposing a novel blockchain-based distributed paradigm to secure localization services, here named the Internet of Entities (IoE). It represents a mechanism for the reliable localization of people and things, and it exploits the increasing number of existing wireless devices and blockchain-based distributed ledger technologies. Moreover, unlike most of the canonical localization approaches, it is strongly oriented towards the protection of the users’ privacy. Finally, its implementation requires minimal efforts since it employs the existing infrastructures and devices, thus giving life to a new and wide data environment, exploitable in many domains, such as e-health, smart cities, and smart mobility

    Creation, Analysis and Evaluation of AnnoMI, a Dataset of Expert-Annotated Counselling Dialogues †

    Get PDF
    Research on the analysis of counselling conversations through natural language processing methods has seen remarkable growth in recent years. However, the potential of this field is still greatly limited by the lack of access to publicly available therapy dialogues, especially those with expert annotations, but it has been alleviated thanks to the recent release of AnnoMI, the first publicly and freely available conversation dataset of 133 faithfully transcribed and expert-annotated demonstrations of high- and low-quality motivational interviewing (MI)-an effective therapy strategy that evokes client motivation for positive change. In this work, we introduce new expert-annotated utterance attributes to AnnoMI and describe the entire data collection process in more detail, including dialogue source selection, transcription, annotation, and post-processing. Based on the expert annotations on key MI aspects, we carry out thorough analyses of AnnoMI with respect to counselling-related properties on the utterance, conversation, and corpus levels. Furthermore, we introduce utterance-level prediction tasks with potential real-world impacts and build baseline models. Finally, we examine the performance of the models on dialogues of different topics and probe the generalisability of the models to unseen topics
    • …
    corecore