112 research outputs found
Leveraging augmentation techniques for tasks with unbalancedness within the financial domain: a two-level ensemble approach
Modern financial markets produce massive datasets that need to be analysed using new modelling techniques like those from (deep) Machine Learning and Artificial Intelligence. The common goal of these techniques is to forecast the behaviour of the market, which can be translated into various classification tasks, such as, for instance, predicting the likelihood of companies’ bankruptcy or in fraud detection systems. However, it is often the case that real-world financial data are unbalanced, meaning that the classes’ distribution is not equally represented in such datasets. This gives the main issue since any Machine Learning model is trained according to the majority class mainly, leading to inaccurate predictions. In this paper, we explore different data augmentation techniques to deal with very unbalanced financial data. We consider a number of publicly available datasets, then apply state-of-the-art augmentation strategies to them, and finally evaluate the results for several Machine Learning models trained on the sampled data. The performance of the various approaches is evaluated according to their accuracy, micro, and macro F1 score, and finally by analyzing the precision and recall over the minority class. We show that a consistent and accurate improvement is achieved when data augmentation is employed. The obtained classification results look promising and indicate the efficiency of augmentation strategies on financial tasks. On the basis of these results, we present an approach focused on classification tasks within the financial domain that takes a dataset as input, identifies what kind of augmentation technique to use, and then applies an ensemble of all the augmentation techniques of the identified type to the input dataset along with an ensemble of different methods to tackle the underlying classification
NetFPGA Hardware Modules for Input, Output and EWMA Bit-Rate Computation
NetFPGA is a hardware board that it is becoming increasingly popular in various research
areas. It is a hardware customizable router and it can be used to study, implement and test
new protocols and techniques directly in hardware. It allows researchers to experience a
more real experiment environment. In this paper we present a work about the design and
development of four new modules built on top of the NetFPGA Reference Router design. In
particular, they compute the input and output bit rate run time and provide an estimation
of the input bit rate based on an EWMA filter. Moreover we extended the rate limiter
module which is embedded within the output queues in order to test our improved Reference
Router. Along the paper we explain in detail each module as far as the architecture and the
implementation are concerned. Furthermore, we created a testing environment which show
the effectiveness and effciency of our module
Performance comparison between the Click Modular Router and the NetFPGA
It is possible to forward minimum-sized packets at rates of hundreds of Mbps using commodity hardware and Linux. We had a preference for the Click Modular Router platform due its flexibility and the fact that it claimed to have equal or higher performance than native forwarding if used with its polling drivers. Moreover, the NetFPGA is an open networking platform accelerator that enables researchers and instructors to build working prototypes of high-speed, hardware-accelerated networking systems. NetFPGA reference designs comprised in the system include an IPv4 router, an Ethernet switch, a four-port NIC, and SCONE (Software Component of NetFPGA). Researchers have used the platform to build advanced network flow processing systems. We have followed the RFC1242 - Benchmarking Terminology for Network Interconnection Devices - and the RFC2544 - Benchmarking Methodology for Network Interconnection Devices - in order to define the specific set of tests to use to describe the performance characteristics of the two routers. We have also shown a test comparison between the NetFPGA and the Click router about a file transfer using the FTP and the HTTP protocol.Overall, the NetFPGA router performance outperforms the Click router performance
A blockchain-based distributed paradigm to secure localization services
In recent decades, modern societies are experiencing an increasing adoption of interconnected smart devices. This revolution involves not only canonical devices such as smartphones and tablets, but also simple objects like light bulbs. Named the Internet of Things (IoT), this ever-growing scenario offers enormous opportunities in many areas of modern society, especially if joined by other emerging technologies such as, for example, the blockchain. Indeed, the latter allows users to certify transactions publicly, without relying on central authorities or intermediaries. This work aims to exploit the scenario above by proposing a novel blockchain-based distributed paradigm to secure localization services, here named the Internet of Entities (IoE). It represents a mechanism for the reliable localization of people and things, and it exploits the increasing number of existing wireless devices and blockchain-based distributed ledger technologies. Moreover, unlike most of the canonical localization approaches, it is strongly oriented towards the protection of the users’ privacy. Finally, its implementation requires minimal efforts since it employs the existing infrastructures and devices, thus giving life to a new and wide data environment, exploitable in many domains, such as e-health, smart cities, and smart mobility
A P2P Platform for real-time multicast video streaming leveraging on scalable multiple descriptions to cope with bandwidth fluctuations
In the immediate future video distribution applications will increase their diffusion thanks tothe ever-increasing user capabilities and improvements in the Internet access speed and performance.The target of this paper is to propose a content delivery system for real-time streaming services based ona peer-to-peer approach that exploits multicast overlay organization of the peers to address thechallenges due to bandwidth heterogeneity. To improve reliability and flexibility, video is coded using ascalable multiple description approach that allows delivery of sub-streams over multiple trees andallows rate adaptation along the trees as the available bandwidth changes. Moreover, we have deployeda new algorithm for tree-based topology management of the overlay network. In fact, tree based overlaynetworks better perform in terms of end-to-end delay and ordered delivery of video flow packets withrespect to mesh based ones. We also show with a case study that the proposed system works better thansimilar systems using only either multicast or multiple trees
CulturAI: Semantic Enrichment of Cultural Data Leveraging Artificial Intelligence
In this paper, we propose an innovative tool able to enrich cultural and creative spots (gems, hereinafter) extracted from the European Commission Cultural Gems portal, by suggesting relevant keywords (tags) and YouTube videos (represented with proper thumbnails). On the one hand, the system queries the YouTube search portal, selects the videos most related to the given gem, and extracts a set of meaningful thumbnails for each video. On the other hand, each tag is selected by identifying semantically related popular search queries (i.e., trends). In particular, trends are retrieved by querying the Google Trends platform. A further novelty is that our system suggests contents in a dynamic way. Indeed, as for both YouTube and Google Trends platforms the results of a given query include the most popular videos/trends, such that a gem may constantly be updated with trendy content by periodically running the tool. The system has been tested on a set of gems and evaluated with the support of human annotators. The results highlighted the effectiveness of our proposal
Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting
In this manuscript, we propose a Machine Learning approach to tackle a binary classification problem whose goal is to predict the magnitude (high or low) of future stock price variations for individual companies of the SP 500 index. Sets of lexicons are generated from globally published articles with the goal of identifying the most impactful words on the market in a specific time interval and within a certain business sector. A feature engineering process is then performed out of the generated lexicons, and the obtained features are fed to a Decision Tree classifier. The predicted label (high or low) represents the underlying company's stock price variation on the next day, being either higher or lower than a certain threshold. The performance evaluation we have carried out through a walk-forward strategy, and against a set of solid baselines, shows that our approach clearly outperforms the competitors. Moreover, the devised Artificial Intelligence (AI) approach is explainable, in the sense that we analyze the white-box behind the classifier and provide a set of explanations on the obtained results
Ensembling and Dynamic Asset Selection for Risk-Controlled Statistical Arbitrage
In recent years, machine learning algorithms have been successfully employed to leverage the potential of identifying hidden patterns of financial market behavior and, consequently, have become a land of opportunities for financial applications such as algorithmic trading. In this paper, we propose a statistical arbitrage trading strategy with two key elements: an ensemble of regression algorithms for asset return prediction, followed by a dynamic asset selection. More specifically, we construct an extremely heterogeneous ensemble ensuring model diversity by using state-of-the-art machine learning algorithms, data diversity by using a feature selection process, and method diversity by using individual models for each asset, as well models that learn cross-sectional across multiple assets. Then, their predictive results are fed into a quality assurance mechanism that prunes assets with poor forecasting performance in the previous periods. We evaluate the approach on historical data of component stocks of the SP500 index. By performing an in-depth risk-return analysis, we show that this setup outperforms highly competitive trading strategies considered as baselines. Experimentally, we show that the dynamic asset selection enhances overall trading performance both in terms of return and risk. Moreover, the proposed approach proved to yield superior results during both financial turmoil and massive market growth periods, and it showed to have general application for any risk-balanced trading strategy aiming to exploit different asset classes
Data-Driven Methodology for Knowledge Graph Generation Within the Tourism Domain
The tourism and hospitality sectors have become increasingly important in the last few years and the companies operating in this field are constantly challenged with providing new innovative services. At the same time, (big-) data has become the 'new oil' of this century and Knowledge Graphs are emerging as the most natural way to collect, refine, and structure this heterogeneous information. In this paper, we present a methodology for semi-automatic generating a Tourism Knowledge Graph (TKG), which can be used for supporting a variety of intelligent services in this space, and a new ontology for modelling this domain, the Tourism Analytics Ontology (TAO). Our approach processes and integrates data from Booking.com, Airbnb, DBpedia, and GeoNames. Due to its modular structure, it can be easily extended to include new data sources or to apply new enrichment and refinement functions. We report a comprehensive evaluation of the functional, logical, and structural dimensions of TKG and TAO
Evaluation of Variability in the Sweet Orange Germplasm through Next Generation Clonal Fingerprinting
The great phenotypic variability characterizing the sweet orange [Citrus sinensis(L.) Osbeck] germplasm arises from spontaneous bud mutations, causing a diversification into major groups (common, Navel and blood oranges). A huge divergence also occurred within each varietal group. The genetic basis of such variability, also including nutritional and qualitative traits (ripening time, colour, fruit shape, acidity, sugars), is currently uncharacterized, and therefore not exploitable. With the aim of describing the somatic mutation events in the sweet orange group a deep-sequencing of 20 Italian and foreign accessions was performed by Illumina platform, allowing the identification of single nucleotide polymorphisms (SNPs), structural variants (SVs) and large deletions, specific to each varietal group or clone-specific. A subset of SNPs used for the design of two 384 SNP - GoldenGate Assays allowed to genotype 225 CREA sweet orange accessions. The developed markers represent the first reliable molecular tools able to unambiguously fingerprint each somatic mutant. Moreover, they might be used to associate mutations with phenotypic traits, and are a powerful tool for traceability. By using the GoldenGate assay, we have been able to fingerprint several blood orange clones starting from DNAs isolated from leaves or juice. These tools will potentially provide the consumer with a guarantee on the quality and origin of juices, avoiding eventual frauds
- …