150 research outputs found

    PredictChain: Empowering Collaboration and Data Accessibility for AI in a Decentralized Blockchain-based Marketplace

    Full text link
    Limited access to computing resources and training data poses significant challenges for individuals and groups aiming to train and utilize predictive machine learning models. Although numerous publicly available machine learning models exist, they are often unhosted, necessitating end-users to establish their computational infrastructure. Alternatively, these models may only be accessible through paid cloud-based mechanisms, which can prove costly for general public utilization. Moreover, model and data providers require a more streamlined approach to track resource usage and capitalize on subsequent usage by others, both financially and otherwise. An effective mechanism is also lacking to contribute high-quality data for improving model performance. We propose a blockchain-based marketplace called "PredictChain" for predictive machine-learning models to address these issues. This marketplace enables users to upload datasets for training predictive machine learning models, request model training on previously uploaded datasets, or submit queries to trained models. Nodes within the blockchain network, equipped with available computing resources, will operate these models, offering a range of archetype machine learning models with varying characteristics, such as cost, speed, simplicity, power, and cost-effectiveness. This decentralized approach empowers users to develop improved models accessible to the public, promotes data sharing, and reduces reliance on centralized cloud providers

    Datan esikäsittely- ja visualisointityökalu

    Get PDF
    Tiivistelmä. Koneoppimismenetelmien tehokkaaksi hyödyntämiseksi on tärkeää, että käyttäjällä on tietoa datajoukosta ja sen rakenteista. Tämän takia tavallinen ensimmäinen askel uuden datan tapauksessa on turvautua erilaisiin visualisointimenetelmiin. Visualisoinnin tarkoituksena on löytää samankaltaisuussuhteita datasta ja muodostaa alustavaa käsitystä sen rakenteista. Kiinnostavaa on esimerkiksi tietää datan jakautumisesta erilaisiin ryhmiin. Ennen visualisointia moniulotteinen data on kuitenkin saatava pudotettua kahteen tai kolmeen ulottuvuuteen, jotta ihminen voisi tehdä siitä havaintoja. Tähän vastataan dimensionaalisuuden vähentämismenetelmillä. Dimensionaalisuuden vähentämisellä on visualisoinnin lisäksi roolinsa koneoppimisessa myös piirteiden tehostamisessa. Dimensionaalisuuden aiheuttamien ongelmien lisäksi useimmat koneoppimismenetelmät vaativat datan skaalausta tai normalisointia ennen niiden käyttöä. Skaalaus tai normalisointi on yleisesti tärkeää, sillä useassa tapauksessa datan piirteiden arvoalueet poikkeavat toisistaan huomattavasti. Tässä kandidaatintyössä on perehdytty datan skaalauksiin ja normalisointeihin, sekä dimensionaalisuuden vähentämiseen erilaisilla menetelmillä. Lisäksi on tutkittu erilaisten data-aineistojen rakennetta käsittelemällä niitä edellä mainituin keinoin. Työn tarkoituksena on valaista data-aineistoon perehtymisen tärkeyttä ja esitellä menetelmiä, joilla koneoppimisen tuloksia voidaan parantaa. Työssä on kehitetty Python-kielinen työkalu, jonka avulla datan käsittely ja visualisointi onnistuu helposti graafisen käyttöliittymän avulla. Se on ensisijaisesti tarkoitettu opetustarkoituksiin.Data preprocessing and visualization tool. Abstract. For effective utilization of machine learning methods, it is important that the user has information about the dataset and its structures. Therefore, it is common to use visualization as the first step when dealing with new data. The purpose of visualization is to find similarities in data and to get insight about its structures. For example, it is interesting to find out whether the data is clustered. Before visualization, the data needs to be transformed into two or three dimensions so that humans can make observations from it. This is the step where dimensionality reduction is used. In addition, dimensionality reduction plays role in machine learning when features are required to be more efficient. On top of the problems caused by dimensionality, many machine learning methods require input data to be somehow scaled or normalized. Scaling or normalization is important because it is common that features in datasets are in different scales and distributions. This bachelor’s thesis introduces and experiments with different methods for data scaling, normalization and dimensionality reduction. Various real-life datasets and their structures are explored with these methods. The purpose of this is to underline the importance of gaining familiarity with new datasets and to introduce some common methods that can be used to improve results of machine learning methods. The concrete contribution of this thesis is a data analysis tool developed using Python programming language. The tool is primarily intended for educational purposes and it makes data handling and visualization easier with the use of a graphical user interface

    Realizing EDGAR: eliminating information asymmetries through artificial intelligence analysis of SEC filings

    Get PDF
    The U.S. Securities and Exchange Commission (SEC) maintains a publicly-accessible database of all required filings of all publicly traded companies. Known as EDGAR (Electronic Data Gathering, Analysis, and Retrieval), this database contains documents ranging from annual reports of major companies to personal disclosures of senior managers. However, the common user and particularly the retail investor are overwhelmed by the deluge of information, not empowered. EDGAR as it currently functions entrenches the information asymmetry between these retail investors and the large financial institutions with which they often trade. With substantial research staffs and budgets coupled to an industry standard of “playing both sides” of a transaction, these investors “in the know” lead price fluctuations while others must follow. In general, this thesis applies recent technological advancements to the development of software tools that will derive valuable insights from EDGAR documents in an efficient time period. While numerous such commercial products currently exist, all come with significant price tags and many still rely on significant human involvement in deriving such insights. Recent years, however, have seen an explosion in the fields of Machine Learning (ML) and Natural Language Processing (NLP), which show promise in automating many of these functions with greater efficiency. ML aims to develop software which learns parameters from large datasets as opposed to traditional software which merely applies a programmer’s logic. NLP aims to read, understand, and generate language naturally, an area where recent ML advancements have proven particularly adept. Specifically, this thesis serves as an exploratory study in applying recent advancements in ML and NLP to the vast range of documents contained in the EDGAR database. While algorithms will likely never replace the hordes of research analysts that now saturate securities markets nor the advantages that accrue to large and diverse trading desks, they do hold the potential to provide small yet significant insights at little cost. This study first examines methods for document acquisition from EDGAR with a focus on a baseline efficiency sufficient for the real-time trading needs of market participants. Next, it applies recent advancements in ML and NLP, specifically recurrent neural networks, to the task of standardizing financial statements across different filers. Finally, the conclusion contextualizes these findings in an environment of continued technological and commercial evolution

    Three Risky Decades: A Time for Econophysics?

    Get PDF
    Our Special Issue we publish at a turning point, which we have not dealt with since World War II. The interconnected long-term global shocks such as the coronavirus pandemic, the war in Ukraine, and catastrophic climate change have imposed significant humanitary, socio-economic, political, and environmental restrictions on the globalization process and all aspects of economic and social life including the existence of individual people. The planet is trapped—the current situation seems to be the prelude to an apocalypse whose long-term effects we will have for decades. Therefore, it urgently requires a concept of the planet's survival to be built—only on this basis can the conditions for its development be created. The Special Issue gives evidence of the state of econophysics before the current situation. Therefore, it can provide excellent econophysics or an inter-and cross-disciplinary starting point of a rational approach to a new era

    Essays in High Frequency Trading and Market Structure

    Get PDF
    High Frequency Trading (HFT) is the use of algorithmic trading technology to gain a speed advantage when operating in financial markets. The increasing gap between the fastest and the slowest players in financial markets raises questions around the efficiency of markets, the strategies players must use to trade effectively and the overall fairness of markets which regulators must maintain. This research explores markets affected by HFT activity from three perspectives. Firstly an updated microstructure model is proposed to allow for empirical exploration of current levels of noise in financial markets, this illustrates current noise levels are not disruptive to dominant trading strategies. Second, a ARCH type model is used to de-compose market data into a series of traders working price levels to demonstrate that in cases of suspected market abuse, regulators can assess the impact individual traders make on price even in fast markets. Finally, a review of various HFT control measures are examined in terms of effectiveness and in light of an ordoliberal benchmark of fairness. The work illustrates the extents to which HFT activity is not yet disruptive, but also shows where HFT can be a conduit for market abuse and provides a series of recommendations around use of circuit breakers, algorithmic governance standards and additional considerations where assets are dual listed in different countries

    2016 Oklahoma Research Day Full Program

    Get PDF
    This document contains all abstracts from the 2016 Oklahoma Research Day held at Northeastern State University
    corecore