5 research outputs found

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Short-term load forecasting in times of unprecedented price movements

    Get PDF
    In this thesis we aimed to find the best methods for short-term load forecasting in the Norwegian electricity market during times of unprecedented price movements. We answered three questions related to this aim. The first was which model achieved the most accurate forecast. The second was whether our proposed models outperform the official forecasts published on the Entso-E platform. The third question asked was if the price movements had any effect on the accuracy of the load forecast. We constructed two SARIMAX models, a Gradient boosted decision tree, a Random Forest, and a Multilayer perceptron model. Our findings show the two SARIMAX models to be most accurate. These models outperformed the forecasts published on the Entso-E platform in four out of the five Norwegian bidding zones, measured in MAPE and RMSE. Finally, we have shown that forecasting load with and without price information did not result in significant differences in accuracy. Our findings did not indicate an increase in difficulty of forecasting 2021 compared to 2019, neither for the three southern bidding zones with higher price increase nor the northern two zones.I denne masteroppgaven har vi forsøkt å finne den beste metoden for kortsiktig prognostisering av elektrisitets-etterspørsel i perioder med ekstreme prisbevegelser. Vi har besvart tre spørsmål knyttet til denne problemstillingen. Det første var hvilken modell som oppnår høyest nøyaktighet. Det andre var om våre modeller presterer bedre enn de publiserte prognosene på Entso-Es offentlig tilgjengelige data-plattform. Det tredje spørsmålet var om de ekstreme prisbevegelsene hadde noen effekt på nøyaktigheten av prognosene. Vi har laget to SARIMAX modeller, en Gradient boosting decision tree-, en Random Forest og en Multilayer perceptron-modell. Gjennom arbeidet har vi vist at de to SARIMAX-modellene presterer best. Disse modellene er mer nøyaktig enn prognosene publisert på Entso-Es plattform for fire av de fem norske strømregionene, målt i MAPE og RMSE. Til slutt har vi vist at prognoser gjort både med og uten prisinformasjon ikke gir signifikante forskjeller i nøyaktighet. Det ble heller ikke påvist en klar forskjell i vanskelighetsgraden av å prognostisere 2021 sammenlignet med 2019, verken for de sørlige prissonene med høy prisvekst eller de nordlige sonene med en lavere prisvekst.M-Ø

    Three essays on cartel agreements

    Get PDF
    In this thesis, we propose three essays on cartel agreements. Assuming bounded rationality, the first chapter analyses the strategic interaction between firms from an evolutionary game perspective. We introduce new elements to discuss the mechanisms that sustain collusive agreements. In the second chapter, following the economic reasoning of crime, we propose a game-theoretical model to evaluate the stability of illegal cartels. Under this approach, punishment is also illegal. Thus, we offer new insights to antitrust authorities in inhibiting cartels as criminal organizations. Finally, the third chapter dialogues with the previous chapters through an empirical assessment of gasoline cartels in Brazil. To reach our purposes, we combine machine learning techniques with screens based on the statistical moments of the gasoline retail price distribution to correctly classify cartel behavior.Esta tese propõe três ensaios sobre acordos de cartel. Assumindo a racionalidade limitada, o primeiro capítulo analisa a interação estratégica entre empresas a partir de uma perspectiva de jogos evolucionários. Nesse sentido, introduz-se novos elementos para capturar e discutir os mecanismos que garantem a estabilidade dos acordos colusivos. No segundo capítulo, usando as motivações econômicas do crime, desenvolve-se um modelo teórico para avaliar a estabilidade dos cartéis ilegais. Portanto, como o cartel age ilegalmente, a punição também se dá nesse âmbito. Assim, apresentam-se novos insights para as autoridades antitruste na detecção e inibição de cartéis enquanto organizações criminosas. Por fim, o terceiro capítulo dialoga com os capítulos anteriores por meio de uma avaliação empírica da formação de cartéis no mercado varejista de gasolina nas seguintes cidades: Belo Horizonte, Brasília, Caxias do Sul e São Luís. Combinam-se técnicas de aprendizagem de máquina com filtros baseados nos momentos estatísticos da distribuição de preços de varejo da gasolina para classificar o comportamento do cartel.PROQUALI (UFJF

    Model based approaches to characterize heterogeneity in gene regulation across cells and disease types

    Get PDF
    Access to large genome-wide biological datasets has now enabled computational researchers to tackle long-standing questions in Biomedicine through the lens of Machine Learning (ML) and Artificial Intelligence (AI). The potential benefits of such computational approaches to biological research are immense. For example, efficient, and yet interpretable, machine learning models of disease/drug response/phenotype can impact our life at both personal and social levels. However, heterogeneity is found at multiple scales in biology, manifested as the context-specificity of biological processes. This context-specific heterogeneity poses a major challenge to ML models. Even though context-specific models are often trained, this is mostly done without the benefit of mechanistic insights about the biological processes being modeled, and as such do not help improve our biological understanding. This dissertation addresses these challenges and their limitations by: a) designing appropriate features and ML models motivated by the current biological hypothesis at hand, b) building pipelines to analyze multiple context-specific models together, and c) developing data integration and imputation methods to address the problems of insufficient and missing data. The first project studies loss of methylation or hypo-methylation in large blocks causing aberrant gene activity, a well-known phenomenon in cancer. To find the associated markers, I designed a classification model of hypo-methylated block boundaries and non-boundaries in colon cancer. The second project models binding of transcription factor (TF) to specific DNA element to the genome, one of the principal components of gene regulation. Since condition specificity of TF binding is not yet well understood, this dissertation examines a design of cell type-specific models for transcription factor (TF) binding using ChIPSeq data. A meta-analysis pipeline, called TRISECT, is applied for multiple TF binding models to understand heterogeneity of cell specificity across those models. Next, models for breast cancer metastasis using gene expression data are discussed. In breast cancer metastasis, the affinity towards distant tissues called secondary tissues has not been comprehended. Therefore, going beyond mere discriminatory models, I propose another meta-analysis pipeline, MONTAGE intending to understand the organotropism of breast cancer metastasis across secondary tissues. Building ML models can be hindered by the data size, specially, for rare diseases. Therefore, by necessity, molecular data have been merged across multiple studies, and across multiple technical platforms which has vulnerability of so called batch effects diluting the actual biological signal. Existing methods are not capable of removing multi-variate confounding artifacts leading to inaccurate models. To circumvent this issue, this dissertation examines a deep learning based technique (deepSavior) which ‘translates’ the gene expression profile from samples of one technical platform to another platform. To summarize, this dissertation makes three distinct contributions, a) designing effective ML model to explore the determinants of cancer-associated hypomethlation, b) designing meta-analysis pipelines to compare multiple related but context-specific ML models to understand heterogeneous relations among biological processes, and b) developing new method to overcome the data integration and imputation challenges

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p