21 research outputs found

    Learning causality for Arabic - proclitics

    Get PDF
    The use of prefixed particles is a prevalent linguistic form to express causation in Arabic Language. However, such particles are complicated and highly ambiguous as they imply different meanings according to their position in the text. This ambiguity emphasizes the high demand for a large-scale annotated corpus that contains instances of these particles. In this paper, we present the process of building our corpus, which includes a collection of annotated sentences each containing an instance of a candidate causal particle. We use the corpus to construct and optimize predictive models for the task of causation recognition. The performance of the best models is significantly better than the baselines. Arabic is a less-resourced language and we hope this work would help in building better Information Extraction systems

    Hyperparameter fine tuning for a time series forecasting model

    Get PDF
    This project was conducted in the context of the Project-Based Learning program. The purpose of the program is to provide an experience in a real-life business and data analytics project. During the last 18 months a work collaboration have been carried out between four NOVA SBE Business Analytics master students and Brisa. The main objective of the project was to produce new traffic forecasting models in Python. The individual work carried out by the author of this study, was focused on the hyperparameter fine tuning procedure for the forecasting models. The research for different methodologies resulted in the experimentation of grid search and random search frameworks. As expected, grid search achieved better results but it is a process that requires more computational power and time

    TSPO: an autoML approach to time series forecasting

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsTime series forecasting is an essential tool in many fields. In recent years, machine learning has gained popularity as an appropriate tool for time series forecasting. When employing machine learning algorithms, it is necessary to optimise a machine learning pipeline, which is a tedious manual effort and requires time series analysis and machine learning expertise. AutoML (automatic machine learning) is a sub-field of machine learning research that addresses this issue by providing integrated systems that automatically find machine learning pipelines. However, none of the available open-source tools is yet explicitly designed for time series forecasting. The proposed system TSPO (Time Series Pipeline Optimisation) aims at providing an autoML tool specifically designed to solve time series forecasting tasks to give non-experts the capability to employ machine learning strategies for time series forecasting. The system utilises a genetic algorithm to find an appropriate set of time series features, machine learning models and a set of suitable hyper-parameters. The optimisation objective is defined as minimising the obtained error, which is measured with a time series variant of k-fold cross-validation. TSPO outperformed the official machine learning benchmarks of the M4-Competition in 9 out of 12 randomly selected time series. TSPO captured the characteristics of all analysed time series consistently better compared to the benchmarks. The results indicate that TSPO is capable of producing robust and accurate forecasts without any human input.A previsão de séries temporais é uma importante ferramenta em muitas disciplinas. Nos últimos anos, a aprendizagem automática ganhou popularidade como ferramenta apropriada para a previsão de séries temporais. Ao utilizar algoritmos de aprendizagem automática, é necessário otimizar pipelines de aprendizagem automática, que é um esforço manual, tedioso e que requer experiência na área. O AutoML (aprendizagem automática automatizada) é um subcampo de aprendizagem automática que aborda esse problema, fornecendo sistemas integrados que encontram automaticamente pipelines de aprendizagem automática. No entanto, nenhuma das ferramentas de código aberto disponíveis é explicitamente destinada à previsão de séries temporais. O sistema proposto TSPO (Time Series Pipeline Optimisation) visa fornecer uma ferramenta de aprendizagem automática projetada especificamente para resolver problemas de previsão de séries temporais. Dando a não especialistas a capacidade de utilizar estratégias de aprendizagem automática para previsão de séries temporais. O sistema utiliza um algoritmo genético para encontrar um conjunto apropriado de pipelines de séries temporais, modelos de aprendizagem automática e um conjunto de hiperparâmetros adequados. O objetivo da otimização é definido como a minimização do erro obtido, medido com uma variante da validação cruzada k-fold aplicada a séries temporais. O TSPO superou os benchmarks oficiais de aprendizagem automática da competição M4 em 9 das 12 séries temporais aleatoriamente selecionadas. Além disso o TSPO capturou as características de todas as séries temporais analisadas melhor que os benchmarks. Os resultados indicam que o TSPO é capaz de produzir previsões robustas e precisas sem qualquer contribuição humana

    Predictive analytics applied to firefighter response, a practical approach

    Get PDF
    Time is a crucial factor for the outcome of emergencies, especially those that involve human lives. This paper looks at Lisbon’s firefighter’s occurrences and presents a model,based on city characteristics and climacteric data, to predict whether there will be an occurrence at a certain location, according to the weather forecasts. In this study three algorithms were considered, Logistic Regression, Decision Tree and Random Forest.Measured by the AUC, the best performant modelwasa random forestwith random under-sampling at 0.68. This model was well adjusted across the city and showed that precipitation and size of the subsection are themost relevant featuresin predicting firefighter’s occurrences.The work presented here has clear implications on the firefighter’s decision-makingregarding vehicle allocation, as now they can make an informed decision considering the predicted occurrences

    Otimização de hiperparâmetros em algoritmos de arvore de decisão utilizando computação evolutiva

    Get PDF
    Some algorithms in machine learning are parameterizable, they allow the configuration of parameters in order to increase the performance in some tasks. In most cases, these parameters are empirically found by the developer. Another approach is to use some optimization technique to find an optimized set of parameters. The aim of this project is the application of evolutionary algorithms, Genetic Algorithm (GA), Fluid Genetic Algorithm (FGA) and Genetic Algorithm using Theory of Chaos (GATC) to optimize the search for hyperparameters in decision tree algorithms. This work presents some satisfactory results within the data set tested, where the Classification and Regression Trees (CART) algorithm was used as a classifier algorithm for the tests. In these, the decision trees generated from the default values of the hyperparameters are compared with those optimized by the proposed approach. We has tried to optimize the accuracy and final size of the generated tree, which were successfully optimized by the proposed algorithms.Alguns algoritmos em aprendizado de máquina são parametrizáveis, ou seja, permitem a configuração de parâmetros de maneira a aumentar o desempenho na tarefa utilizada. Na maioria dos casos, estes parâmetros são encontrados empiricamente pelo desenvolvedor. Outra abordagem é utilizar alguma técnica de otimização para encontrar um conjunto otimizado de parâmetros. Este projeto tem por objetivo a aplicação dos algoritmos evolutivos, Algoritmo Genético (AG), Fluid Genetic Algorithm (FGA) e Genetic Algorithm using Theory of Chaos (GATC) para otimizar a busca de hiperparâmetros em algoritmos de ´arvores de decisão. Este trabalho apresenta alguns resultados satisfatórios dentro do conjunto de dados testados, onde o algoritmo Classification and. Regressivo Trees (CART) foi utilizado como algoritmo classificador para os testes. Nestes, as arvores de decisão geradas a partir dos valores padrão dos hiperparâmetros são comparados com os otimizados pela abordagem proposta. Buscou-se otimizar a acurácia e o tamanho final da ´arvore gerada, o que foram otimizadas com sucesso pelos algoritmos propostos

    Machine learning on Crays to optimise petrophysical workflows in oil and gas exploration

    Get PDF
    Public education and outreach leads to a better informed public on Puget Sound and watershed issues. Using beach life and spawning salmon as a way to share knowledge and start the conservation conversation, the Beach Naturalist and Cedar River Salmon Journey programs have been educating Puget Sound residents for over 15 years. These programs benefit two audiences: the volunteers who serve in the program and the public who participate. Volunteers are provided in-depth information about Puget Sound life, watersheds, salmon and conservation strategies. These passionate volunteers translate this information and share it with the public they engage in the environments we hope to protect: at local beaches in the nearshore, the Chittenden Locks along salmonid migratory routes and at salmon spawning locations along the Cedar River. By providing opportunities for the public to learn more and create personal connections with the animals and habitat we share, we suggest choices people make in their daily lives that can help protect the watershed

    Optimization of firefighter response with predictive analytics : practical application to Lisbon, Portugal

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceTime is a crucial factor for the outcome of emergencies, especially those that involve human lives. This paper looks at Lisbon’s firefighter’s occurrences and presents a model, based on city characteristics and climacteric data, to predict whether there will be an occurrence at a certain location, according to the weather forecasts. In this study three algorithms were considered, Logistic Regression, Decision Tree and Random Forest, as well as four techniques to balance the data – random over-sampling, SMOTE, random under-sampling and Near Miss –, which were compared to the baseline, the imbalanced data. Measured by the AUC, the best performant model was a random forest with random under-sampling at 0.68. This model was well adjusted across the city and showed that precipitation and size of the subsection are the most relevant features in predicting firefighter’s occurrences. The work presented here has clear implications on the firefighter’s decision-making regarding vehicle allocation, as now they can make an informed decision considering the predicted occurrences
    corecore