21 research outputs found
Learning causality for Arabic - proclitics
The use of prefixed particles is a prevalent linguistic form to express causation in Arabic Language. However, such particles are complicated and highly ambiguous as they imply different meanings according to their position in the text. This ambiguity emphasizes the high demand for a large-scale annotated corpus that contains instances of these particles. In this paper, we present the process of building our corpus, which includes a collection of annotated sentences each containing an instance of a candidate causal particle. We use the corpus to construct and optimize predictive models for the task of causation recognition. The performance of the best models is significantly better than the baselines. Arabic is a less-resourced language and we hope this work would help in building better Information Extraction systems
Hyperparameter fine tuning for a time series forecasting model
This project was conducted in the context of the Project-Based Learning program. The purpose of the program is to provide an experience in a real-life business and data analytics project. During the last 18 months a work collaboration have been carried out between four NOVA SBE Business Analytics master students and Brisa. The main objective of the project was to produce new traffic forecasting models in Python. The individual work carried out by the author of this study, was focused on the hyperparameter fine tuning procedure for the forecasting models. The research for different methodologies resulted in the experimentation of grid search and random search frameworks. As expected, grid search achieved better results but it is a process that requires more computational power and time
TSPO: an autoML approach to time series forecasting
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsTime series forecasting is an essential tool in many fields. In recent years, machine learning
has gained popularity as an appropriate tool for time series forecasting. When employing
machine learning algorithms, it is necessary to optimise a machine learning pipeline, which is a
tedious manual effort and requires time series analysis and machine learning expertise. AutoML
(automatic machine learning) is a sub-field of machine learning research that addresses this issue
by providing integrated systems that automatically find machine learning pipelines. However,
none of the available open-source tools is yet explicitly designed for time series forecasting.
The proposed system TSPO (Time Series Pipeline Optimisation) aims at providing an
autoML tool specifically designed to solve time series forecasting tasks to give non-experts the
capability to employ machine learning strategies for time series forecasting. The system utilises
a genetic algorithm to find an appropriate set of time series features, machine learning models
and a set of suitable hyper-parameters. The optimisation objective is defined as minimising the
obtained error, which is measured with a time series variant of k-fold cross-validation.
TSPO outperformed the official machine learning benchmarks of the M4-Competition in 9
out of 12 randomly selected time series. TSPO captured the characteristics of all analysed time
series consistently better compared to the benchmarks.
The results indicate that TSPO is capable of producing robust and accurate forecasts without
any human input.A previsão de séries temporais é uma importante ferramenta em muitas disciplinas. Nos últimos
anos, a aprendizagem automática ganhou popularidade como ferramenta apropriada para a
previsão de séries temporais. Ao utilizar algoritmos de aprendizagem automática, é necessário
otimizar pipelines de aprendizagem automática, que é um esforço manual, tedioso e que requer
experiência na área. O AutoML (aprendizagem automática automatizada) é um subcampo
de aprendizagem automática que aborda esse problema, fornecendo sistemas integrados que
encontram automaticamente pipelines de aprendizagem automática. No entanto, nenhuma
das ferramentas de código aberto disponÃveis é explicitamente destinada à previsão de séries
temporais.
O sistema proposto TSPO (Time Series Pipeline Optimisation) visa fornecer uma ferramenta
de aprendizagem automática projetada especificamente para resolver problemas de previsão de
séries temporais. Dando a não especialistas a capacidade de utilizar estratégias de aprendizagem
automática para previsão de séries temporais. O sistema utiliza um algoritmo genético para
encontrar um conjunto apropriado de pipelines de séries temporais, modelos de aprendizagem
automática e um conjunto de hiperparâmetros adequados. O objetivo da otimização é definido
como a minimização do erro obtido, medido com uma variante da validação cruzada k-fold
aplicada a séries temporais.
O TSPO superou os benchmarks oficiais de aprendizagem automática da competição M4
em 9 das 12 séries temporais aleatoriamente selecionadas. Além disso o TSPO capturou as
caracterÃsticas de todas as séries temporais analisadas melhor que os benchmarks. Os resultados
indicam que o TSPO é capaz de produzir previsões robustas e precisas sem qualquer contribuição
humana
Predictive analytics applied to firefighter response, a practical approach
Time is a crucial factor for the outcome of emergencies, especially those that involve human lives. This paper looks at Lisbon’s firefighter’s occurrences and presents a model,based on city characteristics and climacteric data, to predict whether there will be an occurrence at a certain location, according to the weather forecasts. In this study three algorithms were considered, Logistic Regression, Decision Tree and Random Forest.Measured by the AUC, the best performant modelwasa random forestwith random under-sampling at 0.68. This model was well adjusted across the city and showed that precipitation and size of the subsection are themost relevant featuresin predicting firefighter’s occurrences.The work presented here has clear implications on the firefighter’s decision-makingregarding vehicle allocation, as now they can make an informed decision considering the predicted occurrences
Otimização de hiperparâmetros em algoritmos de arvore de decisão utilizando computação evolutiva
Some algorithms in machine learning are parameterizable, they allow the configuration of parameters in order to increase the performance in some tasks. In most cases, these parameters are empirically found by the developer. Another approach is to use some optimization technique to find an optimized set of parameters. The aim of this project is the application of evolutionary algorithms, Genetic Algorithm (GA), Fluid Genetic Algorithm (FGA) and Genetic Algorithm using Theory of Chaos (GATC) to optimize the search for hyperparameters in decision tree algorithms. This work presents some satisfactory results within the data set tested, where the Classification and Regression Trees (CART) algorithm was used as a classifier algorithm for the tests. In these, the decision trees generated from the default values of the hyperparameters are compared with those optimized by the proposed approach. We has tried to optimize the accuracy and final size of the generated tree, which were successfully optimized by the proposed algorithms.Alguns algoritmos em aprendizado de máquina são parametrizáveis, ou seja, permitem a configuração de parâmetros de maneira a aumentar o desempenho na tarefa utilizada. Na maioria dos casos, estes parâmetros são encontrados empiricamente pelo desenvolvedor. Outra abordagem é utilizar alguma técnica de otimização para encontrar um conjunto otimizado de parâmetros. Este projeto tem por objetivo a aplicação dos algoritmos evolutivos, Algoritmo Genético (AG), Fluid Genetic Algorithm (FGA) e Genetic Algorithm using Theory of Chaos (GATC) para otimizar a busca de hiperparâmetros em algoritmos de ´arvores de decisão. Este trabalho apresenta alguns resultados satisfatórios dentro do conjunto de dados testados, onde o algoritmo Classification and. Regressivo Trees (CART) foi utilizado como algoritmo classificador para os testes. Nestes, as arvores de decisão geradas a partir dos valores padrão dos hiperparâmetros são comparados com os otimizados pela abordagem proposta. Buscou-se otimizar a acurácia e o tamanho final da ´arvore gerada, o que foram otimizadas com sucesso pelos algoritmos propostos
Recommended from our members
Informative Hyper-parameter Optimization and Selection
Hyper-parameter optimization methods allow efficient and robust hyperparameter search-ing without the need to hand-select each value and combination. Although hyper-parameter tuners, such as BOHB, Hyperopt, and SMAC have been investigated by researchers in terms of performance, there has yet to be an in-depth analysis of the values each tuner selected over alliterations. We propose a thorough aggregation of data in terms of the efficiency of the search values selected by each tuner over 59 datasets and ten popular ML algorithms from Scikit-learn. From this extensive data accumulated, we observe and advise which tuners show better results for particular datasets, through its meta-data, and algorithms. Through this research, we have also developed a simple plug-in for BOHB, Hyperopt, and SMAC into DARPA’s Data-driven discovery(D3M) Auto-ML systems for smooth implementation of various tuners. This is advantageous as the desired hyper-parameter tuner may change depending on the pipeline search method in anAuto-ML system, particularly when compared with Auto-ML systems that only utilize one search method. Our results show that for Auto-ML systems, the Hyperopt tuner will give more desirable results in a fewer amount of iterations due to the significant exploration component, and BOHB performs the best generally over a large number of datasets and algorithms owing to strategic budgeting
Machine learning on Crays to optimise petrophysical workflows in oil and gas exploration
Public education and outreach leads to a better informed public on Puget Sound and watershed issues. Using beach life and spawning salmon as a way to share knowledge and start the conservation conversation, the Beach Naturalist and Cedar River Salmon Journey programs have been educating Puget Sound residents for over 15 years. These programs benefit two audiences: the volunteers who serve in the program and the public who participate. Volunteers are provided in-depth information about Puget Sound life, watersheds, salmon and conservation strategies. These passionate volunteers translate this information and share it with the public they engage in the environments we hope to protect: at local beaches in the nearshore, the Chittenden Locks along salmonid migratory routes and at salmon spawning locations along the Cedar River. By providing opportunities for the public to learn more and create personal connections with the animals and habitat we share, we suggest choices people make in their daily lives that can help protect the watershed
Optimization of firefighter response with predictive analytics : practical application to Lisbon, Portugal
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceTime is a crucial factor for the outcome of emergencies, especially those that involve human lives.
This paper looks at Lisbon’s firefighter’s occurrences and presents a model, based on city
characteristics and climacteric data, to predict whether there will be an occurrence at a certain
location, according to the weather forecasts. In this study three algorithms were considered, Logistic
Regression, Decision Tree and Random Forest, as well as four techniques to balance the data –
random over-sampling, SMOTE, random under-sampling and Near Miss –, which were compared to
the baseline, the imbalanced data.
Measured by the AUC, the best performant model was a random forest with random under-sampling
at 0.68. This model was well adjusted across the city and showed that precipitation and size of the
subsection are the most relevant features in predicting firefighter’s occurrences.
The work presented here has clear implications on the firefighter’s decision-making regarding vehicle
allocation, as now they can make an informed decision considering the predicted occurrences