14 research outputs found

    Statistical learning in complex and temporal data: distances, two-sample testing, clustering, classification and Big Data

    Get PDF
    Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Resumo] Esta tesis trata sobre aprendizaxe estatístico en obxetos complexos, con énfase en series temporais. O problema abórdase introducindo coñecemento sobre o dominio do fenómeno subxacente, mediante distancias e características. Proponse un contraste de dúas mostras basado en distancias e estúdase o seu funcionamento nun gran abanico de escenarios. As distancias para clasificación e clustering de series temporais acadan un incremento da potencia estatística cando se aplican a contrastes de dúas mostras. O noso test compárase de xeito favorable con outros métodos gracias á súa flexibilidade ante diferentes alternativas. Defínese unha nova distancia entre series temporais mediante un xeito innovador de comparar as distribucións retardadas das series. Esta distancia herda o bo funcionamento empírico doutros métodos pero elimina algunhas das súas limitacións. Proponse un método de predicción baseada en características das series. O método combina diferentes algoritmos estándar de predicción mediante unha suma ponderada. Os pesos desta suma veñen dun modelo que se axusta a un conxunto de entrenamento de gran tamaño. Propónse un método de clasificación distribuida, baseado en comparar, mediante unha distancia, as funcións de distribución empíricas do conxuto de proba común e as dos datos que recibe cada nodo de cómputo.[Resumen] Esta tesis trata sobre aprendizaje estadístico en objetos complejos, con énfasis en series temporales. El problema se aborda introduciendo conocimiento del dominio del fenómeno subyacente, mediante distancias y características. Se propone un test de dos muestras basado en distancias y se estudia su funcionamiento en un gran abanico de escenarios. La distancias para clasificación y clustering de series temporales consiguen un incremento de la potencia estadística cuando se aplican al tests de dos muestras. Nuestro test se compara favorablemente con otros métodos gracias a su flexibilidad antes diferentes alternativas. Se define una nueva distancia entre series temporales mediante una manera innovadora de comparar las distribuciones retardadas de la series. Esta distancia hereda el buen funcionamiento empírico de otros métodos pero elimina algunas de sus limitaciones. Se propone un método de predicción basado en características de las series. El método combina diferentes algoritmos estándar de predicción mediante una suma ponderada. Los pesos de esta suma salen de un modelo que se ajusta a un conjunto de entrenamiento de gran tamaño. Se propone un método de clasificación distribuida, basado en comparar, mediante una distancia, las funciones de distribución empírica del conjuto de prueba común y las de los datos que recibe cada nodo de cómputo.[Abstract] This thesis deals with the problem of statistical learning in complex objects, with emphasis on time series data. The problem is approached by facilitating the introduction of domain knoweldge of the underlying phenomena by means of distances and features. A distance-based two sample test is proposed, and its performance is studied under a wide range of scenarios. Distances for time series classification and clustering are also shown to increase statistical power when applied to two-sample testing. Our test compares favorably to other methods regarding its flexibility against different alternatives. A new distance for time series is defined by considering an innovative way of comparing lagged distributions of the series. This distance inherits the good empirical performance of existing methods while removing some of their limitations. A forecast method based on times series features is proposed. The method works by combining individual standard forecasting algorithms using a weighted average. These weights come from a learning model fitted on a large training set. A distributed classification algorithm is proposed, based on comparing, using a distance, the empirical distribution functions between the dataset that each computing node receives and the test set

    Augmented Judgments: The affordances of artificial intelligence in improving accuracy of new product launch decisions.

    Get PDF
    The uncertainty behind new product in market makes judging its success a complex endeavour. The extant literature does not accurately explain whether with the help of an artificial intelligence (AI) such uncertainty can be managed. In our paper, we aim at measuring to which extent new product success judgments improve when information provided by an artificial intelligence model is present. We conducted three pilot experiments to measure the effects of different amounts of information given by the artificial intelligence. In the first experiment, participants are presented with the AI’s predicted probability of success. In the second experiment, participants are presented with AI’s probability of success coupled with an explanation how the AI reached its predictions based on variables of the product. In the third experiment, we measured participants improvement on their own judgments after (not while!) being exposed to the information provided by AI. We use new wine products as a context for the experiments. Ground-truth for success is based on a large database of historical product launches. Participants were recruited via a panel, and exposed to new product launch scenarios in an online service. We found that the predicted judgments are significantly improved (p-value: 0.011) when AI information is provided. We also found that participants significantly improved (p-value:0.05) after receiving the AI stimulus. However, we did not find strong evidence that exposing participants to an explanation is better than exposing them to just a probability of success. With our pilot experiments, we also identified required samples sizes and modifications in the experimental design to increase statistical power. Our findings contribute with empirical evidence on the affordances of AI in improving new product success judgments, and the effect of applying novel AI explainability techniques in real-world users. Further, our findings pave the way for further experimentation in human-AI interaction for augmenting new product judgments

    Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

    No full text
    Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method

    European Covid-19 Forecast Hub

    No full text
    European Covid-19 Forecast Hub

    Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations

    No full text
    corecore