14 research outputs found
Statistical learning in complex and temporal data: distances, two-sample testing, clustering, classification and Big Data
Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Resumo]
Esta tesis trata sobre aprendizaxe estatístico en obxetos complexos, con énfase en
series temporais. O problema abórdase introducindo coñecemento sobre o dominio do
fenómeno subxacente, mediante distancias e características.
Proponse un contraste de dúas mostras basado en distancias e estúdase o seu
funcionamento nun gran abanico de escenarios. As distancias para clasificación e
clustering de series temporais acadan un incremento da potencia estatística cando se
aplican a contrastes de dúas mostras. O noso test compárase de xeito favorable con
outros métodos gracias á súa flexibilidade ante diferentes alternativas.
Defínese unha nova distancia entre series temporais mediante un xeito innovador
de comparar as distribucións retardadas das series. Esta distancia herda o bo funcionamento
empírico doutros métodos pero elimina algunhas das súas limitacións.
Proponse un método de predicción baseada en características das series. O método
combina diferentes algoritmos estándar de predicción mediante unha suma ponderada.
Os pesos desta suma veñen dun modelo que se axusta a un conxunto de entrenamento
de gran tamaño.
Propónse un método de clasificación distribuida, baseado en comparar, mediante
unha distancia, as funcións de distribución empíricas do conxuto de proba común e as
dos datos que recibe cada nodo de cómputo.[Resumen]
Esta tesis trata sobre aprendizaje estadístico en objetos complejos, con énfasis en
series temporales. El problema se aborda introduciendo conocimiento del dominio del
fenómeno subyacente, mediante distancias y características.
Se propone un test de dos muestras basado en distancias y se estudia su funcionamiento
en un gran abanico de escenarios. La distancias para clasificación y
clustering de series temporales consiguen un incremento de la potencia estadística
cuando se aplican al tests de dos muestras. Nuestro test se compara favorablemente
con otros métodos gracias a su flexibilidad antes diferentes alternativas.
Se define una nueva distancia entre series temporales mediante una manera innovadora
de comparar las distribuciones retardadas de la series. Esta distancia hereda el
buen funcionamiento empírico de otros métodos pero elimina algunas de sus limitaciones.
Se propone un método de predicción basado en características de las series. El
método combina diferentes algoritmos estándar de predicción mediante una suma
ponderada. Los pesos de esta suma salen de un modelo que se ajusta a un conjunto de
entrenamiento de gran tamaño.
Se propone un método de clasificación distribuida, basado en comparar, mediante
una distancia, las funciones de distribución empírica del conjuto de prueba común y
las de los datos que recibe cada nodo de cómputo.[Abstract]
This thesis deals with the problem of statistical learning in complex objects, with
emphasis on time series data. The problem is approached by facilitating the introduction
of domain knoweldge of the underlying phenomena by means of distances and features.
A distance-based two sample test is proposed, and its performance is studied under
a wide range of scenarios. Distances for time series classification and clustering are
also shown to increase statistical power when applied to two-sample testing. Our
test compares favorably to other methods regarding its flexibility against different
alternatives. A new distance for time series is defined by considering an innovative
way of comparing lagged distributions of the series. This distance inherits the good
empirical performance of existing methods while removing some of their limitations.
A forecast method based on times series features is proposed. The method works
by combining individual standard forecasting algorithms using a weighted average.
These weights come from a learning model fitted on a large training set. A distributed
classification algorithm is proposed, based on comparing, using a distance, the empirical
distribution functions between the dataset that each computing node receives and the
test set
Augmented Judgments: The affordances of artificial intelligence in improving accuracy of new product launch decisions.
The uncertainty behind new product in market makes judging its success a complex endeavour. The extant literature does not accurately explain whether with the help of an artificial intelligence (AI) such uncertainty can be managed. In our paper, we aim at measuring to which extent new product success judgments improve when information provided by an artificial intelligence model is present. We conducted three pilot experiments to measure the effects of different amounts of information given by the artificial intelligence. In the first experiment, participants are presented with the AI’s predicted probability of success. In the second experiment, participants are presented with AI’s probability of success coupled with an explanation how the AI reached its predictions based on variables of the product. In the third experiment, we measured participants improvement on their own judgments after (not while!) being exposed to the information provided by AI. We use new wine products as a context for the experiments. Ground-truth for success is based on a large database of historical product launches. Participants were recruited via a panel, and exposed to new product launch scenarios in an online service. We found that the predicted judgments are significantly improved (p-value: 0.011) when AI information is provided. We also found that participants significantly improved (p-value:0.05) after receiving the AI stimulus. However, we did not find strong evidence that exposing participants to an explanation is better than exposing them to just a probability of success. With our pilot experiments, we also identified required samples sizes and modifications in the experimental design to increase statistical power. Our findings contribute with empirical evidence on the affordances of AI in improving new product success judgments, and the effect of applying novel AI explainability techniques in real-world users. Further, our findings pave the way for further experimentation in human-AI interaction for augmenting new product judgments
Distributed classification based on distances between probability distributions in feature space
Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method