5 research outputs found

    Event Discovery and Classification in Space-Time Series: A Case Study for Storms

    Get PDF
    Recent advancement in sensor technology has enabled the deployment of wireless sensors for surveillance and monitoring of phenomenon in diverse domains such as environment and health. Data generated by these sensors are typically high-dimensional and therefore difficult to analyze and comprehend. Additionally, high level phenomenon that humans commonly recognize, such as storms, fire, traffic jams are often complex and multivariate which individual univariate sensors are incapable of detecting. This thesis describes the Event Oriented approach, which addresses these challenges by providing a way to reduce dimensionality of space-time series and a way to integrate multivariate data over space and/or time for the purpose of detecting and exploring high level events. The proposed Event Oriented approach is implemented using space-time series data from the Gulf of Maine Ocean Observation System (GOMOOS). GOMOOS is a long standing network of wireless sensors in the Gulf of Maine monitoring the high energy ocean environment. As a case study, high level storm events are detected and classified using the Event Oriented approach. A domain-independent ontology for detecting high level xvi composite events called a General Composite Event Ontology is presented and used as a basis of the Storm Event Ontology. Primitive events are detected from univariate sensors and assembled into Composite Storm Events using the Storm Event Ontology. To evaluate the effectiveness of the Event Oriented approach, the resulting candidate storm events are compared with an independent historic Storm Events Database from the National Climatic Data Center (NCDC) indicating that the Event Oriented approach detected about 92% of the storms recorded by the NCDC. The Event Oriented approach facilitates classification of high level composite event. In the case study, candidate storms were classified based on their spatial progression and profile. Since ontological knowledge is used for constructing high level event ontology, detection of candidate high level events could help refine existing ontological knowledge about them. In summary, this thesis demonstrates the Event Oriented approach to reduce dimensionality in complex space-time series sensor data and the facility to integrate ime series data over space for detecting high level phenomenon

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision

    Os modelos de exposição necessários à aquisição de publicidade no sector televisivo

    Get PDF
    O investimento publicitário no sector televisivo depende do desenvolvimento de modelos de “ratings” ou da identificação de abordagens metodológicas alternativas de previsão da exposição televisiva. Avaliámos o contributo da Análise Simbólica e do Data Mining para a construção dos modelos quantitativos de exposição, que servem de suporte à actividade de planeamento de media. Nas bases de audimetria consta informação com uma considerável capacidade explicativa da evolução dos ratings que pode alcançar os 90%. Porém, o potencial predictivo das análises univariadas e multivariadas de Regressão linear e não linear é consideravelmente menor, situando-se no máximo no intervalo 70%-80%. Foram testadas determinadas metodologias de Redes Neuronais (MLP e RBF), Árvores de Regressão (CART e CHAID), IBL, segmentação e clustering das séries temporais e modelos locais de Regressão. A construção de modelos explicativos dos comportamentos “estruturais” de consumo televisivo, permitiu verificar que no painel existe uma reduzida a moderada duplicação das audiências e que a totalidade dos comportamentos de lealdade está presente, existindo alguma tendência para a especialização das audiências. O desenvolvimento de um modelo explicativo estrutural da exposição televisiva demonstra os múltiplos contextos de exposição intencional e não intencional e fundamenta uma proposta alternativa de construção dos modelos de exposição, recorrendo a metodologias simbólicas, ao Data Mining Sequencial, Temporal, Multirrelacional e a algoritmos Bayesianos e de Regressão não linear, que é aplicável nos contextos de maior irregularidade dos dados de ratings ou quando novos conteúdos são transmitidos. Para os segmentos que apresentam uma exposição fortemente irregular é proposta a construção de Regras de associação e sequenciais que vão permitir a identificação dos suportes mais adequados à divulgação da mensagem publicitária, com a posterior construção de Redes Bayesianas e de Regras de Classificação multirrelacionais para reduzir a incerteza dos resultados em determinado período. Quando existem hábitos de consumo televisivo poderá ser suficiente recorrer ao Data Mining Sequencial, a modelos Binomiais Logísticos ou à Classificação de Bayes. No contexto de transmissão de eventos desportivos devemos recorrer às Regras Temporais que permitem identificar informação relevante nas séries temporais multivariadas de “ratings”, viabilizando uma melhor negociação com as estações televisivas.Television advertising investment depends on the development of ratings models or on the identification of alternative methodological approaches for the prediction of television exposure. In this research study, we evaluate the contribution of Symbolic Analysis and Data Mining for the construction of quantitative exposure models, which support the activity of media planning. According to the results attained, ratings databases contain information with a considerable explanatory capacity on the evolution of commercial ratings, which can reach up to 90%. However, the predictive potential of univariate and multivariate Linear Regression models and non-linear analysis is considerably lower and in general drops in the 70% -80% range. Certain methodologies were tested within the Neuronal Networks field (MLP and RBF), Regression Trees (CART and CHAID), IBL, segmentation and clustering of time series and Local Regression models. The construction of explanatory models for television “structural” consumption behaviours allowed us assessing that the panel presents reduced audience duplication ratings but all of the loyalty behaviours are present and there is a trend towards the specialisation of TV audiences. The development of a structural explanatory television exposure model demonstrates the multiple contexts of intentional and unintentional TV exposure and justifies an alternative proposal for the construction of exposure models, using symbolic methodologies, Temporal, Sequential and Multi-relational Data Mining and Bayesian algorithms and Non-Linear. Regression, which is most suited in the contexts of a higher irregularity of Ratings data or when new content is broadcasted. For audience segments which exhibit stronger irregular patterns, the construction of association or sequence rules is proposed. These rules will allow the identification of the most appropriate commercial spots for the broadcasting of the advertising message, with the subsequent construction of Bayesian Networks and Multi-Relational Regression Rules so as to reduce the uncertainty of the results over a given period. When viewers have television consumption habits, it may be sufficient to use Binomial Logistic models and Data Mining Sequential models or Bayes classification. In the context of the broadcast of sports events, there is a great difficulty in the construction of causal models. Therefore, we must turn to Temporal Rules in order to identify relevant information in the multivariate ratings time series, enabling a better negotiation with the TV stations.L’investissement au publicité au secteur de la télévision dépend du developpement des modèles de “ratings” ou de l’identification de plusieures approches alternatives de prévision de l’exposition à la télévision. On a évalué le contribut de l’Analyse Symbolique et du Data Mining à fin de créer des modèles quantitatifs d’exposition qui supportent l’activité de planification du media. Aux bases de l’audiométrie on trouve l’information avec une capacité explicatif considérable sur l’évolution des ratings qui peut atteindre un pourcentage de 90%. Cependant, le potentiel de pronostiquer les analyses univariées et multivariées da la Régression Linéaire et non Linéaire est considerablement inférieur et se situe dans un intervalle 70%-80% maximum. On a examiné certaines méthodologies des réseaux de neurones (MLP et RBF), arbres de régression (CART e CHAID), IBL, segmentation et clustering des séries chronologiques et des modèles locales de Régression. La création des modèles explicatifs des comportements “structurals” de consommation de télévision a montré qui au panneau existe une duplication des audiences faible à modérée et que tous les comportements de loyauté sont présents et qu’il ya une certaine tendance pour la spécialization des audiences. Le développement d’un modèle explicatif structural de l’exposition à la télévision montre les contextes variés de l’exposition intentionnel et non intentionnel et soutient une suggestion alternative de création des modèles de exposition, donnant la possibilité de utilization des méthodologies symboliques, le Data Mining Séquentiel, Temporel, Multirrelacional et algorithmes bayésiens et de Régression non linéaire, qui sont appliqués dans les contextes plus irrégulières des ratings ou quand les nouveaux contenus sont transmis. Pour les segments qui présentent une exposition beaucoup irréguliere on propose la création des règles de association et sequentielles qui permettront l’identification des supports plus convenables à la divulgation du message publicitaire, avec la création en arrière des règles bayésiens et des règles de classification multirrelationals à fin de réduire l’incertitude des résultats dans un période determiné. Quand on existe les habitudes de consommation de la télévision sera suffissant utilizer le Data Mining Sequentiel, les modèles Logistiques Binominales ou la classification de Bayes Au contexte de transmission des évenements sportifs on doit appliquer les Règles Temporelles qui identifient l’information plus important dans les séries chronologiques multivariées des “ratings”, et qui permet une meilleure négociation avec les chaînes de télévision
    corecore