    n-Way metrics

    Multivariate analysis of psychological data

    A comparison of multi-way similarity coefficients for binary sequences

    Multivariate analysis of psychological data

    Einheitliche Gütemaße für Clusterings, Layouts und Orderings von Graphen, und deren Anwendung als Software-Entwurfskriterien

    How good is a given graph clustering, graph layout, or graph ordering --specifically, how well does it group densely connected vertices and separate sparsely connected vertices? How good is a given software design -- specifically, how well does it minimize the interdependence of the subsystems? This work introduces and validates simple and uniform measures for these two properties. Together with existing optimization algorithms, the introduced measures enable the automatic computation e.g. of communities in social networks and of design flaws in software systems. The first part derives, validates, and unifies quality measures for graph clusterings, graph layouts, and graph orderings, with the following results: - Identical quality measures can be applied to clusterings, layouts, and orderings; this enables the computation of consistent clusterings, layouts, and orderings. - Diverse existing and new measures can be unified into few general measures; this facilitates their comparison and validation. - Many existing measures are biased towards certain clusterings, layouts, or orderings, even for graphs without particularly dense or sparse subgraphs, and thus do not (only) measure quality in the above sense. - For example graphs, the minimization of new, unbiased (or weakly biased) measures reveals nonobvious groups, e.g. communities in social networks, subject areas in hypertexts, or closely interlocked countries in international trade. The second part derives, validates, and unifies dependency-based indicators of software design quality. It applies two quality measures for graph clusterings as measures for the coupling of software subsystems -- specifically for the coupling indicated by common changes and for the coupling indicated by references -- and shows: - The measures quantify the dependency-caused development costs, under well-defined simplifying assumptions. - The minimization of the measures conforms to existing dependency-related design principles (like locality of change, acyclicity of references, and stability of references), design rules, and design patterns. - For example software systems, the incremental minimization of the measures reveals nonobvious design flaws, like the distribution of coherent responsibilities over several subsystems, or references from low-level to high-level subsystems. In summary, this work shows that - simple measures can suffice to capture important aspects of graph clustering quality, graph layout quality, graph ordering quality, and software design quality, and - the optimization of simple measures can suffice to detect nonobvious and often useful structure in various real-world systems.Wie gut ist ein Graph-Clustering, Graph-Layout oder Graph-Ordering -- insbesondere, wie gut gruppiert es dicht verbundene Knoten? Wie gut ist ein Software-Entwurf -- insbesondere, wie gut minimiert er die Abhängigkeiten zwischen Subsystemen? Für diese beiden Eigenschaften definiert und validiert die vorliegende Arbeit einfache und einheitliche Maße. Zusammen mit existierenden Optimierungsalgorithmen ermöglichen diese Maße die automatische Entdeckung z.B. von kohäsiven Communities in sozialen Netzwerken und von Entwurfsfehlern in Software-Systemen. Der erste Teil definiert, validiert und vereinheitlicht Gütemaße für Graph-Clusterings, Graph-Layouts und Graph-Orderings, mit folgenden Ergebnissen: - Identische Gütemaße können auf Clusterings, Layouts und Orderings angewendet werden. Dies ermöglicht die Berechnung von konsistenten Clusterings, Layouts und Orderings. - Viele existierende und neue Gütemaße können zu wenigen allgemeinen Maßen vereinheitlicht werden; dies erleichtert ihren Vergleich und ihre Validierung. - Viele existierende Maße messen nicht (nur) Güte im obigen Sinne, da sie selbst für Graphen ohne ungewöhnlich dichte oder dünne Teilgraphen bestimmte Clusterings, Layouts oder Orderings bevorzugen. - Durch Optimierung verbesserter Maße lassen sich nicht-offensichtliche Gruppen in vielen realen Systemen finden, z.B. Communities in sozialen Netzwerken, Themengebiete in Hypertexten, und Integrationsräume in der Weltwirtschaft. Der zweite Teil definiert, validiert und vereinheitlicht abhängigkeitsbasierte Indikatoren für Software-Entwurfsqualität. Er verwendet zwei Gütemaße für Graph-Clusterings als Maße für die Kopplung von Software-Subsystemen -- insbesondere für Kopplung, deren Symptom gemeinsame Änderungen sind und für Kopplung, deren Ursache Referenzen sind -- und zeigt: - Die Maße quantifizieren die durch Abhängigkeiten verursachten Entwicklungskosten, unter vereinfachenden Annahmen. - Die Optimierung der Maße impliziert anerkannte Entwurfsprinzipien (z.B. Lokalität von Änderungen, Azyklizität von Referenzen, und Stabilität von Referenzen), Entwurfsregeln und Entwurfsmuster. - Durch Optimierung der Maße lassen sich nicht-offensichtliche Entwurfsfehler finden, z.B. die Verteilung kohärenter Verantwortlichkeiten über mehrere Subsysteme, oder Referenzen von allgemeinen zu speziellen Subsystemen. Zusammenfassend zeigt die Arbeit, dass - einfache Maße ausreichen, um wichtige Aspekte der Qualität von Graph-Clusterings, Graph-Layouts, Graph-Orderings und Software-Entwürfen zu formalisieren, und - die Optimierung einfacher Maße ausreicht, um nicht-offensichtliche und nützliche Struktur in verschiedensten Systemen zu finden

    Similarity coefficients for binary data : properties of coefficients, coefficient matrices, multi-way metrics and multivariate coefficients

    In data analysis, an important role is played by similarity coefficients. A similarity coefficient is a measure of resemblance or association of two entities or variables. Similarity coefficients for binary data are used, for example, in biological ecology for measuring the degree of coexistence between two species type over different locations, or in psychology for a 2×2 reliability study where two observers classify a sample of subjects using a dichotomous response. In choosing a coefficient, a measure has to be considered in the context of the data-analytic study of which it is a part. Because there are so many similarity coefficients for binary data to choose from, it is important that the different coefficients and their properties are better understood. The dissertation contains a mathematical approach to the analysis of similarity coefficients for binary data. A variety of data-analytic properties are considered for various coefficients it is established whether they possess the property or not. Part I contains results on correction for chance and maximum value. In part II sufficient conditions for Robinson matrices and some mathematical properties of multiple correspondence analysis are presented. In part III various two-way notions are generalized to the multi-way case. Part IV contains formulations of multi-way coefficients.

    Events Recognition System for Water Treatment Works

    The supply of drinking water in sufficient quantity and required quality is a challenging task for water companies. Tackling this task successfully depends largely on ensuring a continuous high quality level of water treatment at Water Treatment Works (WTW). Therefore, processes at WTWs are highly automated and controlled. A reliable and rapid detection of faulty sensor data and failure events at WTWs processes is of prime importance for its efficient and effective operation. Therefore, the vast majority of WTWs operated in the UK make use of event detection systems that automatically generate alarms after the detection of abnormal behaviour on observed signals to ensure an early detection of WTW’s process failures. Event detection systems usually deployed at WTWs apply thresholds to the monitored signals for the recognition of WTW’s faulty processes. The research work described in this thesis investigates new methods for near real-time event detection at WTWs by the implementation of statistical process control and machine learning techniques applied for an automated near real-time recognition of failure events at WTWs processes. The resulting novel Hybrid CUSUM Event Recognition System (HC-ERS) makes use of new online sensor data validation and pre-processing techniques and utilises two distinct detection methodologies: first for fault detection on individual signals and second for the recognition of faulty processes and events at WTWs. The fault detection methodology automatically detects abnormal behaviour of observed water quality parameters in near real-time using the data of the corresponding sensors that is online validated and pre-processed. The methodology utilises CUSUM control charts to predict the presence of faults by tracking the variation of each signal individually to identify abnormal shifts in its mean. The basic CUSUM methodology was refined by investigating optimised interdependent parameters for each signal individually. The combined predictions of CUSUM fault detection on individual signals serves the basis for application of the second event detection methodology. The second event detection methodology automatically identifies faults at WTW’s processes respectively failure events at WTWs in near real-time, utilising the faults detected by CUSUM fault detection on individual signals beforehand. The method applies Random Forest classifiers to predict the presence of an event at WTW’s processes. All methods have been developed to be generic and generalising well across different drinking water treatment processes at WTWs. HC-ERS has proved to be effective in the detection of failure events at WTWs demonstrated by the application on real data of water quality signals with historical events from a UK’s WTWs. The methodology achieved a peak F1 value of 0.84 and generates 0.3 false alarms per week. These results demonstrate the ability of method to automatically and reliably detect failure events at WTW’s processes in near real-time and also show promise for practical application of the HC-ERS in industry. The combination of both methodologies presents a unique contribution to the field of near real-time event detection at WTW

    Criminal data analysis based on low rank sparse representation

    FINDING effective clustering methods for a high dimensional dataset is challenging due to the curse of dimensionality. These challenges can usually make the most of basic common algorithms fail in highdimensional spaces from tackling problems such as large number of groups, and overlapping. Most domains uses some parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the implementation of several techniques of a high-dimensional data for finding a low-dimensional space. Many proposed methods fail to overcome the challenges, especially when the data input is high-dimensional, and the clusters have a complex. REGULARLY in high dimensional data, lots of the data dimensions are not related and might hide the existing clusters in noisy data. High-dimensional data often reside on some low dimensional subspaces. The problem of subspace clustering algorithms is to uncover the type of relationship of an objects from one dimension that are related in different subsets of another dimensions. The state-of-the-art methods for subspace segmentation which included the Low Rank Representation (LRR) and Sparse Representation (SR). The former seeks the global lowest-rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which, however, causes failure in the case of large noise. THIS thesis aims are to identify the key problems and obstacles that have challenged the researchers in recent years in clustering high dimensional data, then to implement an effective subspace clustering methods for solving high dimensional crimes domains for both real events and synthetic data which has complex data structure with 168 different offence crimes. As well as to overcome the disadvantages of existed subspace algorithms techniques. To this end, a Low-Rank Sparse Representation (LRSR) theory, the future will refer to as Criminal Data Analysis Based on LRSR will be examined, then to be used to recover and segment embedding subspaces. The results of these methods will be discussed and compared with what already have been examined on previous approaches such as K-mean and PCA segmented based on K-means. The previous approaches have helped us to chose the right subspace clustering methods. The Proposed method based on subspace segmentation method named Low Rank subspace Sparse Representation (LRSR) which not only recovers the low-rank subspaces but also gets a relatively sparse segmentation with respect to disjoint subspaces or even overlapping subspaces. BOTH UCI Machine Learning Repository, and crime database are the best to find and compare the best subspace clustering algorithm that fit for high dimensional space data. We used many Open-Source Machine Learning Frameworks and Tools for both employ our machine learning tasks and methods including preparing, transforming, clustering and visualizing the high-dimensional crime dataset, we precisely have used the most modern and powerful Machine Learning Frameworks data science that known as SciKit-Learn for library for the Python programming language, as well as we have used R, and Matlab in previous experiment

    Customer equity drivers and emotions on algarve 5-star hotel clients´ satisfaction and loyalty

    The tourism and hotel industry are critical drivers for Portugal, particularly for Algarve’s economy. The hotel industry’s demand depends not only on macroeconomic variables of countries of the tourists but also on other customer behavior issues, such as Satisfaction, Loyalty, Emotions, and Customer Equity drivers, which are significantly related. All these items are essential for customer decision making. Therefore, understanding their relations can be useful for academia and, by knowledge transfer, to the industry. This research aimed to clarify the relationships between Customer Equity Drivers and Emotions with Satisfaction and Loyalty of five-star hotel clients from Algarve’s predominant tourist nationalities, contributing to a more integrative conceptual model. For this purpose, the perspectives of two leading hotel brands in Algarve were compared with the perspective of their clients. Questionnaires were administrated amongst five-star hotel clients from the two famous brands, who stayed overnight during July, August, and September of 2019 in the Algarve region. A sample of 133 respondents from the predominant tourist nationalities with valid answers was achieved. The five-star hotels’ management answered the questionnaires based on their data and perception to compare their clients’ perspectives. Complementary, the emotions of tourists about the Algarve region were also studied. The analysis was done in an exploratory approach, using three-way data analysis supported by Multiple Factor Analysis (MFA) developed by Escofier and Pagès (1985). The MFA results confirmed stability between the dimensions constructed with the two hotel brands and clients’ data. It was identified, as expected, an opposition between negative emotions and all other model items. Nonetheless, there was a more evident linkage between positive emotions, joy, and happiness, with overall satisfaction and perception of brand ethics. Another highlighted linkage was between positive emotion enthusiasm, service/product quality, and attitudinal loyalties. The results showed that hotel brand one is a variance with the other two perspectives. This difference was mostly related to Portuguese nationality clients. The presupposes of the proposed conceptual model were aligned with the research results.O turismo é uma das principais formas de desenvolvimento de cada região. O turismo e a hotelaria em particular são fatores críticos para Portugal e em particular para a economia do Algarve. Neste contexto, o alojamento é uma parte importante da indústria do turismo e é significativo para o desenvolvimento dos destinos turísticos. Entre os diferentes tipos de alojamento, os hotéis são o setor mais tipificado e posicionam-se como o principal segmento na maioria dos destinos. Os hotéis têm que competir globalmente para atrair turistas e, numa realidade que é naturalmente dinâmica, as mudanças de preferências, requisitos e expectativas dos clientes tornam evidente a necessidade de investigação constante sobre a realidade ou realidades dos clientes. A procura da indústria hoteleira depende não apenas de variáveis macroeconómicas dos países de origem dos turistas, mas também de outras questões e conceitos associados a aspetos comportamentais do cliente, como a satisfação, a lealdade, as emoções e o valor do cliente, que estão significativamente relacionados. Todos estes itens são essenciais para o processo de tomada de decisão do cliente e entendimento das suas relações é um contributo de interesse para a academia e, por transferência de conhecimento, para a indústria, pois pode fornecer informações relevantes para apoiar os hotéis nas suas atividades relacionadas aos clientes. Investigações anteriores mostraram que a satisfação dos clientes é um fator-chave para o sucesso de todos os negócios, e essa satisfação leva ao suporte e à fidelidade por parte dos clientes, à passa-palavra ou marketing de boca-a-boca positivos, à retenção de clientes, à sua lealdade e traduz-se na diminuição do custo da captação de novos clientes. O reconhecimento da qualidade pelos clientes, intimamente relacionada com os níveis de satisfação, é possível não só em serviços de luxo, mas também em todos serviços que respondam ao que o cliente procura. No entanto, importa referir que, as próprias classificações dos alojamentos normalmente associadas à atribuição de mais estrelas em função da sua categoria, podem ser consideradas indicadores de qualidade. Neste sentido, e também de acordo com resultados de investigações anteriores, os clientes dos hotéis de categoria mais alta são mais exigentes. Particularmente os clientes ocidentais são mais exigentes, menos leais e mais recetivos a iniciativas de marketing. Assim, Portugal, como destino turístico predominantemente de turistas ocidentais, deve envidar esforços para atrair e reter mais visitantes. Para esse efeito, a apresentação de produtos ou serviços adequados para a satisfação dos clientes, a monitorização e controlo do valor e lealdade dos clientes são as melhores formas de o concretizar. A revisão da história e da literatura teórica mostram que o valor do cliente é um estímulo para a satisfação do cliente. A abordagem pelo valor do cliente é uma estratégia competitiva de marketing assente numa lógica de valores dos recursos investidos por clientes, em organizações específicas. Por outro lado, as emoções desempenham um papel importante no processo de compra dos clientes e sentimentos positivos podem levar a melhores níveis de satisfação e consequentemente à fidelidade ao destino. Neste contexto, os investigadores defendem que os sentimentos positivos podem influenciar positivamente as perceções do consumidor sobre a qualidade do serviço e como toma as suas decisões. Estudos anteriores revelaram diferenças significativas nas características comportamentais com base na nacionalidade e nas especificidades de conceito de produtos de luxo em diferentes culturas. O principal propósito da presente investigação foi esclarecer as relações entre os drivers de valor do cliente, as emoções, a satisfação e lealdade de clientes de hotéis de cinco estrelas, tendo em conta as nacionalidades turísticas predominantes na região do Algarve, contribuindo para um modelo conceptual mais integrador. Para o efeito, foram comparadas as perspetivas de duas marcas hoteleiras famosas na região do Algarve, na categoria de cinco estrelas, com a perspetiva dos seus próprios clientes, no que se refere a Valor do Cliente (10 itens), Emoções (9 itens), Satisfação (4 itens) e Lealdade (4 itens). A gestão dos hotéis de cinco estrelas respondeu aos questionários com base nos seus dados e perceção para comparação com a perspetiva dos seus clientes, também para diferentes nacionalidades turísticas. Complementarmente, foram estudadas as emoções dos turistas sobre a região do Algarve com recurso a 20 itens de Emoções Negativas e Positivas. Os questionários foram aplicados aos clientes de hotéis de cinco estrelas das duas marcas famosas, nos meses de julho, agosto e setembro de 2019 na região do Algarve. Foram obtidas respostas válidas de 133 inquiridos das nacionalidades turísticas predominantes. A gestão dos dois hotéis de cinco estrelas respondeu aos questionários com base nos seus dados e na sua perceção sobre os clientes das várias nacionalidades. A configuração dos dados recolhidos levou à construção de estruturas de dados tridimensionais que exigiram análises multivariadas de três vias. A investigação assentou numa abordagem predominantemente exploratória, utilizando a análise de dados de três vias suportada pela análise fatorial múltipla (AFM) desenvolvida por Escofier e Pagès (1985). Os resultados da AFM confirmaram a estabilidade entre as dimensões construídas com as duas marcas de hotel e dados dos clientes. Identificou-se, como esperado, uma oposição entre as emoções negativas e todos os demais itens do modelo. No entanto, houve uma ligação mais evidente entre as emoções positivas, alegria e felicidade, com a satisfação geral e a perceção da ética da marca. Outra ligação destacada foi entre o entusiasmo nas emoções positivas, com a qualidade do serviço/produto e a lealdade atitudinal. Os resultados mostraram que uma das marcas de hotel encontra-se em divergência com as outras duas perspetivas. Esta diferença está principalmente relacionada com a avaliação de clientes de nacionalidade portuguesa. Os pressupostos do modelo concetual proposto encontram-se alinhados aos resultados da pesquisa. Verificou-se complementarmente, que a região do Algarve se encontra numa situação ideal para os turistas no que se refere ao seu posicionamento no espetro de emoções negativas e positivas. Os itens positivos Simpatia / Interesse / Compaixão, e principalmente os itens relacionados às emoções Admiração / Maravilha / Espanto são menos relevantes para os outros itens positivos, e foram menos experienciados por comparação com as outras emoções positivas. A abordagem metodológica utilizada mostrou-se adequada para compreender relações menos evidentes, ao avaliar as mesmas observações, entre diferentes conjuntos de variáveis por diferentes perspetivas. Por fim, a investigação facilitou sugestões para os processos de decisão da gestão dos hotéis para melhoria seu desempenho e perceção pelos clientes

