23,598 research outputs found

    On the Nature and Types of Anomalies: A Review

    Full text link
    Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement

    Advanced Methods in Business Process Deviance Mining

    Get PDF
    Äriprotsessi hĂ€lve on nĂ€htus, kus alamhulk Ă€riprotsessi tĂ€itmistest erinevad soovitud vĂ”i ettenĂ€htud tulemusest, kas positiivses vĂ”i negatiivses mĂ”ttes. Äriprotsesside hĂ€lbega tĂ€itmised sisaldavad endas tĂ€itmisi, mis ei vasta ettekirjutatud reeglitele vĂ”i tĂ€itmised, mis on jÀÀvad alla vĂ”i ĂŒletavad tulemuslikkuse eesmĂ€rke. HĂ€lbekaevandus tegeleb hĂ€lbe pĂ”hjuste otsimisega, analĂŒĂŒsides selleks Ă€riprotsesside sĂŒndmuste logisid.Antud töös lĂ€henetakse protsessihĂ€lvete pĂ”hjuste otsimise ĂŒlesandele, esmalt kasutades jĂ€rjestikkudel pĂ”hinevaid vĂ”i deklaratiivseid mustreid ning nende kombinatsiooni. HĂ€lbekaevandusest saadud pĂ”hjendusi saab parendada, kasutades sĂŒndmustes ja sĂŒndmusjĂ€lgede atribuutides sisalduvaid andmelaste. Andmelastidest konstrueeritakse uued tunnused nii otsekoheselt atribuute ekstraheerides ja agregeerides kui ka andmeteadlike deklaratiivseid piiranguid kasutades. HĂ€lbeid iseloomustavad pĂ”hjendused ekstraheeritakse kasutades kaudset ja otsest meetodit reeglite induktsiooniks. Kasutades sĂŒnteetilisi ja reaalseid logisid, hinnatakse erinevaid tunnuseid ja tulemuseks saadud otsustusreegleid nii nende vĂ”imekuses tĂ€pselt eristada hĂ€lbega ja hĂ€lbeta protsesside tĂ€itmiseid kui ka kasutajatele antud lĂ”pptulemustes.Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to its expected or desirable outcomes. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. In this thesis, the problem of explaining deviations in business processes is first investigated by using features based on sequential and declarative patterns, and a combination of them. The explanations are further improved by leveraging the data payload of events and traces in event logs through features based on pure data attribute values and data-aware declare constraints. The explanations characterizing the deviances are then extracted by direct and indirect methods for rule induction. Using synthetic and real-life logs from multiple domains, a range of feature types and different forms of decision rules are evaluated in terms of their ability to accurately discriminate between non-deviant and deviant executions of a process as well as in terms of the final outcome returned to the users

    Rigorous, transparent, and eye-catching: Exploring the universalistic parameters of impactful theory building in management

    Get PDF
    In the management discipline, scholarly impact is most commonly measured using a researcher perspective, by counting the number of times a particular article is mentioned in the references section of other articles (Aguinis, Shapiro, Antonacopoulou, and Cummings, 2014). This approach conceptualizes scholarly impact using a measurable indicator, the citation count an article receives. Several studies have been conducted to examine what drives scholarly impact in the field of management. The originality of the idea, rigor of the study, and clarity of writing have been identified as the most significant universalistic parameters of scholarly impact (Judge, Colbert, Cable, and Rynes, 2007). This dissertation sets out to do a detailed examination of these parameters. The six articles included in the thesis do so in two ways: either by offering recommendations for improving these universalistic parameters of scholarly impact or by further exploring the relationship between the universalistic parameters and scholarly impact. Our first empirical article, here relayed in Chapter II, focuses on case studies, and analyzes the methodological rigor of all case studies published during the period 1996-2006. We point out different types of replication logic, and illustrate how their individual research actions have differential effects on the internal and external validity (in that order of priority) of the emerging theory. Chapter III follows up on the previous chapter, extending the investigation to quantitative as well as qualitative research, and offers replication logic as a tool for analyzing deviant cases identified during the course of a qualitative or quantitative study. We call this technique the \u2018Deviant Case Method\u2019 (\u2018DCM\u2019). Through this study, we explain the theoretical consequentiality (Aguinis et al., 2013; Cortina, 2002) of analyzing three different kinds of outliers (construct, model fit, and prediction outliers/ deviant cases) and offer DCM for analyzing prediction outliers/deviant cases. In Chapter IV, we extend this method to have a look at medium-N studies. Here we focus on inconsistent or deviant cases which turn up during a fuzzy set Qualitative Comparative Analysis (fsQCA). We offer a method called \u2018Comparative Outlier Analysis\u2019 (\u2018COA\u2019) which combines DCM and Mill\u2019s canons (1875) to examine these multitude of inconsistent cases. We explicate this using exemplars from fields like politics, marketing, and education. Unlike in other disciplines or methods, it is far from clear what the label \u2018transparent research procedures\u2019 constitutes in management field studies, with adverse effects during write-up, revision, and even after publication. To rectify this, in Chapter V, we review field studies across seven major management journals (1997- 2006) in order to develop a transparency index, and link it to article impact. Chapter VI is a sequel to the previous chapter. We propose a new method for assessing the methodological rigor of grounded theory procedures ex-post using an audit trail perspective. While existing research on the methodological sophistication of grounded theory was typically done from the perspective of the author or producer of the research, our perspective is customer-centric, both in terms of the end-customer (i.e. the reader or other author), as well as the intermediate customer (i.e. reviewers and editors). The last empirical article in the thesis, Chapter VII, focuses on yet another parameter influencing impact: the style of academic writing. Specifically, we focus on the attributes of article titles and their subsequent influence on the citation count. At this early stage of theory development on article titles, we do this in the specific application context of management science. We conclude with Chapter VIII where we sum up the findings and implications of all preceding studies and put forth suggestions for future research

    Role based behavior analysis

    Get PDF
    Tese de mestrado, Segurança InformĂĄtica, Universidade de Lisboa, Faculdade de CiĂȘncias, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condiçÔes que se alteram rapidamente. Dois requisitos para esse sucesso sĂŁo trabalhadores proactivos e uma infra-estrutura ĂĄgil de Tecnologias de InformacĂŁo/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nĂ­vel da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizaçÔes. AlĂ©m disso, se nĂŁo houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderĂĄ ser utilizada de forma ineficiente, com excessos em algumas ĂĄreas e deficiĂȘncias noutras. Finalmente, incentivar a proactividade nĂŁo implica acesso completo e sem restriçÔes, uma vez que pode deixar os sistemas vulnerĂĄveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese Ă© desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo Ă© iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com caracterĂ­sticas semelhantes sĂŁo agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais sĂŁo comprados com os perfis de grupo, e os que diferem significativamente sĂŁo marcados como anomalias para anĂĄlise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda mĂ©todos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicaçÔes. Os resultados confirmam que os grupos projectam com precisĂŁo comportamento semelhante. AlĂ©m disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos sĂŁo assinalados para posterior anĂĄlise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it

    Data-Aware Declarative Process Mining with SAT

    Get PDF
    Process Mining is a family of techniques for analyzing business process execution data recorded in event logs. Process models can be obtained as output of automated process discovery techniques or can be used as input of techniques for conformance checking or model enhancement. In Declarative Process Mining, process models are represented as sets of temporal constraints (instead of procedural descriptions where all control-flow details are explicitly modeled). An open research direction in Declarative Process Mining is whether multi-perspective specifications can be supported, i.e., specifications that not only describe the process behavior from the control-flow point of view, but also from other perspectives like data or time. In this paper, we address this question by considering SAT (Propositional Satisfiability Problem) as a solving technology for a number of classical problems in Declarative Process Mining, namely log generation, conformance checking and temporal query checking. To do so, we first express each problem as a suitable FO (First-Order) theory whose bounded models represent solutions to the problem, and then find a bounded model of such theory by compilation into SAT

    Automated novelty detection in the WISE survey with one-class support vector machines

    Get PDF
    Wide-angle photometric surveys of previously uncharted sky areas or wavelength regimes will always bring in unexpected sources whose existence and properties cannot be easily predicted from earlier observations: novelties or even anomalies. Such objects can be efficiently sought for with novelty detection algorithms. Here we present an application of such a method, called one-class support vector machines (OCSVM), to search for anomalous patterns among sources preselected from the mid-infrared AllWISE catalogue covering the whole sky. To create a model of expected data we train the algorithm on a set of objects with spectroscopic identifications from the SDSS DR13 database, present also in AllWISE. OCSVM detects as anomalous those sources whose patterns - WISE photometric measurements in this case - are inconsistent with the model. Among the detected anomalies we find artefacts, such as objects with spurious photometry due to blending, but most importantly also real sources of genuine astrophysical interest. Among the latter, OCSVM has identified a sample of heavily reddened AGN/quasar candidates distributed uniformly over the sky and in a large part absent from other WISE-based AGN catalogues. It also allowed us to find a specific group of sources of mixed types, mostly stars and compact galaxies. By combining the semi-supervised OCSVM algorithm with standard classification methods it will be possible to improve the latter by accounting for sources which are not present in the training sample but are otherwise well-represented in the target set. Anomaly detection adds flexibility to automated source separation procedures and helps verify the reliability and representativeness of the training samples. It should be thus considered as an essential step in supervised classification schemes to ensure completeness and purity of produced catalogues.Comment: 14 pages, 15 figure

    The development of intelligent hypermedia courseware, for design and technology in the English National Curriculum at Key Stage 3, by the sequential combination of cognition clusters, supported by system intelligence, derived from a dynamic user model

    Get PDF
    The purpose of this research was to develop an alternative to traditional textbooks for the teaching of electronics, within Design and Technology at Key Stage 3, in the English National Curriculum. The proposed alternative of intelligent hypermedia courseware was investigated in terms of its potential to support pupil procedural autonomy in task directed, goal oriented, design projects. Three principal design criteria were applied to the development of this courseware: the situation in which it is to be used; the task that it is to support; and the pedagogy that it will reflect and support. The discussion and satisfaction of these design criteria led towards a new paradigm for the development of intelligent hypermedia courseware, i.e. the sequential combination of cognition clusters, supported by system intelligence, derived from a dynamic user model. A courseware prototype was instantiated using this development paradigm and subsequently evaluated in three schools. An illuminative evaluation method was developed to investigate the consequences of using this courseware prototype. This evaluation method was based on longitudinal case studies where cycles of observation, further inquiry and explanation are undertaken. As a consequence of following this longitudinal method, where participants chose to adopt the courseware after the first trial, the relatability of outcomes increased as subsequent cycles were completed. Qualitative data was obtained from semi-structured interviews with participating teachers. This data was triangulated against quantitative data obtained from the completed dynamic user models generated by pupils using the courseware prototype. These data were used to generate hypotheses, in the form of critical processes, by the identification of significant features, concomitant features and recurring concomitants from the courseware trials. Four relatable critical processes are described that operate when this courseware prototype is used. These critical processes relate to: the number of computers available; the physical environment where the work takes place; the pedagogical features of a task type match, a design brief frame match and a preferred teaching approach match; and the levels of heuristic interaction with the courseware prototype
    • 

    corecore