    New Fundamental Technologies in Data Mining

    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Financial Methods for Online Advertising

    Online advertising, a form of advertising that reaches consumers through the World Wide Web, has become a multi-billion dollar industry. Using the state of the art computing technologies, online auctions have become an important sales mechanism for automating transactions in online advertising markets, where advertisement (shortly ad) inventories, such as impressions or clicks, are able to be auctioned off in milliseconds after they are generated by online users. However, with providing non-guaranteed deliveries, the current auction mechanisms have a number of limitations including: the uncertainty in the winning payment prices for buyers; the volatility in the seller’s revenue; and the weak loyalty between buyer and seller. To address these issues, this thesis explores the methods and techniques from finance to evaluate and allocate ad inventories over time and to design new sales models. Finance, as a sub-field of microeconomics, studies how individuals and organisations make decisions regarding the allocation of resources over time as well as the handling of risk. Therefore, we believe that financial methods can be used to provide novel solutions to the non-guaranteed delivery problem in online advertising. This thesis has three major contributions. We first study an optimal dynamic model for unifying programmatic guarantee and real-time bidding in display advertising. This study solves the problem of algorithmic pricing and allocation of guaranteed contracts. We then propose a multi-keyword multi-click ad option. This work discusses a flexible way of guaranteed deliveries in the sponsored search context, and it’s evaluation is under the no arbitrage principle and is based on the assumption that the underlying winning payment prices of candidate keywords for specific positions follow a geometric Brownian motion. However, according to our data analysis and other previous research, the same underlying assumption is not valid empirically for display ads. We therefore study a lattice framework to price an ad option based on a stochastic volatility underlying model. This research extends the usage of ad options to display advertising in a more general situation

    Integration of Data Mining into Scientific Data Analysis Processes

    In recent years, using advanced semi-interactive data analysis algorithms such as those from the field of data mining gained more and more importance in life science in general and in particular in bioinformatics, genetics, medicine and biodiversity. Today, there is a trend away from collecting and evaluating data in the context of a specific problem or study only towards extensively collecting data from different sources in repositories which is potentially useful for subsequent analysis, e.g. in the Gene Expression Omnibus (GEO) repository of high throughput gene expression data. At the time the data are collected, it is analysed in a specific context which influences the experimental design. However, the type of analyses that the data will be used for after they have been deposited is not known. Content and data format are focused only to the first experiment, but not to the future re-use. Thus, complex process chains are needed for the analysis of the data. Such process chains need to be supported by the environments that are used to setup analysis solutions. Building specialized software for each individual problem is not a solution, as this effort can only be carried out for huge projects running for several years. Hence, data mining functionality was developed to toolkits, which provide data mining functionality in form of a collection of different components. Depending on the different research questions of the users, the solutions consist of distinct compositions of these components. Today, existing solutions for data mining processes comprise different components that represent different steps in the analysis process. There exist graphical or script-based toolkits for combining such components. The data mining tools, which can serve as components in analysis processes, are based on single computer environments, local data sources and single users. However, analysis scenarios in medical- and bioinformatics have to deal with multi computer environments, distributed data sources and multiple users that have to cooperate. Users need support for integrating data mining into analysis processes in the context of such scenarios, which lacks today. Typically, analysts working with single computer environments face the problem of large data volumes since tools do not address scalability and access to distributed data sources. Distributed environments such as grid environments provide scalability and access to distributed data sources, but the integration of existing components into such environments is complex. In addition, new components often cannot be directly developed in distributed environments. Moreover, in scenarios involving multiple computers, multiple distributed data sources and multiple users, the reuse of components, scripts and analysis processes becomes more important as more steps and configuration are necessary and thus much bigger efforts are needed to develop and set-up a solution. In this thesis we will introduce an approach for supporting interactive and distributed data mining for multiple users based on infrastructure principles that allow building on data mining components and processes that are already available instead of designing of a completely new infrastructure, so that users can keep working with their well-known tools. In order to achieve the integration of data mining into scientific data analysis processes, this thesis proposes an stepwise approach of supporting the user in the development of analysis solutions that include data mining. We see our major contributions as the following: first, we propose an approach to integrate data mining components being developed for a single processor environment into grid environments. By this, we support users in reusing standard data mining components with small effort. The approach is based on a metadata schema definition which is used to grid-enable existing data mining components. Second, we describe an approach for interactively developing data mining scripts in grid environments. The approach efficiently supports users when it is necessary to enhance available components, to develop new data mining components, and to compose these components. Third, building on that, an approach for facilitating the reuse of existing data mining processes based on process patterns is presented. It supports users in scenarios that cover different steps of the data mining process including several components or scripts. The data mining process patterns support the description of data mining processes at different levels of abstraction between the CRISP model as most general and executable workflows as most concrete representation

    Decision Support Systems

    Decision support systems (DSS) have evolved over the past four decades from theoretical concepts into real world computerized applications. DSS architecture contains three key components: knowledge base, computerized model, and user interface. DSS simulate cognitive decision-making functions of humans based on artificial intelligence methodologies (including expert systems, data mining, machine learning, connectionism, logistical reasoning, etc.) in order to perform decision support functions. The applications of DSS cover many domains, ranging from aviation monitoring, transportation safety, clinical diagnosis, weather forecast, business management to internet search strategy. By combining knowledge bases with inference rules, DSS are able to provide suggestions to end users to improve decisions and outcomes. This book is written as a textbook so that it can be used in formal courses examining decision support systems. It may be used by both undergraduate and graduate students from diverse computer-related fields. It will also be of value to established professionals as a text for self-study or for reference

    Data Mining in Promoting Flight Safety

    The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.Lentoliikenne kasvoi huomattavasti 1950-luvulla pääasiassa suihkumatkustajakoneiden myötä, mikä aiheutti poikkeamatietojen järjestelmällisen keräämisen ja tutkimuksen tarpeen. Määrämuotoinen tieto voidaan helposti analysoida tietokantakyselyillä esittäen tulokset käyttäen graafisia työkaluja, mutta tekstianalyysiin, jonka avulla tapauksista saadaan usein tarkempia tietoja, tarvitaan louhintatyökaluja. Tekstimuotoisen tiedon automaattinen analysointi ei ole ollut mahdollista ennen louhintatyökalujen kehittämistä. Silti niiden käyttö, ainakin ilmailun piirissä, on edelleen vähäistä. Tutkimuksen tarkoituksena oli havaita vaarallisia kehityskulkuja lentoturvallisuusraporteissa. 1 200 lentoturvallisuusraportin selostusosiot vuosilta 1994 –1996 käsiteltiin kolmella tekstinlouhintatyökalulla. Yksi näistä oli täysin kieliriippumaton, toisessa oli lisäosa, jossa oli mahdollisuus käsitellä suomen kieltä ja kolmas oli rakennettu alun perin ainoastaan englanninkielisen tekstin louhintaan, mutta espanjan kielellä saavutettujen rohkaisevien tulosten pohjalta päätettiin kokeilla myös suomenkielistä tekstiä. Lento-onnettomuuksien määrä liikenteeseen nähden on vakiintumassa maailmanlaajuisesti katsottuna ja turvallisuustaso voidaan katsoa tyydyttäväksi. Kuitenkin liikenteen kasvaessa myös onnettomuuksien määrä lisääntyy vuosittain, mikäli lentoturvallisuutta ei kyetä parantamaan. Turvallisuustiedon kerääminen ja raportointijärjestelmät ovat jo saavuttaneet huippunsa. Analysoinnin parantaminen on avain lentoturvallisuuden parantamiseen. Lentoliikenteen on ennustettu kasvavan 5 – 6 prosenttia vuodessa seuraavien kahden vuosikymmenen ajan. Samana aikana lentoliikenne saattaa kaksinkertaistua jopa vaatimattomimpien talouskasvuennusteiden mukaan. Tällainen kehitys asettaa lentoliikenteen päättäjille yhä kasvavia paineita kiristyvän kilpailun, polttoaineiden hinnannousun ja liikenteen kasvun aiheuttaman onnettomuuksien määrän vähentämiseksi. Tämä korostaa uusien menetelmien ja työkalujen kiireellistä tarvetta. Kaikilla louhintajärjestelmillä saatiin rohkaisevia tuloksia mutta ne nostivat samalla esille haasteita, jotka tulisi vielä voittaa. Lentoturvallisuutta voidaan vielä parantaa käyttämällä tässä esille tuotuja analyysimenetelmiä ja –työkaluja kuten tiedonlouhintaa ja soveltamalla näin saatuja tuloksia johdon päätöksenteon tukena.Siirretty Doriast

    Cyber Security and Critical Infrastructures 2nd Volume

    The second volume of the book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles, including an editorial that explains the current challenges, innovative solutions and real-world experiences that include critical infrastructure and 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems

    Investigation on the Future of Enterprise Architecture in Dynamic Environments

    En la economía actual, el cambio constante se ha convertido en la nueva normalidad. Las consecuencias de este desarrollo son vívidamente visibles. La dinámica en los entornos corporativos está aumentando y las empresas que no se adapten a las condiciones cambiantes serán menos exitosas y finalmente acabarán en cierre. Mientras el desarrollo y la mejora de las capacidades de adaptación para tener éxito en los entornos dinámicos requieren el trabajo conjunto de muchas partes dentro de la empresa, la Arquitectura Empresarial (Enterprise Architecture - EA) puede suponer una parte vital al habilitar y guiar a distintos elementos organizacionales para ser más efectivos en entornos dinámicos. Sin embargo, para poder hacerlo, la EA necesita transformarse a sí misma. Esta tesis ofrece resultados que describen cómo la EA puede ser efectiva en entornos dinámicos. Los resultados se han estructurado de acuerdo con las siguientes cuatro áreas. Primero, se presenta una revisión del estado del arte sobre EA, en el que se describe el desarrollo de la disciplina a lo largo de las últimas tres décadas. Desde el análisis, es evidente que el enfoque de la investigación de EA se ha movido desde la comprensión y la definición de la EA hacia gestionar eficazmente la disciplina en entornos empresariales complejos. Las partes posteriores de esta tesis ponen énfasis en la gestión efectiva de la EA también al proporcionar enfoques de EA para circunstancias específicas, es decir, entornos con un mayor ritmo de cambio. En segundo lugar, esta tesis ofrece una descripción formal de cómo los efectos del ritmo creciente de cambio influyen en la efectividad de la EA. El resultado primario de esta parte es un modelo, basado en la teoría de la complejidad, que resume las siguientes dependencias: El ritmo creciente del cambio conduce a una mayor complejidad dinámica para EA ya que existe la necesidad de administrar partes que están cambiando más y más rápido. Esta complejidad debe considerarse desde un punto de vista de negocio y tecnológico. En el modelo final, La complejidad dinámica de negocios y tecnológica se consideran como factores contextuales, los cuales influyen en el uso correcto de la EA y, en consecuencia, la efectividad de la disciplina. Tercero, se presenta una colección de enfoques para mejorar la efectividad de la EA en ambientes dinámicos. Estos están estructurados en torno a cuatro dimensiones: la competencia EA, la cual considera quién en la organización está trabajando en EA; la metodología EA, que considera cómo se ejecuta EA en la organización; el contenido de EA, que considera la salida de EA; EA Tools que considera con qué EA está siendo creado y mantenido. Cuarto, la parte final de esta tesis presenta los resultados en forma de arquitectura de referencia para EA en entornos dinámicos. Los enfoques de EA son nuevamente estructurados de acuerdo con las dimensiones descritas anteriormente. La arquitectura de referencia se describe en el nivel de los enfoques individuales, así como en el nivel de dimensión. En resumen, la competencia EA debe integrarse bien en la empresa. Además de esto, la metodología EA debe estar alineada con prácticas ágiles que permitan decisiones arquitectónicas rápidas. El contenido EA resultante debe ser adaptativo, lo que significa que la arquitectura se puede ajustar fácilmente en caso de que sea necesario. Por último, los arquitectos y otras partes interesadas de EA deberían recibir el soporte de las modernas herramientas de EA. Esta tesis muestra que el objetivo subyacente de EA, en concreto, asegurar la alineación de diferentes facetas dentro de la empresa, incluso en las condiciones cambiantes de hoy en día, sigue siendo necesario. Sin embargo, los arquitectos trabajando en entornos dinámicos deberían revisar las dimensiones descritas (¿quién? - ¿cómo? - ¿qué? - ¿con qué?) en su práctica de la EA para seguir siendo efectivos. Con sus resultados, esta tesis presenta una guía para profesionales para que puedan tomar decisiones adecuadas y así optimizar la efectividad de la EA en entornos dinámicos. Al mismo tiempo, esta tesis contribuye al conocimiento académico sobre EA. Los modelos y enfoques presentados abordan la brecha con respecto al enfoque holístico actual de la EA en entornos dinámicos. Además, esta tesis señala diversas áreas que brindan oportunidades para futuras investigaciones. Se espera que estas inspirarán a investigadores a impulsar aún más la evolución de la EA desde el punto de vista académico.Administración y Dirección de Empresa


    Biometrics-Unique and Diverse Applications in Nature, Science, and Technology provides a unique sampling of the diverse ways in which biometrics is integrated into our lives and our technology. From time immemorial, we as humans have been intrigued by, perplexed by, and entertained by observing and analyzing ourselves and the natural world around us. Science and technology have evolved to a point where we can empirically record a measure of a biological or behavioral feature and use it for recognizing patterns, trends, and or discrete phenomena, such as individuals' and this is what biometrics is all about. Understanding some of the ways in which we use biometrics and for what specific purposes is what this book is all about