    Proceedings of the ECCS 2005 satellite workshop: embracing complexity in design - Paris 17 November 2005

    Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr).

    Replication or exploration? Sequential design for stochastic simulation experiments

    We investigate the merits of replication, and provide methods for optimal design (including replicates), with the goal of obtaining globally accurate emulation of noisy computer simulation experiments. We first show that replication can be beneficial from both design and computational perspectives, in the context of Gaussian process surrogate modeling. We then develop a lookahead based sequential design scheme that can determine if a new run should be at an existing input location (i.e., replicate) or at a new one (explore). When paired with a newly developed heteroskedastic Gaussian process model, our dynamic design scheme facilitates learning of signal and noise relationships which can vary throughout the input space. We show that it does so efficiently, on both computational and statistical grounds. In addition to illustrative synthetic examples, we demonstrate performance on two challenging real-data simulation experiments, from inventory management and epidemiology.Comment: 34 pages, 9 figure

    Robust and efficient approach to feature selection with machine learning

    Most statistical analyses or modelling studies must deal with the discrepancy between the measured aspects of analysed phenomenona and their true nature. Hence, they are often preceded by a step of altering the data representation into somehow optimal for the following methods.This thesis deals with feature selection, a narrow yet important subset of representation altering methodologies.Feature selection is applied to an information system, i.e., data existing in a tabular form, as a group of objects characterised by values of some set of attributes (also called features or variables), and is defined as a process of finding a strict subset of them which fulfills some criterion.There are two essential classes of feature selection methods: minimal optimal, which aim to find the smallest subset of features that optimise accuracy of certain modelling methods, and all relevant, which aim to find the entire set of features potentially usable for modelling. The first class is mostly used in practice, as it adheres to a well known optimisation problem and has a direct connection to the final model performance. However, I argue that there exists a wide and significant class of applications in which only all relevant approaches may yield usable results, while minimal optimal methods are not only ineffective but even can lead to wrong conclusions.Moreover, all relevant class substantially overlaps with the set of actual research problems in which feature selection is an important result on its own, sometimes even more important than the finally resulting black-box model. In particular this applies to the p>>n problems, i.e., those for which the number of attributes is large and substantially exceeds the number of objects; for instance, such data is produced by high-throughput biological experiments which currently serve as the most powerful tool of molecular biology and a fundament of the arising individualised medicine.In the main part of the thesis I present Boruta, a heuristic, all relevant feature selection method. It is based on the concept of shadows, by-design random attributes incorporated into the information system as a reference for the relevance of original features in the context of whole structure of the analysed data. The variable importance on its own is assessed using the Random Forest method, a popular ensemble classifier.As the performance of the Boruta method turns out insatisfactory for some important applications, the following chapters of the thesis are devoted to Random Ferns, an ensemble classifier with the structure similar to Random Forest, but of a substantially higher computational efficiency. In the thesis, I propose a substantial generalisation of this method, capable of training on generic data and calculating feature importance scores.Finally, I assess both the Boruta method and its Random Ferns-based derivative on a series of p>>n problems of a biological origin. In particular, I focus on the stability of feature selection; I propose a novel methodology based on bootstrap and self-consistency. The results I obtain empirically confirm the validity of aforementioned effects characteristic to minimal optimal selection, as well as the efficiency of proposed heuristics for all relevant selection.The thesis is completed with a study of the applicability of Random Ferns in musical information retrieval, showing the usefulness of this method in other contexts and proposing its generalisation for multi-label classification problems.W większości zagadnień statystycznego modelowania istnieje problem niedostosowania zebranych danych do natury badanego zjawiska; co za tym idzie, analiza danych jest zazwyczaj poprzedzona zmianą ich surowej formy w optymalną dla dalej stosowanych metod.W rozprawie zajmuję się selekcją cech, jedną z klas zabiegów zmiany formy danych. Dotyczy ona systemów informacyjnych, czyli danych dających się przedstawić w formie tabelarycznej jako zbiór obiektów opisanych przez wartości zbioru atrybutów (nazywanych też cechami), oraz jest zdefiniowana jako proces wydzielenia w jakimś sensie optymalnego podzbioru atrybutów.Wyróżnia się dwie zasadnicze grupy metod selekcji cech: poszukujących możliwie małego podzbioru cech zapewniającego możliwie dobrą dokładność jakiejś metody modelowania (minimal optimal) oraz poszukujących podzbioru wszystkich cech, które niosą istotną informację i przez to są potencjalnie użyteczne dla jakiejś metody modelowania (all relevant). Tradycyjnie stosuje się prawie wyłącznie metody minimal optimal, sprowadzają się one bowiem w prosty sposób do znanego problemu optymalizacji i mają bezpośredni związek z efektywnością finalnego modelu. W rozprawie argumentuję jednak, że istnieje szeroka i istotna klasa problemów, w których tylko metody all relevant pozwalają uzyskać użyteczne wyniki, a metody minimal optimal są nie tylko nieefektywne ale często prowadzą do mylnych wniosków. Co więcej, wspomniana klasa pokrywa się też w dużej mierze ze zbiorem faktycznych problemów w których selekcja cech jest sama w sobie użytecznym wynikiem, nierzadko ważniejszym nawet od uzyskanego modelu. W szczególności chodzi tu o zbiory klasy p>>n, to jest takie w których liczba atrybutów w~systemie informacyjnym jest duża i znacząco przekracza liczbę obiektów; dane takie powszechnie występują chociażby w wysokoprzepustowych badaniach biologicznych, będących obecnie najpotężniejszym narzędziem analitycznym biologii molekularnej jak i fundamentem rodzącej się zindywidualizowanej medycyny.W zasadniczej części rozprawy prezentuję metodę Boruta, heurystyczną metodę selekcji zmiennych. Jest ona oparta o koncepcję rozszerzania systemu informacyjnego o cienie, z definicji nieistotne atrybuty wytworzone z oryginalnych cech przez losową permutację wartości, które są wykorzystywane jako odniesienie dla oceny istotności oryginalnych atrybutów w kontekście pełnej struktury analizowanych danych. Do oceny ważności cech metoda wykorzystuje algorytm lasu losowego (Random Forest), popularny klasyfikator zespołowy.Ponieważ wydajność obliczeniowa metody Boruta może być niewystarczająca dla pewnych istotnych zastosowań, w dalszej części rozprawy zajmuję się algorytmem paproci losowych, klasyfikatorem zespołowym zbliżonym strukturą do algorytmu lasu losowego, lecz oferującym znacząco lepszą wydajność obliczeniową. Proponuję uogólnienie tej metody, zdolne do treningu na generycznych systemach informacyjnych oraz do obliczania miary ważności atrybutów.Zarówno metodę Boruta jak i jej modyfikację wykorzystującą paprocie losowe poddaję w rozprawie wyczerpującej analizie na szeregu zbiorów klasy p>>n pochodzenia biologicznego. W szczególności rozważam tu stabilność selekcji; w tym celu formułuję nową metodę oceny opartą o podejście resamplingowe i samozgodność wyników. Wyniki przeprowadzonych eksperymentów potwierdzają empirycznie zasadność wspomnianych wcześniej problemów związanych z selekcją minimal optimal, jak również zasadność przyjętych heurystyk dla selekcji all relevant.Rozprawę dopełnia studium stosowalności algorytmu paproci losowych w problemie rozpoznawania instrumentów muzycznych w nagraniach, ilustrujące przydatność tej metody w innych kontekstach i proponujące jej uogólnienie na klasyfikację wieloetykietową

    The InVEST volcanic concept survey: Assessment of conceptual knowledge about volcanoes among undergraduates in entry-level geoscience courses

    A growing body of geoscience education research suggests that many students in the American K-12 system do not fully understand key geoscience concepts. Moreover, early misunderstandings appear to persist even at the introductory undergraduate level. This thesis focuses on exploring the understanding of volcanic systems among American undergraduates via a new assessment instrument, the Volcanic Concept Survey (VCS), which has collected over 600 student responses from a diverse sample of undergraduates across the country. Initial results show that student understanding of volcanic processes is rather limited. Specifically, students tended to possess only basic content knowledge, while concepts requiring the use of higher thinking skills were not well understood. Further explorations of demographic data for the student population reveal that, among other factors, the students\u27 source of knowledge about volcanoes can significantly impact the quality of their understanding. Students who learned from non-traditional film and media sources did not score as highly on the VCS instrument as their peers. The severity of this problem underscores a need for change. Thus, to promote deep and robust learning, new strategies may be necessary when teaching volcanology in the modern introductory geoscience classroom. While simulations will never fully rival the experience of fieldwork, VCS results are being applied to optimize the pedagogical value of an upcoming highly interactive and visually stimulating Virtual Volcano teaching tool

    Using digital technologies to automat and optimize drilling parameters in real-time, its impact on value creation, and work process

    The industry down-time from the second half of 2013 and corona pandemic has effectively turned most companies into introduction of new technologies and digitalization. The industrial landscape is undergoing innovation, leading organizations to embrace fresh strategies in order to leverage emerging technologies. Automation and automated work processes through the digital technologies are expanding through the life cycle of assets. The prevailing trend indicates that the competitive advantage of successful adapters is linked to automation, increase efficiency and safety, and add value to the asset. eDrilling is a global AI and digital twin software provider for the life cycle of drilling with mathematical- based model that represents the physical assets as the core technology. The suite of products comprises a transient hydraulic model, a mechanical torque and drag model, and a thermodynamic model. The models are embedded together which creates digital twin of a wellbore, in order to predict and optimize the drilling performance. The company is moving ahead with technology trend and recently developed a real-time parameter optimization software for drilling and tripping operations. This software leverages the concept of a digital twin, which is continuously calibrated in real-time, and perform lookahead simulations to optimize drilling parameters. This thesis focuses on investigating the optimization of drilling parameters using the wellGuide software and its impact on work performance. The drilling process involves numerous variables that continuously and dynamically interact with each other. The study specifically examines the parameters optimized by the wellGuide software and their value-added impact on asset performance. In order to evaluate the effectiveness of the software, field data obtained from real drilling operations is utilized. The lookahead simulations are subsequently analysed to assess their impact on value creation and work processes, as well as to gain insights into system design, human-machine interface, and the technical aspects of the wellGuide package. By examining relevant research papers and functional descriptions, a comprehensive exploration of the theoretical and practical considerations pertaining to the software and its functionalities is conducted. The findings and recommendations derived from this analysis are then discussed

    Automated Auction Mechanism Design with Competing Markets

    Resource allocation is a major issue in multiple areas of computer science. Despite the wide range of resource types across these areas, for example real commodities in e-commerce and computing resources in distributed computing, auctions are commonly used in solving the optimization problems involved in these areas, since well designed auctions achieve desirable economic outcomes. Auctions are markets with strict regulations governing the information available to traders in the market and the possible actions they can take. Auction mechanism design aims to manipulate the rules of an auction in order to achieve specific goals. Economists traditionally use mathematical methods, mainly game theory, to analyze auctions and design new auction forms. However, due to the high complexity of auctions, the mathematical models are typically simplified to obtain results, and this makes it difficult to apply results derived from such models to market environments in the real world. As a result, researchers are turning to empirical approaches. Following this line of work, we present what we call a grey-box approach to automated auction mechanism design using reinforcement learning and evolutionary computation methods. We first describe a new strategic game, called \cat, which were designed to run multiple markets that compete to attract traders and make profit. The CAT game enables us to address the imbalance between prior work in this field that studied auctions in an isolated environment and the actual competitive situation that markets face. We then define a novel, parameterized framework for auction mechanisms, and present a classification of auction rules with each as a building block fitting into the framework. Finally we evaluate the viability of building blocks, and acquire auction mechanisms by combining viable blocks through iterations of CAT games. We carried out experiments to examine the effectiveness of the grey-box approach. The best mechanisms we learnt were able to outperform the standard mechanisms against which learning took place and carefully hand-coded mechanisms which won tournaments based on the CAT game. These best mechanisms were also able to outperform mechanisms from the literature even when the evaluation did not take place in the context of CAT games. These results suggest that the grey-box approach can generate robust double auction mechanisms and, as a consequence, is an effective approach to automated mechanism design. The contributions of this work are two-fold. First, the grey-box approach helps to design better auction mechanisms which can play a central role in solutions to resource allocation problems in various application domains of computer science. Second, the parameterized view and the reinforcement learning-based search method can be used in other strategic, competitive situations where decision making processes are complex and difficult to design and evaluate manually

    Activity Report: Automatic Control 2012

    Essays on Human Capital Development and Socio-economic Inequality

    This dissertation consists of three chapters in which I address central research questions about the role of parental investments and family structure on human capital development, the impact of education on labour-market outcomes and learning outcomes, and the origins and mechanisms of inter-generational mobility in developing countries. The first chapter examines how parental monetary investment affects the joint evolution of child health, cognitive skills and socio-emotional skills. I estimate a dynamic factor model, characterizing the skill formation process over the childhood, from birth to 12 years of age, using the sample of Vietnamese children from the Young Lives study. In the second chapter, I estimate marginal returns to upper secondary school on the labour market and on learning outcomes in Indonesia. Using the longitudinal data from the Indonesian Family Life Survey 1997-2015, I document a substantial degree of heterogeneity in the returns to upper secondary school on the labour market. The third chapter investigates the origins and mechanisms of birth order effects on cognitive skills, socio-emotional skills and health in Vietnam. Using a sample of children from the Young Lives study we find strong evidence of negative birth order effects on parental investments and child capabilities, emerging very early in life

    Development Management Model of Elite Athletes in Team Sports Games

    The scientific and expert approach to defining a model of managing the development of top-level athletes in team sports games is oriented toward the challenging values that mark a certain position and role in a team sports game. A hypothetical dynamic model of development management of top-level athletes in team sports games, which explicitly shows the order of procedures in the process of multidimensional development of athletes using the concepts of the dynamic systems theory has been suggested. The hypothetical model of management shows that the athlete’s development is primarily under the influence of genetic potential, sports preparation process and the competition format, as well as the management of their lifestyle. In the process, the athlete’s development is seen as a dynamic and plastic process under the influence of selective procedures and training programs that enable a continuous change in the level of the athlete’s performance and sports preparation process