2,002 research outputs found

    Exploring foundations for using simulations in IS research

    Get PDF
    Simulation has been adopted in many disciplines as a means for understanding the behavior of a system by imitating it through an artificial object that exhibits a nearly identical behavior. Although simulation approaches have been widely adopted for theory building in disciplines such as engineering, computer science, management, and social sciences, their potential in the IS field is often overlooked. The aim of this paper is to understand how different simulation approaches are used in IS research, thereby providing insights and methodological recommendations for future studies. A literature review of simulation studies published in top-tier IS journals leads to the definition of three classes of simulations, namely the self-organizing, the elementary, and the situated. A set of stylized facts is identified for characterizing the ways in which the premise, the inference, and the contribution are presented in IS simulation studies. As a result, this study provides guidance to future simulation researchers in designing and presenting findings

    Linking Datasets on Organizations Using Half A Billion Open Collaborated Records

    Full text link
    Scholars studying organizations often work with multiple datasets lacking shared unique identifiers or covariates. In such situations, researchers may turn to approximate string matching methods to combine datasets. String matching, although useful, faces fundamental challenges. Even when two strings appear similar to humans, fuzzy matching often does not work because it fails to adapt to the informativeness of the character combinations presented. Worse, many entities have multiple names that are dissimilar (e.g., "Fannie Mae" and "Federal National Mortgage Association"), a case where string matching has little hope of succeeding. This paper introduces data from a prominent employment-related networking site (LinkedIn) as a tool to address these problems. We propose interconnected approaches to leveraging the massive amount of information from LinkedIn regarding organizational name-to-name links. The first approach builds a machine learning model for predicting matches from character strings, treating the trillions of user-contributed organizational name pairs as a training corpus: this approach constructs a string matching metric that explicitly maximizes match probabilities. A second approach identifies relationships between organization names using network representations of the LinkedIn data. A third approach combines the first and second. We document substantial improvements over fuzzy matching in applications, making all methods accessible in open-source software ("LinkOrgs")

    Using Prior Knowledge and Learning from Experience in Estimation of Distribution Algorithms

    Get PDF
    Estimation of distribution algorithms (EDAs) are stochastic optimization techniques that explore the space of potential solutions by building and sampling explicit probabilistic models of promising candidate solutions. One of the primary advantages of EDAs over many other stochastic optimization techniques is that after each run they leave behind a sequence of probabilistic models describing useful decompositions of the problem. This sequence of models can be seen as a roadmap of how the EDA solves the problem. While this roadmap holds a great deal of information about the problem, until recently this information has largely been ignored. My thesis is that it is possible to exploit this information to speed up problem solving in EDAs in a principled way. The main contribution of this dissertation will be to show that there are multiple ways to exploit this problem-specific knowledge. Most importantly, it can be done in a principled way such that these methods lead to substantial speedups without requiring parameter tuning or hand-inspection of models

    Variational Autoencoder Based Estimation Of Distribution Algorithms And Applications To Individual Based Ecosystem Modeling Using EcoSim

    Get PDF
    Individual based modeling provides a bottom up approach wherein interactions give rise to high-level phenomena in patterns equivalent to those found in nature. This method generates an immense amount of data through artificial simulation and can be made tractable by machine learning where multidimensional data is optimized and transformed. Using individual based modeling platform known as EcoSim, we modeled the abilities of elitist sexual selection and communication of fear. Data received from these experiments was reduced in dimension through use of a novel algorithm proposed by us: Variational Autoencoder based Estimation of Distribution Algorithms with Population Queue and Adaptive Variance Scaling (VAE-EDA-Q AVS). We constructed a novel Estimation of Distribution Algorithm (EDA) by extending generative models known as variational autoencoders (VAE). VAE-EDA-Q, proposed by us, smooths the data generation process using an iteratively updated queue (Q) of populations. Adaptive Variance Scaling (AVS) dynamically updates the variance at which models are sampled based on fitness. The combination of VAE-EDA-Q with AVS demonstrates high computational efficiency and requires few fitness evaluations. We extended VAE-EDA-Q AVS to act as a feature reducing wrapper method in conjunction with C4.5 Decision trees to reduce the dimensionality of data. The relationship between sexual selection, random selection, and speciation is a contested topic. Supporting evidence suggests sexual selection to drive speciation. Opposing evidence contends either a negative or absence of correlation to exist. We utilized EcoSim to model elitist and random mate selection. Our results demonstrated a significantly lower speciation rate, a significantly lower extinction rate, and a significantly higher turnover rate for sexual selection groups. Species diversification was found to display no significant difference. The relationship between communication and foraging behavior similarly features opposing hypotheses in claim of both increases and decreases of foraging behavior in response to alarm communication. Through modeling with EcoSim, we found alarm communication to decrease foraging activity in most cases, yet gradually increase foraging activity in some other cases. Furthermore, we found both outcomes resulting from alarm communication to increase fitness as compared to non-communication

    Statistical Inference for Propagation Processes on Complex Networks

    Get PDF
    Die Methoden der Netzwerktheorie erfreuen sich wachsender Beliebtheit, da sie die Darstellung von komplexen Systemen durch Netzwerke erlauben. Diese werden nur mit einer Menge von Knoten erfasst, die durch Kanten verbunden werden. Derzeit verfügbare Methoden beschränken sich hauptsächlich auf die deskriptive Analyse der Netzwerkstruktur. In der hier vorliegenden Arbeit werden verschiedene Ansätze für die Inferenz über Prozessen in komplexen Netzwerken vorgestellt. Diese Prozesse beeinflussen messbare Größen in Netzwerkknoten und werden durch eine Menge von Zufallszahlen beschrieben. Alle vorgestellten Methoden sind durch praktische Anwendungen motiviert, wie die Übertragung von Lebensmittelinfektionen, die Verbreitung von Zugverspätungen, oder auch die Regulierung von genetischen Effekten. Zunächst wird ein allgemeines dynamisches Metapopulationsmodell für die Verbreitung von Lebensmittelinfektionen vorgestellt, welches die lokalen Infektionsdynamiken mit den netzwerkbasierten Transportwegen von kontaminierten Lebensmitteln zusammenführt. Dieses Modell ermöglicht die effiziente Simulationen verschiedener realistischer Lebensmittelinfektionsepidemien. Zweitens wird ein explorativer Ansatz zur Ursprungsbestimmung von Verbreitungsprozessen entwickelt. Auf Grundlage einer netzwerkbasierten Redefinition der geodätischen Distanz können komplexe Verbreitungsmuster in ein systematisches, kreisrundes Ausbreitungsschema projiziert werden. Dies gilt genau dann, wenn der Ursprungsnetzwerkknoten als Bezugspunkt gewählt wird. Die Methode wird erfolgreich auf den EHEC/HUS Epidemie 2011 in Deutschland angewandt. Die Ergebnisse legen nahe, dass die Methode die aufwändigen Standarduntersuchungen bei Lebensmittelinfektionsepidemien sinnvoll ergänzen kann. Zudem kann dieser explorative Ansatz zur Identifikation von Ursprungsverspätungen in Transportnetzwerken angewandt werden. Die Ergebnisse von umfangreichen Simulationsstudien mit verschiedenstensten Übertragungsmechanismen lassen auf eine allgemeine Anwendbarkeit des Ansatzes bei der Ursprungsbestimmung von Verbreitungsprozessen in vielfältigen Bereichen hoffen. Schließlich wird gezeigt, dass kernelbasierte Methoden eine Alternative für die statistische Analyse von Prozessen in Netzwerken darstellen können. Es wurde ein netzwerkbasierter Kern für den logistischen Kernel Machine Test entwickelt, welcher die nahtlose Integration von biologischem Wissen in die Analyse von Daten aus genomweiten Assoziationsstudien erlaubt. Die Methode wird erfolgreich bei der Analyse genetischer Ursachen für rheumatische Arthritis und Lungenkrebs getestet. Zusammenfassend machen die Ergebnisse der vorgestellten Methoden deutlich, dass die Netzwerk-theoretische Analyse von Verbreitungsprozessen einen wesentlichen Beitrag zur Beantwortung verschiedenster Fragestellungen in unterschiedlichen Anwendungen liefern kann

    Adaptation and self-organization in evolutionary algorithms

    Full text link
    The objective of Evolutionary Computation is to solve practical problems (e.g.optimization, data mining) by simulating the mechanisms of natural evolution. This thesis addresses several topics related to adaptation and self-organization in evolving systems with the overall aims of improving the performance of Evolutionary Algorithms (EA), understanding its relation to natural evolution, and incorporating new mechanisms for mimicking complex biological systems. Part I of this thesis presents a new mechanism for allowing an EA to adapt its behavior in response to changes in the environment. Using the new approach, adaptation of EA behavior (i.e. control of EA design parameters) is driven by an analysis of population dynamics, as opposed to the more traditional use of fitness measurements. Comparisons with a number of adaptive control methods from the literature indicate substantial improvements in algorithm performance for a range of artificial and engineering design problems. Part II of this thesis involves a more thorough analysis of EA behavior based on the methods derived in Part 1. In particular, several properties of EA population dynamics are measured and compared with observations of evolutionary dynamics in nature. The results demonstrate that some large scale spatial and temporal features of EA dynamics are remarkably similar to their natural counterpart. Compatibility of EA with the Theory of Self-Organized Criticality is also discussed. Part III proposes fundamentally new directions in EA research which are inspired by the conclusions drawn in Part II. These changes involve new mechanisms which allow self-organization of the EA to occur in ways which extend beyond its common convergence in parameter space. In particular, network models for EA populations are developed where the network structure is dynamically coupled to EA population dynamics. Results indicate strong improvements in algorithm performance compared to cellular Genetic Algorithms and non-distributed EA designs. Furthermore, topological analysis indicates that the population network can spontaneously evolve to display similar characteristics to the interaction networks of complex biological systems
    corecore