840 research outputs found
Molecular basis of ecological speciation in sticklebacks
The constant race to outcompete other organisms, either from the same or another species drives the continuous emergence of new and, better-adapted individuals. As these phenotypes become increasingly specialized to the specific environmental conditions they are exposed too, they grow progressively divergent from both their ancestors as well as their conspecifics in other environmental contexts. In freshwater three-spined sticklebacks (Gasterosteus aculeatus), this has resulted in a number of specific phenotypes, which have adapted different reproductive strategies or have evolved habitat specific growth and parasite resistance. Each of which is characterized by specific adaptations to the environmental context they are exposed to and maintained through environmental pressures and sexual selection. A key difference between rivers and lakes are the different parasite communities they harbor. Parasites impose strong selection pressures on their stickleback host by diverting resources, restricting reproduction and promoting their host’s death. Hence, it is crucial for sticklebacks to minimize their parasite load, resulting in a co-evolutionary arms race between parasite and host. As the parasite community varies between habitat types they demand different adaptations, resulting in increasing divergence between sticklebacks in their respective habitats. The globally reoccurring distinction of river and lake habitats implies an equally reoccurring habitat dependent differentiation into specific river respectively lake ecotypes. Specializing to their environment bestows an ecological advantage onto the resident individuals. This is reinforced by sexual selection which privileges well-adapted phenotypes by actively selecting for such individuals with low parasite burdens and optimal immune gene composition. Incidentally, this selects against potential migrants from other habitats, thus generating genetic isolation, a prerequisite for speciation
Discovering three-dimensional patterns in real-time from data streams: An online triclustering approach
Triclustering algorithms group sets of coordinates of 3-dimensional datasets. In this paper,
a new triclustering approach for data streams is introduced. It follows a streaming scheme
of learning in two steps: offline and online phases. First, the offline phase provides a sum mary model with the components of the triclusters. Then, the second stage is the online
phase to deal with data in streaming. This online phase consists in using the summary
model obtained in the offline stage to update the triclusters as fast as possible with genetic
operators. Results using three types of synthetic datasets and a real-world environmental
sensor dataset are reported. The performance of the proposed triclustering streaming algo rithm is compared to a batch triclustering algorithm, showing an accurate performance
both in terms of quality and running timesMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C
Incremental learning of concept drift from imbalanced data
Learning data sampled from a nonstationary distribution has been shown to be a very challenging problem in machine learning, because the joint probability distribution between the data and classes evolve over time. Thus learners must adapt their knowledge base, including their structure or parameters, to remain as strong predictors. This phenomenon of learning from an evolving data source is akin to learning how to play a game while the rules of the game are changed, and it is traditionally referred to as learning concept drift. Climate data, financial data, epidemiological data, spam detection are examples of applications that give rise to concept drift problems. An additional challenge arises when the classes to be learned are not represented (approximately) equally in the training data, as most machine learning algorithms work well only when the class distributions are balanced. However, rare categories are commonly faced in real-world applications, which leads to skewed or imbalanced datasets. Fraud detection, rare disease diagnosis, anomaly detection are examples of applications that feature imbalanced datasets, where data from category are severely underrepresented. Concept drift and class imbalance are traditionally addressed separately in machine learning, yet data streams can experience both phenomena. This work introduces Learn++.NIE (nonstationary & imbalanced environments) and Learn++.CDS (concept drift with SMOTE) as two new members of the Learn++ family of incremental learning algorithms that explicitly and simultaneously address the aforementioned phenomena. The former addresses concept drift and class imbalance through modified bagging-based sampling and replacing a class independent error weighting mechanism - which normally favors majority class - with a set of measures that emphasize good predictive accuracy on all classes. The latter integrates Learn++.NSE, an algorithm for concept drift, with the synthetic sampling method known as SMOTE, to cope with class imbalance. This research also includes a thorough evaluation of Learn++.CDS and Learn++.NIE on several real and synthetic datasets and on several figures of merit, showing that both algorithms are able to learn in some of the most difficult learning environments
Integrating Environmental, Molecular, and Morphological Data to Unravel an Ice-age Radiation of Arctic-alpine Campanula in Western North America
Many arctic-alpine plant genera have undergone speciation during the Quaternary. The bases for these radiations have been ascribed to geographic isolation,abiotic and biotic differences between populations, and/or hybridization andpolyploidization. The Cordilleran Campanula L. (Campanulaceae Juss.), a monophyletic clade of mostly endemic arctic-alpine taxa from western North America, experienced a recent and rapid radiation. We set out to unravel the factors that likely influenced speciation in this group. To do so, we integrated environmental, genetic, and morphological datasets, tested biogeographic hypotheses, and analyzed the potential consequences of the various factors on the evolutionary history of the clade. We created paleodistribution models to identify potential Pleistocene refugia for the clade and estimated niche space for individual taxa using geographic and climatic data. Using 11 nuclear loci, we reconstructed a species tree and tested biogeographic hypotheses derived from the paleodistribution models. Finally, we tested 28 morphological characters, including floral, vegetative, and seed characteristics, for their capacity to differ- entiate taxa. Our results show that the combined effect of Quaternary climatic variation, isolation among differing environments in the mountains in western North America, and biotic factors influencing floral morphology contributed to speciation in this group during the mid-Pleistocene. Furthermore, our biogeographic analyses uncovered asynchronous consequences of interglacial and glacial periods for the timing of refugial isolation within the southern and northwestern mountains, respectively. These findings have broad implications for understanding the processes promoting speciation in arctic-alpine plants and the rise of numerous endemic taxa across the region
Memory Models for Incremental Learning Architectures
Losing V. Memory Models for Incremental Learning Architectures. Bielefeld: Universität Bielefeld; 2019.Technological advancement leads constantly to an exponential growth of generated data in basically every domain, drastically increasing the burden of data storage and maintenance. Most of the data is instantaneously extracted and available in form of endless streams that contain the most current information. Machine learning methods constitute one fundamental way of processing such data in an automatic way, as they generate models that capture the processes behind the data. They are omnipresent in our everyday life as their applications include personalized advertising, recommendations, fraud detection, surveillance, credit ratings, high-speed trading and smart-home devices. Thereby, batch learning, denoting the offline construction of a static model based on large datasets, is the predominant scheme. However, it is increasingly unfit to deal with the accumulating masses of data in given time and in particularly its static nature cannot handle changing patterns. In contrast, incremental learning constitutes one attractive alternative that is a very natural fit for the current demands. Its dynamic adaptation allows continuous processing of data streams, without the necessity to store all data from the past, and results in always up-to-date models, even able to perform in non-stationary environments. In this thesis, we will tackle crucial research questions in the domain of incremental learning by contributing new algorithms or significantly extending existing ones. Thereby, we consider stationary and non-stationary environments and present multiple real-world applications that showcase merits of the methods as well as their versatility. The main contributions are the following:
One novel approach that addresses the question of how to extend a model for prototype-based algorithms based on cost minimization.
We propose local split-time prediction for incremental decision trees to mitigate the trade-off between adaptation speed versus model complexity and run time.
An extensive survey of the strengths and weaknesses of state-of-the-art methods that provides guidance for choosing a suitable algorithm for a given task.
One new approach to extract valuable information about the type of change in a dataset.
We contribute a biologically inspired architecture, able to handle different types of drift using dedicated memories that are kept consistent.
Application of the novel methods within three diverse real-world tasks, highlighting their robustness and versatility.
Investigation of personalized online models in the context of two real-world applications
Evolving Spiking Neural Networks for online learning over drifting data streams
Nowadays huge volumes of data are produced in the form of fast streams, which are further affected by non-stationary phenomena. The resulting lack of stationarity in the distribution of the produced data calls for efficient and scalable algorithms for online analysis capable of adapting to such changes (concept drift). The online learning field has lately turned its focus on this challenging scenario, by designing incremental learning algorithms that avoid becoming obsolete after a concept drift occurs. Despite the noted activity in the literature, a need for new efficient and scalable algorithms that adapt to the drift still prevails as a research topic deserving further effort. Surprisingly, Spiking Neural Networks, one of the major exponents of the third generation of artificial neural networks, have not been thoroughly studied as an online learning approach, even though they are naturally suited to easily and quickly adapting to changing environments. This work covers this research gap by adapting Spiking Neural Networks to meet the processing requirements that online learning scenarios impose. In particular the work focuses on limiting the size of the neuron repository and making the most of this limited size by resorting to data reduction techniques. Experiments with synthetic and real data sets are discussed, leading to the empirically validated assertion that, by virtue of a tailored exploitation of the neuron repository, Spiking Neural Networks adapt better to drifts, obtaining higher accuracy scores than naive versions of Spiking Neural Networks for online learning environments.This work was supported by the EU project Pacific AtlanticNetwork for Technical Higher Education and Research—PANTHER(grant number 2013-5659/004-001 EMA2)
Using Diversity Ensembles with Time Limits to Handle Concept Drift
While traditional supervised learning focuses on static datasets, an increasing amount of data comes in the form of streams, where data is continuous and typically processed only once. A common problem with data streams is that the underlying concept we are trying to learn can be constantly evolving. This concept drift has been of interest to researchers the last few years and there is a need for improved machine learning algorithms that are capable of dealing with concept drifts. A promising approach involves using an ensemble of a diverse set of classifiers. The constituent classifiers are re-trained when a concept drift is detected. Decisions regarding the number of classifiers to maintain and the frequency of re-training classifiers are critical factors that determine classification accuracy in the presence of concept drift. This dissertation systematically investigated these issues in order to develop an improved classifier for online ensemble learning. The impact of reducing the time requiring additional ensembles was studied using artificial and real world datasets. Findings from these studies revealed that in many cases the number of time steps additional ensembles are in memory can be reduced without sacrificing prequential accuracy. It was also found that this new ensemble approach performed well in the presence of false concept drift
A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams
This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. Rapid flows of data challenge the state-of-the art processing and communication infrastructure, hence the motivation for research and innovation into real-time algorithms that analyse data streams on-the-fly and can automatically adapt to concept drifts. To date, DSM techniques have largely focused on predictive data mining applications that aim to forecast the value of a particular target feature of unseen data instances, answering questions such as whether a credit card transaction is fraudulent or not. A real-time, expressive and descriptive Data Mining technique for streaming data has not been previously established as part of the DSM toolkit. This has motivated the work reported in this paper, which has resulted in developing and validating a Generalised Rule Induction (GRI) tool, thus producing expressive rules as explanations that can be easily understood by human analysts. The expressiveness of decision models in data streams serves the objectives of transparency, underpinning the vision of `explainable AI’ and yet is an area of research that has attracted less attention despite being of high practical importance. The algorithm introduced and described in this paper is termed Fast Generalised Rule Induction (FGRI). FGRI is able to induce descriptive rules incrementally for raw data from both categorical and numerical features. FGRI is able to adapt rule-sets to changes of the pattern encoded in the data stream (concept drift) on the fly as new data arrives and can thus be applied continuously in real-time. The paper also provides a theoretical, qualitative and empirical evaluation of FGRI
Cooperation in Microbial Populations: Theory and Experimental Model Systems
Cooperative behavior, the costly provision of benefits to others, is common
across all domains of life. This review article discusses cooperative behavior
in the microbial world, mediated by the exchange of extracellular products
called public goods. We focus on model species for which the production of a
public good and the related growth disadvantage for the producing cells are
well described. To unveil the biological and ecological factors promoting the
emergence and stability of cooperative traits we take an interdisciplinary
perspective and review insights gained from both mathematical models and
well-controlled experimental model systems. Ecologically, we include crucial
aspects of the microbial life cycle into our analysis and particularly consider
population structures where an ensemble of local communities (sub populations)
continuously emerge, grow, and disappear again. Biologically, we explicitly
consider the synthesis and regulation of public good production. The discussion
of the theoretical approaches includes general evolutionary concepts,
population dynamics, and evolutionary game theory. As a specific but generic
biological example we consider populations of Pseudomonas putida and its
regulation and utilization of pyoverdines, iron scavenging molecules. The
review closes with an overview on cooperation in spatially extended systems and
also provides a critical assessment of the insights gained from the
experimental and theoretical studies discussed. Current challenges and
important new research opportunities are discussed, including the biochemical
regulation of public goods, more realistic ecological scenarios resembling
native environments, cell to cell signalling, and multi-species communities.Comment: Review article, 88 pages, 14 figure
- …