Search CORE

286 research outputs found

Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models

Author: Filstroff Louis
Publication venue
Publication date: 13/11/2019
Field of study

Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data

Open Archive Toulouse Archive Ouverte

Recommended from our members

Generalised Bayesian matrix factorisation models

Author: Mohamed Shakir
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

Factor analysis and related models for probabilistic matrix factorisation are of central importance to the unsupervised analysis of data, with a colourful history more than a century long. Probabilistic models for matrix factorisation allow us to explore the underlying structure in data, and have relevance in a vast number of application areas including collaborative filtering, source separation, missing data imputation, gene expression analysis, information retrieval, computational finance and computer vision, amongst others. This thesis develops generalisations of matrix factorisation models that advance our understanding and enhance the applicability of this important class of models. The generalisation of models for matrix factorisation focuses on three concerns: widening the applicability of latent variable models to the diverse types of data that are currently available; considering alternative structural forms in the underlying representations that are inferred; and including higher order data structures into the matrix factorisation framework. These three issues reflect the reality of modern data analysis and we develop new models that allow for a principled exploration and use of data in these settings. We place emphasis on Bayesian approaches to learning and the advantages that come with the Bayesian methodology. Our port of departure is a generalisation of latent variable models to members of the exponential family of distributions. This generalisation allows for the analysis of data that may be real-valued, binary, counts, non-negative or a heterogeneous set of these data types. The model unifies various existing models and constructs for unsupervised settings, the complementary framework to the generalised linear models in regression. Moving to structural considerations, we develop Bayesian methods for learning sparse latent representations. We define ideas of weakly and strongly sparse vectors and investigate the classes of prior distributions that give rise to these forms of sparsity, namely the scale-mixture of Gaussians and the spike-and-slab distribution. Based on these sparsity favouring priors, we develop and compare methods for sparse matrix factorisation and present the first comparison of these sparse learning approaches. As a second structural consideration, we develop models with the ability to generate correlated binary vectors. Moment-matching is used to allow binary data with specified correlation to be generated, based on dichotomisation of the Gaussian distribution. We then develop a novel and simple method for binary PCA based on Gaussian dichotomisation. The third generalisation considers the extension of matrix factorisation models to multi-dimensional arrays of data that are increasingly prevalent. We develop the first Bayesian model for non-negative tensor factorisation and explore the relationship between this model and the previously described models for matrix factorisation.Supported by a Commonwealth Scholarship awarded by the Commonwealth Scholarship and Fellowship Programme (CSFP) [Award number ZACS-2207-363] Supported by award from the National Research Foundation, South Africa (NRF) [Award number SFH2007072200001

Apollo (Cambridge)

soMLier: A South African Wine Recommender System

Author: Redelinghuys Joshua
Publication venue: 'University of Zagreb, Faculty of Science, Department of Mathematics'
Publication date: 19/04/2023
Field of study

Though several commercial wine recommender systems exist, they are largely tailored to consumers outside of South Africa (SA). Consequently, these systems are of limited use to novice wine consumers in SA. To address this, the aim of this research is to develop a system for South African consumers that yields high-quality wine recommendations, maximises the accuracy of predicted ratings for those recommendations and provides insights into why those suggestions were made. To achieve this, a hybrid system “soMLier” (pronounced “sommelier”) is built in this thesis that makes use of two datasets. Firstly, a database containing several attributes of South African wines such as the chemical composition, style, aroma, price and description was supplied by wine.co.za (a SA wine retailer). Secondly, for each wine in that database, the numeric 5-star ratings and textual reviews made by users worldwide were further scraped from Vivino.com to serve as a dataset of user preferences. Together, these are used to develop and compare several systems, the most optimal of which are combined in the final system. Item-based collaborative filtering methods are investigated first along with model-based techniques (such as matrix factorisation and neural networks) when applied to the user rating dataset to generate wine recommendations through the ranking of rating predictions. Respectively, these methods are determined to excel at generating lists of relevant wine recommendations and producing accurate corresponding predicted ratings. Next, the wine attribute data is used to explore the efficacy of content-based systems. Numeric features (such as price) are compared along with categorical features (such as style) using various distance measures and the relationships between the textual descriptions of the wines are determined using natural language processing methods. These methods are found to be most appropriate for explaining wine recommendations. Hence, the final hybrid system makes use of collaborative filtering to generate recommendations, matrix factorisation to predict user ratings, and content-based techniques to rationalise the wine suggestions made. This thesis contributes the “soMLier” system that is of specific use to SA wine consumers as it bridges the gap between the technologies used by highly-developed existing systems and the SA wine market. Though this final system would benefit from more explicit user data to establish a richer model of user preferences, it can ultimately assist consumers in exploring unfamiliar wines, discovering wines they will likely enjoy, and understanding their preferences of SA wine

Cape Town University OpenUCT

Gaussian Processes on Hypergraphs

Author: Leslie David
Nemeth Christopher
Pinder Thomas
Turnbull Kathryn
Publication venue
Publication date: 03/06/2021
Field of study

We derive a Matern Gaussian process (GP) on the vertices of a hypergraph. This enables estimation of regression models of observed or latent values associated with the vertices, in which the correlation and uncertainty estimates are informed by the hypergraph structure. We further present a framework for embedding the vertices of a hypergraph into a latent space using the hypergraph GP. Finally, we provide a scheme for identifying a small number of representative inducing vertices that enables scalable inference through sparse GPs. We demonstrate the utility of our framework on three challenging real-world problems that concern multi-class classification for the political party affiliation of legislators on the basis of voting behaviour, probabilistic matrix factorisation of movie reviews, and embedding a hypergraph of animals into a low-dimensional latent space

arXiv.org e-Print Archive

Lancaster E-Prints

Automatic Detection of Volcanic Unrest Using Interferometric Synthetic Aperture Radar

Author: Gaddes Matthew Edward
Publication venue: University of Leeds
Publication date: 01/05/2019
Field of study

A diverse set of hazards are posed by the world's 1500 subaerial volcanoes, yet the majority of them remain unmonitored. Measurements of deformation provide a way to monitor volcanoes, and synthetic aperture RaDAR (SAR) provides a powerful tool to measure deformation at the majority of the world's subaerial volcanoes. This is due to recent changes in how regularly SAR data are acquired, how they are distributed to the scientific community, and how quickly they can be processed to create time series of interferograms. However, for interferometric SAR (InSAR) to be used to monitor the world's volcanoes, an algorithm is required to automatically detect signs of deformation-generating volcanic unrest in a time series of interferograms, as the volume of new interferograms produced each week precludes this task being achieved by human interpreters. In this thesis, I introduce two complementary methods that can be used to detect signs of volcanic unrest. The first method centres on the use of blind signal separation (BSS) methods to isolate signals of geophysical interest from nuisance signals, such as those due to changes in the refractive index of the atmosphere between two SAR acquisitions. This is achieved through first comparing which of non-negative matrix factorisation (NMF), principal component analysis (PCA), and independent component analysis (ICA) are best suited for solving BSS problems involving time series of InSAR data, and how InSAR data should best be arranged for its use with these methods. I find that NMF can be used with InSAR data, providing the time series is formatted in a novel way that reduces the likelihood of any pixels having negative values. However, when NMF, PCA, and ICA are applied to a set of synthetic data, I find that the most accurate recovery of signals of interest is achieved when ICA is set to recover spatially independent sources (termed sICA). I find that the best results are produced by sICA when interferograms are ordered as a simple ``daisy chain'' of short temporal baselines, and when sICA is set to recover around 1-3 more sources than were thought to have contributed to the time series. However, I also show that in cases such as deformation centred under a stratovolcano, the overlapping nature of a topographically correlated atmospheric phase screen (APS) signal and a deformation signal produces a pair of signals that are no longer spatially statistically independent, and so cannot be recovered accurately by sICA. To validate these results, I apply sICA to a time series of Sentinel-1 interferograms that span the 2015 eruption of Wolf volcano (Galapagos archipelago, Ecuador) and automatically isolate three signals of geophysical interest, which I validate by comparing with the results of other studies. I also apply the sICA algorithm to a time series of interferograms that image Mt Etna, and through isolating signals that are likely to be due to instability of the east flank of the volcano, show that the method can be applied to stratovolcanoes to recover useful signals. Utilising the ability of sICA to isolate signals of interest, I introduce a prototype detection algorithm that tracks changes in the behaviour of a subaerial volcano, and show that it could have been used to detect the onset of the 2015 eruption of Wolf. However, for use in an detection algorithm that is to be applied globally, the signals recovered by sICA cannot be manually validated through comparison with other studies. Therefore, I seek to incorporate a module into my detection algorithm that is able to quantify the significance of the sources recovered by sICA. I achieve this through extensively modernising the ICASO algorithm to create a new algorithm, ICASAR, that is optimised for use with InSAR time series. This algorithm allows me to assess the significance of signals recovered by sICA at a given volcano, and to then prioritise the tracking of any changes they exhibit when they are used in my detection algorithm. To further develop the detection algorithm, I create two synthetic time series that characterise the different types of unrest that could occur at a volcanic centre. The first features the introduction of a new signal, and my algorithm is able to detect when this signal enters the time series by tracking how well the baseline sources are able to fit new interferograms. The second features the change in rate of a signal that was present during the baseline stage, and my algorithm is able to detect when this change in rate occurs by tracking how sources recovered from the baseline data are used through time. To further test the algorithm, I extended the Sentinel-1 time series I used to study the 2015 eruption of Wolf to include the 2018 eruption of Sierra Negra, and I find that my algorithm is able to detect the increase in inflation that precedes the eruption, and the eruption itself. I also perform a small study into the pre-eruptive inflation seen at Sierra Negra using the deformation signal and its time history that are outputted by ICASAR. A Bayesian inversion is performed using the GBIS software package, in which the inflation signal is modelled as a horizontal rectangular dislocation with variable opening and uniform overpressure. Coupled with the time history of the inflation signal provided by ICASAR, this allows me to determine the temporal evolution of the pre-eruptive overpressure since the beginning of the Sentinel-1 time series in 2014. To extend this back to the end of the previous eruption in 2005, I use GPS data that spans the entire interruptive period. I find that the total interruptive pressure change is ~13.5 MPa, which is significantly larger than the values required for tensile failure of an elastic medium overlying an inflating body. I conclude that it is likely that one or more processes occurred to reduce the overpressure within the sill, and that the change in rate of inflation prior to the final failure of the sill is unlikely to be coincidental. The second method I develop to detect volcanic deformation in a time series of interferograms uses a convolutional neural network (CNN) to classify and locate deformation signals as each new interferogram is added to the time series. I achieve this through building a model that uses the five convolutional blocks of a previously state-of-the-art classification and localisation model, VGG16, but incorporates a classification output/head, and a localisation output/head. In order to train the model, I perform transfer learning and utilise the weights made freely available for the convolutional blocks of a version of VGG16 that was trained to classify natural images. I then synthesise a set of training data, but find that better performance is achieved on a testing set of Sentinel-1 interferograms when the model is trained with a mixture of both synthetic and real data. I conclude that CNNs can be built that are able to differentiate between different styles of volcanic deformation, and that they can perform localisation by globally reasoning with a 224 x 224 pixel interferogram without the need for a sliding window approach. The results I present in this thesis show that many machine learning methods can be applied to both time series of interferograms, and individual interferograms. sICA provides a powerful tool to separate some geophysical signals from atmospheric ones, and the ICASAR algorithm that I develop allows a user to evaluate the significance of the results provided by sICA. I incorporate these methods into an deformation detection algorithm, and show that this could be used to detect several types of volcanic unrest using data produced by the latest generation of SAR satellites. Additionally, the CNN I develop is able to differentiate between deformation signals in a single interferogram, and provides a complementary way to monitor volcanoes using InSAR

White Rose E-theses Online

On-premise containerized, light-weight software solutions for Biomedicine

Author: Le Duc Huy
Publication venue
Publication date: 01/01/2023
Field of study

Bioinformatics software systems are critical tools for analysing large-scale biological data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several software approaches on the design and implementation of bioinformatics software systems. These approaches include software patterns, microservices, distributed computing, containerisation and container orchestration. The research focuses on understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights the challenges and considerations involved in their implementation. This study also examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to enhance the productivity and performance of bioinformatics software systems. The research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can significantly improve the code accessibility and structure of bioinformatics software systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting advanced software engineering practices, such as model-driven design and container orchestration, can facilitate efficient and productive deployment and management of bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system demonstrated the ability to address challenges in bioinformatics. The thesis makes several key contributions in addressing the research questions surrounding the design, implementation, and optimisation of bioinformatics software systems using software patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge für die Analyse umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann jedoch aufgrund der erforderlichen Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich darauf, zu verstehen, wie sich diese Techniken auf die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit und Effizienz von bioinformatischen Software-Systemen auswirken und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die Einschränkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die Schlüsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die Produktivität und Leistung bioinformatischer Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgeführt. Die erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die Zuverlässigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit des Systems bei. Darüber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung, die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen für Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich der Bioinformatik zu bewältigen und stellt somit ein wertvolles Werkzeug für Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige Beiträge zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der Implementierung und der Optimierung von Software-Systemen für die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit, Effizienz und Produktivität bioinformatischer Software-Systeme erheblich verbessern kann

Institutional Repository of the Freie Universität Berlin

Machine learning for improving heuristic optimisation

Author: Asta Shahriar
Publication venue
Publication date: 01/01/2015
Field of study

Heuristics, metaheuristics and hyper-heuristics are search methodologies which have been preferred by many researchers and practitioners for solving computationally hard combinatorial optimisation problems, whenever the exact methods fail to produce high quality solutions in a reasonable amount of time. In this thesis, we introduce an advanced machine learning technique, namely, tensor analysis, into the field of heuristic optimisation. We show how the relevant data should be collected in tensorial form, analysed and used during the search process. Four case studies are presented to illustrate the capability of single and multi-episode tensor analysis processing data with high and low abstraction levels for improving heuristic optimisation. A single episode tensor analysis using data at a high abstraction level is employed to improve an iterated multi-stage hyper-heuristic for cross-domain heuristic search. The empirical results across six different problem domains from a hyper-heuristic benchmark show that significant overall performance improvement is possible. A similar approach embedding a multi-episode tensor analysis is applied to the nurse rostering problem and evaluated on a benchmark of a diverse collection of instances, obtained from different hospitals across the world. The empirical results indicate the success of the tensor-based hyper-heuristic, improving upon the best-known solutions for four particular instances. Genetic algorithm is a nature inspired metaheuristic which uses a population of multiple interacting solutions during the search. Mutation is the key variation operator in a genetic algorithm and adjusts the diversity in a population throughout the evolutionary process. Often, a fixed mutation probability is used to perturb the value at each locus, representing a unique component of a given solution. A single episode tensor analysis using data with a low abstraction level is applied to an online bin packing problem, generating locus dependent mutation probabilities. The tensor approach improves the performance of a standard genetic algorithm on almost all instances, significantly. A multi-episode tensor analysis using data with a low abstraction level is embedded into multi-agent cooperative search approach. The empirical results once again show the success of the proposed approach on a benchmark of flow shop problem instances as compared to the approach which does not make use of tensor analysis. The tensor analysis can handle the data with different levels of abstraction leading to a learning approach which can be used within different types of heuristic optimisation methods based on different underlying design philosophies, indeed improving their overall performance

Nottingham eTheses

CiteSeerX

Strain Elevation Tension Spring embedding and Cascading failures on the power-grid

Author: Bourne Jonathan
Publication venue: UCL (University College London)
Publication date: 28/09/2021
Field of study

Understanding the dynamics and properties of networks is of great importance in our highly connected data-driven society. When the networks relate to infrastructure, such understanding can have a substantial impact on public welfare. As such, there is a need for algorithms that can provide insights into the observable and latent properties of these structures. This thesis presents a novel embedding algorithm: the Strain Elevation Tension Spring embedding (SETSe), as a method of understanding complex networks. The algorithm is a deterministic physics model that incorporates both node and edge features into the final embedding. SETSe distinguishes itself from most embeddings methods by not having a loss function in the conventional sense and by not trying to place similar nodes close together. Instead, SETSe acts as a smoothing function for node features across the network topology. This approach produces embeddings that are intuitive and interpretable. In this thesis, I demonstrate how SETSe outperforms alternative embedding methods on node level and graph level tasks using networks made from stochastic block models and social networks with over 40,000 nodes and over 1 million edges. I also highlight a weakness of traditional methods to analysing cascading failures on power grids and demonstrate that SETSe is not susceptible to such issues. I then show how SETSe can be used as a measure of robustness in addition to providing a means to create interpretable maps in the geographical space given its smoothing embedding method. The framework has been made widely available through two open source R packages contributions, 1) the implementation of SETSe ("rsetse" on CRAN), and 2) a package for analysing cascading failures on power grids

UCL Discovery

Recommended from our members

Beyond Discourse: Computational Text Analysis and Material Historical Processes

Author: Atria Jose Tomas
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This dissertation proposes a general methodological framework for the application of computational text analysis to the study of long duration material processes of transformation, beyond their traditional application to the study of discourse and rhetorical action. Over a thin theory of the linguistic nature of social facts, the proposed methodology revolves around the compilation of term co-occurrence matrices and their projection into different representations of an hypothetical semantic space. These representations offer solutions to two problems inherent to social scientific research: that of "mapping" features in a given representation to theoretical entities and that of "alignment" of the features seen in models built from different sources in order to enable their comparison. The data requirements of the exercise are discussed through the introduction of the notion of a "narrative horizon", the extent to which a given source incorporates a narrative account in its rendering of the context that produces it. Useful primary data will consist of text with short narrative horizons, such that the ideal source will correspond to a continuous archive of institutional, ideally bureaucratic text produced as mere documentation of a definite population of more or less stable and comparable social facts across a couple of centuries. Such a primary source is available in the Proceedings of the Old Bailey (POB), a collection of transcriptions of 197,752 criminal trials seen by the Old Bailey and the Central Criminal Court of London and Middlesex between 1674 and 1913 that includes verbatim transcriptions of witness testimony. The POB is used to demonstrate the proposed framework, starting with the analysis of the evolution of an historical corpus to illustrate the procedure by which provenance data is used to construct longitudinal and cross-sectional comparisons of different corpus segments. The co-occurrence matrices obtained from the POB corpus are used to demonstrate two different projections: semantic networks that model different notions of similarity between the terms in a corpus' lexicon as an adjacency matrix describing a graph and semantic vector spaces that approximate a lower-dimensional representation of an hypothetical semantic space from its empirical effects on the co-occurrence matrix. Semantic networks are presented as discrete mathematical objects that offer a solution to the mapping problem through operation that allow for the construction of sets of terms over which an order can be induced using any measure of significance of the strength of association between a term set and its elements. Alignment can then be solved through different similarity measures computed over the intersection and union of the sets under comparison. Semantic vector spaces are presented as continuous mathematical objects that offer a solution to the mapping problem in the linear structures contained in them. This include, in all cases, a meaningful metric that makes it possible to define neighbourhoods and regions in the semantic space and, in some cases, a meaningful orientation that makes it possible to trace dimensions across them. Alignment can then proceed endogenously in the case of oriented vector spaces for relative comparisons, or through the construction of common basis sets for non-oriented semantic spaces for absolute comparisons. The dissertation concludes with the proposition of a general research program for the systematic compilation of text distributional patterns in order to facilitate a much needed process of calibration required by the techniques discussed in the previous chapters. Two specific avenues for further research are identified. First, the development of incremental methods of projection that allow a semantic model to be updated as new observations come along, an area that has received considerable attention from the field of electronic finance and the pervasive use of Gentleman's algorithm for matrix factorisation. Second, the development of additively decomposable models that may be combined or disaggregated to obtain a similar result to the one that would have been obtained had the model being computed from the union or difference of their inputs. This is established to be dependent on whether the functions that actualise a given model are associative under addition or not

Columbia University Academic Commons