12,222 research outputs found
Advancing Model Pruning via Bi-level Optimization
The deployment constraints in practical applications necessitate the pruning
of large-scale deep learning models, i.e., promoting their weight sparsity. As
illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the
potential of improving their generalization ability. At the core of LTH,
iterative magnitude pruning (IMP) is the predominant pruning method to
successfully find 'winning tickets'. Yet, the computation cost of IMP grows
prohibitively as the targeted pruning ratio increases. To reduce the
computation overhead, various efficient 'one-shot' pruning methods have been
developed, but these schemes are usually unable to find winning tickets as good
as IMP. This raises the question of how to close the gap between pruning
accuracy and pruning efficiency? To tackle it, we pursue the algorithmic
advancement of model pruning. Specifically, we formulate the pruning problem
from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the
BLO interpretation provides a technically-grounded optimization base for an
efficient implementation of the pruning-retraining learning paradigm used in
IMP. We also show that the proposed bi-level optimization-oriented pruning
method (termed BiP) is a special class of BLO problems with a bi-linear problem
structure. By leveraging such bi-linearity, we theoretically show that BiP can
be solved as easily as first-order optimization, thus inheriting the
computation efficiency. Through extensive experiments on both structured and
unstructured pruning with 5 model architectures and 4 data sets, we demonstrate
that BiP can find better winning tickets than IMP in most cases, and is
computationally as efficient as the one-shot pruning schemes, demonstrating 2-7
times speedup over IMP for the same level of model accuracy and sparsity.Comment: Thirty-sixth Conference on Neural Information Processing Systems
(NeurIPS 2022
Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process
We develop a novel credit assignment algorithm for information processing
with spiking neurons without requiring feedback synapses. Specifically, we
propose an event-driven generalization of the forward-forward and the
predictive forward-forward learning processes for a spiking neural system that
iteratively processes sensory input over a stimulus window. As a result, the
recurrent circuit computes the membrane potential of each neuron in each layer
as a function of local bottom-up, top-down, and lateral signals, facilitating a
dynamic, layer-wise parallel form of neural computation. Unlike spiking neural
coding, which relies on feedback synapses to adjust neural electrical activity,
our model operates purely online and forward in time, offering a promising way
to learn distributed representations of sensory data patterns with temporal
spike signals. Notably, our experimental results on several pattern datasets
demonstrate that the even-driven forward-forward (ED-FF) framework works well
for training a dynamic recurrent spiking system capable of both classification
and reconstruction
Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images
One of the main obstacles of adopting digital pathology is the challenge of
efficient processing of hyperdimensional digitized biopsy samples, called whole
slide images (WSIs). Exploiting deep learning and introducing compact WSI
representations are urgently needed to accelerate image analysis and facilitate
the visualization and interpretability of pathology results in a postpandemic
world. In this paper, we introduce a new evolutionary approach for WSI
representation based on large-scale multi-objective optimization (LSMOP) of
deep embeddings. We start with patch-based sampling to feed KimiaNet , a
histopathology-specialized deep network, and to extract a multitude of feature
vectors. Coarse multi-objective feature selection uses the reduced search space
strategy guided by the classification accuracy and the number of features. In
the second stage, the frequent features histogram (FFH), a novel WSI
representation, is constructed by multiple runs of coarse LSMOP. Fine
evolutionary feature selection is then applied to find a compact (short-length)
feature vector based on the FFH and contributes to a more robust deep-learning
approach to digital pathology supported by the stochastic power of evolutionary
algorithms. We validate the proposed schemes using The Cancer Genome Atlas
(TCGA) images in terms of WSI representation, classification accuracy, and
feature quality. Furthermore, a novel decision space for multicriteria decision
making in the LSMOP field is introduced. Finally, a patch-level visualization
approach is proposed to increase the interpretability of deep features. The
proposed evolutionary algorithm finds a very compact feature vector to
represent a WSI (almost 14,000 times smaller than the original feature vectors)
with 8% higher accuracy compared to the codes provided by the state-of-the-art
methods
Breast mass segmentation from mammograms with deep transfer learning
Abstract. Mammography is an x-ray imaging method used in breast cancer screening, which is a time consuming process. Many different computer assisted diagnosis have been created to hasten the image analysis. Deep learning is the use of multilayered neural networks for solving different tasks. Deep learning methods are becoming more advanced and popular for segmenting images. One deep transfer learning method is to use these neural networks with pretrained weights, which typically improves the neural networks performance.
In this thesis deep transfer learning was used to segment cancerous masses from mammography images. The convolutional neural networks used were pretrained and fine-tuned, and they had an an encoder-decoder architecture. The ResNet22 encoder was pretrained with mammography images, while the ResNet34 encoder was pretrained with various color images. These encoders were paired with either a U-Net or a Feature Pyramid Network decoder. Additionally, U-Net model with random initialization was also tested. The five different models were trained and tested on the Oulu Dataset of Screening Mammography (9204 images) and on the Portuguese INbreast dataset (410 images) with two different loss functions, binary cross-entropy loss with soft Jaccard loss and a loss function based on focal Tversky index.
The best models were trained on the Oulu Dataset of Screening Mammography with the focal Tversky loss. The best segmentation result achieved was a Dice similarity coefficient of 0.816 on correctly segmented masses and a classification accuracy of 88.7% on the INbreast dataset. On the Oulu Dataset of Screening Mammography, the best results were a Dice score of 0.763 and a classification accuracy of 83.3%. The results between the pretrained models were similar, and the pretrained models had better results than the non-pretrained models. In conclusion, deep transfer learning is very suitable for mammography mass segmentation and the choice of loss function had a large impact on the results.Rinnan massojen segmentointi mammografiakuvista syvÀ- ja siirto-oppimista hyödyntÀen. TiivistelmÀ. Mammografia on röntgenkuvantamismenetelmÀ, jota kÀytetÀÀn rintÀsyövÀn seulontaan. Mammografiakuvien seulonta on aikaa vievÀÀ ja niiden analysoimisen avuksi on kehitelty useita tietokoneavusteisia ratkaisuja. SyvÀoppimisella tarkoitetaan monikerroksisten neuroverkkojen kÀyttöÀ eri tehtÀvien ratkaisemiseen. SyvÀoppimismenetelmÀt ovat ajan myötÀ kehittyneet ja tulleet suosituiksi kuvien segmentoimiseen. Yksi tapa yhdistÀÀ syvÀ- ja siirtooppimista on hyödyntÀÀ neuroverkkoja esiopetettujen painojen kanssa, mikÀ auttaa parantamaan neuroverkkojen suorituskykyÀ.
TÀssÀ diplomityössÀ tutkittiin syvÀ- ja siirto-oppimisen kÀyttöÀ syöpÀisten massojen segmentoimiseen mammografiakuvista. KÀytetyt konvoluutioneuroverkot olivat esikoulutettuja ja hienosÀÀdettyjÀ. LisÀksi niillÀ oli enkooderi-dekooderi arkkitehtuuri. ResNet22 enkooderi oli esikoulutettu mammografia kuvilla, kun taas ResNet34 enkooderi oli esikoulutettu monenlaisilla vÀrikuvilla. NÀihin enkoodereihin yhdistettiin joko U-Net:n tai piirrepyramidiverkon dekooderi. LisÀksi kÀytettiin U-Net mallia ilman esikoulutusta. NÀmÀ viisi erilaista mallia koulutettiin ja testattiin sekÀ Oulun Mammografiaseulonta DatasetillÀ (9204 kuvaa) ettÀ portugalilaisella INbreast datasetillÀ (410 kuvaa) kÀyttÀen kahta eri tavoitefunktiota, jotka olivat binÀÀriristientropia yhdistettynÀ pehmeÀllÀ Jaccard-indeksillÀ ja fokaaliin Tversky indeksiin perustuva tavoitefunktiolla.
Parhaat mallit olivat koulutettu Oulun datasetillÀ fokaalilla Tversky tavoitefunktiolla. Parhaat tulokset olivat 0,816 Dice kerroin oikeissa positiivisissa segmentaatioissa ja 88,7 % luokittelutarkkuus INbreast datasetissÀ. Esikoulutetut mallit antoivat parempia tuloksia kuin mallit joita ei esikoulutettu. Oulun datasetillÀ parhaat tulokset olivat 0,763:n Dice kerroin ja 83,3 % luokittelutarkkuus. Tuloksissa ei ollut suurta eroa esikoulutettujen mallien vÀlillÀ. Tulosten perusteella syvÀ- ja siirto-oppiminen soveltuvat hyvin massojen segmentoimiseen mammografiakuvista. LisÀksi tavoitefunktiovalinnalla saatiin suuri vaikutus tuloksiin
Statistical-dynamical analyses and modelling of multi-scale ocean variability
This thesis aims to provide a comprehensive analysis of multi-scale oceanic variabilities using various statistical and dynamical tools and explore the data-driven methods for correct statistical emulation of the oceans. We considered the classical, wind-driven, double-gyre ocean circulation model in quasi-geostrophic approximation and obtained its eddy-resolving solutions in terms of potential vorticity anomaly and geostrophic streamfunctions. The reference solutions possess two asymmetric gyres of opposite circulations and a strong meandering eastward jet separating them with rich eddy activities around it, such as the Gulf Stream in the North Atlantic and Kuroshio in the North Pacific.
This thesis is divided into two parts. The first part discusses a novel scale-separation method based on the local spatial correlations, called correlation-based decomposition (CBD), and provides a comprehensive analysis of mesoscale eddy forcing. In particular, we analyse the instantaneous and time-lagged interactions between the diagnosed eddy forcing and the evolving large-scale PVA using the novel `product integral' characteristics. The product integral time series uncover robust causality between two drastically different yet interacting flow quantities, termed `eddy backscatter'. We also show data-driven augmentation of non-eddy-resolving ocean models by feeding them the eddy fields to restore the missing eddy-driven features, such as the merging western boundary currents, their eastward extension and low-frequency variabilities of gyres.
In the second part, we present a systematic inter-comparison of Linear Regression (LR), stochastic and deep-learning methods to build low-cost reduced-order statistical emulators of the oceans. We obtain the forecasts on seasonal and centennial timescales and assess them for their skill, cost and complexity. We found that the multi-level linear stochastic model performs the best, followed by the ``hybrid stochastically-augmented deep learning models''. The superiority of these methods underscores the importance of incorporating core dynamics, memory effects and model errors for robust emulation of multi-scale dynamical systems, such as the oceans.Open Acces
Image classification over unknown and anomalous domains
A longstanding goal in computer vision research is to develop methods that are simultaneously applicable to a broad range of prediction problems. In contrast to this, models often perform best when they are specialized to some task or data type. This thesis investigates the challenges of learning models that generalize well over multiple unknown or anomalous modes and domains in data, and presents new solutions for learning robustly in this setting.
Initial investigations focus on normalization for distributions that contain multiple sources (e.g. images in different styles like cartoons or photos). Experiments demonstrate the extent to which existing modules, batch normalization in particular, struggle with such heterogeneous data, and a new solution is proposed that can better handle data from multiple visual modes, using differing sample statistics for each.
While ideas to counter the overspecialization of models have been formulated in sub-disciplines of transfer learning, e.g. multi-domain and multi-task learning, these usually rely on the existence of meta information, such as task or domain labels. Relaxing this assumption gives rise to a new transfer learning setting, called latent domain learning in this thesis, in which training and inference are carried out over data from multiple visual domains, without domain-level annotations. Customized solutions are required for this, as the performance of standard models degrades: a new data augmentation technique that interpolates between latent domains in an unsupervised way is presented, alongside a dedicated module that sparsely accounts for hidden domains in data, without requiring domain labels to do so.
In addition, the thesis studies the problem of classifying previously unseen or anomalous modes in data, a fundamental problem in one-class learning, and anomaly detection in particular. While recent ideas have been focused on developing self-supervised solutions for the one-class setting, in this thesis new methods based on transfer learning are formulated. Extensive experimental evidence demonstrates that a transfer-based perspective benefits new problems that have recently been proposed in anomaly detection literature, in particular challenging semantic detection tasks
Modelling and Solving the Single-Airport Slot Allocation Problem
Currently, there are about 200 overly congested airports where airport capacity does not suffice to accommodate airline demand. These airports play a critical role in the global air transport system since they concern 40% of global passenger demand and act as a bottleneck for the entire air transport system. This imbalance between airport capacity and airline demand leads to excessive delays, as well as multi-billion economic, and huge environmental and societal costs. Concurrently, the implementation of airport capacity expansion projects requires time, space and is subject to significant resistance from local communities. As a short to medium-term response, Airport Slot Allocation (ASA) has been used as the main demand management mechanism. The main goal of this thesis is to improve ASA decision-making through the proposition of models and algorithms that provide enhanced ASA decision support. In doing so, this thesis is organised into three distinct chapters that shed light on the following questions (IâV), which remain untapped by the existing literature. In parentheses, we identify the chapters of this thesis that relate to each research question. I. How to improve the modelling of airline demand flexibility and the utility that each airline assigns to each available airport slot? (Chapters 2 and 4) II. How can one model the dynamic and endogenous adaptation of the airportâs landside and airside infrastructure to the characteristics of airline demand? (Chapter 2) III. How to consider operational delays in strategic ASA decision-making? (Chapter 3) IV. How to involve the pertinent stakeholders into the ASA decision-making process to select a commonly agreed schedule; and how can one reduce the inherent decision-complexity without compromising the quality and diversity of the schedules presented to the decision-makers? (Chapter 3) V. Given that the ASA process involves airlines (submitting requests for slots) and coordinators (assigning slots to requests based on a set of rules and priorities), how can one jointly consider the interactions between these two sides to improve ASA decision-making? (Chapter 4) With regards to research questions (I) and (II), the thesis proposes a Mixed Integer Programming (MIP) model that considers airlinesâ timing flexibility (research question I) and constraints that enable the dynamic and endogenous allocation of the airportâs resources (research question II). The proposed modelling variant addresses several additional problem characteristics and policy rules, and considers multiple efficiency objectives, while integrating all constraints that may affect airport slot scheduling decisions, including the asynchronous use of the different airport resources (runway, aprons, passenger terminal) and the endogenous consideration of the capabilities of the airportâs infrastructure to adapt to the airline demandâs characteristics and the aircraft/flight type associated with each request. The proposed model is integrated into a two-stage solution approach that considers all primary and several secondary policy rules of ASA. New combinatorial results and valid tightening inequalities that facilitate the solution of the problem are proposed and implemented. An extension of the above MIP model that considers the trade-offs among schedule displacement, maximum displacement, and the number of displaced requests, is integrated into a multi-objective solution framework. The proposed framework holistically considers the preferences of all ASA stakeholder groups (research question IV) concerning multiple performance metrics and models the operational delays associated with each airport schedule (research question III). The delays of each schedule/solution are macroscopically estimated, and a subtractive clustering algorithm and a parameter tuning routine reduce the inherent decision complexity by pruning non-dominated solutions without compromising the representativeness of the alternatives offered to the decision-makers (research question IV). Following the determination of the representative set, the expected delay estimates of each schedule are further refined by considering the whole airfieldâs operations, the landside, and the airside infrastructure. The representative schedules are ranked based on the preferences of all ASA stakeholder groups concerning each scheduleâs displacement-related and operational-delay performance. Finally, in considering the interactions between airlinesâ timing flexibility and utility, and the policy-based priorities assigned by the coordinator to each request (research question V), the thesis models the ASA problem as a two-sided matching game and provides guarantees on the stability of the proposed schedules. A Stable Airport Slot Allocation Model (SASAM) capitalises on the flexibility considerations introduced for addressing research question (I) through the exploitation of data submitted by the airlines during the ASA process and provides functions that proxy each requestâs value considering both the airlinesâ timing flexibility for each submitted request and the requestsâ prioritisation by the coordinators when considering the policy rules defining the ASA process. The thesis argues on the compliance of the proposed functions with the primary regulatory requirements of the ASA process and demonstrates their applicability for different types of slot requests. SASAM guarantees stability through sets of inequalities that prune allocations blocking the formation of stable schedules. A multi-objective Deferred-Acceptance (DA) algorithm guaranteeing the stability of each generated schedule is developed. The algorithm can generate all stable non-dominated points by considering the trade-off between the spilled airline and passenger demand and maximum displacement. The work conducted in this thesis addresses several problem characteristics and sheds light on their implications for ASA decision-making, hence having the potential to improve ASA decision-making. Our findings suggest that the consideration of airlinesâ timing flexibility (research question I) results in improved capacity utilisation and scheduling efficiency. The endogenous consideration of the ability of the airportâs infrastructure to adapt to the characteristics of airline demand (research question II) enables a more efficient representation of airport declared capacity that results in the scheduling of additional requests. The concurrent consideration of airlinesâ timing flexibility and the endogenous adaptation of airport resources to airline demand achieves an improved alignment between the airport infrastructure and the characteristics of airline demand, ergo proposing schedules of improved efficiency. The modelling and evaluation of the peak operational delays associated with the different airport schedules (research question III) provides allows the study of the implications of strategic ASA decision-making for operations and quantifies the impact of the airportâs declared capacity on each scheduleâs operational performance. In considering the preferences of the relevant ASA stakeholders (airlines, coordinators, airport, and air traffic authorities) concerning multiple operational and strategic ASA efficiency metrics (research question IV) the thesis assesses the impact of alternative preference considerations and indicates a commonly preferred schedule that balances the stakeholdersâ preferences. The proposition of representative subsets of alternative schedules reduces decision-complexity without significantly compromising the quality of the alternatives offered to the decision-making process (research question IV). The modelling of the ASA as a two-sided matching game (research question V), results in stable schedules consisting of request-to-slot assignments that provide no incentive to airlines and coordinators to reject or alter the proposed timings. Furthermore, the proposition of stable schedules results in more intensive use of airport capacity, while simultaneously improving scheduling efficiency. The models and algorithms developed as part of this thesis are tested using airline requests and airport capacity data from coordinated airports. Computational results that are relevant to the context of the considered airport instances provide evidence on the potential improvements for the current ASA process and facilitate data-driven policy and decision-making. In particular, with regards to the alignment of airline demand with the capabilities of the airportâs infrastructure (questions I and II), computational results report improved slot allocation efficiency and airport capacity utilisation, which for the considered airport instance translate to improvements ranging between 5-24% for various schedule performance metrics. In reducing the difficulty associated with the assessment of multiple ASA solutions by the stakeholders (question IV), instance-specific results suggest reductions to the number of alternative schedules by 87%, while maintaining the quality of the solutions presented to the stakeholders above 70% (expressed in relation to the initially considered set of schedules). Meanwhile, computational results suggest that the concurrent consideration of ASA stakeholdersâ preferences (research question IV) with regards to both operational (research question III) and strategic performance metrics leads to alternative airport slot scheduling solutions that inform on the trade-offs between the schedulesâ operational and strategic performance and the stakeholdersâ preferences. Concerning research question (V), the application of SASAM and the DA algorithm suggest improvements to the number of unaccommodated flights and passengers (13 and 40% improvements) at the expense of requests concerning fewer passengers and days of operations (increasing the number of rejected requests by 1.2% in relation to the total number of submitted requests). The research conducted in this thesis aids in the identification of limitations that should be addressed by future studies to further improve ASA decision-making. First, the thesis focuses on exact solution approaches that consider the landside and airside infrastructure of the airport and generate multiple schedules. The proposition of pre-processing techniques that identify the bottleneck of the airportâs capacity, i.e., landside and/or airside, can be used to reduce the size of the proposed formulations and improve the required computational times. Meanwhile, the development of multi-objective heuristic algorithms that consider several problem characteristics and generate multiple efficient schedules in reasonable computational times, could extend the capabilities of the models propositioned in this thesis and provide decision support for some of the worldâs most congested airports. Furthermore, the thesis models and evaluates the operational implications of strategic airport slot scheduling decisions. The explicit consideration of operational delays as an objective in ASA optimisation models and algorithms is an issue that merits investigation since it may further improve the operational performance of the generated schedules. In accordance with current practice, the models proposed in this work have considered deterministic capacity parameters. Perhaps, future research could propose formulations that consider stochastic representations of airport declared capacity and improve strategic ASA decision-making through the anticipation of operational uncertainty and weather-induced capacity reductions. Finally, in modelling airlinesâ utility for each submitted request and available time slot the thesis proposes time-dependent functions that utilise available data to approximate airlinesâ scheduling preferences. Future studies wishing to improve the accuracy of the proposed functions could utilise commercial data sources that provide route-specific information; or in cases that such data is unavailable, employ data mining and machine learning methodologies to extract airlinesâ time-dependent utility and preferences
From wallet to mobile: exploring how mobile payments create customer value in the service experience
This study explores how mobile proximity payments (MPP) (e.g., Apple Pay) create customer value in the service experience compared to traditional payment methods (e.g. cash and card). The main objectives were firstly to understand how customer value manifests as an outcome in the MPP service experience, and secondly to understand how the customer activities in the process of using MPP create customer value. To achieve these objectives a conceptual framework is built upon the Grönroos-Voima Value Model (Grönroos and Voima, 2013), and uses the Theory of Consumption Value (Sheth et al., 1991) to determine the customer value constructs for MPP, which is complimented with Script theory (Abelson, 1981) to determine the value creating activities the consumer does in the process of paying with MPP.
The study uses a sequential exploratory mixed methods design, wherein the first qualitative stage uses two methods, self-observations (n=200) and semi-structured interviews (n=18). The subsequent second quantitative stage uses an online survey (n=441) and Structural Equation Modelling analysis to further examine the relationships and effect between the value creating activities and customer value constructs identified in stage one. The academic contributions include the development of a model of mobile payment services value creation in the service experience, introducing the concept of in-use barriers which occur after adoption and constrains the consumers existing use of MPP, and revealing the importance of the mobile in-hand momentary condition as an antecedent state. Additionally, the customer value perspective of this thesis demonstrates an alternative to the dominant Information Technology approaches to researching mobile payments and broadens the view of technology from purely an object a user interacts with to an object that is immersed in consumersâ daily life
Recommended from our members
Mixture Models in Machine Learning
Modeling with mixtures is a powerful method in the statistical toolkit that can be used for representing the presence of sub-populations within an overall population. In many applications ranging from financial models to genetics, a mixture model is used to fit the data. The primary difficulty in learning mixture models is that the observed data set does not identify the sub-population to which an individual observation belongs. Despite being studied for more than a century, the theoretical guarantees of mixture models remain unknown for several important settings.
In this thesis, we look at three groups of problems. The first part is aimed at estimating the parameters of a mixture of simple distributions. We ask the following question: How many samples are necessary and sufficient to learn the latent parameters? We propose several approaches for this problem that include complex analytic tools to connect statistical distances between pairs of mixtures with the characteristic function. We show sufficient sample complexity guarantees for mixtures of popular distributions (including Gaussian, Poisson and Geometric). For many distributions, our results provide the first sample complexity guarantees for parameter estimation in the corresponding mixture. Using these techniques, we also provide improved lower bounds on the Total Variation distance between Gaussian mixtures with two components and demonstrate new results in some sequence reconstruction problems.
In the second part, we study Mixtures of Sparse Linear Regressions where the goal is to learn the best set of linear relationships between the scalar responses (i.e., labels) and the explanatory variables (i.e., features). We focus on a scenario where a learner is able to choose the features to get the labels. To tackle the high dimensionality of data, we further assume that the linear maps are also sparse , i.e., have only few prominent features among many. For this setting, we devise algorithms with sub-linear (as a function of the dimension) sample complexity guarantees that are also robust to noise.
In the final part, we study Mixtures of Sparse Linear Classifiers in the same setting as above. Given a set of features and the binary labels, the objective of this task is to find a set of hyperplanes in the space of features such that for any (feature, label) pair, there exists a hyperplane in the set that justifies the mapping. We devise efficient algorithms with sub-linear sample complexity guarantees for learning the unknown hyperplanes under similar sparsity assumptions as above. To that end, we propose several novel techniques that include tensor decomposition methods and combinatorial designs
- âŠ