12,222 research outputs found

    Advancing Model Pruning via Bi-level Optimization

    Full text link
    The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.Comment: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022

    Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process

    Full text link
    We develop a novel credit assignment algorithm for information processing with spiking neurons without requiring feedback synapses. Specifically, we propose an event-driven generalization of the forward-forward and the predictive forward-forward learning processes for a spiking neural system that iteratively processes sensory input over a stimulus window. As a result, the recurrent circuit computes the membrane potential of each neuron in each layer as a function of local bottom-up, top-down, and lateral signals, facilitating a dynamic, layer-wise parallel form of neural computation. Unlike spiking neural coding, which relies on feedback synapses to adjust neural electrical activity, our model operates purely online and forward in time, offering a promising way to learn distributed representations of sensory data patterns with temporal spike signals. Notably, our experimental results on several pattern datasets demonstrate that the even-driven forward-forward (ED-FF) framework works well for training a dynamic recurrent spiking system capable of both classification and reconstruction

    Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images

    Full text link
    One of the main obstacles of adopting digital pathology is the challenge of efficient processing of hyperdimensional digitized biopsy samples, called whole slide images (WSIs). Exploiting deep learning and introducing compact WSI representations are urgently needed to accelerate image analysis and facilitate the visualization and interpretability of pathology results in a postpandemic world. In this paper, we introduce a new evolutionary approach for WSI representation based on large-scale multi-objective optimization (LSMOP) of deep embeddings. We start with patch-based sampling to feed KimiaNet , a histopathology-specialized deep network, and to extract a multitude of feature vectors. Coarse multi-objective feature selection uses the reduced search space strategy guided by the classification accuracy and the number of features. In the second stage, the frequent features histogram (FFH), a novel WSI representation, is constructed by multiple runs of coarse LSMOP. Fine evolutionary feature selection is then applied to find a compact (short-length) feature vector based on the FFH and contributes to a more robust deep-learning approach to digital pathology supported by the stochastic power of evolutionary algorithms. We validate the proposed schemes using The Cancer Genome Atlas (TCGA) images in terms of WSI representation, classification accuracy, and feature quality. Furthermore, a novel decision space for multicriteria decision making in the LSMOP field is introduced. Finally, a patch-level visualization approach is proposed to increase the interpretability of deep features. The proposed evolutionary algorithm finds a very compact feature vector to represent a WSI (almost 14,000 times smaller than the original feature vectors) with 8% higher accuracy compared to the codes provided by the state-of-the-art methods

    Breast mass segmentation from mammograms with deep transfer learning

    Get PDF
    Abstract. Mammography is an x-ray imaging method used in breast cancer screening, which is a time consuming process. Many different computer assisted diagnosis have been created to hasten the image analysis. Deep learning is the use of multilayered neural networks for solving different tasks. Deep learning methods are becoming more advanced and popular for segmenting images. One deep transfer learning method is to use these neural networks with pretrained weights, which typically improves the neural networks performance. In this thesis deep transfer learning was used to segment cancerous masses from mammography images. The convolutional neural networks used were pretrained and fine-tuned, and they had an an encoder-decoder architecture. The ResNet22 encoder was pretrained with mammography images, while the ResNet34 encoder was pretrained with various color images. These encoders were paired with either a U-Net or a Feature Pyramid Network decoder. Additionally, U-Net model with random initialization was also tested. The five different models were trained and tested on the Oulu Dataset of Screening Mammography (9204 images) and on the Portuguese INbreast dataset (410 images) with two different loss functions, binary cross-entropy loss with soft Jaccard loss and a loss function based on focal Tversky index. The best models were trained on the Oulu Dataset of Screening Mammography with the focal Tversky loss. The best segmentation result achieved was a Dice similarity coefficient of 0.816 on correctly segmented masses and a classification accuracy of 88.7% on the INbreast dataset. On the Oulu Dataset of Screening Mammography, the best results were a Dice score of 0.763 and a classification accuracy of 83.3%. The results between the pretrained models were similar, and the pretrained models had better results than the non-pretrained models. In conclusion, deep transfer learning is very suitable for mammography mass segmentation and the choice of loss function had a large impact on the results.Rinnan massojen segmentointi mammografiakuvista syvÀ- ja siirto-oppimista hyödyntÀen. TiivistelmÀ. Mammografia on röntgenkuvantamismenetelmÀ, jota kÀytetÀÀn rintÀsyövÀn seulontaan. Mammografiakuvien seulonta on aikaa vievÀÀ ja niiden analysoimisen avuksi on kehitelty useita tietokoneavusteisia ratkaisuja. SyvÀoppimisella tarkoitetaan monikerroksisten neuroverkkojen kÀyttöÀ eri tehtÀvien ratkaisemiseen. SyvÀoppimismenetelmÀt ovat ajan myötÀ kehittyneet ja tulleet suosituiksi kuvien segmentoimiseen. Yksi tapa yhdistÀÀ syvÀ- ja siirtooppimista on hyödyntÀÀ neuroverkkoja esiopetettujen painojen kanssa, mikÀ auttaa parantamaan neuroverkkojen suorituskykyÀ. TÀssÀ diplomityössÀ tutkittiin syvÀ- ja siirto-oppimisen kÀyttöÀ syöpÀisten massojen segmentoimiseen mammografiakuvista. KÀytetyt konvoluutioneuroverkot olivat esikoulutettuja ja hienosÀÀdettyjÀ. LisÀksi niillÀ oli enkooderi-dekooderi arkkitehtuuri. ResNet22 enkooderi oli esikoulutettu mammografia kuvilla, kun taas ResNet34 enkooderi oli esikoulutettu monenlaisilla vÀrikuvilla. NÀihin enkoodereihin yhdistettiin joko U-Net:n tai piirrepyramidiverkon dekooderi. LisÀksi kÀytettiin U-Net mallia ilman esikoulutusta. NÀmÀ viisi erilaista mallia koulutettiin ja testattiin sekÀ Oulun Mammografiaseulonta DatasetillÀ (9204 kuvaa) ettÀ portugalilaisella INbreast datasetillÀ (410 kuvaa) kÀyttÀen kahta eri tavoitefunktiota, jotka olivat binÀÀriristientropia yhdistettynÀ pehmeÀllÀ Jaccard-indeksillÀ ja fokaaliin Tversky indeksiin perustuva tavoitefunktiolla. Parhaat mallit olivat koulutettu Oulun datasetillÀ fokaalilla Tversky tavoitefunktiolla. Parhaat tulokset olivat 0,816 Dice kerroin oikeissa positiivisissa segmentaatioissa ja 88,7 % luokittelutarkkuus INbreast datasetissÀ. Esikoulutetut mallit antoivat parempia tuloksia kuin mallit joita ei esikoulutettu. Oulun datasetillÀ parhaat tulokset olivat 0,763:n Dice kerroin ja 83,3 % luokittelutarkkuus. Tuloksissa ei ollut suurta eroa esikoulutettujen mallien vÀlillÀ. Tulosten perusteella syvÀ- ja siirto-oppiminen soveltuvat hyvin massojen segmentoimiseen mammografiakuvista. LisÀksi tavoitefunktiovalinnalla saatiin suuri vaikutus tuloksiin

    Statistical-dynamical analyses and modelling of multi-scale ocean variability

    Get PDF
    This thesis aims to provide a comprehensive analysis of multi-scale oceanic variabilities using various statistical and dynamical tools and explore the data-driven methods for correct statistical emulation of the oceans. We considered the classical, wind-driven, double-gyre ocean circulation model in quasi-geostrophic approximation and obtained its eddy-resolving solutions in terms of potential vorticity anomaly and geostrophic streamfunctions. The reference solutions possess two asymmetric gyres of opposite circulations and a strong meandering eastward jet separating them with rich eddy activities around it, such as the Gulf Stream in the North Atlantic and Kuroshio in the North Pacific. This thesis is divided into two parts. The first part discusses a novel scale-separation method based on the local spatial correlations, called correlation-based decomposition (CBD), and provides a comprehensive analysis of mesoscale eddy forcing. In particular, we analyse the instantaneous and time-lagged interactions between the diagnosed eddy forcing and the evolving large-scale PVA using the novel `product integral' characteristics. The product integral time series uncover robust causality between two drastically different yet interacting flow quantities, termed `eddy backscatter'. We also show data-driven augmentation of non-eddy-resolving ocean models by feeding them the eddy fields to restore the missing eddy-driven features, such as the merging western boundary currents, their eastward extension and low-frequency variabilities of gyres. In the second part, we present a systematic inter-comparison of Linear Regression (LR), stochastic and deep-learning methods to build low-cost reduced-order statistical emulators of the oceans. We obtain the forecasts on seasonal and centennial timescales and assess them for their skill, cost and complexity. We found that the multi-level linear stochastic model performs the best, followed by the ``hybrid stochastically-augmented deep learning models''. The superiority of these methods underscores the importance of incorporating core dynamics, memory effects and model errors for robust emulation of multi-scale dynamical systems, such as the oceans.Open Acces

    Image classification over unknown and anomalous domains

    Get PDF
    A longstanding goal in computer vision research is to develop methods that are simultaneously applicable to a broad range of prediction problems. In contrast to this, models often perform best when they are specialized to some task or data type. This thesis investigates the challenges of learning models that generalize well over multiple unknown or anomalous modes and domains in data, and presents new solutions for learning robustly in this setting. Initial investigations focus on normalization for distributions that contain multiple sources (e.g. images in different styles like cartoons or photos). Experiments demonstrate the extent to which existing modules, batch normalization in particular, struggle with such heterogeneous data, and a new solution is proposed that can better handle data from multiple visual modes, using differing sample statistics for each. While ideas to counter the overspecialization of models have been formulated in sub-disciplines of transfer learning, e.g. multi-domain and multi-task learning, these usually rely on the existence of meta information, such as task or domain labels. Relaxing this assumption gives rise to a new transfer learning setting, called latent domain learning in this thesis, in which training and inference are carried out over data from multiple visual domains, without domain-level annotations. Customized solutions are required for this, as the performance of standard models degrades: a new data augmentation technique that interpolates between latent domains in an unsupervised way is presented, alongside a dedicated module that sparsely accounts for hidden domains in data, without requiring domain labels to do so. In addition, the thesis studies the problem of classifying previously unseen or anomalous modes in data, a fundamental problem in one-class learning, and anomaly detection in particular. While recent ideas have been focused on developing self-supervised solutions for the one-class setting, in this thesis new methods based on transfer learning are formulated. Extensive experimental evidence demonstrates that a transfer-based perspective benefits new problems that have recently been proposed in anomaly detection literature, in particular challenging semantic detection tasks

    Modelling and Solving the Single-Airport Slot Allocation Problem

    Get PDF
    Currently, there are about 200 overly congested airports where airport capacity does not suffice to accommodate airline demand. These airports play a critical role in the global air transport system since they concern 40% of global passenger demand and act as a bottleneck for the entire air transport system. This imbalance between airport capacity and airline demand leads to excessive delays, as well as multi-billion economic, and huge environmental and societal costs. Concurrently, the implementation of airport capacity expansion projects requires time, space and is subject to significant resistance from local communities. As a short to medium-term response, Airport Slot Allocation (ASA) has been used as the main demand management mechanism. The main goal of this thesis is to improve ASA decision-making through the proposition of models and algorithms that provide enhanced ASA decision support. In doing so, this thesis is organised into three distinct chapters that shed light on the following questions (I–V), which remain untapped by the existing literature. In parentheses, we identify the chapters of this thesis that relate to each research question. I. How to improve the modelling of airline demand flexibility and the utility that each airline assigns to each available airport slot? (Chapters 2 and 4) II. How can one model the dynamic and endogenous adaptation of the airport’s landside and airside infrastructure to the characteristics of airline demand? (Chapter 2) III. How to consider operational delays in strategic ASA decision-making? (Chapter 3) IV. How to involve the pertinent stakeholders into the ASA decision-making process to select a commonly agreed schedule; and how can one reduce the inherent decision-complexity without compromising the quality and diversity of the schedules presented to the decision-makers? (Chapter 3) V. Given that the ASA process involves airlines (submitting requests for slots) and coordinators (assigning slots to requests based on a set of rules and priorities), how can one jointly consider the interactions between these two sides to improve ASA decision-making? (Chapter 4) With regards to research questions (I) and (II), the thesis proposes a Mixed Integer Programming (MIP) model that considers airlines’ timing flexibility (research question I) and constraints that enable the dynamic and endogenous allocation of the airport’s resources (research question II). The proposed modelling variant addresses several additional problem characteristics and policy rules, and considers multiple efficiency objectives, while integrating all constraints that may affect airport slot scheduling decisions, including the asynchronous use of the different airport resources (runway, aprons, passenger terminal) and the endogenous consideration of the capabilities of the airport’s infrastructure to adapt to the airline demand’s characteristics and the aircraft/flight type associated with each request. The proposed model is integrated into a two-stage solution approach that considers all primary and several secondary policy rules of ASA. New combinatorial results and valid tightening inequalities that facilitate the solution of the problem are proposed and implemented. An extension of the above MIP model that considers the trade-offs among schedule displacement, maximum displacement, and the number of displaced requests, is integrated into a multi-objective solution framework. The proposed framework holistically considers the preferences of all ASA stakeholder groups (research question IV) concerning multiple performance metrics and models the operational delays associated with each airport schedule (research question III). The delays of each schedule/solution are macroscopically estimated, and a subtractive clustering algorithm and a parameter tuning routine reduce the inherent decision complexity by pruning non-dominated solutions without compromising the representativeness of the alternatives offered to the decision-makers (research question IV). Following the determination of the representative set, the expected delay estimates of each schedule are further refined by considering the whole airfield’s operations, the landside, and the airside infrastructure. The representative schedules are ranked based on the preferences of all ASA stakeholder groups concerning each schedule’s displacement-related and operational-delay performance. Finally, in considering the interactions between airlines’ timing flexibility and utility, and the policy-based priorities assigned by the coordinator to each request (research question V), the thesis models the ASA problem as a two-sided matching game and provides guarantees on the stability of the proposed schedules. A Stable Airport Slot Allocation Model (SASAM) capitalises on the flexibility considerations introduced for addressing research question (I) through the exploitation of data submitted by the airlines during the ASA process and provides functions that proxy each request’s value considering both the airlines’ timing flexibility for each submitted request and the requests’ prioritisation by the coordinators when considering the policy rules defining the ASA process. The thesis argues on the compliance of the proposed functions with the primary regulatory requirements of the ASA process and demonstrates their applicability for different types of slot requests. SASAM guarantees stability through sets of inequalities that prune allocations blocking the formation of stable schedules. A multi-objective Deferred-Acceptance (DA) algorithm guaranteeing the stability of each generated schedule is developed. The algorithm can generate all stable non-dominated points by considering the trade-off between the spilled airline and passenger demand and maximum displacement. The work conducted in this thesis addresses several problem characteristics and sheds light on their implications for ASA decision-making, hence having the potential to improve ASA decision-making. Our findings suggest that the consideration of airlines’ timing flexibility (research question I) results in improved capacity utilisation and scheduling efficiency. The endogenous consideration of the ability of the airport’s infrastructure to adapt to the characteristics of airline demand (research question II) enables a more efficient representation of airport declared capacity that results in the scheduling of additional requests. The concurrent consideration of airlines’ timing flexibility and the endogenous adaptation of airport resources to airline demand achieves an improved alignment between the airport infrastructure and the characteristics of airline demand, ergo proposing schedules of improved efficiency. The modelling and evaluation of the peak operational delays associated with the different airport schedules (research question III) provides allows the study of the implications of strategic ASA decision-making for operations and quantifies the impact of the airport’s declared capacity on each schedule’s operational performance. In considering the preferences of the relevant ASA stakeholders (airlines, coordinators, airport, and air traffic authorities) concerning multiple operational and strategic ASA efficiency metrics (research question IV) the thesis assesses the impact of alternative preference considerations and indicates a commonly preferred schedule that balances the stakeholders’ preferences. The proposition of representative subsets of alternative schedules reduces decision-complexity without significantly compromising the quality of the alternatives offered to the decision-making process (research question IV). The modelling of the ASA as a two-sided matching game (research question V), results in stable schedules consisting of request-to-slot assignments that provide no incentive to airlines and coordinators to reject or alter the proposed timings. Furthermore, the proposition of stable schedules results in more intensive use of airport capacity, while simultaneously improving scheduling efficiency. The models and algorithms developed as part of this thesis are tested using airline requests and airport capacity data from coordinated airports. Computational results that are relevant to the context of the considered airport instances provide evidence on the potential improvements for the current ASA process and facilitate data-driven policy and decision-making. In particular, with regards to the alignment of airline demand with the capabilities of the airport’s infrastructure (questions I and II), computational results report improved slot allocation efficiency and airport capacity utilisation, which for the considered airport instance translate to improvements ranging between 5-24% for various schedule performance metrics. In reducing the difficulty associated with the assessment of multiple ASA solutions by the stakeholders (question IV), instance-specific results suggest reductions to the number of alternative schedules by 87%, while maintaining the quality of the solutions presented to the stakeholders above 70% (expressed in relation to the initially considered set of schedules). Meanwhile, computational results suggest that the concurrent consideration of ASA stakeholders’ preferences (research question IV) with regards to both operational (research question III) and strategic performance metrics leads to alternative airport slot scheduling solutions that inform on the trade-offs between the schedules’ operational and strategic performance and the stakeholders’ preferences. Concerning research question (V), the application of SASAM and the DA algorithm suggest improvements to the number of unaccommodated flights and passengers (13 and 40% improvements) at the expense of requests concerning fewer passengers and days of operations (increasing the number of rejected requests by 1.2% in relation to the total number of submitted requests). The research conducted in this thesis aids in the identification of limitations that should be addressed by future studies to further improve ASA decision-making. First, the thesis focuses on exact solution approaches that consider the landside and airside infrastructure of the airport and generate multiple schedules. The proposition of pre-processing techniques that identify the bottleneck of the airport’s capacity, i.e., landside and/or airside, can be used to reduce the size of the proposed formulations and improve the required computational times. Meanwhile, the development of multi-objective heuristic algorithms that consider several problem characteristics and generate multiple efficient schedules in reasonable computational times, could extend the capabilities of the models propositioned in this thesis and provide decision support for some of the world’s most congested airports. Furthermore, the thesis models and evaluates the operational implications of strategic airport slot scheduling decisions. The explicit consideration of operational delays as an objective in ASA optimisation models and algorithms is an issue that merits investigation since it may further improve the operational performance of the generated schedules. In accordance with current practice, the models proposed in this work have considered deterministic capacity parameters. Perhaps, future research could propose formulations that consider stochastic representations of airport declared capacity and improve strategic ASA decision-making through the anticipation of operational uncertainty and weather-induced capacity reductions. Finally, in modelling airlines’ utility for each submitted request and available time slot the thesis proposes time-dependent functions that utilise available data to approximate airlines’ scheduling preferences. Future studies wishing to improve the accuracy of the proposed functions could utilise commercial data sources that provide route-specific information; or in cases that such data is unavailable, employ data mining and machine learning methodologies to extract airlines’ time-dependent utility and preferences

    From wallet to mobile: exploring how mobile payments create customer value in the service experience

    Get PDF
    This study explores how mobile proximity payments (MPP) (e.g., Apple Pay) create customer value in the service experience compared to traditional payment methods (e.g. cash and card). The main objectives were firstly to understand how customer value manifests as an outcome in the MPP service experience, and secondly to understand how the customer activities in the process of using MPP create customer value. To achieve these objectives a conceptual framework is built upon the Grönroos-Voima Value Model (Grönroos and Voima, 2013), and uses the Theory of Consumption Value (Sheth et al., 1991) to determine the customer value constructs for MPP, which is complimented with Script theory (Abelson, 1981) to determine the value creating activities the consumer does in the process of paying with MPP. The study uses a sequential exploratory mixed methods design, wherein the first qualitative stage uses two methods, self-observations (n=200) and semi-structured interviews (n=18). The subsequent second quantitative stage uses an online survey (n=441) and Structural Equation Modelling analysis to further examine the relationships and effect between the value creating activities and customer value constructs identified in stage one. The academic contributions include the development of a model of mobile payment services value creation in the service experience, introducing the concept of in-use barriers which occur after adoption and constrains the consumers existing use of MPP, and revealing the importance of the mobile in-hand momentary condition as an antecedent state. Additionally, the customer value perspective of this thesis demonstrates an alternative to the dominant Information Technology approaches to researching mobile payments and broadens the view of technology from purely an object a user interacts with to an object that is immersed in consumers’ daily life
    • 

    corecore