4,172 research outputs found

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Deep generative models for network data synthesis and monitoring

    Get PDF
    Measurement and monitoring are fundamental tasks in all networks, enabling the down-stream management and optimization of the network. Although networks inherently have abundant amounts of monitoring data, its access and effective measurement is another story. The challenges exist in many aspects. First, the inaccessibility of network monitoring data for external users, and it is hard to provide a high-fidelity dataset without leaking commercial sensitive information. Second, it could be very expensive to carry out effective data collection to cover a large-scale network system, considering the size of network growing, i.e., cell number of radio network and the number of flows in the Internet Service Provider (ISP) network. Third, it is difficult to ensure fidelity and efficiency simultaneously in network monitoring, as the available resources in the network element that can be applied to support the measurement function are too limited to implement sophisticated mechanisms. Finally, understanding and explaining the behavior of the network becomes challenging due to its size and complex structure. Various emerging optimization-based solutions (e.g., compressive sensing) or data-driven solutions (e.g. deep learning) have been proposed for the aforementioned challenges. However, the fidelity and efficiency of existing methods cannot yet meet the current network requirements. The contributions made in this thesis significantly advance the state of the art in the domain of network measurement and monitoring techniques. Overall, we leverage cutting-edge machine learning technology, deep generative modeling, throughout the entire thesis. First, we design and realize APPSHOT , an efficient city-scale network traffic sharing with a conditional generative model, which only requires open-source contextual data during inference (e.g., land use information and population distribution). Second, we develop an efficient drive testing system — GENDT, based on generative model, which combines graph neural networks, conditional generation, and quantified model uncertainty to enhance the efficiency of mobile drive testing. Third, we design and implement DISTILGAN, a high-fidelity, efficient, versatile, and real-time network telemetry system with latent GANs and spectral-temporal networks. Finally, we propose SPOTLIGHT , an accurate, explainable, and efficient anomaly detection system of the Open RAN (Radio Access Network) system. The lessons learned through this research are summarized, and interesting topics are discussed for future work in this domain. All proposed solutions have been evaluated with real-world datasets and applied to support different applications in real systems

    Essays on Panel Data Prediction Models

    Get PDF
    Forward-looking analysis is valuable for policymakers as they need effective strategies to mitigate imminent risks and potential challenges. Panel data sets contain time series information over a number of cross-sectional units and are known to have superior predictive abilities in comparison to time series only models. This PhD thesis develops novel panel data methods to contribute to the advancement of short-term forecasting and nowcasting of macroeconomic and environmental variables. The two most important highlights of this thesis are the use of cross-sectional dependence in panel data forecasting and to allow for timely predictions and ‘nowcasts’.Although panel data models have been found to provide better predictions in many empirical scenarios, forecasting applications so far have not included cross-sectional dependence. On the other hand, cross-sectional dependence is well-recognised in large panels and has been explicitly modelled in previous causal studies. A substantial portion of this thesis is devoted to developing cross-sectional dependence in panel models suited to diverse empirical scenarios. The second important aspect of this work is to integrate the asynchronous release schedules of data within and across panel units into the panel models. Most of the thesis emphasises the pseudo-real-time predictions with efforts to estimate the model on the data that has been released at the time of predictions, thus trying to replicate the realistic circumstances of delayed data releases.Linear, quantile and non-linear panel models are developed to predict a range of targets both in terms of their meaning and method of measurement. Linear models include panel mixed-frequency vector-autoregression and bridge equation set-ups which predict GDP growth, inflation and CO2 emissions. Panel quantile regressions and latent variable discrete choice models predict growth-at-risk and extreme episodes of cross-border capital flows, respectively. The datasets include both international cross-country panels as well as regional subnational panels. Depending on the nature of the model and the prediction targets, different precision criteria evaluate the accuracy of the models in out-of-sample settings. The generated predictions beat respective standard benchmarks in a more timely fashion

    Graduate Catalog of Studies, 2023-2024

    Get PDF

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Natural and Technological Hazards in Urban Areas

    Get PDF
    Natural hazard events and technological accidents are separate causes of environmental impacts. Natural hazards are physical phenomena active in geological times, whereas technological hazards result from actions or facilities created by humans. In our time, combined natural and man-made hazards have been induced. Overpopulation and urban development in areas prone to natural hazards increase the impact of natural disasters worldwide. Additionally, urban areas are frequently characterized by intense industrial activity and rapid, poorly planned growth that threatens the environment and degrades the quality of life. Therefore, proper urban planning is crucial to minimize fatalities and reduce the environmental and economic impacts that accompany both natural and technological hazardous events

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    The consolidated European synthesis of CH₄ and N₂O emissions for the European Union and United Kingdom: 1990–2019

    Get PDF
    Knowledge of the spatial distribution of the fluxes of greenhouse gases (GHGs) and their temporal variability as well as flux attribution to natural and anthropogenic processes is essential to monitoring the progress in mitigating anthropogenic emissions under the Paris Agreement and to inform its global stocktake. This study provides a consolidated synthesis of CH₄ and N₂O emissions using bottom-up (BU) and top-down (TD) approaches for the European Union and UK (EU27 + UK) and updates earlier syntheses (Petrescu et al., 2020, 2021). The work integrates updated emission inventory data, process-based model results, data-driven sector model results and inverse modeling estimates, and it extends the previous period of 1990–2017 to 2019. BU and TD products are compared with European national greenhouse gas inventories (NGHGIs) reported by parties under the United Nations Framework Convention on Climate Change (UNFCCC) in 2021. Uncertainties in NGHGIs, as reported to the UNFCCC by the EU and its member states, are also included in the synthesis. Variations in estimates produced with other methods, such as atmospheric inversion models (TD) or spatially disaggregated inventory datasets (BU), arise from diverse sources including within-model uncertainty related to parameterization as well as structural differences between models. By comparing NGHGIs with other approaches, the activities included are a key source of bias between estimates, e.g., anthropogenic and natural fluxes, which in atmospheric inversions are sensitive to the prior geospatial distribution of emissions. For CH₄ emissions, over the updated 2015–2019 period, which covers a sufficiently robust number of overlapping estimates, and most importantly the NGHGIs, the anthropogenic BU approaches are directly comparable, accounting for mean emissions of 20.5 Tg CH₄ yrc (EDGARv6.0, last year 2018) and 18.4 Tg CH₄ yr⁻¹ (GAINS, last year 2015), close to the NGHGI estimates of 17.5±2.1 Tg CH₄ yr⁻¹. TD inversion estimates give higher emission estimates, as they also detect natural emissions. Over the same period, high-resolution regional TD inversions report a mean emission of 34 Tg CH₄ yr⁻¹. Coarser-resolution global-scale TD inversions result in emission estimates of 23 and 24 Tg CH₄ yr⁻¹ inferred from GOSAT and surface (SURF) network atmospheric measurements, respectively. The magnitude of natural peatland and mineral soil emissions from the JSBACH–HIMMELI model, natural rivers, lake and reservoir emissions, geological sources, and biomass burning together could account for the gap between NGHGI and inversions and account for 8 Tg CH₄ yr⁻¹. For N₂O emissions, over the 2015–2019 period, both BU products (EDGARv6.0 and GAINS) report a mean value of anthropogenic emissions of 0.9 Tg N₂O yr⁻¹, close to the NGHGI data (0.8±55 % Tg N₂O yr⁻¹). Over the same period, the mean of TD global and regional inversions was 1.4 Tg N₂O yr⁻¹ (excluding TOMCAT, which reported no data). The TD and BU comparison method defined in this study can be operationalized for future annual updates for the calculation of CH₄ and N₂O budgets at the national and EU27 + UK scales. Future comparability will be enhanced with further steps involving analysis at finer temporal resolutions and estimation of emissions over intra-annual timescales, which is of great importance for CH₄ and N₂O, and may help identify sector contributions to divergence between prior and posterior estimates at the annual and/or inter-annual scale. Even if currently comparison between CH₄ and N₂O inversion estimates and NGHGIs is highly uncertain because of the large spread in the inversion results, TD inversions inferred from atmospheric observations represent the most independent data against which inventory totals can be compared. With anticipated improvements in atmospheric modeling and observations, as well as modeling of natural fluxes, TD inversions may arguably emerge as the most powerful tool for verifying emission inventories for CH₄, N₂O and other GHGs. The referenced datasets related to figures are visualized at https://doi.org/10.5281/zenodo.7553800 (Petrescu et al., 2023)

    Evaluation of mixed microalgae species biorefinery of Desmodesmus sp. And Scenedesmus sp. For bioproducts synthesis

    Get PDF
    Microalgae is known to produce numerous bioactive compounds for instance proteins, fatty acid, polysaccharides, enzymes, sterols, and antioxidants. Due to their valuable biochemical composition, microalgae are regarded as a very intriguing source to produce novel food products and can be utilised to improve the nutritional content of traditional foods. Additionally, microalgae are used as animal feed and additives in the cosmetics, pharmaceutical as well as nutraceutical industries. As compared to other terrestrial plants and other microorganisms, microalgae possess few advantages: (1) rapid growth rate; (2) able to grow in non-arable land and harsh cultivation conditions; (3) low nutritional requirements; (4) high productivity; and (5) reduce emission of carbon dioxide. Despite the large number of microalgae species found in nature, only a few species are identified and commercialized such as Chlorella sp., Spirulina sp. Haematococcus pluvialis, Nannochloropsis sp. and Chlamydomonas reinhardtii, which is one of the major obstacles preventing the full utilisation of microalgae-based technology. This thesis provides information on the overall composition of mixed microalgae species, Desmodesmus sp. and Scenedesmus sp., for instance protein, carbohydrate, lipid, antioxidants, and pigment. This thesis firstly introduces the application of triphasic partitioning (TPP) in the extraction and partitioning of the biomolecules from the microalgae. The latest advancement of technology has evolved from a liquid biphasic flotation (LBF) to TPP. T-butanol and ammonium sulphate are used in TPP to precipitate desired biomolecules from the aqueous solutions with the formation of three layer. TPP is a simple, time- and cost- efficient, as well as scalable process that does not require toxic organic solvents. Lipase is abundantly produced by microbes, bacteria, fungi, yeast, mammals, and plants. Lipase is widely used in the oleochemical, detergent, dairy, leather, cosmetics, paper, cosmetics, and nutraceutical industries. Therefore, this thesis also discusses the possibility of identifying and extracting enzyme lipase from the microalgae using LBF. Several parameters (volume and concentration of solvents, weight of biomass, flotation kinetics and solvent types, etc.) have been investigated to optimize the lipase extraction from LBF. Chlorophyll is the main pigment present in the microalgae. Thus, this work proposes the digital imaging approach to determine the chlorophyll concentration in the microalgae rapidly because the chlorophyll content has a significant impact on microalgae physiological health status as well as identifies the chlorophyll concentration in the production of by-products. Lastly, microalgae oil can be used as the feedstock for biodiesel as well as nutraceutical, pharmaceutical, and health-care products. The challenge in the lipid extraction is the co-extraction of chlorophyll into the oil, which can have serious consequences for downstream processing. Therefore, the removal of the chlorophyll from the microalgae using activated clay or sodium chlorite in the pre-treatment procedure are examined. The research achievements in these works and future opportunities are highlighted in the last chapter of the thesis

    Less is More: Restricted Representations for Better Interpretability and Generalizability

    Get PDF
    Deep neural networks are prevalent in supervised learning for large amounts of tasks such as image classification, machine translation and even scientific discovery. Their success is often at the sacrifice of interpretability and generalizability. The increasing complexity of models and involvement of the pre-training process make the inexplicability more imminent. The outstanding performance when labeled data are abundant while prone to overfit when labeled data are limited demonstrates the difficulty of deep neural networks' generalizability to different datasets. This thesis aims to improve interpretability and generalizability by restricting representations. We choose to approach interpretability by focusing on attribution analysis to understand which features contribute to prediction on BERT, and to approach generalizability by focusing on effective methods in a low-data regime. We consider two strategies of restricting representations: (1) adding bottleneck, and (2) introducing compression. Given input x, suppose we want to learn y with the latent representation z (i.e. x→z→y), adding bottleneck means adding function R such that L(R(z)) < L(z) and introducing compression means adding function R so that L(R(y)) < L(y) where L refers to the number of bits. In other words, the restriction is added either in the middle of the pipeline or at the end of it. We first introduce how adding information bottleneck can help attribution analysis and apply it to investigate BERT's behavior on text classification in Chapter 3. We then extend this attribution method to analyze passage reranking in Chapter 4, where we conduct a detailed analysis to understand cross-layer and cross-passage behavior. Adding bottleneck can not only provide insight to understand deep neural networks but can also be used to increase generalizability. In Chapter 5, we demonstrate the equivalence between adding bottleneck and doing neural compression. We then leverage this finding with a framework called Non-Parametric learning by Compression with Latent Variables (NPC-LV), and show how optimizing neural compressors can be used in the non-parametric image classification with few labeled data. To further investigate how compression alone helps non-parametric learning without latent variables (NPC), we carry out experiments with a universal compressor gzip on text classification in Chapter 6. In Chapter 7, we elucidate methods of adopting the perspective of doing compression but without the actual process of compression using T5. Using experimental results in passage reranking, we show that our method is highly effective in a low-data regime when only one thousand query-passage pairs are available. In addition to the weakly supervised scenario, we also extend our method to large language models like GPT under almost no supervision --- in one-shot and zero-shot settings. The experiments show that without extra parameters or in-context learning, GPT can be used for semantic similarity, text classification, and text ranking and outperform strong baselines, which is presented in Chapter 8. The thesis proposes to tackle two big challenges in machine learning --- "interpretability" and "generalizability" through restricting representation. We provide both theoretical derivation and empirical results to show the effectiveness of using information-theoretic approaches. We not only design new algorithms but also provide numerous insights on why and how "compression" is so important in understanding deep neural networks and improving generalizability
    corecore