232 research outputs found
Deep generative models for network data synthesis and monitoring
Measurement and monitoring are fundamental tasks in all networks, enabling the down-stream management and optimization of the network.
Although networks inherently
have abundant amounts of monitoring data, its access and effective measurement is
another story. The challenges exist in many aspects. First, the inaccessibility of network monitoring data for external users, and it is hard to provide a high-fidelity dataset
without leaking commercial sensitive information. Second, it could be very expensive
to carry out effective data collection to cover a large-scale network system, considering the size of network growing, i.e., cell number of radio network and the number of
flows in the Internet Service Provider (ISP) network. Third, it is difficult to ensure fidelity and efficiency simultaneously in network monitoring, as the available resources
in the network element that can be applied to support the measurement function are
too limited to implement sophisticated mechanisms. Finally, understanding and explaining the behavior of the network becomes challenging due to its size and complex
structure. Various emerging optimization-based solutions (e.g., compressive sensing)
or data-driven solutions (e.g. deep learning) have been proposed for the aforementioned challenges. However, the fidelity and efficiency of existing methods cannot yet
meet the current network requirements.
The contributions made in this thesis significantly advance the state of the art in
the domain of network measurement and monitoring techniques. Overall, we leverage
cutting-edge machine learning technology, deep generative modeling, throughout the
entire thesis. First, we design and realize APPSHOT , an efficient city-scale network
traffic sharing with a conditional generative model, which only requires open-source
contextual data during inference (e.g., land use information and population distribution). Second, we develop an efficient drive testing system — GENDT, based on generative model, which combines graph neural networks, conditional generation, and quantified model uncertainty to enhance the efficiency of mobile drive testing. Third, we
design and implement DISTILGAN, a high-fidelity, efficient, versatile, and real-time
network telemetry system with latent GANs and spectral-temporal networks. Finally,
we propose SPOTLIGHT , an accurate, explainable, and efficient anomaly detection system of the Open RAN (Radio Access Network) system. The lessons learned through
this research are summarized, and interesting topics are discussed for future work in
this domain. All proposed solutions have been evaluated with real-world datasets and
applied to support different applications in real systems
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Marchenko-Lippmann-Schwinger inversion
Seismic wave reflections recorded at the Earth’s surface provide a rich source of
information about the structure of the subsurface. These reflections occur due to
changes in the material properties of the Earth; in the acoustic approximation, these
are the density of the Earth and the velocity of seismic waves travelling through it.
Therefore, there is a physical relationship between the material properties of the Earth
and the reflected seismic waves that we observe at the surface. This relationship is
non-linear, due to the highly scattering nature of the Earth, and to our inability to
accurately reproduce these scattered waves with the low resolution velocity models
that are usually available to us. Typically, we linearize the scattering problem by
assuming that the waves are singly-scattered, requiring multiple reflections to be
removed from recorded data at great effort and with varying degrees of success. This
assumption is called the Born approximation.
The equation that describes the relationship between the Earth’s properties and
the fully-scattering reflection data is called the Lippmann-Schwinger equation, and
this equation is linear if the full scattering wavefield inside the Earth could be known.
The development of Marchenko methods makes such wavefields possible to estimate
using only the surface reflection data and an estimate of the direct wave from the
surface to each point in the Earth. Substituting the results from a Marchenko method
into the Lippmann-Schwinger equation results in a linear equation that includes all
orders of scattering. The aim of this thesis is to determine whether higher orders
of scattering improve the linear inverse problem from data to velocities, by comparing
linearized inversion under the Born approximation to the inversion of the linear
Lippmann-Schwinger equation.
This thesis begins by deriving the linear Lippmann-Schwinger and Born inverse
problems, and reviewing the theoretical basis for Marchenko methods. By deriving the
derivative of the full scattering Green’s function with respect to the model parameters
of the Earth, the gradient direction for a new type of least-squares full waveform
inversion called Marchenko-Lippmann-Schwinger full waveform inversion is defined
that uses all orders of scattering.
By recreating the analytical 1D Born inversion of a boxcar perturbation by Beydoun
and Tarantola (1988), it is shown that high frequency-sampling density is required
to correctly estimate the amplitude of the velocity perturbation. More importantly,
even when the scattered wavefield is defined to be singly-scattering and the
velocity model perturbation can be found without matrix inversion, Born inversion
cannot reproduce the true velocity structure exactly. When the results of analytical
inversion are compared to inversions where the inverse matrices have been explicitly
calculated, the analytical inversion is found to be superior. All three matrix inversion
methods are found to be extremely ill-posed. With regularisation, it is possible to
accurately determine the edges of the perturbation, but not the amplitude.
Moving from a boxcar perturbation with a homogeneous starting velocity to a
many-layered 1D model and a smooth representation of this model as the starting
point, it is found that the inversion solution is highly dependent on the starting
model. By optimising an iterative inversion in both the model and data domains, it
is found that optimising the velocity model misfit does not guarantee improvement
in the resulting data misfit, and vice versa. Comparing unregularised inversion to
inversions with Tikhonov damping or smoothing applied to the kernel matrix, it is
found that strong Tikhonov damping results in the most accurate velocity models.
From the consistent under-performance of Lippmann-Schwinger inversion when using
Marchenko-derived Green’s functions compared to inversions carried out with true
Green’s functions, it is concluded that the fallibility of Marchenko methods results in
inferior inversion results.
Born and Lippmann-Schwinger inversion are tested on a 2D syncline model. Due
to computational limitations, using all sources and receivers in the inversion required
limiting the number of frequencies to 5. Without regularisation, the model update
is uninterpretable due to the presence of strong oscillations across the model. With
strong Tikhonov damping, the model updates obtained are poorly scaled, have low
resolution, and low amplitude oscillatory noise remains.
By replacing the inversion of all sources simultaneously with single source inversions,
it is possible to reinstate all frequencies within our limited computational
resources. These single source model updates can be stacked similarly to migration
images to improve the overall model update. As predicted by the 1D analytical inversion,
restoring the full frequency bandwidth eliminates the oscillatory noise from
the inverse solution. With or without regularisation, Born and Lippmann-Schwinger
inversion results are found to be nearly identical. When Marchenko-derived Green’s
functions are introduced, the inversion results are worse than either the Born inversion
or the Lippmann-Schwinger inversion without Marchenko methods. On this basis, one
concludes that the inclusion of higher order scattering does not improve the outcome
of solving the linear inverse scattering problem using currently available methods.
Nevertheless, some recent developments in the methods used to solve the Marchenko
equation hold some promise for improving solutions in future
Measuring the impact of COVID-19 on hospital care pathways
Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospital’s new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted
Reversible Image Watermarking Using Modified Quadratic Difference Expansion and Hybrid Optimization Technique
With increasing copyright violation cases, watermarking of digital images is a very popular solution for securing online media content. Since some sensitive applications require image recovery after watermark extraction, reversible watermarking is widely preferred. This article introduces a Modified Quadratic Difference Expansion (MQDE) and fractal encryption-based reversible watermarking for securing the copyrights of images. First, fractal encryption is applied to watermarks using Tromino's L-shaped theorem to improve security. In addition, Cuckoo Search-Grey Wolf Optimization (CSGWO) is enforced on the cover image to optimize block allocation for inserting an encrypted watermark such that it greatly increases its invisibility. While the developed MQDE technique helps to improve coverage and visual quality, the novel data-driven distortion control unit ensures optimal performance. The suggested approach provides the highest level of protection when retrieving the secret image and original cover image without losing the essential information, apart from improving transparency and capacity without much tradeoff. The simulation results of this approach are superior to existing methods in terms of embedding capacity. With an average PSNR of 67 dB, the method shows good imperceptibility in comparison to other schemes
Mobile app with steganography functionalities
[Abstract]: Steganography is the practice of hiding information within other data, such as images, audios,
videos, etc. In this research, we consider applying this useful technique to create a mobile
application that lets users conceal their own secret data inside other media formats, send that
encoded data to other users, and even perform analysis to images that may have been under
a steganography attack.
For image steganography, lossless compression formats employ Least Significant Bit (LSB)
encoding within Red Green Blue (RGB) pixel values. Reciprocally, lossy compression formats,
such as JPEG, utilize data concealment in the frequency domain by altering the quantized
matrices of the files.
Video steganography follows two similar methods. In lossless video formats that permit
compression, the LSB approach is applied to the RGB pixel values of individual frames.
Meanwhile, in lossy High Efficient Video Coding (HEVC) formats, a displaced bit modification
technique is used with the YUV components.[Resumo]: A esteganografÃa é a práctica de ocultar determinada información dentro doutros datos,
como imaxes, audio, vÃdeos, etc. Neste proxecto pretendemos aplicar esta técnica como visión
para crear unha aplicación móbil que permita aos usuarios ocultar os seus propios datos
secretos dentro doutros formatos multimedia, enviar eses datos cifrados a outros usuarios e
mesmo realizar análises de imaxes que puidesen ter sido comprometidas por un ataque esteganográfico.
Para a esteganografÃa de imaxes, os formatos con compresión sen perdas empregan a
codificación Least Significant Bit (LSB) dentro dos valores Red Green Blue (RGB) dos seus
pÃxeles. Por outra banda, os formatos de compresión con perdas, como JPEG, usan a ocultación
de datos no dominio de frecuencia modificando as matrices cuantificadas dos ficheiros.
A esteganografÃa de vÃdeo segue dous métodos similares. En formatos de vÃdeo sen perdas,
o método LSB aplÃcase aos valores RGB de pÃxeles individuais de cadros. En cambio, nos
formatos High Efficient Video Coding (HEVC) con compresión con perdas, úsase unha técnica
de cambio de bits nos compoñentes YUV.Traballo fin de grao (UDC.FIC). EnxeñarÃa Informática. Curso 2022/202
Systematic Approaches for Telemedicine and Data Coordination for COVID-19 in Baja California, Mexico
Conference proceedings info:
ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologies
Raleigh, HI, United States, March 24-26, 2023
Pages 529-542We provide a model for systematic implementation of telemedicine within a large evaluation center for COVID-19 in the area of Baja California, Mexico. Our model is based on human-centric design factors and cross disciplinary collaborations for scalable data-driven enablement of smartphone, cellular, and video Teleconsul-tation technologies to link hospitals, clinics, and emergency medical services for point-of-care assessments of COVID testing, and for subsequent treatment and quar-antine decisions. A multidisciplinary team was rapidly created, in cooperation with different institutions, including: the Autonomous University of Baja California, the Ministry of Health, the Command, Communication and Computer Control Center
of the Ministry of the State of Baja California (C4), Colleges of Medicine, and the College of Psychologists. Our objective is to provide information to the public and to evaluate COVID-19 in real time and to track, regional, municipal, and state-wide data in real time that informs supply chains and resource allocation with the anticipation of a surge in COVID-19 cases. RESUMEN Proporcionamos un modelo para la implementación sistemática de la telemedicina dentro de un gran centro de evaluación de COVID-19 en el área de Baja California, México. Nuestro modelo se basa en factores de diseño centrados en el ser humano y colaboraciones interdisciplinarias para la habilitación escalable basada en datos de tecnologÃas de teleconsulta de teléfonos inteligentes, celulares y video para vincular hospitales, clÃnicas y servicios médicos de emergencia para evaluaciones de COVID en el punto de atención. pruebas, y para el tratamiento posterior y decisiones de cuarentena. Rápidamente se creó un equipo multidisciplinario, en cooperación con diferentes instituciones, entre ellas: la Universidad Autónoma de Baja California, la SecretarÃa de Salud, el Centro de Comando, Comunicaciones y Control Informático.
de la SecretarÃa del Estado de Baja California (C4), Facultades de Medicina y Colegio de Psicólogos. Nuestro objetivo es proporcionar información al público y evaluar COVID-19 en tiempo real y rastrear datos regionales, municipales y estatales en tiempo real que informan las cadenas de suministro y la asignación de recursos con la anticipación de un aumento de COVID-19. 19 casos.ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologieshttps://doi.org/10.1007/978-981-99-3236-
Research Paper: Process Mining and Synthetic Health Data: Reflections and Lessons Learnt
Analysing the treatment pathways in real-world health data can provide valuable insight for clinicians and decision-makers. However, the procedures for acquiring real-world data for research can be restrictive, time-consuming and risks disclosing identifiable information. Synthetic data might enable representative analysis without direct access to sensitive data. In the first part of our paper, we propose an approach for grading synthetic data for process analysis based on its fidelity to relationships found in real-world data. In the second part, we apply our grading approach by assessing cancer patient pathways in a synthetic healthcare dataset (The Simulacrum provided by the English National Cancer Registration and Analysis Service) using process mining. Visualisations of the patient pathways within the synthetic data appear plausible, showing relationships between events confirmed in the underlying non-synthetic data. Data quality issues are also present within the synthetic data which reflect real-world problems and artefacts from the synthetic dataset’s creation. Process mining of synthetic data in healthcare is an emerging field with novel challenges. We conclude that researchers should be aware of the risks when extrapolating results produced from research on synthetic data to real-world scenarios and assess findings with analysts who are able to view the underlying data
Recommended from our members
Computational Methods in Multi-Messenger Astrophysics using Gravitational Waves and High Energy Neutrinos
This dissertation seeks to describe advancements made in computational methods for multi-messenger astrophysics (MMA) using gravitational waves GW and neutrinos during Advanced LIGO (aLIGO)’s first through third observing runs (O1-O3) and, looking forward, to describe novel computational techniques suited to the challenges of both the burgeoning MMA field and high-performance computing as a whole.
The first two chapters provide an overview of MMA as it pertains to gravitational wave/high energy neutrino (GWHEN) searches, including a summary of expected astrophysical sources as well as GW, neutrino, and gamma-ray detectors used in their detection. These are followed in the third chapter by an in-depth discussion of LIGO’s timing system, particularly the diagnostic subsystem, describing both its role in MMA searches and the author’s contributions to the system itself.
The fourth chapter provides a detailed description of the Low-Latency Algorithm for Multi-messenger Astrophysics (LLAMA), the GWHEN pipeline developed by the author and used in O2 and O3. Relevant past multi-messenger searches are described first, followed by the O2 and O3 analysis methods, the pipeline’s performance, scientific results, and finally, an in-depth account of the library’s structure and functionality. In particular, the author’s high-performance multi-order coordinates (MOC) HEALPix image analysis library, HPMOC, is described. HPMOC increases performance of HEALPix image manipulations by several orders of magnitude vs. naive single-resolution approaches while presenting a simple high-level interface and should prove useful for diverse future MMA searches. The performance improvements it provides for LLAMA are also covered.
The final chapter of this dissertation builds on the approaches taken in developing HPMOC, presenting several novel methods for efficiently storing and analyzing large data sets, with applications to MMA and other data-intensive fields. A family of depth-first multi-resolution ordering of HEALPix images — DEPTH9, DEPTH19, and DEPTH40 — is defined, along with algorithms and use cases where it can improve on current approaches, including high-speed streaming calculations suitable for serverless compute or FPGAs.
For performance-constrained analyses on HEALPix data (e.g. image analysis in multi-messenger search pipelines) using SIMD processors, breadth-first data structures can provide short-circuiting calculations in a data-parallel way on compressed data; a simple compression method is described with application to further improving LLAMA performance.
A new storage scheme and associated algorithms for efficiently compressing and contracting tensors of varying sparsity is presented; these demuxed tensors (D-Tensors) have equivalent asymptotic time and space complexity to optimal representations of both dense and sparse matrices, and could be used as a universal drop-in replacement to reduce code complexity and developer effort while improving performance of existing non-optimized numerical code. Finally, the big bucket hash table (B-Table), a novel type of hash table making guarantees on data layout (vs. load factor), is described, along with optimizations it allows for (like hardware acceleration, online rebuilds, and hard realtime applications) that are not possible with existing hash table approaches. These innovations are presented in the hope that some will prove useful for improving future MMA searches and other data-intensive applications
- …