77 research outputs found

    Reliable Generation of EHR Time Series via Diffusion Models

    Full text link
    Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six datasets, comparing our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data utility while requiring less training effort. Our approach also enhances downstream medical data analysis by providing diverse and realistic synthetic EHR data

    Spitzer View of Massive Star Formation in the Tidally Stripped Magellanic Bridge

    Get PDF
    The Magellanic Bridge is the nearest low-metallicity, tidally stripped environment, offering a unique high-resolution view of physical conditions in merging and forming galaxies. In this paper we present analysis of candidate massive young stellar objects (YSOs), i.e., {\it in situ, current} massive star formation (MSF) in the Bridge using {\it Spitzer} mid-IR and complementary optical and near-IR photometry. While we definitely find YSOs in the Bridge, the most massive are ∼10M⊙\sim10 M_\odot, ≪45M⊙\ll45 M_\odot found in the Large Magellanic Cloud (LMC). The intensity of MSF in the Bridge also appears decreasing, as the most massive YSOs are less massive than those formed in the past. To investigate environmental effects on MSF, we have compared properties of massive YSOs in the Bridge to those in the LMC. First, YSOs in the Bridge are apparently less embedded than in the LMC: 81% of Bridge YSOs show optical counterparts, compared to only 56% of LMC sources with the same range of mass, circumstellar dust mass, and line-of-sight extinction. Circumstellar envelopes are evidently more porous or clumpy in the Bridge's low-metallicity environment. Second, we have used whole samples of YSOs in the LMC and the Bridge to estimate the probability of finding YSOs at a given \hi\ column density, N(HI). We found that the LMC has ∼3×\sim3\times higher probability than the Bridge for N(HI) >10×1020>10\times10^{20} cm−2^{-2}, but the trend reverses at lower N(HI). Investigating whether this lower efficiency relative to HI is due to less efficient molecular cloud formation, or less efficient cloud collapse, or both, will require sensitive molecular gas observations.Comment: 41 pages, 20 figures, 6 tables; accepted for publication in ApJ; several figures are in low resolution due to the size limit here and a high resolution version can be downloaded via http://www.astro.virginia.edu/~cc5ye/ms_bridge20140215.pd

    Recent Advances in Machine Learning for Network Automation in the O-RAN

    Get PDF
    © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/The evolution of network technologies has witnessed a paradigm shift toward open and intelligent networks, with the Open Radio Access Network (O-RAN) architecture emerging as a promising solution. O-RAN introduces disaggregation and virtualization, enabling network operators to deploy multi-vendor and interoperable solutions. However, managing and automating the complex O-RAN ecosystem presents numerous challenges. To address this, machine learning (ML) techniques have gained considerable attention in recent years, offering promising avenues for network automation in O-RAN. This paper presents a comprehensive survey of the current research efforts on network automation using ML in O-RAN. We begin by providing an overview of the O-RAN architecture and its key components, highlighting the need for automation. Subsequently, we delve into O-RAN support for ML techniques. The survey then explores challenges in network automation using ML within the O-RAN environment, followed by the existing research studies discussing application of ML algorithms and frameworks for network automation in O-RAN. The survey further discusses the research opportunities by identifying important aspects where ML techniques can benefit.Peer reviewe

    A Spitzer Space Telescope far-infrared spectral atlas of compact sources in the Magellanic Clouds. I. The Large Magellanic Cloud

    Full text link
    [abridged] We present 52-93 micron spectra obtained with Spitzer in the MIPS-SED mode, of a representative sample of luminous compact far-IR sources in the LMC. These include carbon stars, OH/IR AGB stars, post-AGB objects and PNe, RCrB-type star HV2671, OH/IR red supergiants WOHG064 and IRAS05280-6910, B[e] stars IRAS04530-6916, R66 and R126, Wolf-Rayet star Brey3a, Luminous Blue Variable R71, supernova remnant N49, a large number of young stellar objects, compact HII regions and molecular cores, and a background galaxy (z~0.175). We use the spectra to constrain the presence and temperature of cold dust and the excitation conditions and shocks within the neutral and ionized gas, in the circumstellar environments and interfaces with the surrounding ISM. Evolved stars, including LBV R71, lack cold dust except in some cases where we argue that this is swept-up ISM. This leads to an estimate of the duration of the prolific dust-producing phase ("superwind") of several thousand years for both RSGs and massive AGB stars, with a similar fractional mass loss experienced despite the different masses. We tentatively detect line emission from neutral oxygen in the extreme RSG WOHG064, with implications for the wind driving. In N49, the shock between the supernova ejecta and ISM is revealed by its strong [OI] 63-micron emission and possibly water vapour; we estimate that 0.2 Msun of ISM dust was swept up. Some of the compact HII regions display pronounced [OIII] 88-micron emission. The efficiency of photo-electric heating in the interfaces of ionized gas and molecular clouds is estimated at 0.1-0.3%. We confirm earlier indications of a low nitrogen content in the LMC. Evidence for solid state emission features is found in both young and evolved object; some of the YSOs are found to contain crystalline water ice.Comment: Accepted for publication in The Astronomical Journal. This paper accompanies the Summer 2009 SAGE-Spec release of 48 MIPS-SED spectra, but uses improved spectrum extraction. (Fig. 2 reduced resolution because of arXiv limit.

    SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

    Full text link
    What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communicatio

    Using Pre-existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance

    Get PDF
    Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study
    • …
    corecore