35 research outputs found

    Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

    Full text link
    Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show that dynamic gating improves maximum throughput by 6.21-11.23×\times for LM, 5.75-10.98×\times for MT Encoder and 2.58-5.71×\times for MT Decoder. It also reduces memory usage by up to 1.36×\times for LM and up to 1.1×\times for MT. We further propose Expert Buffering, a new caching mechanism that only keeps hot, active experts in GPU memory while buffering the rest in CPU memory. This reduces static memory allocation by up to 1.47×\times. We finally propose a load balancing methodology that provides additional scalability to the workload

    Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

    Full text link
    Code datasets, often collected from diverse and uncontrolled sources such as GitHub, potentially suffer from quality issues, thereby affecting the performance and training efficiency of Large Language Models (LLMs) optimized for code generation. Previous studies demonstrated the benefit of using embedding spaces for data pruning, but they mainly focused on duplicate removal or increasing variety, and in other modalities, such as images. Our work focuses on using embeddings to identify and remove "low-quality" code data. First, we explore features of "low-quality" code in embedding space, through the use of synthetic corruptions. Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset. We demonstrate the benefits of this synthetic corruption informed pruning (SCIP) approach on the well-established HumanEval and MBPP benchmarks, outperforming existing embedding-based methods. Importantly, we achieve up to a 3% performance improvement over no pruning, thereby showing the promise of insights from synthetic corruptions for data pruning.Comment: 12 pages, 4 figures, Oral Presentation at 3rd Workshop on Efficient Natural Language and Speech Processing (ENLSP-III), NeurIPS 202

    Pricing Python Parallelism: A Dynamic Language Cost Model for Heterogeneous Platforms

    Get PDF
    Execution times may be reduced by offloading parallel loop nests to a GPU. Auto-parallelizing compilers are common for static languages, often using a cost model to determine when the GPU execution speed will outweigh the offload overheads. Nowadays scientific software is increasingly written in dynamic languages and would benefit from compute accelerators. The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures. We present the first analytical cost model for auto-parallelizing loop nests in a dynamic language on heterogeneous architectures. Predicting execution time in a language like Python is extremely challenging, since aspects like the element types, size of the iteration space, and amenability to parallelization can only be determined at runtime. Hence the cost model must be both staged, to combine compile and run-time information, and lightweight to minimize runtime overhead. GPU execution time prediction must account for factors like data transfer, block-structured execution, and starvation. We show that a comparatively simple, staged analytical model can accurately determine during execution when it is profitable to offload a loop nest. We evaluate our model on three heterogeneous platforms across 360 experiments with 12 loop-intensive Python benchmark programs. The results show small misprediction intervals and a mean slowdown of just 13.6%, relative to the optimal (oracular) offload strategy

    Exome sequencing utility in defining the genetic landscape of hearing loss and novel-gene discovery in Iran

    Get PDF
    Hearing loss (HL) is one of the most common sensory defects affecting more than 466 million individuals worldwide. It is clinically and genetically heterogeneous with over 120 genes causing non-syndromic HL identified to date. Here, we performed exome sequencing (ES) on a cohort of Iranian families with no disease-causing variants in known deafness-associated genes after screening with a targeted gene panel. We identified likely causal variants in 20 out of 71 families screened. Fifteen families segregated variants in known deafness-associated genes. Eight families segregated variants in novel candidate genes for HL: DBH, TOP3A, COX18, USP31, TCF19, SCP2, TENM1, and CARMIL1. In the three of these families, intrafamilial locus heterogeneity was observed with variants in both known and novel candidate genes. In aggregate, we were able to identify the underlying genetic cause of HL in nearly 30 of our study cohort using ES. This study corroborates the observation that high-throughput DNA sequencing in populations with high rates of consanguineous marriages represents a more appropriate strategy to elucidate the genetic etiology of heterogeneous conditions such as HL. © 2021 John Wiley & Sons A/S. Published by John Wiley & Sons Lt

    Prediction of Parallel Artificial Membrane Permeability Assay of Some Drugs from their Theoretically Calculated Molecular Descriptors

    Get PDF
    Parallel artificial membrane permeation assays (PAMPA) have been extensively utilized to determine the drug permeation potentials. In the present work, the permeation of miscellaneous drugs measured as flux by PAMPA (logF) of 94 drugs, are predicted by quantitative structure property relationships modeling based on a variety of calculated theoretical descriptors, which screened and selected by genetic algorithm (GA) variable subset selection procedure. These descriptors were used as inputs for generated artificial neural networks. After generation, optimization and training of artificial neural network (5:3:1), it was used for the prediction of logF for the training, test and validation sets. The standard error for the GA-ANN calculated logF for training, test and validation sets are 0.17, 0.028 and 0.15 respectively, which are smaller than those obtained by GA-MLR model (0.26, 0.051 and 0.22, respectively). Results obtained reveal the reliability and good predictably of neural network model in the prediction of membrane permeability of drugs

    Dietary quercetin impacts the concentration of pesticides in honey bees

    No full text
    Honey bees are important pollinators and are subject to numerous stressors, such as changing floral resources, parasites, and agrochemical exposure. Pesticide exposure has been linked to the decline in the global honey bee population. We have limited knowledge of the metabolic pathways and synergistic effects of xenobiotics in bees. Quercetin is one of the most abundant phytochemicals in plants and is therefore abundant in the honey bee diet. Quercetin can upregulate the detoxification system in honey bees; however, it is still unknown to what extent quercetin ingestion can reduce the content of absorbed pesticides. In this study, we investigated the effect of dietary quercetin on the contents of three pesticides in honey bees: imidacloprid (insecticide), tebuconazole (fungicide), and tau-fluvalinate (insecticide and acaricide). Bees were divided into two main groups and fed either quercetin-sucrose paste or only sucrose for 72 h. Thereafter, they were orally exposed to ∼10 ng/bee imidacloprid or contact-exposed to ∼0.9 μg/bee tau-fluvalinate or ∼5.2 μg/bee tebuconazole. After 1 h of oral exposure or 24 h of contact exposure, the bees were anaesthetised with CO2, sacrificed by freezing, and extracted with a validated QuEChERS method. Subsequently, the concentrations of the three pesticides and quercetin in the bees were determined with a triple quadrupole tandem mass spectrometer coupled to an HPLC system. No significant effect on the concentration of tebuconazole or tau-fluvalinate was observed in bees fed quercetin. Intake of quercetin led to a reduction in the concentration of imidacloprid in honey bees. Quercetin-rich plants may be exploited in future beekeeping

    Indagine conoscitiva dello stato nutrizionale e delle abitudini di vita di un campione di alunni delle scuole elementari della provincia di Pavia

    No full text
    Obiettivo: nell’ambito della prevenzione delle malattie cerebrocardiovascolari, contribuire ad un’ulteriore definizione di strategie efficaci, orientate alla promozione di stili di vita sani in diversi gruppi di popolazione della provincia lombarda. Rendendo disponibili dati di confronto ai fini della valutazione di efficacia a lungo termine degli interventi di prevenzione intrapresi. Materiali e metodi: E’ stata condotta una indagine, mediante intervista strutturata, per rilevare alcune abitudini di vita nell’arco della giornata, tra le quali alimentazione, attività fisica, gioco/sport e movimento, in un campione rappresentativo di bambini di 10 anni mediante uno studio su 460 alunni frequentanti la quinta elementare nelle scuole del territorio provinciale. Le modalità di campionamento hanno tenuto conto delle caratteristiche geografiche della Provincia di Pavia e della densità abitativa, classificando le città in grandi (30.000-80.000 ab.),medie (5.000-30.000 ab.) e piccole (meno di 5.000 ab). Nel campione sono stati inoltre rilevati i parametri antropometrici e calcolato il relativo B.M.I.(secondo tabelle di Cole.e coll.). Risultati: Nel campione considerato il 23,3% dei bambini risulta essere in sovrappeso e il 12,6% obeso. La percentuale dei bambini obesi è significativamente superiore nelle medie e piccole città rispetto alle grandi. La prevalenza degli obesi è superiore nei bambini che non praticano attività sportiva regolare (almeno una volta alla settimana) ed in quelli che trascorrono il pomeriggio a guardare la tv od a giocare con la play-station. Discussioni e conclusioni: La significatività dei dati emersi dallo studio condotto risulta di grande interesse sia ai fini della pianificazione di interventi educativi ed informativi mirati che per la relativa individuazione di indicatori di risultato per la valutazione a lungo termine
    corecore