101 research outputs found

    Are We There Yet? Product Quantization and its Hardware Acceleration

    Full text link
    Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs). Recently, product quantization (PQ) has been successfully applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. While this property makes PQ an attractive solution for model acceleration, little is understood about the associated trade-offs in terms of compute and memory footprint, and the impact on accuracy. Our empirical study investigates the impact of different PQ settings and training methods on layerwise reconstruction error and end-to-end model accuracy. When studying the efficiency of deploying PQ DNNs, we find that metrics such as FLOPs, number of parameters, and even CPU/GPU performance, can be misleading. To address this issue, and to more fairly assess PQ in terms of hardware efficiency, we design the first custom hardware accelerator to evaluate the speed and efficiency of running PQ models. We identify PQ configurations that are able to improve performance-per-area for ResNet20 by 40%-104%, even when compared to a highly optimized conventional DNN accelerator. Our hardware performance outperforms recent PQ solutions by 4x, with only a 0.6% accuracy degradation. This work demonstrates the practical and hardware-aware design of PQ models, paving the way for wider adoption of this emerging DNN approximation methodology

    ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

    Full text link
    When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.Comment: Published as a conference paper at ICLR 202

    How Much Is Hidden in the NAS Benchmarks? Few-Shot Adaptation of a NAS Predictor

    Full text link
    Neural architecture search has proven to be a powerful approach to designing and refining neural networks, often boosting their performance and efficiency over manually-designed variations, but comes with computational overhead. While there has been a considerable amount of research focused on lowering the cost of NAS for mainstream tasks, such as image classification, a lot of those improvements stem from the fact that those tasks are well-studied in the broader context. Consequently, applicability of NAS to emerging and under-represented domains is still associated with a relatively high cost and/or uncertainty about the achievable gains. To address this issue, we turn our focus towards the recent growth of publicly available NAS benchmarks in an attempt to extract general NAS knowledge, transferable across different tasks and search spaces. We borrow from the rich field of meta-learning for few-shot adaptation and carefully study applicability of those methods to NAS, with a special focus on the relationship between task-level correlation (domain shift) and predictor transferability; which we deem critical for improving NAS on diverse tasks. In our experiments, we use 6 NAS benchmarks in conjunction, spanning in total 16 NAS settings -- our meta-learning approach not only shows superior (or matching) performance in the cross-validation experiments but also successful extrapolation to a new search space and tasks

    A first look into the carbon footprint of federated learning

    Full text link
    Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in datacenters. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL, in particular, is starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and civil society for privacy protection. However, the potential environmental impact related to FL remains unclear and unexplored. This paper offers the first-ever systematic study of the carbon footprint of FL. First, we propose a rigorous model to quantify the carbon footprint, hence facilitating the investigation of the relationship between FL design and carbon emissions. Then, we compare the carbon footprint of FL to traditional centralized learning. Our findings show that FL, despite being slower to converge in some cases, may result in a comparatively greener impact than a centralized equivalent setup. We performed extensive experiments across different types of datasets, settings, and various deep learning models with FL. Finally, we highlight and connect the reported results to the future challenges and trends in FL to reduce its environmental impact, including algorithms efficiency, hardware capabilities, and stronger industry transparency.Comment: arXiv admin note: substantial text overlap with arXiv:2010.0653

    The Trp73 Mutant Mice: A Ciliopathy Model That Uncouples Ciliogenesis From Planar Cell Polarity

    Get PDF
    p73 transcription factor belongs to one of the most important gene families in vertebrate biology, the p53-family. Trp73 gene, like the other family members, generates multiple isoforms named TA and DNp73, with different and, sometimes, antagonist functions. Although p73 shares many biological functions with p53, it also plays distinct roles during development. Trp73 null mice (p73KO from now on) show multiple phenotypes as gastrointestinal and cranial hemorrhages, rhinitis and severe central nervous system defects. Several groups, including ours, have revisited the apparently unrelated phenotypes observed in total p73KO and revealed a novel p73 function in the organization of ciliated epithelia in brain and trachea, but also an essential role as regulator of ependymal planar cell polarity. Unlike p73KO or TAp73KO mice, tumor-prone Trp53−/− mice (p53KO) do not present ependymal ciliary or planar cell polarity defects, indicating that regulation of ciliogenesis and PCP is a p73-specific function. Thus, loss of ciliary biogenesis and epithelial organization might be a common underlying cause of the diverse p73KO-phenotypes, highlighting Trp73 role as an architect of the epithelial tissue. In this review we would like to discuss the data regarding p73 role as regulator of ependymal cell ciliogenesis and PCP, supporting the view of the Trp73-mutant mice as a model that uncouples ciliogenesis from PCP and a possible model of human congenital hydrocephalus

    Hydroxychloroquine is associated with a lower risk of polyautoimmunity: data from the RELESSER Registry

    Get PDF
    OBJECTIVES: This article estimates the frequency of polyautoimmunity and associated factors in a large retrospective cohort of patients with SLE. METHODS: RELESSER (Spanish Society of Rheumatology Lupus Registry) is a nationwide multicentre, hospital-based registry of SLE patients. This is a cross-sectional study. The main variable was polyautoimmunity, which was defined as the co-occurrence of SLE and another autoimmune disease, such as autoimmune thyroiditis, RA, scleroderma, inflammatory myopathy and MCTD. We also recorded the presence of multiple autoimmune syndrome, secondary SS, secondary APS and a family history of autoimmune disease. Multiple logistic regression analysis was performed to investigate possible risk factors for polyautoimmunity. RESULTS: Of the 3679 patients who fulfilled the criteria for SLE, 502 (13.6%) had polyautoimmunity. The most frequent types were autoimmune thyroiditis (7.9%), other systemic autoimmune diseases (6.2%), secondary SS (14.1%) and secondary APS (13.7%). Multiple autoimmune syndrome accounted for 10.2% of all cases of polyautoimmunity. A family history was recorded in 11.8%. According to the multivariate analysis, the factors associated with polyautoimmunity were female sex [odds ratio (95% CI), 1.72 (1.07, 2.72)], RP [1.63 (1.29, 2.05)], interstitial lung disease [3.35 (1.84, 6.01)], Jaccoud arthropathy [1.92 (1.40, 2.63)], anti-Ro/SSA and/or anti-La/SSB autoantibodies [2.03 (1.55, 2.67)], anti-RNP antibodies [1.48 (1.16, 1.90)], MTX [1.67 (1.26, 2.18)] and antimalarial drugs [0.50 (0.38, 0.67)]. CONCLUSION: Patients with SLE frequently present polyautoimmunity. We observed clinical and analytical characteristics associated with polyautoimmunity. Our finding that antimalarial drugs protected against polyautoimmunity should be verified in future studies

    The time scale of recombination rate evolution in great apes

    Get PDF
    We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequence data from 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez J, et al. 2013. Great ape genetic diversity and population history. Nature 499:471-475). We also identified species-specific recombination hotspots in each group using a modified LDhot framework, which greatly improves statistical power to detect hotspots at varying strengths. We show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time scale of complete hotspot turnover. Further, using species-specific PRDM9 sequences to predict potential binding sites (PBS), we show higher predicted PRDM9 binding in recombination hotspots as compared to matched cold spot regions in multiple great ape species, including at least one chimpanzee subspecies. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10-15Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, on the basis of multiple linear regression analysis, we found that various correlates of recombination rate persist throughout the African great apes including repeats, diversity, and divergence. Our study is the first to analyze within- And between-species genome-wide recombination rate variation in several close relatives

    Estimating the global conservation status of more than 15,000 Amazonian tree species

    Get PDF
    • 

    corecore