Search CORE

18 research outputs found

MTrainS: Improving DLRM training efficiency using heterogeneous memories

Author: Akers Jason
Ardestani Ehsan K.
Dreslinski Ronald
Ghosh Mrinmoy
Johnson Paul
Kassa Hiwot Tadese
Liu Xing
Mudigere Dheevatsa
Park Jongsoo
Tulloch Andrew
Publication venue
Publication date: 19/04/2023
Field of study

Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model growth demands a large number of resources in data centers. Hence, training efficiency is becoming considerably more important to keep the data center power demand manageable. In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth. In this paper, we study the bandwidth requirement and locality of embedding tables in real-world deployed models. We observe that the bandwidth requirement is not uniform across different tables and that embedding tables show high temporal locality. We then design MTrainS, which leverages heterogeneous memory, including byte and block addressable Storage Class Memory for DLRM hierarchically. MTrainS allows for higher memory capacity per node and increases training efficiency by lowering the need to scale out to multiple hosts in memory capacity bound use cases. By optimizing the platform memory hierarchy, we reduce the number of nodes for training by 4-8X, saving power and cost of training while meeting our target training performance

arXiv.org e-Print Archive

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments

arXiv.org e-Print Archive

Global, regional, and national burden of colorectal cancer and its risk factors, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

Author: Abbasi-kangevari Mohsen
Abd-rabu Rami
Abidi Hassan
Abu-gharbieh Eman
Acuna Juan Manuel
Adhikari Sangeet
Advani Shailesh M
Afzal Muhammad Sohail
Aghaie Meybodi Mohamad
Ahinkorah Bright Opoku
Ahmad Sajjad
Ahmadi Ali
Ahmadi Sepideh
Ahmed Haroon
Ahmed Luai A
Ahmed Muktar Beshir
Al Hamad Hanadi
Al-raddadi Rajaa M
Alahdab Fares
Alanezi Fahad Mashhour
Alanzi Turki M
Alhalaiqa Fadwa Alhalaiqa Naji
Alimohamadi Yousef
Alipour Vahid
Aljunid Syed Mohamed
Alkhayyat Motasem
Almustanyir Sami
Alvand Saba
Alvis-guzman Nelson
Amini Saeed
Ancuceanu Robert
Anoushiravani Amir
Anoushirvani Ali Arash
Ansari-moghaddam Alireza
Arabloo Jalal
Aryannejad Armin
Asghari Jafarabadi Mohammad
Athari Seyyed Shamsadin
Ausloos Floriane
Ausloos Marcel
Awedew Atalel Fentahun
Awoke Mamaru Ayenew
Ayana Tegegn Mulatu
Azadnajafabad Sina
Azami Hiva
Azangou-khyavy Mohammadreza
Azari Jafari Amirhossein
Badiye Ashish D
Bagherieh Sara
Bahadory Saeed
Baig Atif Amin
Baker Jennifer L
Banach Maciej
Barrow Amadou
Berhie Alemshet Yirga
Besharat Sima
Bhagat Devidas S
Bhagavathula Akshaya Srikanth
Bhala Neeraj
Bhattacharyya Krittika
Bhojaraja Vijayalakshmi S
Bibi Sadia
Bijani Ali
Biondi Antonio
Bjørge Tone
Bodicha Belay Boda Abule
Braithwaite Dejana
Brenner Hermann
Calina Daniela
Cao Chao
Cao Yin
Carreras Giulia
Carvalho Felix
Cerin Ester
Chakinala Raja Chandra
Cho William C S
Chu Dinh-toi
Conde Joao
Costa Vera Marisa
Cruz-martins Natália
Dadras Omid
Dai Xiaochen
Dandona Lalit
Dandona Rakhi
Danielewicz Anna
Demeke Feleke Mekonnen
Demissie Getu Debalkie
Desai Rupak
Dhamnetiya Deepak
Dianatinasab Mostafa
Diaz Daniel
Didehdar Mojtaba
Doaei Saeid
Doan Linh Phuong
Dodangeh Milad
Eghbalian Fatemeh
Ejeta Debela Debela
Ekholuenetale Michael
Ekundayo Temitope Cyrus
El Sayed Iman
Elhadi Muhammed
Enyew Daniel Berhanie
Eyayu Tahir
Ezzeddini Rana
Fakhradiyev Ildar Ravisovich
Farooque Umar
Farrokhpour Hossein
Farzadfar Farshad
Fatehizadeh Ali
Fattahi Hamed
Fattahi Nima
Fereidoonnezhad Masood
Fernandes Eduarda
Fetensa Getahun
Filip Irina
Fischer Florian
Foroutan Masoud
Gaal Peter Andras
Gad Mohamed M
Gallus Silvano
Garg Tushar
Getachew Tamiru
Ghamari Seyyed-hadi
Ghashghaee Ahmad
Ghith Nermin
Gholamalizadeh Maryam
Gholizadeh Navashenaq Jamshid
Gizaw Abraham Tamirat
Glasbey James C
Golechha Mahaveer
Goleij Pouya
Gonfa Kebebe Bekele
Gorini Giuseppe
Guha Avirup
Gupta Sapna
Gupta Veer Bala
Gupta Vivek Kumar
Haddadi Rasool
Hafezi-nejad Nima
Haj-mirzaian Arvin
Halwani Rabih
Haque Shafiul
Hariri Sanam
Hasaballah Ahmed I
Hassanipour Soheil
Hay Simon I
Herteliu Claudiu
Holla Ramesh
Hosseini Mohammad-salar
Hosseinzadeh Mehdi
Hostiuc Mihaela
Househ Mowafa
Huang Junjie
Humayun Ayesha
Iavicoli Ivo
Ilesanmi Olayinka Stephen
Ilic Irena M
Ilic Milena D
Islami Farhad
Iwagami Masao
Jahani Mohammad Ali
Jakovljevic Mihajlo
Javaheri Tahereh
Jayawardena Ranil
Jebai Rime
Jha Ravi Prakash
Joo Tamas
Joseph Nitin
Joukar Farahnaz
Jozwiak Jacek Jerzy
Kabir Ali
Kalhor Rohollah
Kamath Ashwin
Kapoor Neeti
Karaye Ibraheem M
Karimi Amirali
Kauppila Joonas H
Kazemi Asma
Keykhaei Mohammad
Khader Yousef Saleh
Khajuria Himanshu
Khalilov Rovshan
Khanali Javad
Khayamzadeh Maryam
Khodadost Mahmoud
Kim Hanna
Kim Min Seo
Kisa Adnan
Kisa Sezer
Kolahi Ali-asghar
Koohestani Hamid Reza
Kopec Jacek A
Koteeswaran Rajasekaran
Koyanagi Ai
Krishnamoorthy Yuvaraj
Kumar G Anil
Kumar Manoj
Kumar Vivek
La Vecchia Carlo
Lami Faris Hasan
Landires Iván
Ledda Caterina
Lee Sang-woong
Lee Wei-chen
Lee Yeong Yeh
Leong Elvynna
Li Bingyu
Lim Stephen S
Lobo Stany W
Loureiro Joana A
Lunevicius Raimundas
Madadizadeh Farzan
Mahmoodpoor Ata
Majeed Azeem
Malekpour Mohammad-reza
Malekzadeh Reza
Malik Ahmad Azam
Mansour-ghanaei Fariborz
Mantovani Lorenzo Giovanni
Martorell Miquel
Masoudi Sahar
Mathur Prashant
Meena Jitendra Kumar
Mehrabi Nasab Entezar
Mendoza Walter
Mentis Alexios-fotios A
Mestrovic Tomislav
Miao Jonasson Junmei
Miazgowski Bartosz
Miazgowski Tomasz
Mijena Gelana Fekadu Worku
Mirmoeeni Seyyedmohammadsadeq
Mirza-aghazadeh-attari Mohammad
Mirzaei Hamed
Misra Sanjeev
Mohammad Karzan Abdulmuhsin
Mohammadi Esmaeil
Mohammadi Saeed
Mohammadi Seyyede Momeneh
Mohammadian-hafshejani Abdollah
Mohammed Shafiu
Mohammed Teroj Abdulrahman
Moka Nagabhishek
Mokdad Ali H
Mokhtari Zeinab
Molokhia Mariam
Momtazmanesh Sara
Monasta Lorenzo
Moradi Ghobad
Moradzadeh Rahmatollah
Moraga Paula
Morgado-da-costa Joana
Mubarik Sumaira
Mulita Francesk
Naghavi Mohsen
Naimzada Mukhammad David
Nam Hae Sung
Natto Zuhair S
Nayak Biswa Prakash
Nazari Javad
Nazemalhosseini-mojarad Ehsan
Negoi Ionut
Nguyen Cuong Tat
Nguyen Son Hoang
Noor Nurulamin M
Noori Maryam
Noori Seyyed Mohammad Ali
Nuñez-samudio Virginia
Nzoputam Chimezie Igwegbe
Oancea Bogdan
Odukoya Oluwakemi Ololade
Oguntade Ayodipupo Sikiru
Okati-aliabad Hassan
Olagunju Andrew T
Olagunju Tinuke O
Ong Sokking
Ostroff Samuel M
Padron-monedero Alicia
Pakzad Reza
Pana Adrian
Pandey Anamika
Pashazadeh Kan Fatemeh
Patel Urvish K
Paudel Uttam
Pereira Renato B
Perumalsamy Navaraj
Pestell Richard G
Piracha Zahra Zahid
Pollok Richard Charles G
Pourshams Akram
Pourtaheri Naeimeh
Prashant Akila
Rabiee Mohammad
Rabiee Navid
Radfar Amir
Rafiei Sima
Rahman Mosiur
Rahmani Amir Masoud
Rahmanian Vahid
Rajai Nazanin
Rajesh Aashish
Ramezani-doroh Vajiheh
Ramezanzadeh Kiana
Ranabhat Kamal
Rashedi Sina
Rashidi Amirfarzan
Rashidi Mahsa
Rashidi Mohammad-mahdi
Rastegar Mandana
Rawaf David Laith
Rawaf Salman
Rawassizadeh Reza
Razeghinia Mohammad Sadegh
Renzaho Andre M N
Rezaei Negar
Rezaei Nima
Rezaei Saeid
Rezaeian Mohsen
Rezazadeh-khadem Sahba
Roshandel Gholamreza
Saber-ayad Maha Mohamed
Saberzadeh-ardestani Bahar
Saddik Basema
Sadeghi Hossein
Saeed Umar
Sahebazzamani Maryam
Sahebkar Amirhossein
Salek Farrokhi Amir
Salimi Amir
Salimzadeh Hamideh
Samadi Pouria
Samaei Mehrnoosh
Samy Abdallah M
Sanabria Juan
Santric-milicevic Milena M
Saqib Muhammad Arif Nadeem
Sarveazad Arash
Sathian Brijesh
Satpathy Maheswar
Schneider Ione Jayce Ceola
Sepanlou Sadaf G
Seylani Allen
Sha Feng
Shafiee Sayed Mohammad
Shaghaghi Zahra
Shahabi Saeed
Shaker Elaheh
Sharifi-rad Javad
Sharifian Maedeh
Sharma Rajesh
Sheikhbahaei Sara
Shetty Jeevan K
Shirkoohi Reza
Shobeiri Parnian
Siddappa Malleshappa Sudeep K
Silva Julian Guilherme
Silva Diego Augusto Santos
Singh Achintya Dinesh
Singh Jasvinder A
Siraj Md Shahjahan
Sivandzadeh Gholam Reza
Skryabin Valentin Yurievich
Skryabina Anna Aleksandrovna
Socea Bogdan
Solmi Marco
Soltani-zangbar Mohammad Sadegh
Song Suhang
Szerencsés Viktória
Szócska Miklós
Tabarés-seisdedos Rafael
Tabibian Elnaz
Taheri Majid
Taheriabkenar Yasaman
Taherkhani Amir
Talaat Iman M
Tan Ker-kan
Tbakhi Abdelghani
Tesfaye Bekele
Tiyuri Amir
Tollosa Daniel Nigusse
Touvier Mathilde
Tran Bach Xuan
Tusa Biruk Shalmeno
Ullah Irfan
Ullah Saif
Vacante Marco
Valadan Tahbaz Sahel
Veroux Massimiliano
Vo Bay
Vos Theo
Wang Cong
Westerman Ronny
Woldemariam Melat
Yahyazadeh Jabbari Seyed Hossein
Yang Lin
Yazdanpanah Fereshteh
Yu Chuanhua
Yuce Deniz
Yunusa Ismaeel
Zadnik Vesna
Zahir Mazyar
Zare Iman
Zhang Zhi-jiang
Zoladl Mohammad
Šekerija Mario
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Funding: F Carvalho and E Fernandes acknowledge support from Fundação para a Ciência e a Tecnologia, I.P. (FCT), in the scope of the project UIDP/04378/2020 and UIDB/04378/2020 of the Research Unit on Applied Molecular Biosciences UCIBIO and the project LA/P/0140/2020 of the Associate Laboratory Institute for Health and Bioeconomy i4HB; FCT/MCTES through the project UIDB/50006/2020. J Conde acknowledges the European Research Council Starting Grant (ERC-StG-2019-848325). V M Costa acknowledges the grant SFRH/BHD/110001/2015, received by Portuguese national funds through Fundação para a Ciência e Tecnologia (FCT), IP, under the Norma Transitória DL57/2016/CP1334/CT0006.proofepub_ahead_of_prin

AIR Universita degli studi di Milano

Repositório da Universidade Nova de Lisboa

shahrekord university of medical scinces

ESESC: A Fast Multicore Simulator Using Time-Based Sampling

Author: Ehsan K. Ardestani
Jose Renau
Publication venue
Publication date: 01/01/2013
Field of study

Architects rely on simulation in their exploration of the design space. However, slow simulation speed caps their productivity and limits the depth of their exploration. Sampling has been a commonly used remedy. While sampling is shown to be an effective technique for single core processors, its application has been limited to simulation of multiprogram, throughput applications only. This work presents Time-Based Sampling (TBS), a framework that is the first to enable sampling in simulation of multicore processors with virtually no limitation in terms of application type (multiprogrammed or multithreaded), number of cores, homogeneity or heterogeneity of the simulated configuration (4.99 % error averaged across all the evaluated configurations). TBS also is the first to enable integrated power and temperature evaluation in statistically sampled simulation of multicore systems (with 5.5 % and 2.4 % error on average, respectively). We implement an architectural simulator based on TBS, called ESESC, that provides a holistic set of tools for a fair evaluation of different architectures. 1

CiteSeerX

Crossref

Characterizing processor thermal behavior

Author: Ardestani E. K.
Ehsan K. Ardestani
Francisco Javier Mesa-Martinez
Jose Renau
Lee B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Cooling solutions for processor Infrared Thermography

Author: Ehsan K. Ardestani
Francisco-javier Mesa-martínez
Jose Renau
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2010
Field of study

Temperature is a key parameter due to its impact on timing, energy, and reliability. A setup to measure temperature in runtime with high spatial and temporal resolution would help to study the thermal behavior of processors. Currently, Infrared Thermography infrastructures has been developed to measure the temperature in real-time. Since the infrared opaque metal heat sinks need to be replaced with an infrared transparent heat sink in these setups, oil based cooling solutions have been proposed. However, oil is not a representative of a metal heat sink because measurement with oil based cooling can change the thermal behavior of the processor. In this paper, we discuss a representative oil based cooling solution, and show that it has the same thermal response as a metal heat sink. II. THERMAL MEASUREMENT INFRASTRUCTURE Previous works [1], [2] have developed IR infrastructures to directly measure temperature through the chip. An infrared (IR) camera is used to measure the temperature of transistor junctions. A detailed thermal map is obtained with the infrared camera. Our setup has a resolution of 1024x1024 pixels with sampling rates of over 100Hz. IR camera operates on the 3 − 5µm wavelength (MWIR), a range of light where silicon is partially transparent. Silicon has a fairly uniform 55 % transmittance from 1.5µm to 6µm. As a result, the IR camera can measure the temperature through the chip under test. Figure 1 shows a picture with the major components of the measuring setup in [2]. I

CiteSeerX

Crossref

Characterizing Processor Thermal Behavior

Author: Ehsan K. Ardestani
Francisco J. Mesa-martínez
Jose Renau
Publication venue
Publication date: 01/01/2010
Field of study

Temperature is a dominant factor in the performance, reliability, and leakage power consumption of modern processors. As a result, increasing numbers of researchers evaluate thermal characteristics in their proposals. In this paper, we measure a real processor focusing on its thermal characterization executing diverse workloads. Our results show that in real designs, thermal transients operate at larger scales than their performance and power counterparts. Conventional thermal simulation methodologies based on profilebased simulation or statistical sampling, such as Simpoint, tend to explore very limited execution spans. Short simulation times can lead to reduced matchings between performance and thermal phases. To illustrate these issues we characterize and classify from a thermal standpoint SPEC00 and SPEC06 applications, which are traditionally used in the evaluation of architectural proposals. This paper concludes with a list of recommendations regarding thermal modeling considerations based on our experimental insights

CiteSeerX

Crossref

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Author: Alamelu Sankaranarayanan
Ehsan K. Ardestani
Jose Luis Briz
Jose Renau
Publication venue
Publication date: 05/10/2013
Field of study

With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due to the large number of cores they need to serve. This problem could be mitigated by introducing a cache higher up in hierarchy that services fewer cores, but this introduces cache coherency issues that may become very significant, especially for a GPGPU with hundreds of thousands of in-flight threads. In this paper, we propose adding incoherent tinyCaches between each lane in an SM, and the first level data cache that is currently shared by all the lanes in an SM. In a normal multiprocessor, this would require hardware cache coherence between all the SM lanes capable of handling hundreds of thousands of threads. Our incoherent tinyCache architecture exploits certain unique features of the CUDA/OpenCL programming model to avoid complex coherence schemes. This tinyCache is able to filter out 62 % of memory requests that would otherwise need to be serviced by the DL1G, and almost 81 % of scratchpad memory requests, allowing us to achieve a 37 % energy reduction in the on-chip memory hierarchy. We evaluate the tinyCache for different memory patterns and show that it is beneficial in most cases

CiteSeerX

Crossref