    Splitwise: Efficient generative LLM inference using phase splitting

    Recent innovations in generative large language models (LLMs) have made their applications and use-cases ubiquitous. This has led to large-scale deployments of these models, using complex, expensive, and power-hungry AI accelerators, most commonly GPUs. These developments make LLM inference efficiency an important challenge. Based on our extensive characterization, we find that there are two main phases during an LLM inference request: a compute-intensive prompt computation, and a memory-intensive token generation, each with distinct latency, throughput, memory, and power characteristics. Despite state-of-the-art batching and scheduling, the token generation phase underutilizes compute resources. Specifically, unlike compute-intensive prompt computation phases, token generation phases do not require the compute capability of the latest GPUs, and can be run with lower power and cost. With Splitwise, we propose splitting the two phases of a LLM inference request on to separate machines. This allows us to use hardware that is well-suited for each phase, and provision resources independently per phase. However, splitting an inference request across machines requires state transfer from the machine running prompt computation over to the machine generating tokens. We implement and optimize this state transfer using the fast back-plane interconnects available in today's GPU clusters. We use the Splitwise technique to design LLM inference clusters using the same or different types of machines for the prompt computation and token generation phases. Our clusters are optimized for three key objectives: throughput, cost, and power. In particular, we show that we can achieve 1.4x higher throughput at 20% lower cost than current designs. Alternatively, we can achieve 2.35x more throughput with the same cost and power budgets.Comment: 12 pages, 19 figure

    POLCA: Power Oversubscription in LLM Cloud Providers

    Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in the same GPU cluster for inference, with minimal performance los

    A comparative approach of tumor associated inflammation in mammary cancer between humans and dogs

    Infiltrating cells of the immune system are widely accepted to be generic constituents of tumor microenvironment. It has been well established that the development of mammary cancer, both in humans and dogs, is associated with alterations in numbers and functions of immune cells at the sites of tumor progression. These tumor infiltrating immune cells seems to exhibit exclusive phenotypic and functional characteristics and mammary cancer cells can take advantage of signaling molecules released by them. Cancer related inflammation has an important role in mammary carcinogenesis, contributing to the acquisition of core hallmark capabilities that allow cancer cells to survive, proliferate, and disseminate. Indeed, recent studies in human breast cancer and in canine mammary tumors have identified a growing list of signaling molecules released by inflammatory cells that serve as effectors of their tumor-promoting actions. These include the COX-2, the tumor growth factor EGF, the angiogenic growth factor VEGF, other proangiogenic factors and a large variety of chemokines and cytokines that amplify the inflammatory state. This review describes the intertwined signaling pathways shared by Tlymphocytic/macrophage infiltrates and important tissue biomarkers in both human and dog mammary carcinogenesis.The work was supported partially by the Strategic Research project Pest-OE/AGR/UI0772/2011 and the Research Project UID/AGR/04033/2013, by a Ph.D. scholarship SFRH/BD/ 78771/2011 financed by the Portuguese Foundation for Science and Technology (FCT), and in part by the Austrian Science Fund (FWF), SFB F4606-B28, to Erika Jensen Jarolim


    Quando de trata de educação a distância, muitas das teorias tratam da androgenia, do adulto fazendo um curso a distância e suas características. Esse artigo traz uma perspectiva diferente, um curso a distância voltado a adolescentes. Neste sentido, este artigo tem como objetivo de analisar o papel do tutor em um curso a distância para adolescentes. Quanto aos procedimentos metodológicos, caracteriza-se como teóricoempírico, aplicado, estudo de caso, participante e qualitativo. Dentre os resultados analisados, pode-se perceber que há algumas diferenças no papel do tutor, como por exemplo em relação as questões tecnológicas, não se percebe como um fator complicativo, há que o aluno/adolescente tem mais facilidades com as tecnologias. Pode—se verificar que o ambiente virtual utilizado deve apresentar ferramentas parecidas com as redes sociais utilizadas por esse público. Em contra-partida, a desmotivação ao curso é uma tendência eminente, já que muitos adolescentes não conseguem vislumbrar quais são os benefícios que o curso pode ter para o seu futuro, caracterizando uma visão mais imediatista. Assim, o tutor deve trabalhar muito mais no fator motivacional do que no fator de mediação das tecnologias

    Does the number of implants have any relation with peri-implant disease?

    Objective: The aim of this study was to evaluate the relationship between the number of pillar implants of implant-supported fixed prostheses and the prevalence of periimplant disease. Material and Methods: Clinical and radiographic data were obtained for the evaluation. The sample consisted of 32 patients with implant-supported fixed prostheses in function for at least one year. A total of 161 implants were evaluated. Two groups were formed according to the number of implants: G1) ≤5 implants and G2) >;5 implants. Data collection included modified plaque index (MPi), bleeding on probing (BOP), probing depth (PD), width of keratinized mucosa (KM) and radiographic bone loss (BL). Clinical and radiographic data were grouped for each implant in order to conduct the diagnosis of mucositis or peri-implantitis. Results: Clinical parameters were compared between groups using Student’s t test for numeric variables (KM, PD and BL) and Mann-Whitney test for categorical variables (MPi and BOP). KM and BL showed statistically significant differences between both groups (

    As contribuições da nova Sudene para o desenvolvimento do Nordeste

    Este artigo faz uma avaliação preliminar das contribuições da nova Sudene para o desenvolvimento do Nordeste, com base nas diretrizes políticas estabelecidas a partir de sua recriação, em 2007. Para o estudo, foi realizada uma pesquisa bibliográfica e documental para o mapeamento das diretrizes e prioridades da Sudene. Foi realizado ainda uma pesquisa de dados estatísticos secundários para analisar a evolução socioeconômica do Nordeste no período de 2007 a 2017. Para isso, foram utilizados os dados do PIB, do PIB per capita e do índice de Gini do IBGE, e do índice de desenvolvimento humano municipal da Firjan. Os resultados da pesquisa indicam que a Sudene recuperou sua importância estratégia na política do governo federal, como demonstra a instituição legal do alinhamento do Plano Regional de Desenvolvimento do Nordeste com o Plano Plurianual, com participação na elaboração do orçamento da União, a fim de garantir recursos para o desenvolvimento da região. Após uma década de sua recriação, é possível verificar avanços socioeconômicos, como o maior crescimento do PIB per capita no país, de 41,5% entre 2007 e 2015, enquanto a média nacional foi de 29%. Houve significativa melhoria no índice de desenvolvimento humano nas dimensões de saúde e de educação, acumulando no período uma variação de 55,7% e 70,5%, respectivamente. Entretanto, são ainda notáveis as limitações desse progresso, uma vez que o PIB per capita ainda é praticamente metade da média nacional e o Nordeste ainda apresenta o maior índice de desigualdade de renda do país