21 research outputs found

    Boosting Backward Search Throughput for FM-Index Using a Compressed Encoding

    Get PDF
    The rapid development of DNA sequencing technologies has demanded for com- pressed data structures supporting fast pattern matching queries. FM-index is a widely-used compressed data structure that also supports fast pattern matching queries. It is common for the exact matching algorithm to be memory bound, resulting in poor performance. Searching several symbols in a single step improves data locality, although the memory bandwidth requirements remains the same. We propose a new data-layout of FM-index, called Split bit-vector, that compacts all data needed to search k symbols in a single step (k-step), reducing both memory movement and computing requirements at the cost of increasing memory footprint.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

    Get PDF
    Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied.This work is partially supported by the Spanish Ministry of Economy and Competitivity under contract TIN2012-34557, by the BSC-CNS Severo Ochoa program (SEV-2011-00067), by the SGR programmes (2014-SGR-1051 and 2014-SGR-1421) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF). We also would like to thank the technical support team at the Barcelona Supercomputing center (BSC) especially to Carlos Tripiana.Peer ReviewedPostprint (published version

    Near-optimal replacement policies for shared caches in multicore processors

    Get PDF
    An optimal replacement policy that minimizes the miss rate in a private cache was proposed several decades ago. It requires knowing the future access sequence the cache will receive. There is no equivalent for shared caches because replacement decisions alter this future sequence. We present a novel near-optimal policy for minimizing the miss rate in a shared cache that approaches the optimal execution iteratively. During each iteration, the future access sequence is reconstructed on every miss interleaving the future per-core sequences, taken from the previous iteration. This single sequence feeds a classical private-cache optimum replacement policy. Our evaluation on a shared last-level cache shows that our proposal iteratively converges to a near-optimal miss rate that is independent of the initial conditions, within a margin of 0.1%. The best state-of-the-art online policies achieve around 65% of the miss rate reduction obtained by our near-optimal proposal. In a shared cache, miss rate optimization does not imply the optimization of other metrics. Therefore, we also propose a new near-optimal policy to maximize fairness between cores. The best state-of-the-art online policy achieves 60% of the improvement in fairness seen with our near-optimal policy. Our proposals are useful both for setting upper performance bounds and inspiring implementable mechanisms for shared caches.The authors acknowledge support from grants (1) PID2019-105660RB-C21 and PID2019-107255GB-C22 from Agencia Estatal de Investigación (AEI) from Spain and European Regional Development Fund (ERDF); (2) gaZ: T58_20R research group from Aragón Government and European Social Fund (ESF); and (3) 2014-2020 "Construyendo Europa desde Aragón" from European Regional Development Fund (ERDF).Peer ReviewedPostprint (author's final draft

    STT-RAM memory hierarchy designs aimed to performance, reliability and energy consumption

    Get PDF
    Current applications demand larger on-chip memory capacity since off-chip memory accesses be-come a bottleneck. However, if we want to achieve this by scaling down the transistor size of SRAM-based Last-Level Caches (LLCs) it may become prohibitive in terms of cost, area and en-ergy. Therefore, other technologies such as STT-RAM are becoming real alternatives to build the LLC in multicore systems. Although STT-RAM bitcells feature high density and low static power, they suffer from other trade-offs. On the one hand, STT-RAM writes are more expensive than STT-RAM reads and SRAM writes. In order to address this asymmetry, we will propose microarchitectural techniques to minimize the number of write operations on STT-RAM cells. On the other hand, reliability also plays an important role. STT-RAM cells suffer from three types of errors: write, read disturbance, and retention errors. Regarding this, we will suggest tech-niques to manage redundant information allowing error detection and information recovery.Postprint (published version

    Compression-aware and performance-efficient insertion policies for long-lasting hybrid LLCs

    Get PDF
    Emerging non-volatile memory (NVM) technologies can potentially replace large SRAM memories such as the last-level cache (LLC). However, despite recent advances, NVMs suffer from higher write latency and limited write endurance. Recently, NVM-SRAM hybrid LLCs are proposed to combine the best of both worlds. Several policies have been proposed to improve the performance and lifetime of hybrid LLCs by intelligently steering the incoming LLC blocks into either the SRAM or NVM part, regarding the cache behavior of the LLC blocks and the SRAM/NVM device properties. However, these policies neither consider compressing the contents of the cache block nor using partially worn-out NVM cache blocks.This paper proposes new insertion policies for byte-level fault-tolerant hybrid LLCs that collaboratively optimize for lifetime and performance. Specifically, we leverage data compression to utilize partially defective NVM cache entries, thereby improving the LLC hit rate. The key to our approach is to guide the insertion policy by both the reuse properties of the block and the size resulting from its compression. A block is inserted in NVM only if it is a read-reuse block or its compressed size is lower than a threshold. It will be inserted in SRAM if the block is a write-reuse or its compressed size is greater than the threshold. We use set-dueling to tune the compression threshold at runtime. This compression threshold provides a knob to control the NVM write rate and, together with a rule-based mechanism, allows balancing performance and lifetime.Overall, our evaluation shows that, with affordable hardware overheads, the proposed schemes can nearly reach the performance of an SRAM cache with the same associativity while improving lifetime by 17× compared to a hybrid NVM-unaware LLC. Our proposed scheme outperforms the state-of-the-art insertion policies by 9% while achieving a comparative lifetime. The rule-based mechanism shows that by compromising, for instance, 1.1% and 1.9% performance, the NVM lifetime can be further increased by 28% and 44%, respectively.This work was partially funded by the HiPEAC collaboration grant 2020, the Center for Advancing Electronics Dresden (cfaed), the German Research Council (DFG) through the HetCIM project (502388442) under the Priority Program on ‘Disruptive Memory Technologies’ (SPP 2377), and from grants (1) PID2019-105660RB-C21 and PID2019-107255GB- C22/AEI/10.13039/501100011033 from Agencia Estatal de Investigación (AEI), and (2) gaZ: T5820R research group from Dept. of Science, University and Knowledge Society, Government of Aragon.Peer ReviewedPostprint (author's final draft

    Selección de fragmentos representativos de aplicaciones paralelas en el diseño de redes para CMPs

    Get PDF
    El diseño de nuevas arquitecturas de computadores se basa en la utilización de complejos y costosos simuladores. Su mayor limitación es el alto coste en tiempo y memoria, lo que lleva a sacrificar la precisión del simulador o a utilizar aplicaciones demasiado ligeras que resultan poco representativas. Se ha realizado un estudio de cargas de trabajo a simular, en concreto de la suite PARSEC, analizando el impacto del tamaño de la entrada para estas aplicaciones en la jerarquía de memoria ejecutándolas con un solo hilo. Se ha descubierto que no necesariamente las entradas mayores estresan más la jerarquía de memoria, lo que ha permitido proponer un conjunto de entradas a simular para obtener resultados representativos en un tiempo de simulación que resulta, en media, 400 veces más rápido que la utilización de la entrada real. Actualmente, se pretende extender los resultados a un entorno multiprocesador, para lo que es necesario tener en cuenta el protocolo de coherencia de memoria. La utilización de nuestra selección de entradas para las aplicaciones de PARSEC resulta más adecuada que el uso sistemático de una entrada de menor tamaño, ya que mantiene el tiempo de simulación sin perder representatividad. El siguiente objetivo es utilizar esta selección para caracterizar el tráfico en redes dentro del chip. Actualmente, no existen apenas estudios que modelen detalladamente el conjunto de procesadores, red de interconexión y jerarquía de memoria. Se va a analizar el comportamiento de aplicaciones reales sobre la red modelando cuidadosamente todos los componentes citados. A partir de los resultados obtenidos, se pretende proponer un diseño de la red que ofrezca al mismo tiempo buenas prestaciones y bajo consumo energético

    Leveraging data compression for performance-efficient and long-lasting NVM-based last-level cache

    Get PDF
    Non-volatile memory (NVM) technologies are interesting alternatives for building on-chip Last-Level Caches (LLCs). Their advantages, compared to SRAM memory, are higher density and lower static power, but each write operation slightly wears out the bitcell, to the point of losing its storage capacity. In this context, this paper summarizes three contributions to the state-of-the-art NVM-based LLCs. Data compression reduces the size of the blocks and, together with wear-leveling mechanisms, can defer the wear out NVMs. Moreover, as capacity is reduced by write wear, data compression enables degraded cache frames to allocate blocks whose compressed size is adequate. Our first contribution is a microarchitecture design that leverages data compression and an intra-frame wear-leveling to gracefully deal with NVM-LLCs capacity degradation. The second contribution leverages this microarchitecture design to propose new insertion policies for hybrid LLCs using Set Dueling and taking into account the compression capabilities of the blocks. From a methodological point of view, although different approaches are used in the literature to analyze the degradation of a NV-LLC, none of them allows to study in detail its temporal evolution. In this sense, the third contribution is a forecasting procedure that combines detailed simulation and prediction, enabling an accurate analysis of different cache content mechanisms (replacement, wear leveling, compression, etc.) on the temporal evolution of the performance of multiprocessor systems employing such NVM-LLCs. Using this forecasting procedure we show that the proposed NVM-LLCs organizations and the insertion policies for hybrid LLCs significantly outperform the state-of-the-art in both performance and lifetime metrics.Peer ReviewedPostprint (published version

    Hospital-based proton therapy implementation during the COVID pandemic: early clinical and research experience in a European academic institution

    Get PDF
    Introduction A rapid deploy of unexpected early impact of the COVID pandemic in Spain was described in 2020. Oncology practice was revised to facilitate decision-making regarding multimodal therapy for prevalent cancer types amenable to multidisciplinary treatment in which the radiotherapy component searched more efcient options in the setting of the COVID-19 pandemic, minimizing the risks to patients whilst aiming to guarantee cancer outcomes. Methods A novel Proton Beam Therapy (PBT), Unit activity was analyzed in the period of March 2020 to March 2021. Institutional urgent, strict and mandatory clinical care standards for early diagnosis and treatment of COVID-19 infection were stablished in the hospital following national health-authorities’ recommendations. The temporary trends of patients care and research projects proposals were registered. Results 3 out of 14 members of the professional staf involved in the PBR intra-hospital process had a positive test for COVID infection. Also, 4 out of 100 patients had positive tests before initiating PBT, and 7 out of 100 developed positive tests along the weekly mandatory special checkup performed during PBT to all patients. An update of clinical performance at the PBT Unit at CUN Madrid in the initial 500 patients treated with PBT in the period from March 2020 to November 2022 registers a distribution of 131 (26%) pediatric patients, 63 (12%) head and neck cancer and central nervous system neoplasms and 123 (24%) re-irradiation indications. In November 2022, the activity reached a plateau in terms of patients under treatment and the impact of COVID pandemic became sporadic and controlled by minor medical actions. At present, the clinical data are consistent with an academic practice prospectively (NCT05151952). Research projects and scientifc production was adapted to the pandemic evolution and its infuence upon professional time availability. Seven research projects based in public funding were activated in this period and preliminary data on molecular imaging guided proton therapy in brain tumors and post-irradiation patterns of blood biomarkers are reported. Conclusions Hospital-based PBT in European academic institutions was impacted by COVID-19 pandemic, although clinical and research activities were developed and sustained. In the post-pandemic era, the benefts of online learning will shape the future of proton therapy education

    Orthoxenografts of testicular germ cell tumors demonstrate genomic changes associated with cisplatin resistance and identify PDMP as a resensitizing agent

    Get PDF
    [Purpose] To investigate the genetic basis of cisplatin resistance as efficacy of cisplatin-based chemotherapy in the treatment of distinct malignancies is often hampered by intrinsic or acquired drug resistance of tumor cells.[Experimental Design] We produced 14 orthoxenograft transplanting human nonseminomatous testicular germ cell tumors (TGCT) in mice, keeping the primary tumor features in terms of genotype, phenotype, and sensitivity to cisplatin. Chromosomal and genetic alterations were evaluated in matched cisplatin-sensitive and their counterpart orthoxenografts that developed resistance to cisplatin in nude mice.[Results] Comparative genomic hybridization analyses of four matched orthoxenografts identified recurrent chromosomal rearrangements across cisplatin-resistant tumors in three of them, showing gains at 9q32-q33.1 region. We found a clinical correlation between the presence of 9q32-q33.1 gains in cisplatin-refractory patients and poorer overall survival (OS) in metastatic germ cell tumors. We studied the expression profile of the 60 genes located at that genomic region. POLE3 and AKNA were the only two genes deregulated in resistant tumors harboring the 9q32-q33.1 gain. Moreover, other four genes (GCS, ZNF883, CTR1, and FLJ31713) were deregulated in all five resistant tumors independently of the 9q32-q33.1 amplification. RT-PCRs in tumors and functional analyses in Caenorhabditis elegans (C. elegans) indicate that the influence of 9q32-q33.1 genes in cisplatin resistance can be driven by either up- or downregulation. We focused on glucosylceramide synthase (GCS) to demonstrate that the GCS inhibitor DL-threo-PDMP resensitizes cisplatin-resistant germline-derived orthoxenografts to cisplatin[Conclusions] Orthoxenografts can be used preclinically not only to test the efficiency of drugs but also to identify prognosis markers and gene alterations acting as drivers of the acquired cisplatin resistance.Several authors are grateful recipients of predoctoral fellowships: J.M. Piulats from the AECC and F.J. García-Rodríguez from the Instituto de Salud Carlos III (ISCIII). This study was supported by grants from the Spanish Ministry of Economy and Competitiveness (SAF2002-02265 and FIS: BFU2007-67123; PI10-0222, PI13-01339, and PI16/01898, to A. Villanueva; PI15-00895, to J. Ceron; SAF2013-46063R, to F. Vi nals; PI030264, to ~ X. García-del-Muro), Fundacio La Marat o TV3 (051430, to F. Vi nals and X. ~ García-del-Muro), Generalitat de Catalunya (2014SGR364, to A. Villanueva and F. Vinals; FIS09/0059, to A. Morales), cofunded by FEDER funds/ ~ European Regional Development Fund (ERDF) — a way to Build Europe. A. Villanueva received a BAE11/00073 grant. We thank the staff of the Animal Core Facility of IDIBELL for mouse care and maintenance.Peer reviewe
    corecore