523 research outputs found

    Microresonator solitons for massively parallel coherent optical communications

    Full text link
    Optical solitons are waveforms that preserve their shape while propagating, relying on a balance of dispersion and nonlinearity. Soliton-based data transmission schemes were investigated in the 1980s, promising to overcome the limitations imposed by dispersion of optical fibers. These approaches, however, were eventually abandoned in favor of wavelength-division multiplexing (WDM) schemes that are easier to implement and offer improved scalability to higher data rates. Here, we show that solitons may experience a comeback in optical communications, this time not as a competitor, but as a key element of massively parallel WDM. Instead of encoding data on the soliton itself, we exploit continuously circulating dissipative Kerr solitons (DKS) in a microresonator. DKS are generated in an integrated silicon nitride microresonator by four-photon interactions mediated by Kerr nonlinearity, leading to low-noise, spectrally smooth and broadband optical frequency combs. In our experiments, we use two interleaved soliton Kerr combs to transmit a data stream of more than 50Tbit/s on a total of 179 individual optical carriers that span the entire telecommunication C and L bands. Equally important, we demonstrate coherent detection of a WDM data stream by using a pair of microresonator Kerr soliton combs - one as a multi-wavelength light source at the transmitter, and another one as a corresponding local oscillator (LO) at the receiver. This approach exploits the scalability advantages of microresonator soliton comb sources for massively parallel optical communications both at the transmitter and receiver side. Taken together, the results prove the significant potential of these sources to replace arrays of continuous-wave lasers in high-speed communications.Comment: 10 pages, 3 figure

    CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration

    Full text link
    Virtualization is a key technology used in a wide range of applications, from cloud computing to embedded systems. Over the last few years, mainstream computer architectures were extended with hardware virtualization support, giving rise to a set of virtualization technologies (e.g., Intel VT, Arm VE) that are now proliferating in modern processors and SoCs. In this article, we describe our work on hardware virtualization support in the RISC-V CVA6 core. Our contribution is multifold and encompasses architecture, microarchitecture, and design space exploration. In particular, we highlight the design of a set of microarchitectural enhancements (i.e., G-Stage Translation Lookaside Buffer (GTLB), L2 TLB) to alleviate the virtualization performance overhead. We also perform a Design Space Exploration (DSE) and accompanying post-layout simulations (based on 22nm FDX technology) to assess Performance, Power ,and Area (PPA). Further, we map design variants on an FPGA platform (Genesys 2) to assess the functional performance-area trade-off. Based on the DSE, we select an optimal design point for the CVA6 with hardware virtualization support. For this optimal hardware configuration, we collected functional performance results by running the MiBench benchmark on Linux atop Bao hypervisor for a single-core configuration. We observed a performance speedup of up to 16% (approx. 12.5% on average) compared with virtualization-aware non-optimized design at the minimal cost of 0.78% in area and 0.33% in power. Finally, all work described in this article is publicly available and open-sourced for the community to further evaluate additional design configurations and software stacks

    Memory system support for image processing

    Get PDF
    Journal ArticleProcessor speeds are increasing rapidly, but memory speeds are not keeping pace. Image processing is an important application domain that is particularly impacted by this growing performance gap. Image processing algorithms tend to have poor memory locality because they access their data in a non-sequential fashion and reuse that data infrequently. As a result, they often exhibit poor cache and TLB hit rates on conventional memory systems, which limits overall performance. Most current approaches to addressing the memory bottleneck focus on modifying cache organizations or introducing processor-based prefetching. The Impulse memory system takes a different approach: allowing application software to control how, when, and where data are loaded into a conventional processor cache. Impulse does this by letting software configure how the memory controller interprets the physical addresses exported by a processor. Introducing an extra level of address translation in the memory. Data that is sparse in memory can be accessed densely, which improves both cache and TLB utilization, and Impulse hides memory latency by prefectching data within the memory controller. We describe how Impulse improves the performance of three image processing algorithms: an Impulse memory system yields speedups of 40% to 226% over an otherwise identical machine with a conventional memory system

    Splotch: porting and optimizing for the Xeon Phi

    Get PDF
    With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel's coprocessor based upon the new Many Integrated Core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally performance is compared against results achieved with the GPU implementation of Splotch

    Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors

    Full text link
    Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP).La mayor parte de los datos referenciados por aplicaciones paralelas y secuenciales que se ejecutan enCMPs actuales son referenciadas por un único hilo, es decir, son privados. Recientemente, algunas propuestas aprovechan esta observación para mejorar muchos aspectos de los CMPs, como por ejemplo reducir el sobrecoste de la coherencia o la latencia de los accesos a cachés distribuidas. La efectividad de estas propuestas depende en gran medida de la cantidad de datos que son considerados privados. Sin embargo, los mecanismos propuestos hasta la fecha no consideran la migración de hilos de ejecución ni las fases de una aplicación. Por tanto, una cantidad considerable de datos privados no se detecta apropiadamente. Con el fin de aumentar la detección de datos privados, proponemos un mecanismo basado en las TLBs, capaz de reclasificar los datos a privado, y que detecta la migración de los hilos de ejecución sin añadir complejidad al sistema. Los mecanismos de clasificación en las TLBs se han analizado en estructuras de varios niveles, incluyendo TLBs privadas y con un último nivel de TLB compartido y distribuido. Esta tesis también presenta un mecanismo de clasificación de páginas basado en la inspección de las TLBs de otros núcleos tras cada fallo de TLB. De forma particular, el mecanismo propuesto se basa en el intercambio y el cuenteo de tokens (testigos). Contar tokens en las TLBs supone una forma natural y eficiente para la clasificación de páginas de memoria. Además, evita el uso de solicitudes persistentes o arbitraje alguno, ya que si dos o más TLBs compiten para acceder a una página, los tokens se distribuyen apropiadamente y la clasifican como compartida. Sin embargo, la habilidad de los mecanismos basados en TLB para clasificar páginas privadas depende del tamaño de las TLBs. La clasificación basada en las TLBs se basa en la presencia de una traducción en las TLBs del sistema. Para evitarlo, se han propuesto diversos predictores de uso en las TLBs (UP), los cuales permiten una clasificación independiente del tamaño de las TLBs. En concreto, esta tesis presenta un sistema mediante el que se obtiene información de uso de página a nivel de sistema con la ayuda de un nivel de TLB compartida (SUP) o mediante TLBs cooperando juntas (CUP).La major part de les dades referenciades per aplicacions paral·leles i seqüencials que s'executen en CMPs actuals són referenciades per un sol fil, és a dir, són privades. Recentment, algunes propostes aprofiten aquesta observació per a millorar molts aspectes dels CMPs, com és reduir el sobrecost de la coherència o la latència d'accés a memòries cau distribuïdes. L'efectivitat d'aquestes propostes depen en gran mesura de la quantitat de dades detectades com a privades. No obstant això, els mecanismes proposats fins a la data no consideren la migració de fils d'execució ni les fases d'una aplicació. Per tant, una quantitat considerable de dades privades no es detecta apropiadament. A fi d'augmentar la detecció de dades privades, aquesta tesi proposa un mecanisme basat en les TLBs, capaç de reclassificar les dades com a privades, i que detecta la migració dels fils d'execució sense afegir complexitat al sistema. Els mecanismes de classificació en les TLBs s'han analitzat en estructures de diversos nivells, incloent-hi sistemes amb TLBs d'últimnivell compartides i distribuïdes. Aquesta tesi presenta un mecanisme de classificació de pàgines basat en inspeccionar les TLBs d'altres nuclis després de cada fallada de TLB. Concretament, el mecanisme proposat es basa en l'intercanvi i el compte de tokens. Comptar tokens en les TLBs suposa una forma natural i eficient per a la classificació de pàgines de memòria. A més, evita l'ús de sol·licituds persistents o arbitratge, ja que si dues o més TLBs competeixen per a accedir a una pàgina, els tokens es distribueixen apropiadament i la classifiquen com a compartida. No obstant això, l'habilitat dels mecanismes basats en TLB per a classificar pàgines privades depenen de la grandària de les TLBs. La classificació basada en les TLBs resta en la presència d'una traducció en les TLBs del sistema. Per a evitar-ho, s'han proposat diversos predictors d'ús en les TLBs (UP), els quals permeten una classificació independent de la grandària de les TLBs. Específicament, aquesta tesi introdueix un predictor que obté informació d'ús de la pàgina a escala de sistema mitjançant un nivell de TLB compartida (SUP) or mitjançant TLBs cooperant juntes (CUP).Esteve García, A. (2017). Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86136TESI

    Alternative revenue sources for Internet service providers

    Get PDF
    The Internet has evolved from a small research network towards a large globally interconnected network. The deregulation of the Internet attracted commercial entities to provide various network and application services for profit. While Internet Service Providers (ISPs) offer network connectivity services, Content Service Providers (CSPs) offer online contents and application services. Further, the ISPs that provide transit services to other ISPs and CSPs are known as transit ISPs. The ISPs that provide Internet connections to end users are known as access ISPs. Though without a central regulatory body for governing, the Internet is growing through complex economic cooperation between service providers that also compete with each other for revenues. Currently, CSPs derive high revenues from online advertising that increase with content popularity. On other hand, ISPs face low transit revenues, caused by persistent declines in per-unit traffic prices, and rising network costs fueled by increasing traffic volumes. In this thesis, we analyze various approaches by ISPs for sustaining their network infrastructures by earning extra revenues. First, we study the economics of traffic attraction by ISPs to boost transit revenues. This study demonstrates that traffic attraction and reaction to it redistribute traffic on links between Autonomous Systems (ASes) and create camps of winning, losing and neutral ASes with respect to changes in transit payments. Despite various countermeasures by losing ASes, the traffic attraction remains effective unless ASes from the winning camp cooperate with the losing ASes. While our study shows that traffic attraction has a solid potential to increase revenues for transit ISPs, this source of revenues might have negative reputation and legal consequences for the ISPs. Next, we look at hosting as an alternative source of revenues and examine hosting of online contents by transit ISPs. Using real Internet-scale measurements, this work reports a pervasive trend of content hosting throughout the transit hierarchy, validating the hosting as a prominent source of revenues for transit ISPs. In our final work, we consider a model where access ISPs derive extra revenues from online advertisements (ads). Our analysis demonstrates that the ad-based revenue model opens a significant revenue potential for access ISPs, suggesting its economic viability.This work has been supported by IMDEA Networks Institute.Programa Oficial de Doctorado en Ingeniería TelemáticaPresidente: Jordi Domingo-Pascual.- Vocal: Víctor López Álvarez.-Secretario: Alberto García Martíne
    corecore