20 research outputs found
Automated extraction of absorption features from Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Geophysical and Environmental Research Imaging Spectrometer (GERIS) data
Automated techniques were developed for the extraction and characterization of absorption features from reflectance spectra. The absorption feature extraction algorithms were successfully tested on laboratory, field, and aircraft imaging spectrometer data. A suite of laboratory spectra of the most common minerals was analyzed and absorption band characteristics tabulated. A prototype expert system was designed, implemented, and successfully tested to allow identification of minerals based on the extracted absorption band characteristics. AVIRIS spectra for a site in the northern Grapevine Mountains, Nevada, have been characterized and the minerals sericite (fine grained muscovite) and dolomite were identified. The minerals kaolinite, alunite, and buddingtonite were identified and mapped for a site at Cuprite, Nevada, using the feature extraction algorithms on the new Geophysical and Environmental Research 64 channel imaging spectrometer (GERIS) data. The feature extraction routines (written in FORTRAN and C) were interfaced to the expert system (written in PROLOG) to allow both efficient processing of numerical data and logical spectrum analysis
A Case for a Complexity-Effective, Width-partitioned Microarchitecture
Current superscalar processors feature 64-bit datapaths to execute the program instructions, regardless of their operands size. Our analysis indicates, however, that most executions comprise a large amount (40%) of narrow-width operations; i.e. instructions which exclusively process narrow-width operands and results. We further noticed that these operations are well distributed across a program run. In this paper, we exploit these properties to master the hardware complexity of superscalar processors. We propose a width-partitioned microarchitecture (WPM) to decouple the treatment of narrow-width operations from that of the other program instructions. We split a 4-way issue processor into two clusters: one executing 64-bit operations, load/store and complex operations and the other treating the 16-bit operations. We show that revealing the narrow-width operations to the hardware is sufficient to keep the workload balanced and the communications minimized between clusters. Using a WPM reduces the complexity of several critical processor components: register file and bypass network. A WPM also lowers the complexity of the interconnection fabric since the 16-bit cluster is only able to propagate narrow-width data. We examine simple configurations of WPM while discussing their tradeoffs. We evaluate a speculative heuristic to steer the narrow-width operations towards clusters. A detailed complexity analysis shows using a WPM model saves power and area with a minimal impact on performance. \\ Les processeurs superscalaires actuels implémentent des chemins de données 64-bit pour exécuter les instructions d'un programme, indépendamment de la taille des opérandes. Notre analyse indique toutefois que la plupart des exécutions comportent une fraction considérable (40%) en opérations tronquées ; c-à-d. les instructions manipulant exclusivement des opérandes et des résultats de petite dimension. Nous avons aussi remarqué que les opérations tronquées sont bien distribuées au cours d'une exécution. Cette étude exploite ces propriétés pour maîtriser la complexité des processeurs superscalaires. Pour ce faire, nous proposons une microarchitecture clusterisée (WPM) pour découpler le traitement des opérations tronquées de celui des autres instructions du programme. Nous partitionnons ainsi le processeur entre deux clusters : un cluster 64-bit exécutant les opérations 64-bit, load/store et complexes, et un cluster 16-bit traitant les opérations 16-bit. Considérant les propriétés relatives aux opérations tronquées, nous montrons que révéler ces dernières au matériel est suffisant pour maintenir l'équilibrage des charges et minimiser les communications entre les clusters. Le modèle WPM réduit efficacement la complexité de plusieurs composants critiques du processeur : fichier de registres, réseau de bypass. Ce modèle réduit aussi la complexité du réseau d'interconnexion car le cluster 16-bit peut uniquement propager des données 16-bit. Nous examinons différentes configurations du WPM en discutant de leurs compromis. Nous évaluons une heuristique spéculative pour distribuer les instructions vers les clusters. Une analyse détaillée de la complexité indique que le modèle WPM réduit la consommation et la surface de silicium avec un impact minimal sur les performances
Minimizing Single-Usage Cache Pollution for Effective Cache Hierarchy Management
Efficient cache hierarchy management is of a paramount importance when designing high performance processors. Upon a miss, the conventional operation mode of a cache hierarchy is to retrieve back the missing block from higher levels and to store the block into all hierarchy levels. It is however difficult to assert that storing the block into intermediate levels will be really useful. In the literature, this phenomenon, referred to as cache pollution, is often associated with prefetching techniques, that is, a prefetched block could evict data that is more likely to be reused in a near future. Cache pollution could cause severe performance degradation. This paper is typically concerned with addressing this phenomenon in the highest level of cache hierarchy. Unlike past studies that treat polluting cache blocks as blocks that are never accessed (i.e. only due to prefetching), our proposal rather attempts to eliminate cache pollution that is inherent to the application. Our observations did indeed reveal that cache blocks that are only accessed once - single-usage blocks - are quite significant at runtime and especially in the highest level of cache hierarchy. In addition, most single-usage cache blocks are data that can be prefetched. We show that employing a simple prediction mechanism is sufficient to uncover most of the single-usage blocks. For a two-level cache hierarchy, these blocks are directly sent from main memory to L1 cache. Performing data bypassing on L2 cache maximizes memory hierarchy and allows hard-toprefetch memory references to remain into this cache hierarchy level. Our experimental results show that minimizing single-usage cache pollution in the L2 cache leads to a significant decrease in its miss rate; resulting therefore in noticeable performance gains
Architecture et bits significatifs
Maîtriser la complexité matérielle et la consommation d'énergie est devenu essentiel dans la conception de processeurs modernes. Cette thèse propose deux approches pour ce faire. Leur fondement découle de la propriété des programmes à s'exécuter avec une fraction importante en données tronquées, celles comportant un nombre restreint de bits significatifs. Nous utilisons cette propriété pour gérer la consommation du chemin de données au niveau logiciel. La forte localité des données tronquées permet de découper un programme en régions pouvant être traitées avec une largeur d'exécution optimisée. Nous montrons aussi que les opérations tronquées, celles manipulant exclusivement des données tronquées, sont fréquentes et bien distribuées au cours de l'exécution. Nous proposons alors le modèle WPM pour découpler le traitement de ces opérations sur des clusters appropriés. Nos analyses révèlent une complexité et consommation réduites pour une faible dégradation des performances.RENNES1-BU Sciences Philo (352382102) / SudocRENNES-INRIA Rennes Irisa (352382340) / SudocSudocFranceF
Minimizing Single-Usage Cache Pollution for Effective Cache Hierarchy Management
Efficient cache hierarchy management is of a paramount importance when designing high performance processors. Upon a miss, the conventional operation mode of a cache hierarchy is to retrieve back the missing block from higher levels and to store the block into all hierarchy levels. It is however difficult to assert that storing the block into intermediate levels will be really useful. In the literature, this phenomenon, referred to as cache pollution, is often associated with prefetching techniques, that is, a prefetched block could evict data that is more likely to be reused in a near future. Cache pollution could cause severe performance degradation. This paper is typically concerned with addressing this phenomenon in the highest level of cache hierarchy. Unlike past studies that treat polluting cache blocks as blocks that are never accessed (i.e. only due to prefetching), our proposal rather attempts to eliminate cache pollution that is inherent to the application. Our observations did indeed reveal that cache blocks that are only accessed once - single-usage blocks - are quite significant at runtime and especially in the highest level of cache hierarchy. In addition, most single-usage cache blocks are data that can be prefetched. We show that employing a simple prediction mechanism is sufficient to uncover most of the single-usage blocks. For a two-level cache hierarchy, these blocks are directly sent from main memory to L1 cache. Performing data bypassing on L2 cache maximizes memory hierarchy and allows hard-toprefetch memory references to remain into this cache hierarchy level. Our experimental results show that minimizing single-usage cache pollution in the L2 cache leads to a significant decrease in its miss rate; resulting therefore in noticeable performance gains
Speculative Software Management of Datapath-width for Energy Optimization
This paper evaluates managing the processor's datapathwidth at the compiler level by means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence of these operands in multimedia programs to build static narrow-width regions that may be directly exposed to the compiler. We propose to augment the ISA with instructions directly exposing the datapath and the register widths to the compiler. Simple exception management allows this exposition to be only speculative. In this way, we permit the software to speculatively accommodate the execution of a program on a narrower datapath-width in order to save energy. For this purpose, we introduce a novel register file organization, the byte-slice register file, which allows the width of the register file to be dynamically reconfigured, providing both static and dynamic energy savings. We show that by combining the advantages of the byte-slice register file with the advantages provided by clock-gating the datapath on a per-region basis, up to 17% of the datapath dynamic energy can be saved, while a 22% reduction of the register file static energy is achieved
Geant4 simulation of the new CENBG micro and nanoprobes facility
The Centre d'Etudes Nucléaires de Bordeaux-Gradignan (CENBG) will soon be equipped with a new state of the art accelerator facility including a single-ended HVEE® 3.5 MV Singletron. Its performance in terms of brightness and energy stability will provide the unique opportunity to develop a high spatial resolution nanobeam line. The actual microbeam line dedicated to ion beam analysis (scanning transmission ion microscopy, particle induced X-ray emission and Rutherford backscattering spectrometry) and to cellular irradiation will be reinstalled on the new machine. In this paper, expected performances of the two beam lines are presented from ray tracing simulations using the Geant4 toolkit, capabilities of which at the sub-micron scale have been investigated and validated previously