5,204 research outputs found

    The effectiveness of loop unrolling for modulo scheduling in clustered VLIW architectures

    Get PDF
    Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is shown to be more effective than doing first the assignment and later the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover when the cycle time is taken into account, a 4-cluster configurations is 3.6 times faster than the unified architecture.Peer ReviewedPostprint (published version

    Modulo scheduling for a fully-distributed clustered VLIW architecture

    Get PDF
    Clustering is an approach that many microprocessors are adopting in recent times in order to mitigate the increasing penalties of wire delays. We propose a novel clustered VLIW architecture which has all its resources partitioned among clusters, including the cache memory. A modulo scheduling scheme for this architecture is also proposed. This algorithm takes into account both register and memory inter-cluster communications so that the final schedule results in a cluster assignment that favors cluster locality in cache references and register accesses. It has been evaluated for both 2- and 4-cluster configurations and for differing numbers and latencies of inter-cluster buses. The proposed algorithm produces schedules with very low communication requirements and outperforms previous cluster-oriented schedulers.Peer ReviewedPostprint (published version

    Fast, accurate and flexible data locality analysis

    Get PDF
    This paper presents a tool based on a new approach for analyzing the locality exhibited by data memory references. The tool is very fast because it is based on a static locality analysis enhanced with very simple profiling information, which results in a negligible slowdown. This feature allows the tool to be used for highly time-consuming applications and to include it as a step in a typical iterative analysis-optimization process. The tool can provide a detailed evaluation of the reuse exhibited by a program, quantifying and qualifying the different types of misses either globally or detailed by program sections, data structures, memory instructions, etc. The accuracy of the tool is validated by comparing its results with those provided by a simulator.Peer ReviewedPostprint (published version

    Flexible compiler-managed L0 buffers for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the data cache remains centralized. However, as technology evolves, the latency of such a centralized cache increase leading to an important performance impact. In this paper, we propose to include flexible low-latency buffers in each cluster in order to reduce the performance impact of higher cache latencies. The reduced number of entries in each buffer permits the design of flexible ways to map data from L1 to these buffers. The proposed L0 buffers are managed by the compiler, which is responsible to decide which memory instructions make us of them. Effective instruction scheduling techniques are proposed to generate code that exploits these buffers. Results for the Mediabench benchmark suite show that the performance of a clustered VLIW processor with a unified L1 data cache is improved by 16% when such buffers are used. In addition, the proposed architecture also shows significant advantages over both MultiVLIW processors and clustered processors with a word-interleaved cache, two state-of-the-art designs with a distributed L1 data cache.Peer ReviewedPostprint (published version

    Analysis of a chemo-repulsion model with nonlinear production: The continuous problem and unconditionally energy stable fully discrete schemes

    Get PDF
    We consider the following repulsive-productive chemotaxis model: Let p(1,2)p\in (1,2), find u0u \geq 0, the cell density, and v0v \geq 0, the chemical concentration, satisfying \begin{equation}\label{C5:Am} \left\{ \begin{array} [c]{lll} \partial_t u - \Delta u - \nabla\cdot (u\nabla v)=0 \ \ \mbox{in}\ \Omega,\ t>0,\\ \partial_t v - \Delta v + v = u^p \ \ \mbox{in}\ \Omega,\ t>0, \end{array} \right. \end{equation} in a bounded domain ΩRd\Omega\subseteq \mathbb{R}^d, d=2,3d=2,3. By using a regularization technique, we prove the existence of solutions of this problem. Moreover, we propose three fully discrete Finite Element (FE) nonlinear approximations, where the first one is defined in the variables (u,v)(u,v), and the second and third ones by introducing σ=v{\boldsymbol\sigma}=\nabla v as an auxiliary variable. We prove some unconditional properties such as mass-conservation, energy-stability and solvability of the schemes. Finally, we compare the behavior of the schemes throughout several numerical simulations and give some conclusions.Comment: arXiv admin note: substantial text overlap with arXiv:1807.0111

    A unified modulo scheduling and register allocation technique for clustered processors

    Get PDF
    This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.Peer ReviewedPostprint (published version

    Biological Reference Points for Cod Div. 3NO

    Get PDF
    In 2011 Fisheries Commission Working Group of Fishery Managers and Scientists on Conservation Plans and Rebuilding Strategies (WGFMS-CPRS) reviewed the cod 3NO Conservation Plan and Rebuilding Strategy (CPRS) and proposed a new one that was approved by the Fisheries Commission in 2011. The new reference points values approved for the 3NO cod CPRS were the following: Blim = 60,000 t, Bisr = 120,000 t, Flim = 0.30 and Bmsy = 248,000 t. Concerns were raised on the high uncertainty and the lack of confidence intervals of the reference points. The WGFMS-CPRS agreed that the values of Bisr and Bmsy should be further reviewed by the Scientific Council and the Fisheries Commission. In 2012, Scientific Council noted that: the approach used in estimation of the maximum sustainable yield (MSY) reference points may not be advisable in the case of Div. 3NO cod due to the high uncertainty in the stock-recruit relationship for this stock. Scientific Council recommends the use of proxies based on the yield per recruit (YPR) and spawner per recruit (SPR) to estimate the reference points for cod in Div. 3NO. The proxies for the limit references points estimated through YPR were very similar to the Limited reference points approved. However, the Bmsy estimated based on the YPR was different to the Bmsy estimated last year. The aim of this document is to revise the values for the Bmsy references points based on the YPR-SPR. This document presents new Bmsy references points based on the YPR-SPR taking in account same ideas expressed during the 2012 SC meeting. It could be proposed a value around F0.1 (0.19) or F35% (0.20) as a possible Ftarget. The reason to choose this value is that a small reduction in the YPR supposes a precautionary level of F that has a very low probability to be higher than Flim = Fmax (less than 5%) and it is similar to SPR F35%. A good candidate for Btarget based on the YPR estimation could be the equilibrium SSB estimated with all the recruitments produced by the SSB bigger than Blim. A good Btarget level could be the equilibrium SSB of the proposed Ftarget (F0.1 or F35% ) estimated with all the recruitments produced by the SSB bigger than Blim. This gives a value around 180,000-185,000 tons. In the cod 3NO case, and taking a similar definition for Bisr as the ICES MSYBtrigger, a Bisr candidate could be a value around 120,000 ton if we take a very low probability (less than 5%) or 135,000 ton if we take a low probability (less 10%). These values came from biomass point which is expected with a low probability in a fully productive stock which is fished at Ftarget proposed

    Biological Reference Points for Cod 3NO

    Get PDF
    In 2011 Fisheries Commission Working Group of Fishery Managers and Scientists on Conservation Plans and Rebuilding Strategies (WGFMS-CPRS) reviewed the cod 3NO Conservation Plan and Rebuilding Strategy (CPRS) and proposed a new one that was approved by the Fisheries Commission in 2011. The new reference points values approved for the 3NO cod CPRS were the following: Blim = 60,000 t, Bisr = 120,000 t, Flim = 0.30 and Bmsy = 248,000 t. Concerns were raised on the high uncertainty and the lack of confidence intervals of the reference points. The WGFMS-CPRS agreed that the values of Bisr and Bmsy should be further reviewed by the Scientific Council and the Fisheries Commission. The aim of this document is to revise the approved Fisheries Commission reference points values and provide their confidence intervals. The YPR reference points (Fmax and F0.1) were estimated and as well as the Spawning per Recruit (SPR) reference points for F30%, F35% and F40% of the SSB unfished level. For these reference points, biological uncertainty was incorporated in growth, maturation and in the fishery through variability in the partial recruitment. To incorporate the uncertainty, a bootstrap with 1000 iterations was carried out over the years to the whole period (1959-2009). Maturity, partial recruitment, stock and catch weights were bootstrapped together from the selected year range. The process of calculating the appropriate Maximum Sustainable Yield (MSY) reference points estimates was based on combining the yield per recruit analysis and the stock recruit relationship. Three stock-recruitment models were analyzed: Beverton-Holt, Ricker and Segmented Regression. To include uncertainty in the stock recruitment relationships it was chosen a non-parametric bootstrap. Results show that the uncertainty is bigger for the references points estimated with S/R relationship than the YPR and SPR reference points as we can expected. The lack of fit of the S/R relationships is one of the mayor problems in 3NO cod. All the functions analyzed have clear fit problems: residuals pattern, big errors autocorrelation, not log normal distribution of the errors, problems in the likelihood profiles for the fit parameters and the maximum of the functions are not defined in the observed SSB range. Due to these problems it was proposed to use YPR a SPR reference points as proxies of the MSY reference points in 3NO cod. It could be recommended the use of Fmax (0.30) as proxy of Fmsy and Flim and as Blim a biomass level corresponding to the equilibrium Fmax, around 60,000-70,000 tons. It could be proposed a value around F0.1 (0.195) as a possible Ftarget. A reasonable Btarget could be a value in the upper probability range of the F0.1 equilibrium Biomass (120,000 t). A good candidate for Bisr could be 91,000 t. which is the level of biomass that has the 20% of the probability if we fish with F0.1=Ftarget

    Quality of the Tuning Series in the Assessment of Greenland Halibut Subarea 2 and Divisions 3KLMNO

    Get PDF
    The aim of this paper is to provide a deep study of the quality of the tuning series apply in the Greenland halibut assessment of the NAFO Subarea 2 and Divisions 3KLMNO, as well as to study the feasibility of including the Spanish 3NO survey as tuning fleet in future assessments of this stock. Our results may indicated that the Canadian autumn survey have a clear year effect in the 1995 data for ages older than 6 and that this could be due to the lowest depth coverage of the survey in 1995 compared to 1996-2003. Therefore, shortening the Canadian autumn survey index to 1996-2003 the fit of the data improve and, consequently, we propose to eliminate the 1995 data of this survey in the assessment. The Canadian spring survey showed a big trend in the log q errors of ages 7 and 8, so it could be convenient to study the possibility of shortening the age range of this tuning indices to ages 1 to 6. The study also showed that the fit of the Spanish 3NO survey is not very good for ages less than 5 years old, being the information given good for ages between 5 and 12 and, thus, it could be feasible to include this information in future assessments. The within, between surveys abundance correlations and the correlations between surveys and XSA abundance showed that surveys have many difficulties to track the ages 7, 8 and 9. This lack o tracking could be due related to age reading problems
    corecore