42 research outputs found
Time-Optimal and Conflict-Free Mappings of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms or algorithms with n nested loops are mapped into (nâl)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k âl)-dimensional arrays where k\u3c.n. For example, many algorithms at bit-level are at least 4-dimensional (matrix multiplication, convolution, LU decomposition, etc.) and most existing bit level processor arrays are 2-dimensional. A computational conflict occurs if two or more computations of an algorithm are mapped into the same processor and the same execution time. In this paper, necessary and sufficient conditions are derived to identify all mappings without computational conflicts, based on the Hermite normal form of the mapping matrix. These conditions are used to propose methods of mapping any n-dimensional algorithm into (kâ l)-dimensional arrays, knâ3, optimality of the mapping is guaranteed
Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in the index set indexes a computation of the algorithm. If the execution of a computation depends on the execution of another computation, then this dependency is represented as the difference between the index vectors of the computations. The dependence matrix corresponds to a matrix where each column is a dependence vector. An independent partition of the index set is such that there are no dependencies between computations that belong to different blocks of the partition. This report considers uniform dependence algorithms with any arbitrary kind of index set and proposes two very simple methods to find independent partitions of the index set. Each method has advantages over the other one for certain kind of application, and they both outperform previously proposed approaches in terms of computational complexity and/or optimality. Also, lower bounds and upper bounds of the cardinality of the maximal independent partitions are given. For some algorithms it is shown that the cardinality of the maximal partition is equal to the greatest common divisor of some subdeterminants of the dependence matrix. In an MIMD/multiple systolic array computation environment, if different blocks of ail independent partition are assigned to different processors/arrays, the communications between processors/arrays will be minimized to zero. This is significant because the communications usually dominate the overhead in MIMD machines. Some issues of mapping partitioned algorithms into MIMD/systolic systems are addressed. Based on the theory of partitioning, a new method is proposed to test if a system of linear Diophantine equations has integer solutions
Detailed Modeling and Reliability Analysis of Fault-Tolerant Processor Arrays
Recent advances in VLSI/WSI technology have led to the design of processor arrays with a large number of processing elements confined in small areas. The use of redundancy to increase fault-tolerance has the effect of reducing the ratio of area dedicated to processing elements over the area occupied by other resources in the array. The assumption of fault-free hardware support (switches, buses, interconnection links, etc.,), leads at best to conservative reliability estimates. However, detailed modeling entails not only an explosive growth in the model state space but also a difficult model construction process. To address the latter problem, a systematic method to construct Markov models for the reliability evaluation of processor arrays is proposed. This method is based on the premise that the fault behavior of a processor array can be modeled by a Stochastic Petri Net (SPN). However, in order to obtain a more compact representation, a set of attributes is associated with each transition in the Petri net model. This representation is referred to as a Modified Stochastic Petri Net (MSPN) model. A MSPN allows the construction of the corresponding Markov model as the reachability graph is being generated. The Markov model generated can include the effect of failures of several different components of the array as well as the effect of a peculiar distribution of faults when the reconfiguration occurs. Specific reconfiguration schemes such as Successive Row Elimination (SRE), Alternate Row-Column Elimination (ARCE) and Direct Reconfiguration (DR), are analyze
COSMIC: A Model for Multiprocessor Performance Analysis
COSMIC, the Combined Ordering Scheme Model with Isolated Components, describes the execution of specific algorithms on multiprocessors and facilitates analysis of their performance. Building upon previous modeling efforts such as Petri nets, COSMIC structures the modeling of a system along several issues including computational and overhead costs due to sequencing of operations, synchronization between operations, and contention for limited resources. This structuring allows us to isolate the performance impact associated with each issue. Finally, studying the performance of a system while executing a specific algorithm gives insight into its performance under realistic operating conditions. The model also allows us to study realistically sized algorithms with ease, especially when they are regularly structured. During the analysis of a system modeled by COSMIC, a set timed Petri nets is produced. These Petri nets are then analyzed to determine measures of the systems performance. To facilitate the specification, manipulation, and analysis of large timed Petri nets, a set of tools has been developed. These tools take advantage of several special properties of the timed Petri nets that greatly reduce the computational resources required to calculate the required measures. From this analysis, performance measures show not only total performance, but also present a breakdown of these results into several specific categories
Destination Tag Routing Techniques Based on a State Model for the IADM Network
A state model is proposed for solving the problem of routing and rerouting messages in the Inverse Augmented Data Manipulator (IADM) network. Using this model, necessary and sufficient conditions for the reroutability of messages are established, and then destination tag schemes are derived. These schemes are simpler, more efficient and require less complex hardware than previously proposed routing schemes. Two destination tag schemes are proposed. For one of the schemes, rerouting is totally transparent to the sender of the message and any blocked link of a given type can be avoided. Compared with previous works that deal with the same type of blockage, the timeXspace complexity is reduced from O(logN) to O(1). For the other scheme, rerouting is possible for any type of link blockage. A universal rerouting algorithm is constructed based on the second scheme, which finds a blockage-free path for any combination of multiple blockages if there exists such a path, and indicates absence of such a path if there exists none. In addition, the state model is used to derive constructively a lower bound on the number of subgraphs which are isomorphic to the Indirect Binary N-Cube network in the IADM network. This knowledge can be used to characterize properties of the IADM networks and for permutation routing in the IADM networks
Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education
This paper introduces Archer, a community-based computing resource for
computer architecture research and education. The Archer infrastructure
integrates virtualization and batch scheduling middleware to deliver
high-throughput computing resources aggregated from resources distributed
across wide-area networks and owned by different participating entities in a
seamless manner. The paper discusses the motivations leading to the design of
Archer, describes its core middleware components, and presents an analysis of
the functionality and performance of a prototype wide-area deployment running a
representative computer architecture simulation workload.Comment: 11 pages, 2 figures. Describes the Archer project,
http://archer-project.or
Down-Regulation of hsa-miR-10a in Chronic Myeloid Leukemia CD34+ Cells Increases USF2-Mediated Cell Growth
MicroRNAs (miRNA) are small noncoding,
single-stranded RNAs that inhibit gene expression at a
posttranscriptional level, whose abnormal expression
has been described in different tumors. The aim of our
study was to identify miRNAs potentially implicated
in chronic myeloid leukemia (CML). We detected an
abnormal miRNA expression profile in mononuclear and
CD34+ cells from patients with CML compared with
healthy controls. Of 157 miRNAs tested, hsa-miR-10a,
hsa-miR-150, and hsa-miR-151 were down-regulated,
whereas hsa-miR-96 was up-regulated in CML cells.
Down-regulation of hsa-miR-10a was not dependent
on BCR-ABL1 activity and contributed to the increased
cell growth of CML cells. We identified the upstream
stimulatory factor 2 (USF2) as a potential target of
hsa-miR-10a and showed that overexpression of USF2
also increases cell growth. The clinical relevance of
these findings was shown in a group of 85 newly
diagnosed patients with CML in which expression of
hsa-miR-10a was down-regulated in 71% of the patients,
whereas expression of USF2 was up-regulated in 60% of
the CML patients, with overexpression of USF2 being
significantly associated with decreased expression of
hsa-miR-10a (P = 0.004). Our results indicate that
down-regulation of hsa-miR-10a may increase USF2 and
contribute to the increase in cell proliferation of CML
implicating a miRNA in the abnormal behavior of CML
Omecamtiv mecarbil in chronic heart failure with reduced ejection fraction, GALACTICâHF: baseline characteristics and comparison with contemporary clinical trials
Aims:
The safety and efficacy of the novel selective cardiac myosin activator, omecamtiv mecarbil, in patients with heart failure with reduced ejection fraction (HFrEF) is tested in the Global Approach to Lowering Adverse Cardiac outcomes Through Improving Contractility in Heart Failure (GALACTICâHF) trial. Here we describe the baseline characteristics of participants in GALACTICâHF and how these compare with other contemporary trials.
Methods and Results:
Adults with established HFrEF, New York Heart Association functional class (NYHA)ââ„âII, EF â€35%, elevated natriuretic peptides and either current hospitalization for HF or history of hospitalization/ emergency department visit for HF within a year were randomized to either placebo or omecamtiv mecarbil (pharmacokineticâguided dosing: 25, 37.5 or 50âmg bid). 8256 patients [male (79%), nonâwhite (22%), mean age 65âyears] were enrolled with a mean EF 27%, ischemic etiology in 54%, NYHA II 53% and III/IV 47%, and median NTâproBNP 1971âpg/mL. HF therapies at baseline were among the most effectively employed in contemporary HF trials. GALACTICâHF randomized patients representative of recent HF registries and trials with substantial numbers of patients also having characteristics understudied in previous trials including more from North America (n = 1386), enrolled as inpatients (n = 2084), systolic blood pressureâ<â100âmmHg (n = 1127), estimated glomerular filtration rate <â30âmL/min/1.73 m2 (n = 528), and treated with sacubitrilâvalsartan at baseline (n = 1594).
Conclusions:
GALACTICâHF enrolled a wellâtreated, highârisk population from both inpatient and outpatient settings, which will provide a definitive evaluation of the efficacy and safety of this novel therapy, as well as informing its potential future implementation
Tree Structured Grobner Basis Computation on Parallel Machines
With the advent of symbolic mathematical software packages such as Maple, Mathematics, and Macsyma, symbolic computation has become widely used in many scientific applications. Though a significant effort has been put in performing numeric computation on multiprocessors, symbolic computation on parallel machines is still in an unexplored state. However, symbolic mathematical applications are ideal candidates for parallel processing, because they are computationally intensive. This paper considers the parallel computation of Grobner basis, a special basis for a multivariate polynomial ideal over a field that plays a key role in symbolic computation. Large Grobner basis computation poses a challenging problem due to its dynamic data dependent behavior and resource-intensiveness. In an attempt to meet this challenge, a new tree structured approach for Grobner basis computation in parallel is proposed in this paper. It constructs the Grobner basis of a set of polynomials from Grobner basis of its subsets. The tree structured approach proposed in this paper lends itself to parallel implementation and significantly reduces the computation time of large Grobner basis. Finally, experimental results illustrating the effectiveness of the new approacll are provided
Hardware Support for Data Dependence Speculation in Distributed Shared-Memory Multiprocessors Via Cache-block Reconciliation
Data dependence speculation allows a compiler to relax the constraint of data-independence to issue tasks in parallel, increasing the potential for automatic extraction of parallelism from sequential programs. This paper proposes hardware mechanisms to support a data-dependence speculative distributed shared-memory (DDSM) architecture that enable speculative parallelization of programs with irregular data structures and inherent coarse-grain parallelism. Efficient support for coarse-grain tasks requires large buffers for speculative data; DDSM leverages cache and directory structures to provide large buffers that are managed transparently from applications. The proposed cache and directory extensions provide support for distributed speculative versions of cache blocks, run-time detection of dependence violations, and program-order reconciliation of cache blocks. This paper describes the DDSM architecture and presents a simulation-based evaluation of its performance on five benchmarks chosen from the Spec95 and Olden suites. The proposed system yields simulated speedups of up to 12.5 xn a 16-node configuration for programs with coarse-grain speculative windows (millions of instructions and hundreds of KBytes of speculative data)