Abstract-Application tasks are complex and diverse, which need HPC (High Performance Computing) to solve them. The computing efficiency is hugely different when they run in different architectures, because different application tasks have different computing features. The PMC (Processor-MemoryCommunication) resource requirements of application tasks are perceived to get PMC operating grain, and the best matched architecture is assigned to the application. Hypergraph is used to describe the application structure and the computing architecture, and its isomorphic principle is utilized to build and demonstrate the super-mixed heterogeneous architecture. Experiments show that reconfigurable architecture based on operator grain perception has characteristic high computing and low energy consumption.
I. INTRODUCTION
HPC (High Performance Computing) application problem is becoming more and more complex and diverse, involving many different calculating types, which include atmospheric science, molecular dynamic, material science, fluid mechanic, quantum chemistry, signal processing, bioinformatics, and other disciplines, such as linear algebra, Fourier transform, search sorting [1] . HPC system is mainly divided into two categories: 1) In a MPP (Massive Parallel Processing) architecture based on homogeneous system, the energy consumption is very serious and the actual running rate is low.
2) The advanced CPUs with accelerator processor, such as the Cell, GPU, FPGA, DSP are composed of heterogeneous special systems and their components can accelerate heterogeneous architecture, which greatly raise the application performance and effectively reduce power consumption. The heterogeneous architecture becoming a HPC is one of the important developing trends. Different dedicated processors have significant differences in term of performance and application field, such as image and floating point operations in GPU, whose efficiency is 50~100 times than CPU, and the combination of general processor and dedicated processor is the key to high performance in heterogeneous system [2] . Different calculating tasks are with different characteristics that need to compute their PMC by resource requirements varying. There are some resource model including the common intensive computing, stream intensive, data intensive computing, I/O intensive tasks, internal mixed parallelism. The same application running in different structures has operational performance differences. The time complexity of matrix multiplication on a serial processor is O(n 3 ), and is O(3n) in a parallel structure Mesh, and is O(n) in the Torus structure. The single architecture is unable to meet many complex application tasks. Reconfigurable computing has been applied in many HPC, and its architecture is used to design a variable structure to adapt different application tasks [3] . Reconfigurable computing makes HPC feasible and further improves the computing power, low power consumption, low cost, and low development cycle, which provides an effective way to solve the technical wall problems.
In this paper, according to some basic algorithm research of commonly used, such as linear algebraic, combinatorial optimization, graph algorithms, and image processing, different computing characteristics need different complicated algorithm and running structure. At the same time, according to the calculation characteristics of various types of processing components, the analysis and calculation of application tasks in the algorithm of particle size of each sub-algorithm are important, and the PMC of each sub-algorithm is particle calculated and the current state of existing components are also computed by the perceptron algorithm. The sub-structure is designed reasonably, and the sub-algorithm is assigned to the sub-structure in order to suit the calculation characteristic to get a computing model for different system structures according to the application of tasks. So the sub-algorithm can realize the highest computing performance, and achieve sub-parts on a high utilization rate.
II. RELATED WORKS
Reconfigurable computing and HPC are hot research at home and abroad. [2] pointed out that whether the integrated circuits develop, HPC or the Internet and memory, about in 2020, there will be encountering insurmountable information technology wall. The classification model of computer architecture was given in [3] , which was proposed for heterogeneous system structure in chip level, node level and system level according to the interconnection model of architecture based on different hardware granularity. [4] proposed an flow model based on the service execution and International Conference on Intelligent Control and Computer Application (ICCA 2016) system designing object. [5] presented a practical parallel computation models named the LogGP synchronization model, which is based on non-exclusive heterogeneous and reflected the impact of heterogeneity and non-exclusive computing environment for concurrent algorithm design and analysis from the system level. A reconfigurable architecture [6] was analyzed and gave the design method for general purpose and special purpose in energy consumption level multi-processor in special applications with 500 times performance improvement, and 70% energy saving. [7] pointed out that a reconfigurable architecture can make the hardware resources behavior to adapt to the special computing requirements on hardware resources level, which provides an interactive mechanism to maximize the use of logic resources. [8] proposed the system level granularity architecture, which put forward reconfigurable computing applications with hybrid interconnected manner in static and dynamic science. [9] worked in a large-scale multiprocessor network, and used optical interconnection equipment to complete the heterogeneous architecture design. [10] proposed that using multi-FPGAs system structure, the fine-grained and coarsegrained partition problem behavior, whose memory space was effectively used. [11] putted forward the calculation of the HPC necessity. The above thesis was presented respectively in the level of hardware, operating system, scheduling algorithm, such as the interconnection of the reconfigurable architecture design, but only single architecture is used to solve different application tasks with different complexities and diversities. There is lack of viewing the application task angle, where system structure may be suitable for its computing features and the calculation of the performance and energy consumption should be taken into account. [12] proposed the viewpoint of "Application deicide its structure, structure decide its effectiveness". According to different applications, revealing the application characteristics of the reality problems, the suitable different variable system structure model can be designed to make the structure suitable for the application. Thus, the optimal target including high computing performance and energy consumption can obtain.
[13] determined the minimum number of hyper edges in a hypergraph and characterized the hyper edges of a k-partitionconnected hypergraph. [14] used hypergraph to assemble all local link structures, and employed HMETIS for hypergraph partitioning. [15] proposed a novel algorithm called hypergraph regularized non-negative matrix factorization which captured intrinsic geometrical structure by constructing a hypergraph instead of a simple graph. [16] presented a low-rank matrix factorization method, which incorporated multiple hypergraph manifold regularization. The hypergraph is introduced to model the local structure of the intrinsic manifold. [17] modeled the haplotype assembly problem using hypergraph partitioning formulations and proposed a novel hypergraph-based haplotype assembly method. [18] introduced the class of cored hypergraph and power hypergraph, and investigated the properties of their Laplacian Eigenvalues.
III. PMC GRAIN ANALYSIS
For many applications in the field of HPC, PMC parts can be analyzed to suit for some basic algorithm. In the same way, the PMC grains of components or sub-structures are also calculated. Perception algorithm is used to calculate grain characteristics of components and application. The matching component or sub-structure is assigned to different applications and reach the targets of high performance and low energy consumption.
A. The basic concept of grain calculating
Grain calculating is applied to perceive the characteristics of tasks and obtain a pattern. Components and sub-structures are also need to calculate their grain features, including numerical algorithms, combinatorial optimization algorithm, fast Fourier transform, image processing. The matrix multiplication can be regarded as a basic grain.
Grain calculating has many characteristics, such as independence, diversity, universality, and intensive features including computation intensive, data intensive, communication intensive, storage intensive. Granularity characteristics involve instruction level fine-grained, process function level granularity, program process coarse granularity, operation service level granularity. Structure characteristics include a branch, loop, order, and pattern feature, which includes cell task pools, stream/serial, and task/data parallel. Some algorithm grains are shown in Table 1 . b) The basic application of high performance computing analysis: The commonly used basic algorithms such as linear algebra, matrix operations, image processing, graph algorithm are studied to get their computing resource requirements, and all kinds of basic PMC calculate grain.
c) The PMC is the relationship between grain: such as the relations between serial, parallel, calculating grain class relationship, and combination relationship, which consider calculating grain of the combination of PMC constraints.
d) For a given application: the relationship between perception and PMC grain calculating are assigned to match components or sub-structure. The types of components use the interconnection between parts and established the relationship with reference to the relationship between sub-algorithms.
e) In order to effectively represent sub-algorithm relationship: the relationship between components and the corresponding matching states are described in this paper by using hypergraph application and architecture algorithm.
IV. HYPERGRAPH DESCRIPTION OF THE APPLICATION ALGORITHM AND ARCHITECTURE
Collaborative application tasks are considered as many subalgorithms, architecture, and its interconnection structure composed of multiple parts. We give hypergraph and system structure definition. 
The attributes are equal to corresponding nodes and edges (or similar), i.e.: 
For a given PMC model, we can get the application algorithm hypergraph through the combination of attributes transform. Then we use the algorithm hypergraph and system structural hypergraph to reach isomorphism.
In this paper, we use a practical application to illustrate the realization process of matching algorithm with its architecture based on the granularity computing of the application task.
In a cloud computing task, there is an application, subalgorithm a 1~a8 work cooperatively. Different granularities compute each seed algorithm. A call exists in each other or the communication relationship between them. The relationship between the sub algorithms is used hypergraph to describe, as shown in Fig. 1 . In Fig. 1 , the white circles represent some sub-algorithms. The black points express edges, which are the relationship of these sub-algorithms. Each application algorithm realized by a plurality of sub-algorithm cooperation can be made the appropriate division and combination according to the correlation pair algorithm. So we can use several sub-cluster algorithms to form a layered hypergraph.
The calculation of size perception is related to the subalgorithm. The sub-algorithm is assigned to the processing unit for the calculation of particle size on the execution. The cluster algorithm is applied in system structure of neutron structure. Based on Hypergraph isomorphism, the properties are layered architecture hypergraph, which are shown in Fig. 2 . In Fig. 2, u 1~u8 can be a processing component or substructure, which are used to calculate the corresponding a 1~a8 sub-algorithms. The processing parts are GPU, Cell, FPGA. The sub-structure is parallel structure of Torus and Mesh. A particular sub algorithm can be assigned to the processing unit for the calculation of particle size or sub-structure executive. A sub-structure is corresponding to a sub-algorithm cluster. The relationship of sub-algorithm cluster within each sub-structure can refer to the corresponding sub-structure within the complete interconnection. The interconnection between subalgorithm clusters is used to complete the interconnection, which produce a layered architecture hypergraph. At this point, the system structure can be a plurality of sub-structures, which can exist in distributed heterogeneous environment of different geographical position.
V. A RECONFIGURABLE ARCHITECTURE BASED ON PERCEPTION OF GRAIN
Through the application calculating, Internet applications and clouds on HPC application can be composed of a variety of algorithms to achieve. At the same time, each algorithm may include several independent functional sub-algorithms.
In order to efficiently implement the application tasks, the choice of algorithm and sub algorithm must consider the application features including task function size, type of service, the resources status, processing unit connected topology and transmission bandwidth. In order to effectively manage multiple sub algorithms, a sub algorithm cluster of sub algorithms are often formed by combined strong correlation.
For a sub-algorithm, based on the perception algorithm, its computational granularity is got. At the same time, the state of related processing part is perceived. The decision algorithm makes the sub-structure suitable for the calculation of particle size. For a single processor component, we can also assign a parallel structure of Mesh and Torus. In a plurality of multi mapping algorithm and the sub-structure, the algorithm is looking for the highest efficiency of mapping relations to obtain the system structure that best fits the application algorithm. The mapping structure is shown in Fig. 3 . Fig. 3 The application and architecture mapping structure In Fig. 3 , app is an application task. dyn-alg express a dynamic algorithm. sub-clust express a sub-algorithm cluster. sub-stru is a sub-structure which runs the sub-algorithm. derivstru is the derived structure. Architecture mapping transformation model and application algorithm are given below, which are suitable for the calculation of particle size.
Definition4: Calculating grain perception transformation model TSM: a 7-tuples TSM= (AH, GH, CM, OG, LD, CE, DS) is described as follows:
a) AH= (A,E,W,D) is an application of algorithm hypergraph. A={a 1 , a 2 ,. .A n } is an algorithm set. E is the edge set, W and D are corresponding to the attribute collection.
b) GH=(G,P,M,N)
is a system of structural hypergraph. G={g 1 , g 2 ,. .g n } is sub-structure set, and its initial edge set is empty. P,M and N are corresponding to the attribute collection. Transform model execution process is listed as follows:
The appropriate clustering algorithm is based on the sub-scale application of algorithms, parts information to get the sub-algorithm clusters hypergraph.
b) Perception phases:
The intelligent sensing algorithm is used to calculate the particle size characteristics and component characteristics and load.
c) The decision-making stage:
The sub-algorithm is P size, M size and component characteristics. The handling algorithm matches its size mode and assign to a sub-structure.
d) Interconnected stages:
The interconnection between components and sub-structure approach is based on PMC middle C granularity attributes and sub-cluster relationships. The sub-structure is composed of interconnected cluster.
For a practical application task, its dynamic algorithm decomposition (several sub-algorithms) and cluster (sub-cluster algorithm) give the number of sub-clusters algorithm using a cluster-aware algorithm and sub-sub-algorithm PMC grain analysis. The child of PMC algorithm is assigned to match the sub-sub-structure algorithm, and the number of sub-structures is according to the principle of super-isomorphism between sub-structures. The relationship between sub-algorithms is according to the sub-cluster algorithm, which obtain the interconnection between the sub-structures derived from the structure. In the same manner, the derived structure of the interconnect architecture of the algorithm is obtained the relationship between the sub-clusters, and finally we can get a hybrid architecture.
For a given application, if the application uses different algorithms, and the different particle of sub-algorithms, the respective assigned sub-structure should be different in different architectures formed, and the operation efficiency will be different. Saturated optimum computing without constraints and with different constraints build multiple hybrid models to guide hybrid reconfigurable architecture to complete the application of high-performance implementation. 
VI. EXPERIMENTAL RESULTS AND ANALYSIS
To illustrate the efficiency of heterogeneous reconfigurable architecture model, simulation experiments are designed. Example1: A given application algorithm comprises the number of the sub-algorithm clusters ranging from 10 to 200, respectively. Each sub-cluster contains three kinds of arithmetic operation, namely KNN (K Nearest Neighbor) algorithm, keyword matching (Cmatch) and Bayesian algorithms. The available components include a general purpose CPU, GPU and other special-purpose processor Table 2 shows the comparison of the complexity and executing time of the three algorithms on a single processor, multi-processor and GPU. n is the input data dimension. m is the training sample data. p is the number of processors.. Example2: A travel service in the cloud has three services (Web Service), mapping service, travel services and weather service. Each Service contains several operations respectively. op 1~o p 10 is the relationship between the solid line for their callings. The dotted line is a dependency. Travel services application architecture is shown in Fig. 4 . When the number of sub-algorithms is designed for 1, 3,4,10 cluster structures in the serial, parallel Mesh, the super hybrid structure TSM (TSM1 is assigned a Mesh substructure, TSM4 is assigned 4 Mesh substructures and TSM10 is assigned 10 Mesh substructures) is comparable shown in Fig. 5 . In Fig. 5 , TSM1 Mesh structure is comparable with the execution time of TSM10 execution time when the number of clusters is 10. When the number of clusters is 1, the execution time of TSM1, TSM4 and TSM10 are considerably as a subalgorithm cluster assigned to only one sub-structure. When the number of clusters is 4, we can allocate 4 sub-structures to deal it. At this time, the amount of computation and communication integrated is optimal and execution time is the shortest.
When the algorithm is decomposed into sub-cluster algorithm, the calculation needs to consider the traffic amount of sub-algorithms. The particle size of sub-divisions is too large or too small, the overall computing performance is affect ed. The optimal clustering scheme assigns the appropriate substructure to perform the corresponding sub-algorithm. At this time, the local performance achieves optimal, and each substructure has basically the same workload. The relationships between sub-algorithms and sub-structures are matched with the appropriate implementation structure of communication components to build reconfigurable system, which enables these applications achieve the overall performance of algorithms. Meanwhile, the work of the various components of the reconfigurable architecture is load balancing, high utilization and low power consumption.
VII. CONCLUSION Based on the characteristics of the application computing tasks, variable architecture can be designed to match them. The basic algorithm uses grain operators and grain PMC analysis. The application-aware algorithm operators obtain task characteristics and features of computing components and subcomputing structures, adaptable algorithms are assigned to match the characteristics of the sub-structures, which can establish a reconfigurable heterogeneous architecture.
In the future we'll delve into the basic algorithm analysis and the extraction of grain pattern, and improve PMC automatic identification and classification of grain to improve the perception and intelligence of decision-making algorithm.
