The amount of data in real-time, such as time series and streaming data, available today continues to grow. Being able to analyze this data the moment it arrives can bring an immense added value. However, it also requires a lot of computational effort and new acceleration techniques. As a possible solution to this problem, this paper proposes a hardware architecture for Typicality and Eccentricity Data Analytic (TEDA) algorithm implemented on Field Programmable Gate Arrays (FPGA) for use in data streaming anomaly detection. TEDA is based on a new approach to outlier detection in the data stream context. In order to validate the proposals, results of the occupation and throughput of the proposed hardware are presented. Besides, the bit accurate simulation results are also presented. The project aims to Xilinx Virtex-6 xc6vlx240t-1ff1156 as the target FPGA.
Introduction
Outlier detection or anomaly detection consists in detect rare events in a data set. It is a central problem in many application areas such as time series forecasting, data mining and industrial process monitoring. Due the increasing number of sensors in the most diverse areas and applications, there is a huge raise in the availability of data from time series. Thus, outlier detection for temporal data has become a central problem [1] , especially when data are captured and processed continuously in online way. In this case, the data are considered as data streams [2] .
Some important aspects need to be considered when choosing an anomaly detection method, such as the computational effort to handle large streaming data. Since the received information need to be stored and analyzed without compromising memory and run-time. Many of the solutions presented in the literature require prior knowledge of the process and system, such as mathematical models, data distribution, and predefined parameters [3] . Anomaly detection is traditionally done from statistical analysis, using probability and making a series of initial assumptions that in most cases are not in practice applied.
A disadvantage of the traditional statistical method is comparing a single point with the average of all points rather than comparing with sample or data pairs. This way, the information is no longer punctual and local. Moreover, probability theory was developed from examples where processes and variables are purely random. However, real processes are not purely random and shows dependency between samples. Thus, real problems are addressed from offline processes, where the entire data set needs to be known. Being a potential problem of the traditional method. Another problem with traditional approaches is that they often use an offline dataset. Thus, all samples must be previously available from the beginning of the algorithm execution [4] , making it impossible to use in real-time and data stream applications. This type of data presents new technical challenges and opportunities in new fields of work. Detecting realtime anomalies can provide valuable information in critical scenarios, but it is a high computational demand problem that still lacks reliable solutions capable of providing high processing capabilities.
Typicality and Eccentricity Data Analytic (TEDA) is based on new approach to outlier detection in data stream context [5] and it can applied with an algorithm to detect autonomous behavior in industrial process operation, for example. TEDA analyzes the density of each sample of data read, calculated according to the distance from the sample to the other samples previously read.
It is an online algorithm that learns autonomously without the need for prior knowledge about the process or parameters. Therefore, the computational effort required is smaller, allowing the use in real time applications [3] .
TEDA can be used as an alternative statistical framework for analyzing most data, except for purely random processes. It is based on new metrics, all based on similarity/proximity of data in the data space, not in density or entropy, as in traditional methods. The metrics used with TEDA are typicality, defined in [5] as the extent to which objects are ?good examples? of a concept, and eccentricity, defined as how distinct the object is from the rest of the group. A high eccentricity data has a low typicality and is usually an outlier [3] .
Eccentricity can be very useful for anomaly detection, image processing, fault detection, particle physics, etc. Allows analysis for data samples (which can also be done in real time for data stream) [6] . It is also relevant in clustering processes, since elements of a cluster are naturally opposed to the atypical [5] .
Another area where anomaly detection has been increasingly used is in industry 4.0 projects. One of the challenges of the Industry 4.0 is the detection of production failures and defects [7] . New technologies aim to add value and increase process productivity, but face difficulties in performing complex and massive-scale computing due to the large amount of data generated [8] . The huge accumulation of real time data to flow in a network, for example, can quickly overload traditional computing systems due to the large amount of data that originates from the sensors and the requirement for intensive processing and high performance. The development of specialized hardware presents itself as a possible solution to overcome the bottlenecks, making it possible to create solutions for mass data processing and, at the same time, consider ultralow-latency, low-power, high-throughput, security and ultra-reliable conditions, important requirements for increasing productivity and quality in industry 4.0 processes.
Thinking about the challenges presented, this work proposes a specialized hardware architecture of TEDA for anomaly detection. The development of the hardware technique allows systems to be made even faster than their software counterparts, extending the possibilities of use for situations where time constraints are even more severe. In addition allowing its use in applications with large data processing. The works [9, 10, 11, 12, 13] were developed in hardware, specifically on FPGA, for the acceleration of complex algorithms.
The development of machine learning algorithms in hardware has grown significantly. This is justified from performance data with respect to system sampling times compared to software equivalents. One of the motivations for this work is the possibility of accelerating the TEDA algorithm and handling large data streams, such as streaming and real-time.
In this work, all validation and synthesis results was made using a FPGA Virtex 6 xc6vlx240t1ff1156. The FPGA choice was because it has high performance. Modern FPGAs can deliver performance and density comparable to Application Specific Integrated Circuits (ASICs), without the disadvantages of high development time and enabling reprogramming, as FPGAs have a flexible architecture.
The rest of this paper is organized as follows: This first section presented a introduction about the work explaining the motivation behind it and major contributions. Section 2 discusses some related works and the state of the art.
In Section 3 will be presented a theoretical foundation regarding the TEDA technique. Section 4 presents the implementation description details for the architecture proposed. Section 5 will present the validation and synthesis results of the proposed hardware, as well as comparisons with software implementa-tions. Finally, Section 6 will present the conclusions regarding the obtained results.
Related work
Real-time anomaly detection in data stream has potential applications in many areas. Such as: preventive maintenance, fault detection, fraud detection, signals monitoring, among others. Concepts that can be used in many different ranges of industry, such as information technology, finance, medicine, security, energy, e-commerce, agriculture, social media, among others. In the literature there are some uses of the TEDA technique for anomaly detection and even for classification.
The article presented in [6] shows a proposal for a new TEDA-based anomaly detection algorithm. The proposed method, called by the author σ gap, combines the accumulated proximity information for all samples with the comparison of specific point pairs suspected of being anomalies. Using local spatial distribution information about the vicinity of the suspect point. In the journal, TEDA is compared to an approach using traditional statistical methods, emphasizing that the set of initial assumptions is different. TEDA has been shown to be a generalization of traditional statistics compared to a known analysis, n σ, which is a widely used principle for threshold anomaly detection. The same result was obtained for both approaches, although TEDA does not need the initial assumptions. In addition, for various types of proximity measurements (such as Euclidean, Cosine, Mahalanobis), it has been shown that due to the recursion feature, TEDA is computationally more efficient and suitable for online and real-time applications.
In the work [14] a study is presented about the use of TEDA for fault de- As a fundamental theoretical innovation, TDF and TEDA application areas can range from anomaly detection, grouping, classification, prediction, control, filter regression (similar to Kalman). Practical applications may be even broader, so it is difficult to list them all.
The paper presented in [3] proposes the application of TEDA for fault detection in industrial processes. The effectiveness of the proposal has been demonstrated with two real industrial plants, using data streaming, and compared with traditional failure detection methods. This paper presents a practical ap- The manuscript presented in [2] brings a new algorithm for detecting anomalies based on an online memory sequence algorithm called Hierarchical Temporal Memory (HTM). The performance of the proposed algorithm was evaluated and compared with a set of real time anomaly detection algorithms. Comparative analysis was performed as a way to evaluate anomaly detection algorithms for data streaming. All analyzes were performed from the Numenta Anomaly Benchmark (NAB) [17] , which is a benchmark of actual streaming data.
The paper published by [18] brings a study for anomaly detection in TCP / IP networks. The purpose of the paper is to detect computer network anomalies in the process of virtual machine (VM) live migration from local to cloud, by comparing this approach between TEDA, clustering K-Means, and static analysis. They used the tuple -Source IP, Destination IP, Source Port, and Destination Port -to create a signature process and validate errors, including those of traffic flow hidden in the legitimate network. Testing was done using the SECCRIT (SEcure Cloud Computing for CRitical Infrastructure IThttp://www.seccrit.eu) project dataset, which allows anomalies or environmental attacks to be analyzed with Live Migration and other background traffic conditions. The results demonstrate that the proposed method makes it possible to automatically and successfully detect anomalies in attacks, network port scan (NPS) and network scan (NS). A major difficulty is distinguishing a highvolume attack from a denial of service (DoS) attack, for example. Accuracy and false negative rate calculations were made for comparison with K-Means and the proposed solution, with TEDA having better rates in almost all measurements performed.
As the amount of data that needs to be processed grows exponentially and autonomous systems become increasingly important and necessary. Implementation of machine learning and streaming algorithms have been studying in literature. The work presented in [19] describes how to use run-time reconfiguration on FPGAs to improve the efficiency of streaming data transmission in shared communication channel with real-time applications. The reconfigurable architecture proposed consists of two subsystems: the reconfiguration subsystem, which running the modules, and the scheduling subsystem, that controls which modules are loaded to the reconfiguration subsystem.
Besides, many works in the literature have been studied fault and anomaly detection in hardware. In work [20] , an implementation of target and anomaly detection algorithms for real-time hyper-spectral imaging was proposed on FPGA.
The algorithms were implemented in streaming fashion, similar to this work.
The results, obtained from a Kintex-7 FPGA using fixed point structure, were very satisfactory and demonstrated that the implementation can be used in different detection circumstances. The work [21] presented a study of the impact of Neural Network architectures compared to statistical methods in the implementation of an Electrocardiogram (ECG) anomaly detection algorithm on FPGA.
The fixed point implementation contributes to reduce the amount of needed resources. However, the design was made with High Level Sinthesys (HLS), witch could not optimize the FPGA resources consumption. In relation to the TEDA algorithm, no studies in the literature aimed at exploring its hardware implementation on FPGA were identified to date this paper had been write, which this work proposes to accomplish in a pioneering manner.
TEDA
TEDA was introduced by [22] as a statistical framework, influenced by recursive density estimation algorithms. However, unlike algorithms that uses data density as a measure of similarity, TEDA uses concepts of typicity and eccentricity to infer whether a given sample is normal or abnormal to the dataset.
The methodology used in TEDA does not require the use of a previous data information, and can be applied to problems involving fault detection, clustering, classification, among others [22] .
TEDA is a data structure-based anomaly detection algorithm that aims to generalize and avoid the need for well-known, but very restrictive, initial conditions inherent in traditional statistics and probability theory [23] . The approach presented in the TEDA has some advantages over traditional statistical anomaly detection methods. Its recursive feature allows it to handle large volumes of data, such as data streams, with low computational cost and online, enabling faster processing.
TEDA main features include [6]:
• It is entirely based on data and its distribution in data spaces;
• No previous assumptions are made;
• Limits and parameters does not need to be pre-specified;
• No sample independence required;
• An infinite number of observations are not required.
The typicality of TEDA is the similarity of a given data sample to the rest of the dataset samples to which it belongs. Eccentricity, on the other hand, is the opposite of typicality, which indicates how much a sample is dissociated from the other samples in its set. Thus, an outlier can be defined as a sample with high eccentricity and low typicality, considering a threshold established for comparison. It is important to note that for eccentricity and typicality calculations no parameter or threshold is required.
To calculate the eccentricity of each sample, TEDA uses the sum of the geometric distances between the analyzed sample x k and the other samples in the set. Thus, the higher this value, the greater the eccentricity of the sample, and consequently, the lower its typicality. [6] proposed recursively calculating eccentricity. Thus, the eccentricity, ξ can be expressed as
where k is discreization instant; x k is a input set of N elements in the k-th
µ k x is also a N elements vector, equal to the average of x k at the k-th iteration and [σ 2 ] x k is the variance of x k at the k-th iteration. The calculation of µ k x and [σ 2 ] x k is also recursively done, using the following equation
and
The typicality of a given sample x k , at the k-th iteration, can be expressed as a complement to eccentricity [6] , as follows
In addition, [6] also defined that normalized eccentricity can be calculated as
In order to separate normal state data from abnormal state data, it is necessary to define a comparison threshold. For anomaly detection, the use of the mσ [24] threshold is widespread. However, this principle must first assume the distribution of the analyzed data, such as the Gaussian distribution [6] . Chebyshev inequality can be used for any data distribution, assuming that the probability that the data samples are more than mσ from the average is less than or equal to 1/m 2 , where σ is the standard deviation of the data [25] .
The condition that produces the same results as Chebyshev's inequality, discarding any assumptions about data and its independence, can be expressed as [6] ζ k > m 2 + 1 2k , m > 0
where m corresponds to the comparison threshold. 
Implementation description
In this work, a TEDA FPGA proposal was implemented using Register Transfer Level (RTL) such as works presented in [9, 10, 11, 12, 13] . In the following section characteristics of the proposal will be presented, as well as details regarding processing time. A design overview can be seen in Figure 1 .
Architecture proposal overview
As illustrated in the Figure 1, 
Module I -MEAN
Each n-th MEAN module computes the average of each one of n-th elements vector x k acquired at run time. The implementations is based on Equation 2
and it is detailed in Figure 2 . In addition to receiving the n-th element of vector x k as an input, the MEAN block uses a counter to define the number of sample interaction, k. The implementation uses a comparator block identified at the Figure 2 as MCOMPn witch is used to verify if the system is in the first iteration as Line 3 of Algorithm 1. The MMUXn is a multiplexer that acts as a conditional evaluation, using as selecting value the output of MCOMPn comparator. The register MREGn is storing the n-th µ x k element (µ n k ). The 
Module II -VARIANCE
The VARIANCE module is illustrated in Figure 3 . It computes the variance of x k vector samples by receiving the x k vector itself and its average, µ x k , calculated in the previous MEAN blocks.
The VARIANCE module, as the MEAN module, uses a comparator identified at the Figure 3 as VCOMP1 also to verify if the system is in the first iteration 
As demonstrated in Equation 3, the variance calculation is done recursively.
It is necessary to calculate x k − µ k 2 and to do that, N subtractors (VSUBn) and N multipliers (VMULT1_n) are used, as well a adder (VSUM1) with N inputs. Each element of vector µ x k is subtracted from its respective element in vector x k and the result of this operation is multiplied by itself (squared) and then added to the other results. The x k − µ k 2 value is the multiplied (at VMULT2) by 1/k. It is then added at VSUM2 adder with the variance calculated in the previous iteration, [σ 2 ] x k , multiplied (VMULT3) by (k − 1)/k. From the second iteration on, this value passes through the VMUX1 multiplexer to the VREG1 register, delivering the calculation of the variance value at the VARIANCE block output. The values of x k − µ k 2 and 1/k are also delivered at the output of the VARIANCE block to avoid redundant operations as they will be used in the next block, the ECCENTRICITY block.
Module III -ECCENTRICITY
The ECCENTRICITY module is a simpler block than those previously presented. This is because it uses operations already performed in the VARIANCE block to calculate eccentricity. The geometric distance
is stored in register EREG3 and 1/k is stored in EREG4 register. As the ECCENTRICITY module is the architecture design of Equation 1 (Algorithm 1 line 9) , the variance value [σ 2 ] x k is multiplied by k (EMULT1) and used to divise (EDIV1) the geometric distance (µ x k − x k ) T (µ x k − x k ). This operation output is added to 1/k in the ESUM1 adder, calculating the eccentricity of the samples (ξ k (x)) and delivering to the ECCENTRICITY block output. 
Module IV -OUTLIER
Finally, in the OUTLIER block, the samples are classified into abnormal (outlier = true) or normal (outlier = false). The design module can be seen in 
Processing time
The proposed architecture has an initial delay, d, that can be expressed as
where t c is the system critical path time.
The execution time of the circuit implemented for TEDA algorithm is determined by the system critical path time, t c . So, after the initial delay, the execution time of the proposed TEDA, t T EDA , can be expressed as
thus, in every t T EDA it is possible to obtain the output of a sample inserted, that is, the sample classification as abnormal or normal.
The throughput of the implementation, th T EDA , in samples per second (SPS) can be expressed as
Results
In this section will be presented the hardware validation and synthesis results for the architecture proposed in this work. All cases were validated and synthesized on floating point. Validation results were used to verify the hardware functionality, while synthesis results allow the system to be analyzed for important parameters for the design of hardware architectures such as hardware occupancy and processing time, considering factors such as throughput and speedup.
Validation results
To validate the hardware architecture of the TEDA algorithm, we used the DAMADICS (Development and Application of Methods of the Actuator Diagnosis in Industrial Control Systems) benchmark dataset [26] . The benchmark provides a real data set of the water evaporation process in a Polish sugar factory. It is a plant with three actuators; a control valve, which controls the flow of water in the pipes; a pneumatic motor, which controls variable valve openings and a positioner. This dataset has faults at different times of the day on specific days. There are four different fault types, as shown in Table 1 .
Artificial failures were introduced on specific days to plant operation data.
The dataset has a set of 19 faults in these 3 actuators. As a way to validate the architecture, actuator 1 failures were simulated. Table 2 shows a detailed description of some introduced faults for actuator 1. Figure 6 shows the results obtained for the item 1 signal of Table 2 . Figure 6a illustrates the behavior of two simulated input variables in hardware architecture 
. It is possible to observe that a failure happens between the moments k=58900 and k=59800. In Figure 6b it is possible to observe that there is a sudden change in the behavior of the eccentricity (black curve), surpassing the value of the comparison threshold with m = 3 (red curve). In Figure 7 it is possible to observe the results obtained for the item 7 signal, from Table 2 . As within Figure 6 failure happens between moments k = 37700 and k = 38400. Validation results in hardware architecture were compared with results obtained in a python software implementation of the algorithm TEDA. The hardware architecture was designed with floating point number format.
Synthesis results
After performing to validate the implemented circuit, the hardware synthesis was performed to obtain the FPGA resource occupation report, as well as the critical time information used to calculate the proposed implementation processing time. The floating point synthesis results were obtained for a Xilinx Virtex 6 xc6vlx240t-1ff1156 FPGA. Table 3 presents data related to the hardware occupation of the circuit implemented in the target FPGA. The first column shows the number of multipliers used, the second column displays the number of registers, and the third column shows the number of logical cells used as LUT (n LU T ) throughout the circuit. Analyzing the data presented in Table 3 it can be seen that even using a floating point resolution, which demands a greater amount of hardware resources than a fixed point implementation, only a small portion of the resources were occupied from the target FPGA, with a total of only about 3% from multipliers, less than 1% from registers, and about 7% from logical cells used as LUT. With this, we found that the proposed circuit could also be applied in low cost FPGAs,
Hardware occupation
where the amount of available hardware resources is even smaller. In addition, multiple TEDA modules could be applied in parallel for anomaly detection in the same dataset, in order to further reduce processing time. The data presented in Table 4 are quite expressive. The circuit critical time, which also corresponds in the TEDA run-time, was only t c = 138 ns. Thus, after the 414 ns delay, it is possible to get output for a processed sample sorted every 138 ns, which guarantees a throughput of 7.2 million sorted samples per second. These results indicate the feasibility of using the proposal presented in this work to manipulate large data flows in real time.
Processing time

Platforms comparison
To date, no previous literature has been found to explore TEDA hardware implementations. Thus, this paper presents, for the first time, a proposal to implement the TEDA technique on FPGA. To verify the advantages of the hardware application proposed here over implementations on other software platforms, some comparisons of the FPGA processing time with the processing time of other software implementations were made. Table 5 presents the results of the comparisons made. The first column indicates the hardware used, the second presents the processing time required to obtain the classification of each sample, and the third column, the speedup achieved by the proposal presented in this paper.
The data presented in Table 5 reaffirm the importance of this work. The hardware implementation on FPGA proposed here has been able to achieve speedups of up to 3 million times compared to a Pyhton TEDA implementation 
Conclusion
This work presented a proposal for hardware implementation of the TEDA data streaming anomaly detection technique. The hardware was implemented in RTL using floating point format. Synthesis results were obtained for a Xilinx Virtex 6 xc6vlx240t-1ff1156 FPGA. The proposed implementation used a small portion of the target FPGA resources, besides allowing the results to be obtained in a short processing time. The high speedups obtained in comparison with other software platforms reaffirmed the importance of this work, which is pioneering the hardware implementation of the TEDA technique on FPGA. The proposed architecture is feasible to be used in practical fault detection applications in real industrial processes with severe time constraints, as well as to handle large data volumes, such as data streaming, using low processing time.
