Reconfigurable Computing Applied to Latency Reduction for the Tactile
  Internet by Junior, José C. V. S. et al.
Reconfigurable Computing Applied to
Latency Reduction for the Tactile
Internet
JOSÉ C. V. S. JUNIOR1, MATHEUS F. TORQUATO2, TOKTAM MAHMOODI3, MISCHA
DOHLER3, (FELLOW, IEEE) AND MARCELO A. C. FERNANDES1,4,5
1Laboratory of Machine Learning and Intelligent Instrumentation (LMLII), nPITI-IMD, Federal University of Rio Grande do Norte, 59078-970, Natal, Brazil.
2College of Engineering, Swansea University, Swansea, Wales SA2 8PP, UK.
3Centre for Telecommunications Research, Department of Engineering, KingâA˘Z´s College London, London WC2R 2LS, UK.
4Department of Computer and Automation Engineering, Federal University of Rio Grande do Norte, Natal, 59078-970, Brazil.
5(Current address) John A. Paulson School of Engineering and Applied Sciences, Harvard University, 02138, Cambridge, MA, USA.
Corresponding author: Marcelo A. C. Fernandes (mfernandes@dca.ufrn.br or macfernandes@seas.harvard.edu ).
This study was partially financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)—Finance Code 001.
ABSTRACT Tactile internet applications allow robotic devices to be remotely controlled over a com-
munication medium with an unnoticeable time delay. In a bilateral communication, the acceptable round
trip latency is usually in the order of 1ms up to 10ms depending on the application requirements. It is
estimated that 70% of the total latency is generated by the communication network, and the remaining
30% is produced by master and slave devices. Thus, this paper aims to propose a strategy to reduce the
30% of total latency that is produced by such devices. The strategy is to apply reconfigurable computation
using FPGAs to minimize the execution time of device-associated algorithms. With this in mind, this
work presents a hardware reference model for modules that implement nonlinear positioning and force
calculations as well as a tactile system formed by two robotic manipulators. In addition to presenting the
implementation details, simulations and experimental tests are performed in order to validate the proposed
model. Results associated with the FPGA sampling rate, throughput, latency and post-synthesis occupancy
area are analyzed.
INDEX TERMS Tactile Internet, Latency Reduction, Haptic Devices, Reconfigurable Computing, FPGA.
I. INTRODUCTION
THE Tactile internet is conceptually defined as the newgeneration of internet connectivity which will combine
very low latency with extremely high availability, reliability
and security [1]. Another feature pointed out is that this
new generation will be centered around applications that use
human-machine communications (H2M) alongside devices
that are compatible with tactile sensations [2], [3].
A tactile internet environment is basically composed of
a local device (known as a master) and a remote device
(known as a slave), where the master device is responsible
for controlling the slave device over the internet through a
two-way data communication network [4] [5]. Bidirectional
communication is needed to simulate the physical laws of
action and reaction, where action can be represented as send-
ing operational commands and reaction can be represented
as the forces resulting from that action. In tactile internet
applications, the desired time delay for device communi-
cation is characterized by an ultra-low latency. In bilateral
communication, the required round trip latency ranges from
1ms up to 10ms depending on the application requirements
[6]–[9].
According to [10], it can be noticed that in a tactile internet
application, 30% of the total system latency is generated
by the master and slave devices. These devices demand
high processing speeds as repeated execution of a variety
of computationally expensive algorithms and techniques are
required. These algorithms involve the use of arithmetic
operations and calculations of linear and nonlinear equations
that need to be computed at high sampling rates in order
to maintain application fidelity. The remaining 70% of the
latency is caused by the communication network, which
makes them unsuitable for such latency constraints [11].
To minimize this problem, some research groups have been
studying prediction techniques, where many algorithms have
been studied and proposals using artificial intelligence (AI)
have proved to be effective [12]. On the other hand, the
implementation of complex AI-based prediction methods can
1
ar
X
iv
:2
00
3.
12
46
3v
1 
 [c
s.O
H]
  1
2 M
ar 
20
20
further increase the latency of the computer systems present
in master and slave devices.
Alternatively, new approaches such as reconfigurable com-
puting can improve the performance of master and slave
devices in a tactile system environment. Reconfigurable com-
puting with field-programmable gate arrays (FPGAs) enables
the creation of customizable hardware which allow algo-
rithms to be parallelized and optimized at the logical gate
level to speed up their operations. Literature results show that
computationally expensive algorithms can achieve speedups
of up to 1000× over software implementations when custom-
implemented in FPGAs [13]–[19].
In this context, this paper proposes an implementation
to target reducing the 30% of the total latency related to
tactile devices. The project uses reconfigurable computation
in FPGA to minimize the execution time of algorithms as-
sociated with master and slave devices. The use of reconfig-
urable computing allows the parallelization of algorithms and
latency reduction compared to software systems embedded
in traditional architectures with general purpose processors
and microcontrollers. In an effort to validate the proposed
strategy, this paper presents a discrete reference model that
can be adjusted for different types of master and slave devices
in a tactile internet system. Validation results, throughput,
and post-synthesis figures obtained for the proposed hard-
ware implementation using FPGA reconfigurable computing
are presented. Comparisons with other works in the literature
show that the use of reconfiguration computing can signifi-
cantly accelerate the processing speed in tactile devices.
II. RELATED WORK
The authors of [20] presented a tactile internet environment
that used a glove type device in conjunction with a robotic
manipulator. The environment was developed using a gen-
eral purpose processor, which made the execution of the
algorithms sequential. In order to send the data, the tactile
glove produced a latency of approximately 4.82ms, and the
hardware responsible for performing the inverse kinematics
calculations took an interval of 0.95ms. The latency values
obtained in this application could be improved by hardware
structures that allow algorithms parallelization.
Studies in the literature demonstrate the benefit of using
FPGA to accelerate the sample rate for data acquisition from
devices associated with haptic systems. The authors of [21]
presented an implementation for controlling a 3-DoF (Degree
of Freedom) device. The presented technique proposed to
increase the device sampling rate using FPGA hardware
together with a real-time operating system (RTOS) in order to
increase the resolution acquisition of the stiffness sensor. The
control technique presented was developed in 32-bit fixed
point, and trigonometric functions were implemented using
lookup tables.
The work described in [22] presented a control system for
one-dimensional haptic devices (1-DoF). The FPGA control
implementation used single-precision floating point repre-
sentation (IEEE std 754) and the algorithms performed all
calculations in 50µs. The processing time was satisfactory;
however, the data frame size to be sent over the network
increased with the size of the DoF. This peculiarity can
increase latency for more complex haptics systems with
many DoFs. In the same topic of previous works, an imple-
mentation for bilateral control of single-dimensional haptic
devices (1-DoF) was presented in [23]. A more accurate
control techniques based on the sliding mode control (SMC)
was implemented in FPGA, and to assist in performing the
complex calculations, the CORDIC (COordinate Rotation
DIgital Computer) was used. The hardware was designed to
locally control two devices, one master and one slave. In the
implementation, a 24-bit fixed point was used, of which 9 bits
in the integer part and 14 bits for the fractional, and the total
execution time of the controllers was of 7.2375µs.
The works [21], [22] and [23] presented a control that
depends directly on the encoder reading of the device mo-
tors. Usually in commercial models, accessing the device
electronics can be tricky requiring some reverse engineering
and specific knowledge to make the appropriate encoder
connections. On the other hand, some works abstract the data
acquisition and work directly with robotics algorithms. These
algorithms may require high computational power that can
surpass the capabilities of many general-purpose processors
(GPPs) that perform the operations sequentially.
Some studies demonstrate the benefit of using FPGA to
accelerate robotic manipulation algorithms related to haptic
systems. A hardware architecture implemented in FPGA for
performing the forward kinematics of 5-DoF robots using
floating point arithmetic was described in [24]. In this hard-
ware implementation all the forward kinematics calculations
were performed within 1.24µs which represents 67 clock
cycles in a frequency of 54MHz. The equivalent software
implementation has a total processing time of 1.61036ms.
Overall, the hardware implementation is 1298× faster than
the software implementation, which means a considerable
acceleration in the forward kinematics processing time.
The authors of the paper [25] presented an FPGA im-
plementation of inverse kinematics, velocity calculation and
acceleration of a 3-DoF robot. Three systems were created:
the first one did not use any arithmetic co-processor and
floating point operations were performed in software; in
the second system a floating point co-processor was used
which allowed the execution of the four basic mathematical
operations in hardware; lastly, the third system also had a
custom arithmetic co-processor but in this case it allowed
hardware computation of square root. The overall times to
perform the calculations were 2324µs, 560µs and 143µs and
the total logic elements used from the entire device were
4501 (4%), 5840 (5%) and 7219 (6%), respectively. The work
uses hardware-software to implement inverse kinematics, in
which critical parts were implemented in FPGAs to acceler-
ate the whole process.
In [26] is presented a hardware to control a 6-DoF device
using 32-bit fixed point representation, where 21 bits were
used for the fractional part and 11 bits for the integer part.
2
In that work, a CORDIC implementation was used to assist
in performing the trigonometric calculations. The total time
spent to compute the forward kinematics was 3µs and for the
inverse kinematics the time was 4.5µs for a clock of 50MHz.
However, in the presented proposal, some calculations were
performed sequentially, that is, for the execution of the for-
ward kinematics it was necessary 150 clock cycles and for
the inverse, 225 cycles. The use of partial parallelization in
the execution of robotic manipulation algorithms provided
a significant increase in system throughput. Nevertheless, it
is important to note that there is still room for improvement
since all calculations can be computed in parallel.
Another hardware implementation of inverse kinematics
was presented in [27]. The device used was a 10-DoF biped
robot. A CORDIC implementation was used to perform the
trigonometric calculations. The execution time needed to
compute the kinematics of the 10 joints in FPGA was of
0.44µs. In this paper, a comparison with a software imple-
mentation was also performed, and the time taken to perform
the same calculations was 3342µs, i.e. the gain on execution,
or speedup, on custom FPGA hardware was 7595×. The
resulting error between both implementations was acceptable
for this specific control.
In [28] it was presented an FPGA implementation of the
forward and inverse kinematics of a 5-DoF device. The
hardware was developed using a fixed point representation
where 32 bits were used for the angles representation and 15
bits for the fractional part. For the device spatial positioning,
16 bits were used of which 7 bits for the fractional part. In the
implementation of trigonometric functions, a combination
of techniques using lookup tables (LUTs) and Taylor series
was used. To perform the necessary calculations, a finite-
state machine model (FSM) was used to reduce the use of
hardware resources, however, the use of such FSM generated
a sequential computation of the robotic manipulation algo-
rithms. In this model, the forward kinematics implementation
achieved a runtime of 680 ns and the inverse 940 ns, that is,
for the 50MHz clock, the forward kinematics took 34 clock
cycles and the inverse kinematics took 47 cycles. Using such
approaches to reduce the use of hardware resources increases
computation runtime. For tactile device applications, it is
important to optimize the runtime rather than the use of
hardware resources.
Similarly, an FPGA implementation of forward and in-
verse kinematics for a 7-DoF device was presented in [29],
however, only 3-DoF required to control the device move-
ment were implemented in hardware. The proposal used a
32-bit fixed point representation and a CORDIC was used to
execute the trigonometric functions. To validate the proposal,
the FPGA was set to receive the three reference angles, per-
form the forward kinematics and then the inverse. The model
was developed based on pipeline and the operating frequency
used was of 100MHz. As a result, the model calculation
took 2µs to perform the entire kinematics algorithm, which
represented 200 clock cycles.
In this context, it is possible to realize that the use of
reconfigurable FPGA-based computing can accelerate haptic
device control algorithms. Unlike traditional hardware that
processes information sequentially, FPGA enables parallel
information processing. However, most studies from the liter-
ature have developed partially parallel implementations, that
is, implementations in which parts of the used algorithms
are executed sequentially. Unlike the researches previously
mentioned, this study presents a new approach in which
the execution of the robotic manipulation algorithms are
performed in a full-parallel hardware implementation. This
proposed implementation provides a latency reduction for the
tactile devices and enables tactile internet applications.
III. DISCRETE MODEL OF THE TACTILE INTERNET
A discrete model of the tactile internet system is proposed
and presented in Figure 1. This model consists of seven sub-
systems: the Operator (OP), Master Device (MD), Hardware
of the MD (HMD), Network (NW), Hardware of the SD
(HSD), Slave Device (SD) and the Environment (ENV). It
is assumed that the signals are sampled at time ts.
The OP is an entity responsible for generating stimuli
that can be in the form of position signals, speed, force,
image, sound or any other. These stimuli are sent to the
devices involved so that some kind of task can be performed
in some kind of environment. The environment, the ENV
subsystem, receives the stimuli from the OP and generates
feedback signals associated with sensations such as reactive
force information and tactile information that are sent back
to the OP. The interaction between the OP and the ENV is
performed through the master and slave devices, MD and SD,
respectively.
Specifically in this work, MD is characterized as a local
device, SD as remote one and both of them are responsible for
transforming the stimuli and sensations associated with OP
and ENV into signals to be processed. Tactile devices (MD
and SD) can take the form of robotic manipulators, haptic
devices, tactile gloves and others that may be developed in
the future. In the coming years, the introduction of new types
of sensors and actuators is expected that will form the basis
for the development of new tactile devices.
Although there are no tactile internet standards nor prod-
ucts yet, it can be affirmed that future tactile devices will
be integrated with a hardware responsible for all operational
metrics and calculations. Within this conjecture, this work
adds a couple of modules to the discrete model (as per
Figure 1), called HMD and HSD. HMD is responsible for
performing all transformations and calculations associated
with MD, and HSD performs the equivalent operations for
the SD. Several algorithms associated with transformation,
compression, control, prediction will be under the responsi-
bility of these two modules.
Based on the model presented in Figure 1, the signals
generated by the OP can be characterized by the array a(n)
expressed as
a(n) = [a1(n), . . . , ai(n), . . . , aNOP (n)] , (1)
3
OP MD HMD HSD SD ENV
a(n)
o(n)p(n)
u(n)
g(n)
b(n)
Network
c(n)
h(n)
v(n)
q(n)
a(n)ˆ
o(n)ˆ
FIGURE 1. Proposed discrete model of the tactile internet system.
where ai(n) is the i-th stimulus at the n-th instant and NOP
is the total number of stimuli signals generated by the OP.
At every n-th moment the stimulus array, a(n), is received
by the MD which transforms the stimuli into a set of NMD
signals expressed as
b(n) = [b1(n), . . . , bi(n), . . . , bNMD (n)] , (2)
where bi(n) is the i-th signal generated by MD at the n-th
instant. It can be stated that at each n-th moment a set of
stimuli a(n) generates a set of signals b(n) that depend on
the type of MD and the sensor set associated with the device.
Especially important is the fact that the signals generated by
MD, b(n), have heterogeneous characteristics in which each
i-th signal bi(n) can represent an angle, spatial coordinate,
pixel of an image, audio sample or any other information
associated with a stimulus generated by OP. In practice, the
signals grouped by the b(n) array originate from sensors
coupled to the MD and the amount of data may vary accord-
ing to the amount of information to be sent, NMD.
The set of signals, expressed by b(n) are sent to the
HMD (Figure 1) which has the function of processing this
information before sending it to the NW subsystem. Cal-
culations associated with calibration, linear and nonlinear
transformations and signal compression are performed by the
HMD. Essentially the majority of the computational effort of
MD is in this subsystem. At each n-th instant ts the HMD
processes the array b(n) generating an information array
c(n) expressed by
c(n) =
[
c1(n), . . . , ci(n), . . . , cNfHMD
(n)
]
, (3)
where ci(n) is the i-th signal generated by HMD towards
the subsystem NW at the n-th instant ts and N
f
HMD is the
numbers of signals.NfHMD < NMD is expected to minimize
latency during the transmission in the NW subsystem.
The NW subsystem, as shown in Figure 1, characterizes
the communication medium that links OP to ENV. In this
model, the data propagates through two different channels
called the forward channel, that transmits the OP data to-
wards the ENV, and the backwards channel, that transmits
the ENV signals towards the OP. The signal transmitted by
the forward and backwards channels may be disturbed and
delayed. In the case of the forward channel, the received
signal, v(n), may be expressed as
v(n) =
[
v1(n), . . . , vi(n), . . . , vNfHMD
(n)
]
, (4)
where
vi(n) = ci
(
n− dfi (n)
)
+ rfi (n) (5)
in which, rfi (n) represents the added noise and d
f
i (n) rep-
resents a delays associated with the i-th information sent
in c(n). In this model, the noise can be characterized as a
random Gaussian variable of zero mean and σ2rf variance,
and the delays are characterized as integers, that is, they occur
at a granularity of ts. It is important to note that the NW
subsystem can take the shape of the Internet, a metropolitan
network (MAN), a local area network (LAN), or even a direct
connection between an MD and a workstation or computer.
As shown in Figure 1, the HSD receives the v(n) signal
through the forward channel and has the role of generating
control signals to the SD through the signal
u(n) =
[
u1(n), . . . , ui(n), . . . , uNfHSD
(n)
]
, (6)
where NfHSD is the number of control signals and ui(n) is
i-th control signal at the n-th instant ts associated with the
array u(n). It is important to note that there may be various
types of SD: from real robotic handlers to virtual tools in
computational environments. Thus, it can be stated without
loss of generality that HSD can perform an inverse processing
to HMD in addition to specific algorithms associated with
the type of SD. For example, if the SD is a robotic handler,
HSD must additionally implement closed loop control algo-
rithms, whereas if SD is a virtual arm HSD must implement
positioning algorithms for a given virtual reality platform. SD
does not have to correspond directly with MD, e.g. MD can
be a glove while SD is a drone. However, it is desirable that
the stimulus generated by the SD is a copy of the stimulus
generated by the OP, that is, within the model presented in
Figure 1, it can be understood that SD generate a signal
expressed as
aˆ(n) = [aˆ1(n), . . . , aˆi(n), . . . , aˆNOP (n)] , (7)
where aˆi(n) is an estimate of the i-th stimulus ai(n) gener-
ated by the OP. Thus, the estimate of the stimulus generated
by OP, aˆi(n), is applied to the ENV subsystem representing a
given real or virtual environment in which OP is interacting.
In the backwards direction, the stimulus actions generated
by OP, a(n), and represented by aˆ(n), receives a group of
reactions from the ENV subsystem that can be characterized
in the model by the set of signals expressed by
o(n) = [o1(n), . . . , oi(n), . . . , oNENV (n)] , (8)
where NENV is the number of stimulus signals and oi(n) is
i-th stimulus signal at the n-th instant ts. Reaction signals
grouped into o(n) can be in the form of strength, touch,
temperature, etc.
4
Reaction signals are captured by the SD that turns this in-
formation into electrical signals from real or virtual sensors,
if the SD is in a virtual reality environment. After capturing
this information the SD transmits these signals to the HSD.
In the model presented in Figure 1, the signals generated by
the SD are expressed as
g(n) = [g1(n), . . . , gi(n), . . . , gNSD (n)] , (9)
where gi(n) is the i-th signal generated by the SD at the n-
th instant of time, ts and NSD is the amount of signals. The
HSD in turn processes this information and sends to the NW
subsystem through the array h(n), expressed by
h(n) =
[
h1(n), . . . , hi(n), . . . , hNbHSD (n)
]
, (10)
where hi(n) is the i-th signal generated by HSD at the n-th
instant of time, ts and N bHSD is the amount of signals.
The signal received by the HMD through the backwards
channel of the NW subsystem can be expressed as
q(n) =
[
q1(n), . . . , qi(n), . . . , qNbHSD (n)
]
, (11)
where
qi(n) = hi
(
n− dbi (n)
)
+ rbi (n) (12)
in which, rbi (n) represents an added noise and d
b
i (n) repre-
sents a delay associated with the i-th information transmitted
in q(n) by the backwards channel. Similarly to the forward
channel, noise can also be characterized as a random variable
Gaussian of zero mean and variance σ2rb and delays are char-
acterized as integers with ts granularity. The HMD processes
the q(n) signal information and generates a set of control
signals that will act on the MD and can be characterized as
p(n) =
[
p1(n), . . . , pi(n), . . . , pNbHMD (n)
]
, (13)
where pi(n) is the i-th signal generated by the HMD at the
n-th instant of time ts and N bHMD is the number of signals.
The MD in turn will synthesize the reaction stimuli generated
by the environment, i.e. the ENV subsystem. Based on the
model, it is possible to characterize these reaction stimuli as
a signal expressed by
oˆ(n) = [oˆ1(n), . . . , oˆi(n), . . . , oˆNENV (n)] , (14)
where oˆi(n) is an estimate of the i-th stimulus oi(n) gen-
erated in the ENV subsystem. Examples of reaction stimuli
generated or synthesized by MD are touch, strength and
temperature.
In addition to the latency associated with the NW subsys-
tem that characterizes the communication medium between
the OP and ENV subsystems, the MD, HMD, HSD, and SD
subsystems also add latency to the system. Based on the work
presented in [10], [11] these components represent 30% of
total latency. The latency of the MD and SD subsystems are
associated with sensors and actuators that can be mechanical,
electrical, electromechanical and other variations. HMD and
HSD latencies are associated with the processing time of the
algorithms in these devices and depending on the type of
hardware and implementation architecture this latency can be
considerably reduced.
IV. PHANTOM OMNI DEVICE MODEL (MD & SD)
Based on the scheme presented in Figure 1, this section
presents details associated with the MD and SD used as
reference for the hardware system proposed in this research.
The MD and SD are characterized as a three degree of
freedom robotic manipulator, 3-DoF, called the PHANToM
Omni [30] (Figure 2). The PHANToM Omni has been widely
used in literature as presented in [31] and [32]. In this work
two of this devices are going to be used: one as an MD and
the other as a SD.
Tool
y
x
z
θ1
θ2
θ3
FIGURE 2. PHANToM Omni - MD and SD.
As can be seen from Figure 3, the PHANToM Omni phys-
ical structure is formed by a base, an arm with two segments
L1 and L2 which are interconnected by three rotary joints θ1,
θ2 and θ3 and a tool. The variables presented in Figure 3 are
represented by: L1 = 0.135mm, L2 = L1, L3=0.025mm and
L4 = L1+A where A=0.035mm as described in [33]. These
detailed features of the device are essential for performing
the kinematics and dynamic calculations.
θ2
L1
θ3
θ1
L4
L3
L2
A
y
x
z
FIGURE 3. PHANToM Omni structure - MD and SD.
A. FORWARD KINEMATICS
The kinematics of manipulative devices makes use of the
relationship between operational coordinates and joint co-
ordinates. Forward kinematics (FK) correlates the angular
variables of the joints with the Cartesian system. That is,
5
given an array of joint coordinates it is possible to determine
the spatial position of the tool through the equation that can
be expressed by
x = − sin(θ1)(L2 sin(θ3) + L1 cos(θ2)), (15)
y = −L2 cos(θ3) + L1 sin(θ2) + L3, (16)
z = L2 cos(θ1) sin(θ3)
+L1 cos(θ1) cos(θ2)− L4
(17)
where x, y and z are variables that determine the spatial
position of the tool in the Cartesian plane.
B. INVERSE KINEMATICS
In the inverse kinematics (IK), the relationship between the
joint angles and the Cartesian system is reversed, that is,
given the spatial position of the tool it may be possible to
determine the joint coordinates. The solution to this process
is not as straightforward as in the direct kinematics. In direct
kinematics, the position of the tool is determined solely
by the displacements of the joints. In inverse kinematics,
equations are composed of nonlinear calculations formed
by trigonometric functions. Depending on the manipulator
structure, multiple solutions may be possible for the same
tool position, or there may be no solution for a particular set
of tool positions. Based on the works [34], [35] and [33], the
value of θ1 can be defined through the equation expressed by
θ1 = −atan2 (x, z + L4) (18)
where x and z represent coordinates in the Cartesian plane
and L4 corresponds to the size of the the arm segments, as
shown in Figure 3.
To calculate the other two joints θ2 and θ3 it is necessary
to perform intermediate calculations. Thus, one can obtainR,
r, β, γ and α through the equations
R =
√
x2 + (z + L4)2, (19)
r =
√
x2 + z + L4)2 + (y − L3)2, (20)
γ = acos
(
L21 − L22 + r2
2L1r
)
, (21)
β(n) = atan2(y − L3, R), (22)
and
α = acos
(
L21 + L
2
2 − r2
2L1L2
)
. (23)
After performing the intermediate calculations it is possi-
ble to calculate θ2 through the equation
θ2 = γ + β. (24)
Finally, the value corresponding to the θ3 joint can be ob-
tained through the equation
θ3 = θ2 + α− pi
2
. (25)
C. KINESTHETIC FEEDBACK FORCE
The kinesthetic feedback force allows the environment to be
"felt", i.e. when the SD comes into physical contact with an
object, the MD will receive a counter force. This model can
be implemented through the equation
τ = JTF, (26)
where τ defines the torque array that will be applied to each
joint (θ1, θ2 and θ3) of the PHANToM Omni associated with
the MD, JT is the transpose of the Jacobian matrix and F
is the force array resulting from the interaction of SD with
ENV. The torque array τ can be expressed as
τ = [τ1, τ2, τ3] . (27)
The J Jacobian matrix incorporates structural information
about the handler and it is identified as
J =
J11 J12 J13J21 J22 J23
J31 J32 J33
 , (28)
where
J11 = − cos(θ1)(L2 sin(θ3) + L1 cos(θ2)), (29)
J21 = 0, (30)
J31 = −L1 cos(θ2) sin(θ1)− L2 sin(θ3) sin(θ1), (31)
J12 = L1 sin(θ1) sin(θ2), (32)
J22 = L1 cos(θ2), (33)
J32 = −L1 sin(θ2) cos(θ1), (34)
J13 = −L2 sin(θ1) cos(θ3), (35)
J23 = L2 sin(θ3), (36)
and
J33 = L2 cos(θ3) cos(θ1). (37)
The force array F is expressed as
F = [Fx, Fy, Fz] (38)
and can be obtained through sensors internal or external to
the device. According to (26), the τ torque array representing
the resulting force at each joint can be defined as
τ1 = J11Fx + J21Fy + J31Fz, (39)
τ2 = J12Fx + J22Fy + J32Fz, (40)
and
τ3 = J13Fx + J23Fy + J33Fz. (41)
6
V. SIMULATED TACTILE INTERNET MODEL
Figures 1 and 4 details the structure used for the hardware
design in FPGA, in which a given operator, OP, handles a
PHANToM Omni on the master side, MD, which is con-
nected to HMD that, in this case, is a dedicated FPGA
hardware. Data is transmitted through the network, the NW
subsystem, to HSD which is also a dedicated hardware in
FPGA. The HSD is also connected to a PHANToM Omni that
interacts with the environment, the ENV subsystem. Figure 4
also details the backwards direction from the ENV and the
OP.
The OP is modeled as an information source responsible
for generating a spatial trajectory through discrete signals
expressed in the a(n) array. At each n-th instant ts the OP
sends three variables xOP (n), yOP (n) and zOP (n) repre-
senting the positioning of the MD tool (Figures 2 and 3) in
the Cartesian space an this is expressed by
a(n) =
[
xOP (n), yOP (n), zOP (n)
]
. (42)
This step simulates the spatial movement of the MD tool
by the operator, that is, at each instant of time, ts, a spatial
movement is performed and a new signal a(n) is generated
by the OP.
The PHANToM Omni has encoders at its three joints that
translate spatial positioning at the three angles θ1, θ2 and θ3 (
Figures 2 and 3). Thus, based on Figure 4, it can be said that
MD converts the signal a(n) into a signal expressed as
b(n) =
[
θMD1 (n), θ
MD
2 (n), θ
MD
3 (n)
]
(43)
and forwards it to the HMD at every n-th instant of time ts.
Then, as can be seen in Figure 4, the b(n) signal propa-
gates to the HMD, which on receiving the signal transforms
the joint positioning angles, b(n), into spatial position by
calculating the FK according to (15), (16) and (17). All
equations are implemented in FPGA through a hardware
module called the FK-HMD. The equations are implemented
in parallel which can significantly increase the processing
time. The use of FK is motivated by an reduction of the
amount of information utilized, i.e., for a N -DoF robotic
manipulator N joint angles will be generated and that can be
converted into only three values associated with the spatial
position of the tool, x, y and z. On the other hand, the use
of this strategy increases the amount of calculations to be
performed by the MD, which is compensated by the parallel
implementation of the algorithm in FPGA. It is essential to
note that the use of custom hardware operating in parallel
allows processing time not to be substantially affected by N .
Based on Section III, after the FK calculation by the FK-
HMD hardware module, a new discrete signal is created that
can be expressed by
c(n) =
[
xHMD(n), yHMD(n), zHMD(n)
]
(44)
where xHMD(n), yHMD(n) and zHMD(n) are the values
of the spatial coordinate array generated by the HMD to be
sent to HSD via the communication medium, NW. The FK-
HMD hardware module generates a new c(n) array every n-
th instant of time.
After the transmission through the forward channel, here
called FC, the signal received by the HSD can be expressed
as
v(n) =
[
xHSD(n), yHSD(n), zHSD(n)
]
. (45)
Based on (5) the spatial coordinate signal received by HSD
can be expressed as
xHSD(n) = xHMD
(
n− dfx(n)
)
+ rfx(n), (46)
yHSD(n) = yHMD
(
n− dfy(n)
)
+ rfy (n), (47)
and
zHSD(n) = zHMD
(
n− dfz (n)
)
+ rfz (n) (48)
where dfx(n), d
f
y(n), d
f
z (n), r
f
x(n), r
f
y (n) and r
f
z (n) are the
delays and noises associated with CF.
As in this case the Slave PHANToM Omni, SD, copies the
movement of the master PHANToM Omni, MD, it is neces-
sary for the HSD to perform a feedback control system on the
three joints of the PHANToM Omni slave, here expressed as
θSD(n) =
[
θSD1 (n), θ
SD
2 (n), θ
SD
3 (n)
]
(49)
that is, θSD1 (n), θ
SD
2 (n), θ
SD
3 (n) are control variables asso-
ciated with DS. The control system illustrated in Figure 4
as FCS shall minimize the error, eFCS(n), between θSD(n)
and the reference signal θHSD(n) characterized as
θHSD(n) =
[
θHSD1 (n), θ
HSD
2 (n), θ
HSD
3 (n)
]
(50)
where
e(n) = θHSD(n)− θSD(n)and (51)
and eFCS1 (n)eFCS2 (n)
eFCS3 (n)
 =
 θHSD1 (n)θHSD2 (n)
θHSD3 (n)
−
 θSD1 (n)θSD2 (n)
θSD3 (n)
 . (52)
The θSD(n) signal is obtained from the SD via sensors
(encoders) at the SD joints and the θHSD(n) signal is ob-
tained from the IK-HSD hardware module shown in Figure
4. This hardware module implements all inverse kinemat-
ics equations presented in Section IV-B, i.e. (18) through
(25). There are several techniques and approaches that can
be used in the FCS module ranging from more traditional
techniques such as a proportionalâA˘S¸integralâA˘S¸derivative
controller [36] to more innovative artificial intelligence based
techniques [37], [38].
The CPD-HSD and JPD-HSD modules, illustrated in Fig-
ure 4, represent the algorithms of prediction and detection
in cartesian space and joints, respectively. These modules
are responsible for minimizing the latency and noise added
by the FC associated with the tactile internet system (Eqs.
(46), (47) and (48)). Depending on the prediction and de-
tection technique used, the HSD may use only one of the
modules, namely the CPD-HSD or JPD-HSD. There is still
7
Network
IK-HSD
h(n)
FK-HSD
l(n)
c(n)
BC
FC
FBF-HSD
v(n)
HMD
HSD
q(n)
tHSD
tHMD
θHSD(n)
JPD
HMD
KFF-HMD
db
FK-HMD
tNW
df
CPD
HSD
JPD
HSD
CPD
HMD
g(n)
u(n)
o(n)
SD
a(n)ˆ
ENV
tFB
tIK
tFK FCS
p(n)
b(n)
MD
a(n)
o(n)ˆ
OP
tKFF
tFK
tFCS
tCPDtJPD
tCPD
tJPD
tMD tSD
θSD(n)
sOBJ(n)
Split
FIGURE 4. Detailed discrete model of a tactile internet system.
no consensus about whether the Cartesian space or joints is
the best for minimizing latency and noise inserted by the
channel. There are several works in the literature that present
proposals using only one of the spaces and proposals that try
to use the information from both simultaneously.
Similarly to the FCS module, approaches ranging from the
more traditional techniques up to more innovative techniques
based on artificial intelligence have been used in the CPD-
HSD and JPD-HSD modules [39]–[43]. Thus, it can be said
that θHSD(n) is an estimate of the b(n) signal generated by
the MD.
At each n-th time, the FCS acts on the SD through the
u(n) signal, detailed in Figures 1 and 4, which in the case of
the PHANToM Omni can be expressed as
uHSD(n) =
[
τHSD1 (n), τ
HSD
2 (n), τ
HSD
3 (n)
]
(53)
where τHSDi (n) is the i-th torque applied every i-th joint.
The FCS will act as a tracking mechanism, making the SD
follow the path traveled by the MD. Finalizing the data
stream associated with the forward channel, it can be said
that the aˆ(n) signal is formed by an estimate of the spatial
position generated by the OP, aˆ(n), i.e.
aˆ(n) =
[
xˆOP (n), yˆOP (n), zˆOP (n)
]
. (54)
The interaction of the PHANToM Omni, SD, with ENV
can vary from free movement to physical contact. When
some kind of physical contact occurs, the SD detects the
touch and sends this information back to the HSD. As per
the model detailed in Figure 4 the ENV sends back to SD the
information associated with the contact force in the spatial
plane, expressed here as,
o(n) =
[
FENVx (n), F
ENV
y (n), F
ENV
z (n)
]
. (55)
The value associated with the contact force information can
be measured directly through SD-coupled force sensors or
indirectly estimated through other types of sensors that may
be SD-coupled or inserted into the environment [44]. In the
case of the model presented in Figure 4, the SD sends to HSD
the objects surface’s spatial positions through sensors spread
in the ENV. The signal expressed as
sOBJ(n) =
[
xOBJ(n), yOBJ(n), zOBJ(n)
]
(56)
represents the spatial position of the closest object from the
SD tool. Thus, based on the information already described,
every n-th time ts the SD sends to the HSD a signal charac-
terized by the array g(n) expressed as
g(n) =
[
θSD(n), sOBJ(n)
]
. (57)
In the HSD, when the signal g(n) is received, the Split
module separates the θSD(n) signal and sends it to the FCS
and the FK-HSD hardware module. And the signal sOBJ(n)
is sent to the FB-HSD hardware module, as detailed in Figure
4. The FK-HSD hardware module performs the forward kine-
matics calculation similarly to FK-HMD and thus the current
spatial position of the SD tool in the environment, ENV, can
be obtained. Every n-th instant ts FK-HSD generates a signal
expressed as
l(n) = [xENV (n), yENV (n), zENV (n)] (58)
where xENV (n), yENV (n) and zENV (n) are the spatial
position of the tool in the ENV module from θSD(n). The
FBF-HSD hardware module implements the calculations as-
sociated with the generation of the feedback force from the
contact between the tool and the object. Based on the work
presented in [44] the contact force, represented by the h(n)
signal, can be expressed as
h(n) =
[
FHSDx (n), F
HSD
y (n), F
HSD
z (n)
]
, (59)
where
FHSDx (n) = hx(n)
(
xOBJ(n)− xENV (n)) , (60)
FHSDy (n) = hy(n)
(
yOBJ(n)− yENV (n)) , (61)
and
FHSDz (n) = hz(n)
(
zOBJ(n)− zENV (n)) . (62)
8
In these equations, the constants hx(n), hy(n) and hz(n)
represent the elasticity coefficients associated with the object.
It is important to note that in this model the h(n) signal is a
synthesized version of the real force value here characterized
by the o(n) array.
After the feedback force calculation process, as illustrated
in Figure 4, the h(n) signal is transmitted to the HMD via the
backwards channel (BC) which, similarly to FC, adds latency
and noise. The signal received by the HMD can be expressed
as
q(n) = [FHMDx (n), F
HMD
y (n), F
HMD
z (n)] (63)
where
FHMDx (n) = F
HSD
x
(
n− dbx(n)
)
+ rbx(n), (64)
FHMDy (n) = F
HSD
y
(
n− dby(n)
)
+ rby(n), (65)
and
FHMDz (n) = F
HSD
z
(
n− dbz(n)
)
+ rbz(n) (66)
where dbx(n), d
b
y(n), d
b
z(n), r
b
x(n), r
b
y(n) and r
b
z(n) are the
latencies and the noises associated with the BC.
Similarly to HSD, the HMD will minimize the effect of
latency and noise from operations of Cartesian and joint
space. For HMD, the calculations associated with the Carte-
sian space will be performed by the CPD-HMD module and
associated with the joint space by the JPD-HMD module.
In addition to the prediction and detection calculations, the
HMD must transform the force signals received through
signal q(n) into a torque to be applied to the MD joints which
is accomplished by the KFF-HMD hardware module. KFF-
HMD implements the equations (39), (40) and (41) presented
in Section IV-C and generate the signal expressed as
p(n) =
[
τHMD1 (n), τ
HMD
2 (n), τ
HMD
3 (n)
]
(67)
where τHMDi (n) is the torque associated with the i-th joint
of the MD. Since the PHANToM Omni is a haptic device,
it already has a built-in control system, FCS, which uses as
reference signal the torques associated with the p(n) array.
After applying the torques to the MD joints via the p(n)
signal, the OP receives the feedback force signal, in other
words, it feels the object touched by the SD in the ENV. This
sensation is identified in by the oˆ(n) signal expressed as
oˆ(n) =
[
FˆENVx (n), Fˆ
ENV
y (n), Fˆ
ENV
z (n)
]
. (68)
As illustrated in Figure 4, the MD, HMD, NW, HSD, and
SD subsystems have the following runtimes: tMD, tHMD,
tNW , tHSD and tSD, respectively. The sum of these, times
taking into account the forward direction (between OP and
ENV) and the backwards direction (between ENV and OP),
represent the total system latency that can be expressed as
tlatency = 2 (tMD + tHMD + tNW + tHSD + tSD) . (69)
Some works presented in the literature review agree that the
ideal requirement is that tlatency ≤ 1ms, on the other hand,
other works point out that the latency requirement can be
expresses as tlatency ≤ 10ms, depending on the application
[6]–[9], [45]. Considering that 30% of the total latency time
tlatency is spent by MD, HMD, HSD, and SD, it can be
understood that
(tMD + tHMD + tHSD + tSD) ≤ 0.3tlatency
2
. (70)
Assuming an equal time division among MD, HMD, HSD,
and SD it is possible to affirm that the time associated with
hardware, thardware, whether the master, HMD, or the slave
device, HSD, can be expressed as
tHMD = tHSD = thardware ≤ 0.3tlatency
8
. (71)
Taking the 1ms constraints into consideration and substitut-
ing this value in (71), it is possible to affirm that the hardware
time, thardware, must meet the thardware ≤ 37.5µs constraint for
all cases (condition 1ms) or the thardware ≤ 375µs constraint
for some specific cases ( 10ms condition).
Recent studies from the literature show that the 1ms
restriction (thardware ≤ 37.5µs) is difficult to achieve using
hardware devices based on embedded systems such as micro-
processors and microcontrollers [46], [47]. The 10ms restric-
tion (thardware ≤ 375µs) is achieved in specific cases where
SD is a virtual environment and HSD is a high performance
processor computer [45]. Thus this work aims to minimize
the execution time in HMD, tHMD, and HSD, tHSD, using
FPGA reconfigurable computation. In other words, the target
is to achieve a thardware ≤ 37.5µs.
This paper presents a hardware reference model for the
FK-HMD, KFF-HMD, IK-HSD, FK-HSD, and FBF-HSD
modules illustrated in Figure 4. The complete model that will
be presented in detail in the next section makes use of a par-
allel implementation methodology in which high throughput
is prioritized, i.e. the execution time of the modules tFK, tKFF,
tIK and tFBF, illustrated in Figure 4.
This work does not propose dedicated hardware refer-
ence models for the CPD-HSD, JPD-HSD, CPD-HMD, JPD-
HMD and FCS modules as there are several techniques and
algorithms that can be applied to them. However, considering
the hardware time constraints, thardware, it is noted that it
is also important to use dedicated hardware structures with
reconfigurable computing for these modules. Studies in the
literature foresee the use of AI based techniques for these
modules; however, it is essential to note that AI techniques
and algorithms implemented on general purpose processor-
based hardware platforms can lead to higher processing times
[13]–[19].
VI. IMPLEMENTATION DESCRIPTION
The FK-HMD and KFF-HMD hardware modules associated
with the master device (HMD) and the IK-HSD, FK-HSD,
and FBF-HSD hardware modules associated with the slave
device (HSD) (Figure 4) were designed using a parallel
implementation in order to prioritize the processing speed.
The implementations were designed in FPGA using a hybrid
scheme with fixed point and floating point representation in
9
distinct parts of the proposed architecture. In the portions
that adopt the fixed point format, the variables follow a
notation expressed as [sV.N ] indicating that the variable is
formed by V bits of which N bits are intended for the
fractional part and the s symbol indicates that the variable
is signed. In this case, the number of bits intended for the
integer part is V −N − 1. For the representation of floating
point variables, the notation [F32] is adopted. Most of the
implemented circuits were designed using a 32-bit single
precision (IEEE754) floating point format representation.
The fixed point format was used only on the circuit that
implements the trigonometric function block (TFB) module,
as illustrated in Figure 5. TFB is the module responsible for
performing trigonometric operations through the hardware
implementation of CORDIC (COordinate Rotation DIgital
Computer) [48]. The implemented CORDIC circuit uses
data representation in fixed point format using the [s16.13]
representation.
F2FP
[sV.N] 
FP2F
[F32] [sV.N]  [F32]
CORDIC 
Trigonometric Function Block
FIGURE 5. Proposed circuit for calculating trigonometric functions - TFB.
As illustrated in Figure 5, the TFB module receives data
from external circuits in the 32-bits floating point standard.
A conversion to the fixed point numeric representation type
represented by the [s16.13] notation is performed through
the Float to Fixed-point (F2FP) module that has been imple-
mented in hardware. After the CORDIC hardware operations
are performed, the data in the fixed point format is trans-
formed back to the 32-bit floating point through the Fixed-
point to Float (FP2F) module which was also implemented
in hardware.
Several of the proposed methods to be presented use
the constants L1, L2, L3 and L4. They represent physical
characteristics of the PHANToM Omni device as illustrated
in Figure 2. These constants use the 32-bit floating point
numeric representation.
A. FORWARD KINEMATICS (FK-HMD AND FK-HSD))
As illustrated in Figure 4, both the hardware associated
with the master device (HMD) and the hardware associated
with the slave device (HSD) implement forward kinematics
through the FK-HMD and FK-HSD modules, respectively.
These modules have the same FPGA-implemented circuit,
differing only in the input and output signals. They are
designed to work with three input signals, one for each com-
ponent of the angular positioning of the device’s joints, and
three output signals, one for each component of the the posi-
tioning of the device’s tool in the Cartesian system. The input
signals are θ1[F32](n), θ2[F32](n) and θ3[F32](n) and the
output signals are x[F32](n), y[F32](n) and z[F32](n).
For FK-HMD, the input signals represent the θMD1 [F32](n),
θMD2 [F32](n) and θ
MD
3 [F32](n) signals, and the output
signals represent the xHMD[F32](n), yHMD[F32](n) and
zHMD[F32](n) signals. In the case of the FK-HSD mod-
ule, the input signals represent the signals θSD1 [F32](n),
θSD2 [F32](n) and θ
SD
3 [F32](n) and the output signals rep-
resent the signals xENV (n), yENV (n) and zENV (n). At
every n-th instant all the computation performed in order to
calculate the forward kinematics are executed in parallel.
Based on (15), the algorithm used for calculating
x[F32](n) was implemented in FPGA through the generic
circuit illustrated in Figure 6. The circuit was designed to
work with three input signals θ1[F32](n), θ2[F32](n) and
θ3[F32](n) and one output signal. These signals are for-
warded to TFB sub circuits where sine and cosine calcula-
tions are performed. For this process the constants L1 and
L2, three multipliers, one inverter and one adder are used.
X
X
+
L2
x[F32](n)
θ1[F32](n)
θ2[F32](n)
θ3[F32](n)
X
L1
TFB
sin()
TFB
cos()
TFB
sin() -1
FIGURE 6. Proposed forward kinematics circuit for obtaining the x[F32](n)
spatial coordinate (Eq. (15)) - FK-HMD and FK-HSD.
The calculation of y[F32](n) based on (16) was imple-
mented in FPGA through the generic circuit shown in Figure
7. The circuit was designed to work with two input signals
θ2[F32](n) and θ3[F32](n) and one output signal. These
signals are routed to TFB sub circuits to perform sine and
cosine calculations. In the process flow two multipliers, two
adders, one inverter and the constants L1 and L2 are used.
-L2
θ2[F32](n)
θ3[F32](n)
y[F32](n)
X
X
+ +
L3
TFB
sin()
TFB
cos()
L1
FIGURE 7. Proposed forward kinematics circuit for obtaining the y[F32](n)
spatial coordinate (Eq. (16)) - FK-HMD and FK-HSD.
The generic circuit illustrated in Figure 8 was implemented
in FPGA to perform the calculation of z[F32](n) and it is
based on (17). The circuit is designed to work with three
10
input signals θ1[F32](n), θ2[F32](n) and θ3[F32](n) and
one output signal. These signals are routed to TFB sub
circuits in order to perform sine and cosine calculations. In
the process flow four multipliers, two adders, one inverter
and the constants L1, L2 and L4 are used.
X
z[F32](n)
θ2[F32](n)
θ1[F32](n)
θ3[F32](n)
X
X
X
+
+
-L4
TFB
cos()
TFB
cos()
TFB
sin()
L1
L2
FIGURE 8. Proposed forward kinematics circuit for obtaining the z[F32](n)
spatial coordinate (Eq. (17)) - FK-HMD and FK-HSD.
In the FK-HMD module the θMD1 [F32](n), θ
MD
2 [F32](n)
and θMD3 [F32](n) input signals are received through the
b(n) array ((43) in section V), then all calculation are
performed in parallel resulting in the c(n) array ((44)
in section V) with the xHMD[F32](n), yHMD[F32](n)
and zHMD[F32](n) signals as shown in Figure 4. For
the FK-HSD module the θSD1 [F32](n), θ
SD
2 [F32](n) and
θSD3 [F32](n) input signals enter the module via the θ
SD(n)
array ((49) in section V) and after performing all parallel
computations, the resulting signals xENV (n), yENV (n) and
zENV (n) are output from the module via the l(n) array ((49)
in section V).
B. INVERSE KINEMATICS (IK-HSD)
The hardware associated with the slave device (HSD)
implements the inverse kinematics through the IK-HSD
module, as shown in Figure 4. The IK-HSD FPGA-
implemented circuit is designed to work with three input
signals xHSD[F32](n), yHSD[F32](n) and zHSD[F32](n)
and three output signals θHSD1 [F32](n), θ
HSD
2 [F32](n)
and θHSD3 [F32](n). However, to calculate θ
HSD
2 [F32](n)
(Eq. (24)) and θHSD3 [F32](n) (Eq. (25)) it is first nec-
essary to perform intermediate calculations to obtain the
values of R[F32](n), r[F32](n), β[F32](n), γ[F32](n) and
α[F32](n)
Based on (18), (24) and (25), algorithms for calculating
θHSD1 [F32](n), θ
HSD
2 [F32](n) and θ
HSD
3 [F32](n) were
implemented in FPGA through the generic circuits illustrated
in Figures 9, 10 and 11 respectively.
As already described, and according to the illustrations
shown in Figures 10 and 11, to perform the calculations
of θHSD2 [F32](n) and θ
HSD
3 [F32](n) it is first necessary
to perform the intermediate calculations of γ[F32](n) (Eq.
(21)), β[F32](n) (Eq. (22)) and α[F32](n) (Eq. (23)).
However, these calculations depend on the calculation of
zHSD[F32](n)
+L4
TFB
atan2()
xHSD[F32](n)
-1
θHSD[F32](n)1
FIGURE 9. Proposed inverse kinematics circuit for obtaining the
θHSD1 [F32](n) angular position (Eq. (18)) - IK-HSD.
β[F32](n) +
γ[F32](n)
θHSD[F32](n)2
FIGURE 10. Proposed inverse kinematics circuit for obtaining the
θHSD2 [F32](n) angular position (Eq. (24)) - IK-HSD.
R[F32](n) and r[F32](n). Then, when the IK-HSD module
receives the input signals at every n-th instant the circuit
shown in Figure 9 performs the calculation of θHSD1 [F32](n)
in parallel with the generic circuits illustrated in Figures 12
and 13 which were implemented in FPGA to perform the
calculation of R[F32](n) and r[F32](n) based on (19) and
(20).
The circuit shown in Figure 12 used to obtain R[F32](n),
is designed to work with two input signals xHSD[F32](n)
and zHSD[F32](n) and one output signal. This design con-
tains two multipliers, two adders, the L4 constant and a sub-
circuit called Sqrt, which was implemented in hardware to
calculate the square root.
The r[F32](n) calculation is performed through the cir-
cuit shown in Figure 13. This circuit is designed to work
with three input signals xHSD[F32](n), yHSD[F32](n) and
zHSD[F32](n) and one output signal. The circuit consists of
three multipliers, four adders, one inverter, the constants L3
and L4, and, again, the Sqrt sub-circuit.
After the parallel processing of θHSD1 [F32](n),R[F32](n)
and r[F32](n), the circuits responsible for calculating
γ[F32](n), β[F32](n) and α[F32](n) are also executed in
parallel through the FPGA implementations of the generic
circuits illustrated in Figures 14, 15 and 16. The value of
γ[F32](n) is obtained through the circuit shown in Figure 14
which is based on (21). The circuit is designed to work with
an input signal r[F32](n) and one output signal. It consists of
five multipliers, two adder, one divisor, one TFB sub-circuit
to calculate the arccosine and the constants L1 and L2.
The circuit for obtaining β[F32](n) illustrated in Figure
15 is based on (22) and is designed to work with two
input signals yHSD[F32](n) and R[F32](n) and one output
signal. The circuit is composed of one adder, one inverter, a
TFB sub-circuit to perform the arctangent calculation and the
L3 constant.
The value of α[F32](n) is obtained from the circuit shown
in Figure 16 which is based on (23) and is designed to
work with an input signal r[F32](n) and one output signal.
The circuit is composed of five multipliers, two adders, one
11
α[F32](n) +
+
-π/2
θHSD[F32](n)2
θHSD[F32](n)3
FIGURE 11. Circuito proposto da cinemÃa˛tica inversa para obter a
posiÃg˘Ãcˇo angular θHSD3 [F32](n) (Eq. (25)) - IK-HSD.
xHSD[F32](n)
R[F32](n)zHSD[F32](n)
+ Sqrt
L4 x
+
x
FIGURE 12. Proposed circuit to perform the calculation of R[F32](n) (Eq.
(19)) - IK-HSD.
inverter, one divider, one TFB sub-circuit to perform the
arccosine calculation and the constants L1 and L2.
To complete the process, after performing the calculations
of β[F32](n), γ[F32](n) and α[F32](n), it is possible to ob-
tain the θHSD2 [F32](n) and θ
HSD
3 [F32](n) values in parallel
through the circuits shown in Figures 10 and 11.
C. KINESTHETIC FEEDBACK FORCE (KFF-HMD)
As illustrated in Figure 4, the hardware associated with the
master device (HMD) implements the kinesthetic feedback
force through the KFF-HMD module. Based on (26), the
KFF-HMD module was implemented in FPGA through the
generic circuit illustrated in Figure 17. This circuit is com-
posed of sub-circuits that correspond to parts of (26). The
sub-circuit called JM, described in (28), is responsible for
calculating the Jacobian matrix. The KFF sub-circuit makes
the relationship between the Jacobian matrix (JM) module
and the force array from (38).
The circuit shown in Figure 17 has the input sig-
nals θMD1 [F32](n) , θ
MD
2 [F32](n) and θ
MD
3 [F32](n) that
are received from the master device (MD) and also the
Fx[F32](n), Fy[F32](n) and Fz[F32](n) signals that are
received from the hardware associated to the slave de-
vice (HSD). The three output signals are: τHMD1 [F32](n),
τHMD2 [F32](n) and τ
HMD
3 [F32](n).
The JM module that represents the sub-circuit re-
sponsible for performing the Jacobian matrix calcula-
tion consists of nine elements: J11[F32](n), J21[F32](n),
J31[F32](n), J12[F32](n), J22[F32](n), J32[F32](n),
J13[F32](n), J23[F32](n) and J33[F32](n). The calculation
of J21[F32](n) based on (30) does not have an associated cir-
cuit since its value is 0, i.e. J21[F32](n) = 0. Based on (29),
the algorithm for calculating J11[F32](n) was implemented
in FPGA according to the generic circuit illustrated in Figure
18. The circuit was designed to work with three input signals
and one output signal. It uses the constants L1 and L2 and
xHSD[F32](n)
r[F32](n)
zHSD[F32](n)
yHSD[F32](n)
+
+
-L3
L4
Sqrt
x+
x+
x
FIGURE 13. Proposed circuit to perform the calculation of r[F32](n) (Eq.
(20)) - IK-HSD.
r[F32](n)
2
+
/
X
X
+
γ[F32](n)TFB
acos()
-1
X
XL1
XL2
FIGURE 14. Proposed circuit to perform the calculation of γ[F32](n) (Eq.
(21)) - IK-HSD.
has three TFB sub-circuits: two for performing the cosine
calculation and one for obtaining the sine value.
The calculation of J31[F32](n), based on (31), was imple-
mented in FPGA according to the generic circuit illustrated
in Figure 19. The circuit was designed to work with three
input signals and one output signal. The circuit has three TFB
modules, two for sine calculation and one for cosine value
and uses the L1 and L2 constants.
The generic circuit illustrated in Figure 20 was imple-
mented in FPGA to perform the calculation of J12[F32](n)
and is based on (32). The circuit was designed to work with
two input signals and one output signal. The circuit has two
TFB sub circuits to perform sine calculation and uses the L1
constant.
Based on (33), the algorithm for calculating J22[F32](n)
was implemented in FPGA according to the generic circuit
illustrated in Figure 21. The circuit was designed to work
with one input signal and one output signal. The circuit has
a TFB sub-circuit to perform cosine calculation and uses the
constant L1.
The calculation of J32[F32](n) based on (34) was imple-
mented in FPGA according to the generic circuit illustrated in
Figure 22. The circuit was designed to work with two input
signals and one output signal. In addition to the use of the
constant L1, the circuit has two TFB sub circuits, one for
performing the cosine calculation and one for the sine.
The generic circuit illustrated in Figure 23 was imple-
mented in FPGA to perform the calculation of J13[F32](n)
and which is based on (35). The circuit was designed to work
with two inputs and one output signal. In addition to using
the constant L2, the circuit has two TFB sub circuits, one for
performing cosine calculation and one for the sine.
12
β[F32](n)
yHSD[F32](n)
R[F32](n)
+-L3 TFB
atan2()
FIGURE 15. Proposed circuit to perform the calculation of β[F32](n) (Eq.
(22)) - IK-HSD.
α[F32](n)
r[F32](n)
2
+
/
X
X
+L2
TFB
acos()
-1X
XL1
X
FIGURE 16. Proposed circuit to perform the calculation of α[F32](n) (Eq.
(23)) - IK-HSD.
Based on (36), the algorithm for calculating J23[F32](n)
was implemented in FPGA according to the generic circuit
illustrated in Figure 24. The circuit was designed to work
with one input signal and one output signal. The circuit
contains a TFB sub-circuit to perform the sine calculation
and uses the L2 constant.
The calculation of J33[F32](n), based on (37), was imple-
mented in FPGA according to the generic circuit illustrated
in Figure 25. The circuit was designed to work with two
input signals and one output signal. In addition to the use of
constant L2, the circuit has two TFB sub-circuits to perform
the cosine calculation.
All displayed circuits related to the JM sub-circuits are
calculated in parallel at each n-th instant. The results are then
sent to the KFF module which also performs the calculations
of τHMD1 [F32](n), τ
HMD
2 [F32](n) and τ
HMD
3 [F32](n) in
parallel. The KF circuit shown in Figure 17 is designed to
work with twelve input signals and three output signals.
Based on (39), the algorithm for calculating τHMD1 [F32](n)
was implemented in FPGA according to the generic circuit
illustrated in Figure 26. The circuit was designed to work
with six inputs and one output.
The calculation of τHMD2 [F32](n) based on (40) was
implemented in FPGA according to the generic circuit illus-
trated in 27. The circuit was designed to work with six inputs
and one output.
The generic circuit illustrated in Figure 28 has been
implemented in FPGA to perform the calculation of
τHMD3 [F32](n) and it is based on (41). The circuit was
designed to work with six inputs and one output.
D. FEEDBACK FORCE (FBF-HSD)
As illustrated in Figure 4 the hardware associated with the
slave device (HSD) implements the feedback force via the
FBF-HSD module. The FPGA-implemented circuit of the
FBF-HSD module is designed to work with six input signals
KFF
J11[F32](n)
J21[F32](n)
J31[F32](n)
J12[F32](n)
J22[F32](n)
J32[F32](n)
J13[F32](n)
J23[F32](n)
J33[F32](n)
JM
θMD[F32](n)1
θMD[F32](n)2
θMD[F32](n)3
τHMD[F32](n)1
τHMD[F32](n)2
τHMD[F32](n)3
FHMD[F32](n)z
FHMD[F32](n)y
FHMD[F32](n)x
FIGURE 17. Proposed circuit to calculate kinesthetic feedback force (Eq. (26))
- KFF-HMD.
X
+
L2
J11[F32](n)
TFB
sin()
TFB
cos()
TFB
cos()
-1
X
L1
X
θMD[F32](n)1
θMD[F32](n)2
θMD[F32](n)3
FIGURE 18. Proposed circuit to calculate the Jacobian matrix J11[F32](n)
(Eq. (29)) - JM.
L2
J31[F32](n)
TFB
sin()
TFB
cos()
TFB
sin()
X
-L1
X
X
X
-
θMD[F32](n)1
θMD[F32](n)2
θMD[F32](n)3
FIGURE 19. Proposed circuit to calculate the Jacobian matrix J31[F32](n)
(Eq. (31)) - JM.
L1
J12[F32](n)
TFB
sin()
TFB
sin()
X
X
θMD[F32](n)1
θMD[F32](n)2
FIGURE 20. Proposed circuit to calculate the Jacobian matrix J12[F32](n)
(Eq. (32)) - JM.
13
L1
J22[F32](n)TFB
cos() X
θMD[F32](n)2
FIGURE 21. Proposed circuit to calculate the Jacobian matrix J22[F32](n)
(Eq. (33)) - JM.
J32[F32](n)
TFB
sin()
TFB
cos()
-L1
X
X
θMD[F32](n)1
θMD[F32](n)2
FIGURE 22. Proposed circuit to calculate the Jacobian matrix J32[F32](n)
(Eq. (34)) - JM.
J13[F32](n)
TFB
sin()
TFB
cos()
-L2
X
Xθ
MD[F32](n)1
θMD[F32](n)3
FIGURE 23. Proposed circuit to calculate the Jacobian matrix J13[F32](n)
(Eq. (35)) - JM.
J23[F32](n)TFB
sin()
L2
Xθ
MD[F32](n)3
FIGURE 24. Proposed circuit to calculate the Jacobian matrix J23[F32](n)
(Eq. (36)) - JM.
J33[F32](n)
TFB
cos()
TFB
cos()
L2
X
X
θMD[F32](n)1
θMD[F32](n)3
FIGURE 25. Proposed circuit to calculate the Jacobian matrix J33[F32](n)
(Eq. (37)) - JM.
and three output signals. Among the six input variables,
xOBJ [F32](n), yOBJ [F32](n) and zOBJ [F32](n) repre-
sent the spatial position of the closest object to the SD
tool and the other three xENV [F32](n), yENV [F32](n)
and zENV [F32](n) represent the spatial position of the SD
tool in the ENV module. The three outputs FHSDx [F32](n),
FHSDy [F32](n) and F
HSD
z [F32](n) represent the touch of
X
J11[F32](n)
FHMD[F32](n)x
X
J21[F32](n)
FHMD[F32](n)y
X
J31[F32](n)
FHMD[F32](n)z
+
+
τHMD[F32](n)1
FIGURE 26. Proposed circuit to calculate the torque of the τHMD1 [F32](n)
joint (Eq. (39)) - KFF.
X
J12[F32](n)
FHMD[F32](n)x
X
J22[F32](n)
FHMD[F32](n)y
X
J32[F32](n)
FHMD[F32](n)z
+
+
τHMD[F32](n)2
FIGURE 27. Proposed circuit to calculate the torque of the τHMD2 [F32](n)
joint (Eq. (40)) - KFF.
the tool on the object. The variables hx, hy and hz represent
the elasticity coefficients associated with the object. All FBF-
HSD module calculations are performed in parallel.
Based on (60), the algorithm used for calculating
FHSDx [F32](n) was implemented in FPGA according to
the generic circuit illustrated in Figure 29. The circuit was
designed to work with two inputs signals xOBJ [F32](n) and
xENV [F32](n) and one variable hx.
The calculation of FHSDy [F32](n), based on (61), was
implemented in FPGA according to the generic circuit illus-
trated in Figure 30. The circuit was designed to work with
two input signals yOBJ [F32](n) and yENV [F32](n) and
one variable hy .
The generic circuit shown in Figure 31 was implemented
in FPGA to perform the calculation of FHSDz [F32](n) and
it is based on (62). The circuit was designed to work with
two input signals zOBJ [F32](n) and zENV [F32](n) and
one variable hz .
VII. RESULTS
The entire tactile internet model infrastructure presented in
Figure 4 was implemented with the purpose of validating
the FPGA hardware implementation. A spatial trajectory that
represents the data sent by the OP through the a(n) (Eq.
(42)) signal was created to validate the entire developed
environment.
The created trajectory performs a variation in all of the
three angles of the MD articulation. (Figure 3). For this, it
was first considered that the MD is in the initial angular
position expressed as θMD1 (0) = 0, θ
MD
2 (0) = 0 and
14
XJ13[F32](n)
FHMD[F32](n)x
X
J23[F32](n)
FHMD[F32](n)y
X
J33[F32](n)
FHMD[F32](n)z
+
+
τHMD[F32](n)3
FIGURE 28. Proposed circuit to calculate the torque of the τHMD3 [F32](n)
joint (Eq. (41)) - KFF.
X
xOBJ[F32](n)
xENV[F32](n) -
FHSD[F32](n)x
hx
FIGURE 29. Proposed circuit to calculate the feedback force FHSDx [F32](n)
(Eq. (60)) - FBF-HSD.
θMD3 (0) = 0, which corresponds to the spatial position
xOP (0) = 0, yOP (0) = −0.107 and zOP (0) = −0.035
of the tool as illustrated in Figure 32. Initially, the first joint
is moved to θMD1 (vn) = pi/2 where v represents a quantity
of samples that is equal to 4 seconds, thus resulting in the
position xOP (vn) = −0.132, yOP (vn) = −0.107 and
zOP (vn) = −0.167. Then, the second joint is moved to
θMD2 (vn) = pi/4 which results in the position x
OP (vn) =
−0.093, yOP (vn) = −0.013 and zOP (vn) = −0.167 and,
finally, the third joint moves up to θMD3 (vn) = pi/4, thus
resulting in the xOP (vn) = −0.186, yOP (vn) = 0.025
and zOP (vn) = −0.167 position. The path created is within
the limits of the device workspace and takes a total time of
t1 = 12 seconds of which 4 seconds are used to perform the
movement of each joint.
In an effort to validate the circuits from the implemented
modules in FPGA, equivalent software models were used to
compare the results of both implementations. The software
models use a 32-bit floating point format while the hard-
ware modules run a parallel implementation with a hybrid
representation which uses both a 32-bit floating point and a
fixed point representation in different parts of the proposed
architecture, as presented in Section VI. In all scenarios, the
signal sampling rate (or throughput) was Rs = 1ts (samples
per second), where ts is the time between the n-th samples.
From the experimental results, the mean square error
(MSE) between the software model and the hardware imple-
mentation proposed by this work was calculated using the
MSE which can be expressed as
MSE =
1
Q
Q−1∑
n=0
(MSW [F32](n)−M [F32](n))2, (72)
where Q represents the number of tested samples,
MSW [F32](n) corresponds to the variables of the software
X
yOBJ[F32](n)
yENV[F32](n) -
hy
FHSD[F32](n)y
FIGURE 30. Proposed circuit to calculate the feedback force FHSDy [F32](n)
(Eq. (61)) - FBF-HSD.
X
zOBJ[F32](n)
zENV[F32](n) -
hz
FHSD[F32](n)z
FIGURE 31. Proposed circuit to calculate the feedback force FHSDz [F32](n)
(Eq. (62)) - FBF-HSD.
model and M [F32](n) corresponds to the variables of the
model implemented in FPGA.
The quantity of tested samples for the results presented
here is Q = 1200, which correspond to the quantity
of samples of the generated trajectory. The variables that
correspond to the hardware model M [F32](n) vary ac-
cording to the module in which it was implemented. In
the case of forward kinematics, as the FK-HMD and FK-
HSD modules have the same implementation, the val-
ues corresponding to the variables x[F32](n), y[F32](n)
and z[F32](n) change according to the respective module.
For the FK-HMD module, these variables correspond to
xHMD[F32](n), yHMD[F32](n) and zHMD[F32](n) and
for the FK-HSD module the same variables correspond
to xENV [F32](n), yENV [F32](n) and zENV [F32](n) as
presented in Section VI. For inverse kinematics, the vari-
ables M [F32](n) of the IK-HSD module correspond to
θHSD1 [F32](n), θ
HSD
2 [F32](n) and θ
HSD
3 [F32](n). For
the kinesthetic feedback force, the variables M [F32](n)
of the KFF-HMD module correspond to τHMD1 [F32](n),
τHMD2 [F32](n) and τ
HMD
3 [F32](n). For the feedback
force, the variables M [F32](n) of the FBF-HSD mod-
ule correspond to FHSDx [F32](n), F
HSD
y [F32](n) and
FHSDz [F32](n). And finally, in the MSE equation the
MSW [F32](n) corresponds to the same variables as the
software-implemented model.
Table 1 shows the mean square error between the soft-
ware models and the hardware ones proposed in this paper.
The obtained MSE-related results prove to be noteworthy,
showing that the forward kinematics (FK-HMD and FK-
HSD), inverse kinematics (IK-HSD), kinesthetic feedback
force (KFF-HMD) and feedback force (FBF-HSD) modules
had an acceptable response, even when using a hybrid repre-
sentation, compared to the software model that uses a floating
point representation. It can be observed that for the variables
of the FK-HMD and FK-HSD modules the error was in the
range of 10−08, for the IK-HSD module the error was of
10−06, for the variables of the KFF-HMD module the error
was of 10−07 and for the FBF-HSD module the error was
15
−0.2
−0.15
−0.1
−0.05
0
−0.15
−0.1
−0.05
0
−0.2
−0.15
−0.1
−0.05
0
xOP(n)[m]y
OP(n)[m]
zO
P
(n
)[
m
]
FIGURE 32. Trajectory used to validate hardware modules.
in the range of 10−16. These values demonstrate that the
FPGA implementations presented an equivalent behavior to
the software models.
TABLE 1. Mean squared error (MSE) results for floating-point implementation.
Module Variable MSE
FK-HMD
xHMD[F32](n) 2.333× 10−8
yHMD[F32](n) 8.316× 10−9
zHMD[F32](n) 1.656× 10−8
KFF-HMD
τHMD1 [F32](n) 1.467× 10−7
τHMD2 [F32](n) 5.207× 10−9
τHMD3 [F32](n) 3.350× 10−7
FK-HSD
xENV [F32](n) 2.333× 10−8
yENV [F32](n) 8.316× 10−9
zENV [F32](n) 1.656× 10−8
IK-HSD
θHSD1 [F32](n) 3.731× 10−6
θHSD2 [F32](n) 2.847× 10−6
θHSD3 [F32](n) 2.702× 10−6
FBF-HSD
FHSDx [F32](n) 2.437× 10−16
FHSDy [F32](n) 1.731× 10−16
FHSDz [F32](n) 3.360× 10−16
In a hardware implementation, it is important to ana-
lyze some requirements post-synthesis such as available
hardware usage and the execution time. In the case of
FPGAs, the resources are measured through the use of
lookup tables (LUTs), Registers and Digital Signal Pro-
cessors (DSPs) units, to name a few. After validating the
hardware-implemented models, synthesis results were ob-
tained using the implementation designed for an FPGA
Xilinx Virtex 6 XC6VLX240T-1FF1156. The used Virtex
6 FPGA has 37,680 slices that group 301,440 flip-flops,
150,720 logical cells that can be used to implement logical
functions or memories, and 768 DSP cells with multipliers
and accumulators.
Table 2 presents the post-synthesis results related to hard-
ware occupancy, sampling rate, and throughput for the mod-
ules FK-HMD, KFF-HMD, FK-HSD, IK-HSD, and FBF-
HSD. The first column shows the name of the module, the
next three columns called registers, LUTs and multipliers
represent the amounts of resources used in the FPGA. The
column register represents the number of flip-flops that were
used, followed by the total percentage used. The column
LUTs represents the number of LUTs that were used, fol-
lowed by the total percentage used. And the column multi-
pliers represents the number of DPS48 internal multipliers
that were used, followed by the total percentage used. The ts
column represents the sampling rate in nanoseconds that was
obtained for each hardware module. Finally, the Rs column
displays throughput (Rs = 1ts ) values in mega-samples per
second for the hardware modules.
The synthesis results presented in Table 2 show that the
resources used for the FK-HMD and FK-HSD modules
were the same. This means that each module, individually,
used a percentage of 1.01% which is equivalent to 3,041
of the available hardware resources for the registers, was
used 5.31% with LUTs, and 1.43% for embedded multipliers
DSP48. The IK-HSD module had a hardware percentage
consumption of 1.04% for registers, 9.36% for LUTs and
3.52% for multipliers. The KFF-HMD module had a con-
sumption of 1.03%, 8.13% and 6.25% for registers, LUTs
and multipliers, respectively. Finally, the FBF-HSD module
used a percentage of 0.11% for registers, 0.82% for LUTs
and 1.17% for multipliers.
Based on data presented in Table 2, the HMD modules
(FK-HMD and KFF-HMD) that is associated with the MD
device has consumed 6,154 (2.04%) for register, 20,259
(13.44%) for LUTs and 59 (7.68%) for multipliers. In the
case of hardware associated with the SD device, the HSD
modules (FK-HSD, IK-HSD and FBF-HSD) had consumed
6,513 (2.16%) for register, 23,351 (15.49%) for LUTs and 47
(6.12%) for multipliers.
The hardware resources consumed by the HMD hardware
modules and the HSD hardware modules were very low.
Even if all modules are implemented in single hardware,
the consumption remains low. The total sum of hardware
resources used in the FPGA by all modules (FK-HMD,
KFF-HMD, FK-HSD, IK-HSD and FBF-HSD) was: 12,667
(4.20%) for register, 43,610 (28.93%) for LUTs and 106
(13.80%) for multipliers. The low hardware resources con-
sumption demonstrates that the proposed implementations
take up little hardware space in the FPGA which allows other
separate implementations to be used concomitantly.
As per Table 2, the throughput values, Rs, obtained
were significant. Values of 21.27MSps for the FK-HMD
and FK-HSD modules, 4.58MSps for the IK-HSD module,
14.28MSps for the KFF-HMD module and 47.61MSps for
the FBF-HSD module were achieved. These results enable
critical applications that demand strict time constraints, as is
the case with tactile internet applications.
In Table 3, it is possible to see the speedup obtained in
relation to latency time constraints. The first column presents
the latency constraints of 1ms and 10ms that are presented
16
TABLE 2. Hardware occupancy, sampling rate and throughput results for floating-point format.
Module
Name
Registers
(Flip-Flops) LUTs
Multipliers
(DSP48)
ts
(ns)
Rs
(MSps)
FK-HMD 3,041 (1.01%) 8,008 (5.31%) 11 (1.43%) 47 21.27
KFF-HMD 3,113 (1.03%) 12,251 (8.13%) 48 (6.25%) 70 14.28
FK-HSD 3,041 (1.01%) 8,008 (5.31%) 11 (1.43%) 47 21.27
IK-HSD 3,149 (1.04%) 14,107 (9.36%) 27 (3.52%) 218 4.58
FBF-HSD 323 (0.11%) 1,236 (0.82%) 9 (1.17%) 21 47.61
in the literature. The second column shows the minimum
latency values that are required for the application to function
normally. The third column shows the latency related with the
hardware implementation presented here.
TABLE 3. Hardware speedup related to the time limits for the 1ms and 10ms
latency constraints.
Time
Restriction
Latency
Limit thardware Speedup
1ms 37.5µs 403 ns 93×
10ms 375µs 403 ns 930×
The 1ms restriction corresponds to the maximum latency
limit of 37.5µs for acceptable hardware performance. For
the 10ms constraint, the maximum limit is 375µs. The
value thardware that is presented in Table 3 and according
to (71), corresponds to the sum of the latencies of the five
implemented modules (Table 2), two modules are associated
with the MD device (FK-HMD and KFF-HMD) and three
modules are associated with the SD device (FK-HSD, IK-
HSD, and FBF-HSD).
Thus, the presented value of 403 ns in Table 3 corresponds
to the sum of the two modules related to the master compo-
nent, which has a total of 117 ns of which 47 ns come from
the FK-HMD module and 70 ns from the KFF-HMD module
together with the sum of the three modules referring to the
slave component, which has a total of 286 ns of which 47 ns
derives from the FK-HSD module, 218 ns from IK-HSD and
21 ns from the FBS-HSD module. So for the 1ms constraint,
the implementation presented a 93× speedup relative to the
37.5µs, and for the 10ms constraint, the speedup was 930×
relatives to the 375µs limit.
The sample rates resulted from the five modules that were
implemented in this work were notably fast. The values
obtained contributed to the hardware meeting the time con-
straint limits required in a tactile internet environment. Hard-
ware latency showed values significantly below the required
constraints, as shown in Table 3. These values are well below
the 30% presented in the literature and due to the fact that
the communication medium demands 70% of application
latency, this value can be increased as the latency of hardware
devices showed to be significantly low. In other words, it can
be said that the remaining latency not spent on the hardware
devices can be consumed in the network.
It is important to remember that in a more complex tactile
internet environment, there are several others more algo-
rithms to be implemented in hardware such as prediction
algorithms, dynamic control, AI based techniques, etc. How-
ever, as the proposed implementations present low hardware
resource consumption, other necessary modules, as the ones
previously mentioned, could also be implemented in the same
shared hardware since resources would still be available.
Table 4 presents comparisons of the results obtained by
the proposed implementation of this work with equivalent
results found in works from the state of the art. The first
column indicates the references of related works. The next
two columns show the used FPGA platform and the amount
of degrees of freedom of the used device. The fourth column
presents the type of numerical representation used in the
implementation and, finally, the last four columns present
the times obtained by each reference for latency added by
the forward kinematics (FK), inverse kinematics (IK), the
kinesthetic force feedback (KFF) and feedback force (FBF)
modules, respectively.
As described in Table 4, a hardware model for calculating
the forward kinematics of a 5-DoF device is presented in
[24]. The proposed hardware was implemented using a 32-
bit floating-point representation. The total time to perform
the calculations was 1240 ns. Comparing to the forward
kinematics (FK) implementation using 32-bit floating-point
proposed by this work, the speedup was 26.38× over the
model presented in [24].
The work presented in [25] shows the results of an imple-
mentation of the inverse kinematics module using floating-
point 32-bit representation. The kinematic model was de-
signed to work with a 3-DoF device, and the time required
to calculate is 143000 ns. When compared with the proposal
of inverse kinematics (IK) presented in this work, which
uses 32-bit floating-point representation, this implementation
presented a speedup of 655.96× over in relation to the model
proposed by [25].
The kinematics models presented in [26] described in
Table 4, presented data regarding the forward and inverse
kinematics implementations for controlling a 6-DoF device
using the 32-bit fixed-point representation. The modules
were implemented using 21-bit for the fractional part and
11-bit in the integer part. For the forward kinematics (FK),
3000 ns are required to perform all calculations, and for
inverse kinematics (IK), 4500 ns is required. Based on the
17
TABLE 4. Comparative table with state of the art works.
Reference Device DoF Data type FK IK KFF FBF
This work Virtex 6 3 Floating P. 47 ns 218 ns 70 ns 21 ns
[24] Virtex 2 5 Floating P. 1240 ns - - -
[25] Cyclone IV 3 Floating P. - 143000 ns - -
[26] Unknown 6 Fixed P. 3000 ns 4500 ns - -
[27] Cyclone IV 10 Fixed P. - 440 ns - -
[28] Cyclone IV 5 Fixed P. 680 ns 940 ns - -
[29] Artix 7 3 Fixed P. 2000 ns - -
results of the implementations presented in this section, the
implementation proposed for this work using floating-point
representation had a speedup of 63.82× for forward kine-
matics and 20.64× for the inverse kinematics.
The research presented in [27] proposed a hardware im-
plementation of inverse kinematics to control a 10-DoF de-
vice. The hardware was projected using the 32-bit fixed-
point representation, however the amount of bits used in the
fractional part was not specified. The architecture proposed
to calculate the inverse kinematics requires 440 ns to perform
the computation. Comparing to the inverse kinematics (IK)
implementation using 32-bit floating-point proposed by this
work, the speedup was 2.01× over the model presented in
[27].
The authors in [28] present the results of fixed-point imple-
mentation for forward and inverse kinematics to control a 5-
DoF device, as described in Table 4. The proposed hardware
implementation uses the numerical representation of 32-
bit (15-bit to fractional part) and 16-bit (7-bit to fractional
part) in different parts of the modules. The time required to
perform the calculations is 680 ns and 940 ns for forward and
inverse kinematics, respectively. Comparing to the floating-
point implementation proposed by this work, the speedup
was 14.46× for forward kinematic and 4.31× for inverse
kinematic over the model presented in [28].
Differently from previous works (Table 4), in [29], the
authors present unique hardware for calculating forward and
inverse kinematics together. In the proposed model, the 32-
bit fixed-point representation was used. The total time to
perform the calculation is 2000 ns. The time obtained was
calculated taking into account the entire process duration,
however, separate times for each module were not specified.
Given this scenario, by adding the ts FK module time that
calculates forward kinematics with the IK module, the total
time resulting from both implementations reaches 265 ns,
according to Table 4. Hence, the hardware presented in the
work here developed achieved a 7.54× speedup over the
model presented in [29].
It can be seen from Table 4, that none of the works from the
state-of-the-art presented the hardware implementation of all
four robotics algorithms that were presented here. It is also
noted that just two works used the floating-point numerical
representation. The floating-point implementation of robotics
algorithms proposed by this work showed significant gains
when compared to the works presented in the literature
as shown in Table 4. The different amounts of degrees of
freedom (DoF) used in the devices can somehow influence in
values of sample rate and throughput. Another factor that can
also influence these values is in relation to the type of FPGA
that is used to perform the synthesis. Due to the fact that
the implementation of this work was designed in a parallel
architecture, the increase in the amount of DoF does not
necessarily reflect in a significant increase in sample rate.
VIII. CONCLUSIONS
This paper presented a reconfigurable hardware reference
model for four modules that implement robotics-associated
algorithms. The FK-HMD and FK-HSD modules implement
the forward kinematics, the IK-HSD module implements
the inverse kinematics, the KFF-HMD module implements
the kinesthetic feedback force and the FBF-HSM module
implements the feedback force. The parallel FPGA imple-
mentation of the four modules is intended to increase the
tactile system’s processing speed with the purpose of meeting
the latency constraints required for tactile internet applica-
tions. The modules were designed using a full-parallel im-
plementation which works on a hybrid scheme that uses fixed
point and floating point representation in distinct parts of the
architecture. Compared to the state of the art, this work stands
out by presenting the description and implementation of four
different robotics algorithms in FPGA. The implementations
presented in this work achieve higher module processing
speed when compared to equivalent implementations from
the state-of-the-art. All of the modules here presented were
analyzed based on the synthesis results, which included the
hardware occupation area, sampling rate and throughput.
Based on the synthesis results, it was observed that the
implementations achieved high module processing speed, far
below the latency limit of 1ms. Hardware modules achieved
an acceleration of 93× compared to the 37.5µs time con-
straint. This demonstrates that using reconfigurable embed-
ded systems on devices such as FPGAs enables parallel
implementation of algorithms thus speeding up processing of
the data and minimizing execution time. Runtime gains can
make processing time possible for critical applications that
require short time constraints or a large amount of data to be
processed in a short time frame.
18
ACKNOWLEDGMENT
This work was conducted during a scholarship supported by
the Doctoral Sandwich Program CAPES/PDSE at the Federal
University of Rio Grande do Norte. Financed by CAPES
âA˘S¸ Brazilian Federal Agency for Support and Evaluation
of Graduate Education within the Ministry of Education of
Brazil.
References
[1] M. Dohler, “The tactile internet iot, 5g and cloud on steroids,” in 5G Radio
Technology Seminar. Exploring Technical Challenges in the Emerging 5G
Ecosystem, March 2015, pp. 1–16.
[2] A. Aijaz, M. Dohler, A. H. Aghvami, V. Friderikos, and M. Frodigh,
“Realizing the tactile internet: Haptic communications over next
generation 5g cellular networks,” CoRR, vol. abs/1510.02826, 2015.
[Online]. Available: http://arxiv.org/abs/1510.02826
[3] D. V. D. Berg, R. Glans, D. D. Koning, F. A. Kuipers, J. Lugtenburg,
K. Polachan, P. T. Venkata, C. Singh, B. Turkovic, and B. V. Wijk,
“Challenges in haptic communications over the tactile internet,” IEEE
Access, vol. 5, pp. 23 502–23 518, 2017.
[4] M. Maier, M. Chowdhury, B. P. Rimal, and D. P. Van, “The tactile internet:
vision, recent progress, and open challenges,” IEEE Communications
Magazine, vol. 54, no. 5, pp. 138–145, 2016.
[5] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, and G. Fettweis, “The 5g-
enabled tactile internet: Applications, requirements, and architecture,” in
2016 IEEE Wireless Communications and Networking Conference, April
2016, pp. 1–6.
[6] C. Li, C.-P. Li, K. Hosseini, S. B. Lee, J. Jiang, W. Chen, G. Horn,
T. Ji, J. E. Smee, and J. Li, “5g-based systems design for tactile internet,”
Proceedings of the IEEE, vol. 107, no. 2, pp. 307–324, 2018.
[7] K. Antonakoglou, X. Xu, E. Steinbach, T. Mahmoodi, and M. Dohler, “To-
ward haptic communications over the 5g tactile internet,” IEEE Commu-
nications Surveys Tutorials, vol. 20, no. 4, pp. 3034–3059, Fourthquarter
2018.
[8] A. Nasrallah, A. S. Thyagaturu, Z. Alharbi, C. Wang, X. Shao,
M. Reisslein, and H. ElBakoury, “Ultra-low latency (ull) networks: The
ieee tsn and ietf detnet standards and related 5g ull research,” IEEE
Communications Surveys & Tutorials, vol. 21, no. 1, pp. 88–145, 2018.
[9] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, and G. Fettweis, “5g-enabled
tactile internet,” IEEE Journal on Selected Areas in Communications,
vol. 34, no. 3, pp. 460–473, 2016.
[10] D. Szabo, A. Gulyas, F. H. Fitzek, F. H. Fitzek, and D. E. Lucani,
“Towards the tactile internet: Decreasing communication latency with
network coding and software defined networking,” in European Wireless
2015; 21th European Wireless Conference; Proceedings of, May 2015, pp.
1–6.
[11] M. Dohler, T. Mahmoodi, M. A. Lema, M. Condoluci, F. Sardis, K. Anton-
akoglou, and H. Aghvami, “Internet of skills, where robotics meets ai, 5g
and the tactile internet,” in 2017 European Conference on Networks and
Communications (EuCNC), June 2017, pp. 1–5.
[12] Q. Yu, C. Wang, X. Ma, X. Li, and X. Zhou, “A deep learning prediction
process accelerator based fpga,” in 2015 15th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing, May 2015, pp. 1159–
1162.
[13] A. C. de Souza and M. A. Fernandes, “Parallel fixed point implementation
of a radial basis function network in an fpga,” Sensors, vol. 14, no. 10, pp.
18 223–18 243, 2014.
[14] A. L. X. Da Costa, C. A. D. Silva, M. F. Torquato, and M. A. C. Fernandes,
“Parallel implementation of particle swarm optimization on fpga,” IEEE
Transactions on Circuits and Systems II: Express Briefs, pp. 1–1, 2019.
[15] M. G. F. Coutinho, M. F. Torquato, and M. A. C. Fernandes, “Deep neural
network hardware implementation based on stacked sparse autoencoder,”
IEEE Access, vol. 7, pp. 40 674–40 694, 2019.
[16] M. F. Torquato and M. A. C. Fernandes, “High-performance parallel
implementation of genetic algorithm on fpga,” Circuits, Systems, and
Signal Processing, Jan 2019.
[17] L. M. D. Da Silva, M. F. Torquato, and M. A. C. Fernandes, “Parallel
implementation of reinforcement learning q-learning technique for fpga,”
IEEE Access, vol. 7, pp. 2782–2798, 2019.
[18] F. F. Lopes, J. C. Ferreira, and M. A. C. Fernandes, “Parallel implementa-
tion on fpga of support vector machines using stochastic gradient descent,”
Electronics, vol. 8, no. 6, 2019.
[19] D. H. Noronha, M. F. Torquato, and M. A. Fernandes, “A parallel imple-
mentation of sequential minimal optimization on fpga,” Microprocessors
and Microsystems, vol. 69, pp. 138 – 151, 2019.
[20] A. N, A. S. M, K. Polachan, P. T. V, and C. Singh, “An end to end tactile
cyber physical system design,” in 2018 4th International Workshop on
Emerging Ideas and Trends in the Engineering of Cyber-Physical Systems
(EITEC), April 2018, pp. 9–16.
[21] M. K. OâA˘Z´Malley, K. S. Sevcik, and E. Kopp, “Improved haptic fidelity
via reduced sampling period with an fpga-based real-time hardware plat-
form,” Journal of Computing and Information Science in Engineering,
vol. 9, no. 1, p. 011002, 2009.
[22] H. Tanaka, K. Ohnishi, and H. Nishi, “Haptic communication system using
fpga and real-time network framework,” in Industrial Electronics, 2009.
IECON’09. 35th Annual Conference of IEEE. IEEE, 2009, pp. 2931–
2936.
[23] M. Franc and A. Hace, “A study on the fpga implementation of the bilat-
eral control algorithm towards haptic teleoperation,” Automatika–Journal
for Control, Measurement, Electronics, Computing and Communications,
vol. 54, no. 1, 2013.
[24] D. F. Sánchez, D. M. Muñoz, C. H. Llanos, and J. M. Motta, “A recon-
figurable system approach to the direct kinematics of a 5 dof robotic
manipulator,” International Journal of Reconfigurable Computing, vol.
2010, 2010.
[25] K. Gac, G. Karpiel, and M. Petko, “Fpga based hardware accelerator
for calculations of the parallel robot inverse kinematics,” in Proceedings
of 2012 IEEE 17th International Conference on Emerging Technologies
Factory Automation (ETFA 2012), Sept 2012, pp. 1–4.
[26] M. Wu, Y. Kung, Y. Huang, and T. Jung, “Fixed-point computation of
robot kinematics in fpga,” in 2014 International Conference on Advanced
Robotics and Intelligent Systems (ARIS), June 2014, pp. 35–40.
[27] C. C. Wong and C. C. Liu, “Fpga realisation of inverse kinematics for
biped robot based on cordic,” Electronics Letters, vol. 49, no. 5, pp. 332–
334, February 2013.
[28] H. Linh, B. Thi, and Y.-S. Kung, “Digital hardware realization of forward
and inverse kinematics for a five-axis articulated robot arm,” Mathematical
Problems in Engineering, vol. 2015, 2015.
[29] Z. Jiang, Y. Dai, J. Zhang, and S. He, “Kinematics calculation of mini-
mally invasive surgical robot based on fpga,” in 2017 IEEE International
Conference on Robotics and Biomimetics (ROBIO), Dec 2017, pp. 1726–
1730.
[30] Geomagic, Phantom Omni, Device Guide.
[31] G. Song, S. Guo, and Q. Wang, “A tele-operation system based on
haptic feedback,” in 2006 IEEE International Conference on Information
Acquisition, Aug 2006, pp. 1127–1131.
[32] T. Sansanayuth, I. Nilkhamhang, and K. Tungpimolrat, “Teleoperation
with inverse dynamics control for phantom omni haptic device,” in 2012
Proceedings of SICE Annual Conference (SICE), Aug 2012, pp. 2121–
2126.
[33] A. J. Silva, O. A. D. Ramirez, V. P. Vega, and J. P. O. Oliver, “Phan-
tom omni haptic device: Kinematic and manipulability,” in Electronics,
Robotics and Automotive Mechanics Conference, 2009. CERMA’09.
IEEE, 2009, pp. 193–198.
[34] M. C. Cavusoglu and D. Feygin, “Kinematics and dynamics of phantom
(tm) model 1.5 haptic interface,” 2001.
[35] J. San Martin and G. Triviño, “A study of the manipulability of the
phantom omni haptic interface.” in VRIPHYS, 2006, pp. 127–128.
[36] A. Kumar, P. J. Gaidhane, and V. Kumar, “A nonlinear fractional order
pid controller applied to redundant robot manipulator,” in 2017 6th Inter-
national Conference on Computer Applications In Electrical Engineering-
Recent Advances (CERA), Oct 2017, pp. 527–532.
[37] C. Yang, H. Ma, and M. Fu, Intelligent Control of Robot Manipulator.
Singapore: Springer Singapore, 2016, pp. 49–96.
[38] H. Rahimi and M. Nazemizadeh, “Dynamic analysis and intelligent con-
trol techniques for flexible manipulators: a review,” Advanced Robotics,
vol. 28, no. 2, pp. 63–76, 2014.
[39] S. H. Tang, C. K. Ang, M. K. A. B. M. Ariffin, and S. B. Mashohor,
“Predicting the motion of a robot manipulator with unknown trajectories
based on an artificial neural network,” International Journal of Advanced
Robotic Systems, vol. 11, no. 10, p. 176, 2014.
19
[40] Y. Chen and L. Li, “Predictable trajectory planning of industrial robots
with constraints,” Applied Sciences, vol. 8, no. 12, 2018. [Online].
Available: https://www.mdpi.com/2076-3417/8/12/2648
[41] Y. Xiang, “Simulation and analysis of three-dimensional space path
prediction for six-degree-of-freedom (sdof) manipulator,” 3D Research,
vol. 10, no. 2, p. 15, Apr 2019.
[42] B. BÃs¸csi, D. Nguyen-Tuong, L. CsatÃs¸, B. SchÃu˝lkopf, and J. Pe-
ters, “Learning inverse kinematics with structured prediction,” in 2011
IEEE/RSJ International Conference on Intelligent Robots and Systems,
Sep. 2011, pp. 698–703.
[43] S. Shen, A. Song, and T. Li, “Predictor-based motion tracking control for
cloud robotic systems with delayed measurements,” Electronics, vol. 8,
no. 4, 2019.
[44] C. Yang, Y. Xie, S. Liu, and D. Sun, “Force modeling, identification,
and feedback control of robot-assisted needle insertion: A survey of the
literature,” in Sensors, 2018.
[45] J. C. V. S. Junior, M. F. Torquato, D. H. Noronha, S. N. Silva, and M. A. C.
Fernandes, “Proposal of the tactile glove device,” Sensors, vol. 19, no. 22,
2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/22/5029
[46] P. Weber, E. Rueckert, R. Calandra, J. Peters, and P. Beckerle, “A low-
cost sensor glove with vibrotactile feedback and multiple finger joint and
hand motion sensing for human-robot interaction,” in 2016 25th IEEE In-
ternational Symposium on Robot and Human Interactive Communication
(RO-MAN). IEEE, 2016, pp. 99–104.
[47] N. Arjun, S. Ashwin, K. Polachan, T. Prabhakar, and C. Singh, “An end
to end tactile cyber physical system design,” in 2018 4th International
Workshop on Emerging Ideas and Trends in the Engineering of Cyber-
Physical Systems (EITEC). IEEE, 2018, pp. 9–16.
[48] J. E. Volder, “The cordic trigonometric computing technique,” IRE Trans-
actions on Electronic Computers, no. 3, pp. 330–334, 1959.
20
