Abstract-This paper presents the realization of a surface plasmon resonance (SPR) sensor readout system using a tricolor red, green, and blue light emitting diode light source. Time domain intensity modulation of each color channel is applied to interrogate three bands of interest in the SPR spectrum using a single photodiode detector. A low computing resource classification approach is used through the combination of k-nearest neighbor (kNN) and adapted clustering using representative. An optimized number of representatives is chosen in the validation process to reduce the required amount of data for the kNN classification. This scheme was used to classify the concentrations of different glucose solutions. The sensor readout system hardware is based on the use of a field programmable gate array and the glucose solution classification is developed and undertaken on a personal computer using the Python open source programming language.
I. INTRODUCTION
T HE design of optical sensor systems to support instrument portability is becoming more desirable, with a main aim of adopting a "bringing the lab to the sample" approach. New optical sensor designs now have the advantage of reduced physical size and weight that aid this move towards portability for both the sensor and the interrogating system (using physically small electronic circuits that operate with low power via a battery energy source and with high-speed embedded digital signal processing (DSP) capabilities). However, the majority of these designs still require a computer (PC or laptop) connection for interfacing the sensor to the sensor electronics and for sensor data processing [1] . In addition, the performance of any sensing device depends on the sensor response as well as the supporting instrumentation and analytical tools for sensor data processing [1] , [2] . Optical sensors typically include a light source, photodetector, processing (control) unit, communications and power supply. Achieving a portable design requires careful consideration of factors such as the unit cost, sample characteristics, user technical skills and operation constraints (such as system cost, cost and availability of consumables, system maintenance and power requirements) [1] .
At the core of any portable sensing device is the processing (control) unit that coordinates the operation of the subsystems so that these subsystems perform their operations both effectively and efficiently. In many laboratory prototype arrangements, and even some field prototypes, it is typical for a standard PC (desktop or laptop) to perform this control function using a suitable software application. Different studies have proposed a miniaturized sensor system by replacing the PC with a microcontroller (µC) for light source control, sensor signal acquisition, data analysis, calibration and communication to an external system [3] . These µC based systems can be designed to implement their required operations in software as a set of sequential operations and can be chosen so that they operate with low power. However, the sequential operation nature of the µC software code, typically written in a version of the 'C' structured programming language, can limit the potential capabilities of the system, particularly when the system would work more effectively and efficiently if a parallel (concurrent) operation were possible. For example, where multiple operations are implemented in parallel, the operations could be completed in a reduced time. Such parallel operations would include concurrent sensor data acquisition, processing, storage and communication. In some specific optical sensor systems, concurrent processing is suited to tasks that include fluorescent measurement [4] , real time data classification [5] , lock-in amplification (a technique often used for noise reduction) [6] and time division multiplexing [7] . This concurrency is readily available in hardware configured (programmed) devices such as the FPGA [8] . The FPGA can perform many required DSP operations internally (i.e., within the device) using its array of programmable logic and embedded memory (i.e., RAMrandom access memory). This capability can eliminate the need for the use of external devices, as would be typical in a microprocessor (µP) system that would be designed to operate with a set of external integrated circuits (ICs) such as RAM and communications ICs [9] . Huang et al. presented a sensor system that demonstrated a high level of integration using the FPGA. Embedded operations within the FPGA could therefore facilitate real-time, remote and in-situ monitoring. For a field study (i.e., an experiment carried outside of laboratory), the FPGA as a reconfigurable device is also useful in resource-limited environments where different complicated analyses are required to be performed on the digitized sensor samples [10] . Resources would include physical size, power supply requirements and timing of operations. Shi et al. utilized the FPGA's reconfigurable capability to perform analysis requiring multiple computations. In addition to being internally reconfigurable, the FPGA can also accommodate multiple I/O (input/output) interface protocols that could be a beneficial capability when connecting the FPGA to peripheral devices in an embedded sensor system [11] . For example, this allows for ease of communication between a single FPGA and multiple sensor modules. For any sensor readout system design with an attached sensor, a calibration process would also be required to correlate the measured signal to the measurand change. Depending on the nature of the measured signal and the statistical samples, different approaches for data analysis can be used such as classification [2] , regression [12] and clustering [13] , or other machine learning techniques. These approaches include different constraints that determine to which implementation platform they would be best suited. A combination of Clustering Using REpresentatives (CURE) clustering and k-nearest neighbor (kNN) classification approach (both simple to implement and robust approaches) have been used in this study [14] , [15] . The CURE approach can effectively minimize the computational cost by reducing the size of the data sets. The data set is a group of sensor data readings taken within a particular sample period. An advantage of CURE is that it is also robust against the presence of outlier data points and it does not assume cluster shape [15] . kNN is effective as a classification approach, but it is typically not feasible for resource-limited computing environments due to the large data set storage requirements [16] . Such storage needs may result in large memory requirements within the embedded sensor electronics. Though an implementation of the kNN classification approach can suffer from the need for large memory requirements, it is actually considered here as the number of data sets would initially be reduced using CURE and hence considered feasible using the FPGA-PC based approach described in this work.
In this paper, an optical sensor system based on BU-SPR sensor configuration [17] utilizing the Xilinx Artix-7 FPGA [18] is presented. The proposed system aims to be portable, modular, simple to use and with minimal maintenance requirements. The FPGA is used as the main processing (control) unit. It modulates the RGB LED light source and coordinates the photodetector's timing in order to achieve the characteristics of a broad band visible light source. A ZigBee [19] wireless interface of the sensor electronics to a PC allows for system calibration and assessment of the feasibility of the classification approach combination in software using Python [20] . However, the algorithm is considered initially as a purely software implementation with the aim for later incorporation into the FPGA as an embedded hardware module. Hence, when using the FPGA in this type of sensor system, the designer can initially decide on which functions to implement in software (i.e., using a software application running on the PC) and subsequently which functions to implement in hardware (i.e., within the FPGA). In addition, with the ability to embed one or more processor cores within the FPGA, the FPGA design itself can be either hardware only or a hardware-software co-design. The digital design within the FPGA was developed using VHDL (VHSIC (Very High Speed Integrated Circuit) HDL (Hardware Description Language)) [21] . The system was also designed to be modular in that functions implemented within the FPGA can be applied to other optical sensors.
The paper is structured as follows. Section II discusses the sensor and readout principle. Section III provides a system overview with a focus on the electronic components and the FPGA based design. Section IV presents the classification operations applied in the system. Section V provides conclusions and identifies future work.
II. SENSOR READOUT PRINCIPLE

A. Surface Plasmon Resonance Sensor
Surface Plasmon resonance (SPR) is an optical phenomenon where the momentum of an incident photon matches that of the surface charges oscillations (surface Plasmons) between a dielectric substrate and surface coated metal (typically Gold). This condition is sensitive to changes of the surrounding environment, wavelength of excitation and the material properties of the sensing chip. Hence, a change in the refractive index of the dielectric medium produces a change in the propagation constant of the surface Plasmon. This modifies the SPR condition and results in a change of the attenuation spectrum [22] . These factors make SPR the basis for many label free detection applications including gas detection and bio-sensing. In this study, an SPR sensor based on the BU-SPR configuration [17] has been used. The principle of operation is shown in Fig. 1 .
The sensor enclosure was built using a 3D printing method to ensure accurate mechanical alignment between the input fiber, the SPR chip and the output fiber. Optical interfacing between the sensor and sensor electronics is via a low-cost, 1 mm diameter, plastic optical fiber (POF). The SPR chip comprises a glass slide coated with a 50 nm Gold film. The SPR chip is attached to the housing using hot glue. A linearly polarizing film is placed immediately after the input For typical SPR systems operation, wavelength or angular scans are performed to locate the change of the dip location corresponding to an change of the measurand [23] . Other studies have utilized intensity modulations in one color directly [24] or through imaging schemes [25] . In the scheme of this investigation, intensity modulation of the three color bands (red, green and blue) are recorded using a single photodiode in a near simultaneous manner. This places this scheme somewhere in between intensity modulation and wavelength scanning. However, the need for potentially expensive spectrometry is eliminated.
In Fig. 1 , the input and output fibers are fixed at an angle of 20°(results in an angle of 13.18°with the normal to the gold surface inside the SPR chip). Using this configuration, the SPR response was calculated using the T-Matrix method for TM polarized wave [26] . The shift of the dip in the reflectance is depicted in the inset in Fig. 2 when the environment's refractive index is increased from 1.33 to 1.36. The corresponding normalized intensity levels, calculated by averaging the reflectance over the three color ranges, are depicted in Fig. 2 . Blue, green and red corresponds to spectral ranges 448 nm-494 nm, 495 nm-570 nm and 620 nm-750 nm respectively. These results indicate that the change of the intensity for the blue band is minimal while that of red is the maximum. Hence, blue can be used as a reference to accommodate for intensity fluctuations. POF is used in this setup as it has a relatively large diameter core (1 mm) and wide angle of acceptance (NA: 0.5). This eliminates the need for potentially expensive optical couplers and simplifies the optical setup. The optical fiber length for both sensor's input and output is approximately 50 cm.
B. Readout System
The SPR sensor readout system is schematically shown in Fig. 3 . In the setup, an RGB LED is used as a light source. Each LED color is stimulated with maximum current of 20 mA and a photodiode is used to read out the output signal from the SPR chip. A series of red, green, and blue light pulses are generated from the LED by turning ON one color for a specific period whilst maintaining the other two OFF. Synchronizing the photodetector output for each color pulse means that a different attenuation is expected for each color according to the predicted SPR response depicted in Fig. 2 . This approach is able accommodate more sensing mechanisms than single spectral band approach especially in application that have multiple spectral band of interest [27] .
The RGB color light pulses are generated by the FPGA using pulse width modulation (PWM). The light stimulus generation timing is synchronized with the embedded analogto-digital converter (ADC) timing where a voltage representing the photodiode current level is sampled using the in-built 12-bit Xilinx ADC (XADC) within the Artx-7 FPGA. The color sample data is identified by the inclusion of header information attached to each ADC sample. Fig. 4 shows an example timing diagram for the color pulses and the corresponding voltage output from photodetector circuit. The voltage output corresponds to a different attenuation level for different colors.
III. FPGA BASED ELECTRONIC SYSTEM DESIGN AND SYSTEM OPERATION A. System Overview
The system is depicted in Fig. 5 and is composed of three main subsystems: (i) the light source and photodetector, (ii) the FPGA based processing (control) unit, and (iii) the sensor. The thin lines connected to system's electronics represent the optical fiber. The sensing element can be either intrinsic or extrinsic [28] . Electrical signal communications between components are shown in schematic diagram as single direction arrows. A thin single direction arrow represents a signal within subsystem and a thicker arrow represents the interface between subsystems. A bi-directional arrow represents communication between the sensor system and the personal computer. Data transmission between the readout system and PC are implemented using the universal asynchronous receiver-transmitter (UART) protocol through either a wired (USB connection) or wireless (ZigBee). A terminal program was created using Python for control of the sensor system, sensor data visualization and results storage. The graphical user interface (GUI) shows the live data from sensor system and allows the user to initialize the system, set the LED brightness, and label data for data storage.
B. Light Source and Photodetector Design
The light source driver comprises three current control circuits to drive the individual color channels of the RGB LED. A voltage-to-current (VI) converter converts the FPGA output voltage pulses (at +3.3 V and 0 V) to a corresponding color where +3.3 V = 20 mA and 0 is zero current as shown in Fig. 6 (a) . This is used as the light source to generate the sequence of red, green, and blue color pulses. The feedback voltage from the current set resistor, R E , ensures that the voltage remains constant and hence the LED current is accurately controlled. The LED output light intensity is proportional to current and can be accurately controlled (5% tolerance from FPGA's I/O) in the range 0 to 20 mA by applying an FPGA generated PWM signal into the LED driver circuit. A hardware module embedded within the FPGA provides this function for each color. Each color is independently controlled by three identical parallel PWM generator circuits within the FPGA, using the concurrent operation capabilities of the FPGA. An arbitrary light waveform could also be generated by using a digital waveform stored within the FPGA combined with a digital-to-analog converter (DAC) and current control circuit [4] . The digital waveforms stored in the FPGA's RAM are generated beforehand and addressed during operation [29] . An arbitrary light waveform may be useful in applications such as in the case of lock-in signal detection [30] , fluorescent measurement [27] and laser diode wavelength tuning. In the case of this investigation, the three PWM output configuration was used solely for switching and intensity control purpose. The photodetector is operated in photovoltaic mode and the acquisition of its output signal is performed in synchronization with the LED control signal. The in-built 12-bit ADC within FPGA acquires the voltage reading. The header information as described above is added to each sample during data storage in the FPGA in order to identify the correct color channel and correctly read its intensity value.
During the PCB design, electronic circuit shielding and voltage regulator circuits were included to reduce external noise effects and to ensure signal accuracy. The op-amp used for the transimpedance amplifier required low offset and low bias errors for the low-level optical signal received by the photodiode so that the op-amp circuit did not to mask the input signal. Fig. 7 depicts the response from the photodetector circuit as a function of time where each shown data point is an average of 28 raw data points that were collected over time period of 20 ms. The initial transient signals (the first 200 data points) are identified and removed prior to data classification.
Each color shows a similar response in that there is an initial small reduction in the intensity of the signal at the transimpedance amplifier output before it would settle to a constant value. Red shows the largest change whilst blue shows the least amount of change. However, it should be noted that the change in signal is small compared to the absolute value. The intensity number of the vertical axis depicts the output code from the 12-bit XADC where a reference voltage for the ADC is set to +1.25 V. Hence, for this set-up 1 LSB (least significant bit) is (1.25 / 2 12 ) = 305 µV that represents an amplifier average output voltage change of 1007 µV as the amplifier output voltage is firstly applied to a potential divider circuit that divides an applied voltage in the range 0 V to +3.3 V down to 0 V to +1.0 V. In the figure, red changes by 14 LSBs and this represents a change in the average amplifier output voltage of 14.098 mV where the absolute voltage changes from 1.97372 V down to 1.95962 V. In this arrangement, the change was considered negligible, as it did not affect the results clustering and resulting classification. Therefore the scaling of the vertical axis in the graph gives the appearance that the effect is greater than it actually is. However, it should be noted that the cause of this effect warrants further investigation and the effect removed if possible.
The remaining data points (from 200 to 300) are averaged into single point as one measurement point for classification. After each measurement point is sampled, the data collection is paused for 1 minute before the next data set collection.
IV. CLASSIFICATION OPERATION
A combination of the clustering using representatives (CURE) approach and the k nearest neighbor (kNN) approach was used in this study for data classification. This classification combination was implemented using a Python script running on a PC for feasibility assessment. CURE is similar to the centroid approach, but multiple representatives points are used to represent single cluster that contain n t number of training vector and was adapted as supervised method for this work. The J number of representatives were identified to preserve information about the geometry of the clusters using the farthest neighbor algorithm instead of random sampling as the original algorithm and the representatives are shrunk towards the centroid by a fraction. The farthest neighbor was used to ensure that the data points were well dispersed across the cluster and shrinking distances by a fraction reduced the influence from outlier as the further the data point was from the centroid, the shrink distance was larger. This results in the CURE approach being more robust as it is able to fit most of the sample data cluster shapes. It is a compromise scheme between two extremes, the centroid approach and the kNN approach. The time complexity for kNN without data structure is O(n 2 t ). This combination is not as computationally expensive as a plain kNN algorithm with full set of data and not as problematic as the centroid approach [15] , [31] . The time complexity for this adapted CURE algorithm is O(n t J − (J 2 + J)/2 which is dominated by the farthest neighbor algorithm and only executed once to generate representatives. The time complexity effectively become O(n t J) for small J number and O(J 2 + J)/2) manifests for large training set with large J number. This farthest neighbor algorithm was intended to only capture the information of cluster shape with minimal representative points and therefore the time complexity should be O(n t J). The execution time could be further reduced by randomized sampling the training set to reduce the data set (n t ) before farthest neighbor algorithm. The randomized sampling output required to be sufficient maintain the information about geometry clusters. This clustering approach is an unsupervised method, but it was adapted to a supervised method since the data is labelled during acquisition. This adapted CURE effectively reduced the kNN execution time by reducing the training set from n t to J for every cluster but left the time complexity remain as quadratic as this work does not modify kNN.
The total collected data is shown in Table 1 and was split into three sets: training, validation, and test, with ratio of 50:25:25 respectively for hold-out validation [32] .
The splitting process was randomized using the Python Panda sample function. Representatives were found using the training set and the success rate was determined using the validation set. Validation was applied to find the optimum number of representative points, J, such that the J number of points would represent the cluster and used for kNN classification on the validation and test sets. Fig. 8 shows the representative generation from training set for J = 2, 4, and 6; and K-means scenarios.
The x-axis, G-B, and y-axis, R-B, was used to simplify the analysis by reducing the 3D data into 2D data. The axes were reduced in such a way that the given sensor was less sensitive to blue light and hence the blue light provided the reference signal. However, this method might not apply in the case of other sensor systems that require more parameters, and in such cases dimension reduction techniques such as principle component analysis may need to be applied. User knowledge about the sensor behavior can help to minimize the computational burden by simplifying the analysis. This referencing method that employs subtraction is also effective to counteract ambient light that is constantly present as the same ambient influence is present in both colors and can be removed by subtraction. In the current system, the system is designed to minimize ambient light by masking the component exterior with black tape and the LED timing is regulated by FPGA to ensure no two LEDs are switched on at the same time. This timing regulation is to prevent mixing of LED colors that could yield an unexpected output under mixing of different light attenuation.
The K-means scenario used the cluster centroid as the representative instead of a random selected value to compare with the CURE algorithm. However, the centroid approach failed to describe the shape of data that was elongated. For low numbers of representatives, CURE performs less well as the first representative is randomized and this results in variations in the shapes of representatives. In the J = 2 scenario, it has slightly lower total success rate (98.43%) than K-means (98.96%). Following the increase number of representatives, the representative slowly morphs into the shape of the test set and gives a higher success rate at large number of representatives cases later on.
Validation was performed using the kNN approach with representative from CURE and K-means. For K-means of 2, k is set as 1; while for J larger than 2, k is set as 3. An odd number was used for the parameter k in order to prevent ties happening as the space is always created between no more than 2 clusters for this sensor. Once the k number of nearest neighbor (representatives) was located, the neighbors voted for their class attribute and the majority vote was used be used as the prediction. Validation was applied to find the optimum number of representatives for each cluster. Success rate for different number of representatives' scenario is shown in Table 2 .
A low number of representatives was found to be not sufficiently reliable to classify the data as it was not well enough dispersed across the data to cover the entire cluster. From Fig. 8 (b) , the RI = 1.3436 value (green dots) representatives and for J = 2, the shape most resembles a straight line rather than elongated odd shaped cluster and covers only half of the cluster's y axis range. Improvement to the success rate was demonstrated when the number of representatives was increased. The success rate reached 100% at the 4 representative points case (J = 4) and a further increase of representative points could therefore not provide further improvement, only an additional computational burden. Using this process, it was possible to select an optimum number of representatives that is crucial to realizing minimization of the computational workload for low power applications especially in the case of field based systems. It is also suited to be installed as a stationary system that utilizes long optical fiber cables that connect to sensor for in-situ monitoring. Table 3 shows the test success rate for J = 4 and indicates that the J = 4 value is sufficient for accurate prediction.
V. CONCLUSIONS AND FUTURE WORK
In this paper, a portable sensor device system design was proposed and elaborated. The target system was a resonance based optical fiber sensor. The SPR sensor readout system was designed as a combination of optical sensor coupled to an electronic data acquisition and analysis system based on the use of the FPGA with external communications to a PC. The sensor data analysis approach used a combination of the CURE data clustering approach and the kNN algorithm for data classification. Both clustering and classification were found to be mathematically robust, yet simple to implement and the use of this combination is particularly well suited for low power applications where computing resources are scarce.
The CURE data clustering approach has reduced the kNN algorithm's execution time significantly by reducing the training set. Validation of the results produced by a Python test script was presented and its optimum case tested. For optimum number of generated representatives, the reduction in execution time provides 100% classification success rate while Kmeans only provide 98.96% classification success rate. Future work will continue to refine the classification approach and classification implementation for a purely hardware implementation within an FPGA.
DISCLOSURE
The authors declare that there are no conflicts of interest to disclose. 
