I. INTRODUCTION
The critical overview on two of the most used communication protocols, SPI and I 2 C, brings some considerations regarding how well they are suited for Internet of Things (IoT)
applications [1] ; moreover, this analysis can be used to understand what can be done to improve performances and possibly obtaining a new standard that combines the advantages of both of them. As observed, I 2 C has many of the features required by the IoT paradigm, with an abstraction layer that provides flexibility and different services. However, although improved with the latest reviews, it still lacks of important features and, above all, its strict connection between number of devices and the maximum bit-rate forces to create restricted smart objects. At last, the purpose of limiting energy consumption fails due to the open-drain connection with pull-up resistor. On the other hand SPI, based on push-pull output stage, is a protocol less power consuming and with a throughput depending only on capabilities of devices on the bus. However, its simplicity causes the lack of features necessary to let subsystems communicate in an efficient way, making difficult its implementation in a smart object [2] [3].
The main features of these two protocols, compared in some of the most important aspects, are here reported in figure 1. Although it is not reported, it must be pointed out that, while SPI is a completely physical protocol, I 2 C is typically developed with a more complicated software structure; this aspect can be seen as an advantage or an obstacle depending on designers preferences during the development of an application [4] [5]. The possibility of conjugating the benefits of these two protocols would lead to a communication standard particularly suited for smart objects and, therefore, for IoT applications. A fixed number of used pins would simplify the development of sub-systems, making easier to increase capabilities of a smart object; furthermore, by assigning to every device a long address, it would be possible to avoid the use of multiplexers, decreasing the circuital complexity. However, this can be done only if a very high speed is supported in order not to invalidate performances during a communication session; the protocol should also have the capability of recognizing new devices connected to the bus and eventually possible via TI wireless evaluation module headers or EZ430-RF2500T headers. With respect to photo in figure 2a, the position of these peripherals on board are shown in figure 2b [8] [9] . Figure 2 . View of MSP-EXP430F5438 Experimenter Board (a) and its functional overview.
The microcontroller TI MSP430F5438A is very well suited for applications requiring thorough energy management thanks to its architecture that, combined with several lowpower modes, makes possible to achieve an extended battery life [7] [8] [10] . Typical applications for this device include analog and digital sensor systems, digital motor control, remote controls, thermostats, digital timers, and hand-held meters [11] [12] . Among the
(a) (b)
features of this microcontroller, it is worth mentioning the powerful 16-bit RISC architecture of the CPU that, combined with constant generators, contribute to maximize code efficiency with a system clock up to 25 MHz. The CPU is highly transparent to the application: all operations, other than program-flow instructions, are performed as they were register operations, thanks to seven addressing modes for source operand and four addressing modes for destination operand [13] [14] . This architecture is reported in figure 3 . The CPU is equipped with 16 registers that provide reduced instruction execution time: the register-to-register operation execution time, in fact, is one cycle of CPU clock. Four registers, R0 to R3, are dedicated as program counter, stack pointer, status register and constant generator, respectively; remaining registers are general purpose registers. The different peripherals of the microcontroller are connected to the CPU using several kind of buses and can be handled with all instructions, which operate on word and byte data.
Another important feature is the digitally controlled oscillator, capable of waking up the device from one of the low-power modes to active mode in about 3.5µs. The MSP430F5438A
has one active mode and six software selectable low-power modes of operation: the software can be written so that an interrupt event can wake up the device from any of the low-power modes, perform the required instructions and restore back to the low-power mode.
Regarding its storage capability, the MSP430F5438A is equipped with a 256 KB flash memory and a 16 KB RAM; memory is organized to store, among other things, the code 
b. Overview of the software architectures for embedded systems
One of the principles FlexSPI is based on is the abstraction from the physical layer in order to obtain not only a shared SPI communication but also many optional procedures; this approach, however, can only be fulfilled with a proper software architecture for the designed firmware. Any implemented firmware should be able to quickly respond to external events, even if it is in the middle of some processing; in some applications, a simple software architecture will be good enough, while, in some other cases, a more articulate approach should be preferred. A proper analysis can therefore be conducted only after a detailed study on interrupts and the problems that they generate.
An interrupt is a signal from the hardware regarding an event that needs immediate assistance from the microprocessor. code with the instructions that must be performed as soon as an interrupt is requested; before its execution, the microprocessor saves on the stack the address of the instruction that would have been normally executed next. The interrupt routines should always be short and quick to let the device continue its previous processing, restoring the address of the next instruction from the stack. The classic situation that involves an interrupt in an embedded system firmware is described in the example in figure 5 . One of the major problems when working with interrupts is that ISRs may modify a variable used by the main code, leading to the so-called shared-data problem. This annoying bug can be encountered when, for example, a situation like the one represented in figure 2.6 is faced: a temperature from two sensors is asynchronously measured and acquired with an ISR, and the values are compared in the main code. If the value changes between the copy of the measured values in the main code, the two "iTemp" variables will have a different value, although this is not true according to the measure, leading to a false alarm. Figure 6 . Example of code affected by the shared-data bug.
It should be pointed out that this kind of bug seldom shows up, but this makes it difficult to locate and can seriously compromise the integrity of written firmware. One could solve this bug by disabling interrupts before copying values and re-enabling them after; this solution is based on the observation that implicated portion of code is atomic, i.e. it cannot be interrupted.
Although this approach solves problem, it brings some disadvantages related with interrupt latency. The interrupt latency is the time a system needs to respond to an interrupt, and it should be as low as possible; as said before, writing short interrupt routines can help. Latency greatly increases when disabling interrupts because, by doing that, the system can handle an interrupt (event that in any case will be registered) only when interrupts become active again, greatly slowing the system response time and therefore decreasing the performances of the embedded system. The shared-data bug, the interrupt latency and other similar aspects related to the firmware in embedded systems cannot be discussed without having a precise idea of the application one is building: in some cases, we can just ignore these problems since they will not affect system performances. This is the reason why several software architectures have been developed, with the idea of giving more control over the system response when needed.
The simplest architecture possible is called round-robin, prototyped with the example code in figure 7 . This architecture, characterized by the total lack of interrupts, simply check periodically every device connected to the microcontroller, handling the read data and eventually performing other operations; an example of firmware based on this architecture can be the one implemented in a digital multimeter. This architecture results suited in applications when timing is not very important and the number of I/O devices connected is small. Although its simplicity can be enough for some systems, it remains very inadequate for responsive embedded systems, especially in an IoT perspective, characterized by devices whose response should be as fast as possible since they constantly interact [15] . A more sophisticated architecture is the round-robin with interrupts. This architecture is similar to the previous one, with the important difference that interrupts can be exploited to signal urgent need of the microprocessor-based assistance. The possibility of giving a different priority to certain instructions improves firmware performance since also interrupts themselves can be handled with different priorities; this feature is restated in figure 8 , where the two architectures are compared.
Once interrupts have been implemented, however, all the related problems discussed earlier arise, causing the need of much more scrupulousness when designing the code and, eventually, the accidental generation of bugs difficult to be tracked. The last software architecture proposed is the one that uses real-time operating systems, sometimes called kernels. As in the previous cases, interrupts have the highest priority and they signal to the main code that an action is required; the big difference is that this signaling is handled by the operating system itself that, moreover, decides which task needs to be executed next. As can be seen in figure 9 , in fact, the so-called main code is itself divided in several tasks, with an assigned priority according to the designed application. In this way, also regular operations are subordinated to priority mechanisms. The most important component of a RTOS is the scheduler, responsible of keeping track of the state of each task and deciding which one should be executing: it constantly monitors the ready tasks, running the one with the highest priority and registering if a task has been unblocked or blocks itself. A task in fact, apart from being blocked if a higher priority task becomes ready, can block itself if it has no more actions to execute, waiting for further events for completing or restarting its operations. According to the implementation of the scheduler, an RTOS can be classified as:
 Preemptive, when the scheduler immediately blocks the running task if a higher priority one has woken up.
 Non-preemptive, when the scheduler waits for the running task to complete its operations first.
Every task, as shown in figure 11 , possesses its own private context that includes register values, a program counter and a stack; all the other data, such as global or static variables, are shared among all tasks. This configuration, however, leads to the discussed problem of shared-data, since the scheduler could block a task using a variable in favor of another task accessing the same data, causing a possible memory corruption. A first way to solve this problem is to write functions that are reentrant, i.e. that can be safely accessed by more tasks exploiting variables in a non-atomic way; however, sometimes this aspect is related to chosen compiler, therefore solution could not be universal. RTOSs, however, provide a powerful tool to efficiently deal with shared-data problem and to optimize designed firmware: semaphores.
A semaphore is a tool that ensures a secure management of shared-data among different tasks:
before modifying a shared variable, the task can be forced to first attempt to take the semaphore, waiting if it is not available. As soon as another task releases the semaphore, the invoking task will successively take it, thus becoming able to safely modify the data; if other tasks want to modify the same variable, their attempt to take the semaphore will fail and they will have to wait, granting to the task that possesses the semaphore the chance to safely complete its operations. When this task completes its work, it will release the semaphore giving to other tasks the opportunity to modify the variable. There are several different kind of semaphores in RTOSs, used not only for dealing with the shared-data problems but also to optimize the timing of the whole firmware. These tools, together with many others, make possible to write a firmware composed by independently written tasks whose execution is autonomously regulated by the scheduler. This approach is particularly useful when abstraction from a pure physical world, e.g. pin toggling, is necessary, since one can safely design its code focusing on the different aspects of a particular layer in its stack, having to worry only that the interactions among layers are coherent.
Abstraction also eases code management and extension, since modules are weakly linked to each others, granting to the developers the chance to add functions and procedures.
The offered tools and the exposed advantages give the opportunity to precisely tailor the written firmware to its application, leading to the consideration that RTOSs should be the primary choice when writing a firmware for any IoT application. Since the implementation of the FlexSPI firmware has been made working with a open source real-time operating system, exactly FreeRTOS, its main features will be now discussed.
c. The used Kernel: FreeRTOS
FreeRTOS is a real-time kernel on top of which embedded applications can be built to meet their hardware real-time requirements. Since the MSP430F5438A, as many microcontrollers, possesses only one core, only a single task can be in the running state: the scheduler decides which thread should be executed by examining the assigned priorities. It is a common trend to assign higher priorities to tasks that implement hard real-time requirements, and lower priorities to tasks that implement soft real-time instructions. This choice ensures that hard real-time threads are always executed before soft real-time threads, but priority assignment decisions are always more articulated and strongly linked to the application.
The building blocks of FreeRTOS, like all kernels, are tasks, implemented as C functions and not allowed to return parameters. Any task must be explicitly created in the main portion of the code through a given API that specifies, among other parameters, its priority. Once the scheduler is started, the task with the higher priority will be executed and, when its processing ends, a tick interrupt is processed, letting the scheduler decide which task should be executed next; an example of this situation is given in figure 12 . FreeRTOS supports all the classic task states plus another one called "suspended", i.e. when a task has been blocked and not available to the scheduler until it is properly resumed. Declared a proper handler, any task can suspend and resume itself or other tasks; it is also possible to change a task priority during the execution of the firmware. When no thread has to been executed, an "idle" task with the lowest priority and no instructions is running. FreeRTOS, however, gives the opportunity to exploit this idle task, obtaining the scenario in figure 13 to perform some minor actions or to put microcontroller in low-power mode. As soon as a task becomes ready, the scheduler quickly switches the execution thanks to preemption of idle task. Figure 13 . Example timing diagram with the idle task.
Interrupts are handled with a great variety of semaphores: typically, an ISR is synchronized with a task called "interrupt handler" by giving it the semaphore, as can be seen in figure 14.
Thanks to this approach, it is possible to obtain really short interrupt routines and the remaining desired processing can be comfortably done outside. Semaphores can also be used to synchronize different tasks or to perform other functions; for this reason, apart from binary ones, in FreeRTOS there are:
 Counting semaphores, semaphores that can be given and taken more than once, giving the opportunity to latch multiple interrupt events.
 Mutexes, special type of binary semaphores used to control access to a shared resource, avoiding the rising of the shared-data problem with a mutual exclusion scenario.
FreeRTOS semaphores are, actually, a special type of another instrument called "queues". A queue is a First Input First Output (FIFO) buffer used by tasks to exchange data with each others; in this sense, a binary semaphore is simply a one-item long queue that transfers a token granting the possibility of execution. When a task writes a data in the queue, if not explicitly specified differently, it will be placed in the tail of the queue, while data are extracted from the head. If a task tries to write in a full queue or to read from an empty queue, it will block for a configurable amount of time, and then it will try to perform again the same
action. An example on how tasks that use queues are executed is reported in figure 15 . Queues can be used to transfer data encapsulated in structures from multiple sources; in this case, some identification mechanism should be provided. However, if the size of the stored data is very large, it is preferable to use the queue to transfer a pointer to the data, leaving to the receiver task its recovery and processing. With this approach, it is possible to speed up the data exchange and save the available RAM.
By choosing this approach, it is preferable to dynamically allocate the necessary memory before passing the pointer; moreover, the code structure must assure that only one task will access the memory while allocating the memory, modifying its content and eventually freeing it. For this reason, FreeRTOS uses two API, pvPortMalloc and vPortFree, that grant a safer approach with respect to the C classic malloc and free functions. In the developed FlexSPI firmware, among the different possibilities, the definition of these two functions declared in the "Heap_4.c" file has been chosen: the safety of memory allocation and release is granted by temporarily suspending the scheduler before the necessary operations, resuming it when processing ends. This description of FreeRTOS has been obtained not only through the analysis of its manual [6] . The frameworks used to characterize peripherals on the MSP-EXP430F5438 experimenter board have been once again written using this operating system.
Although listings are not reported, the expressed considerations in description of FreeRTOS are also the results of experimental development of firmware based on this advanced software architecture.
III. DESCRIPTION OF THE FLEXSPI FUNDAMENTALS
The purpose that has led to the ideation of this communication protocol is obtaining a fully shared SPI bus, with a fixed amount of wires, without renouncing to the advantages of a pushpull output stage and obtaining an architecture capable of great flexibility [16] . All the four signals of a classic SPI protocol are entirely shared by the slaves on the bus: when a master wants to communicate with a particular device, it will perform an addressing at packet level, whose specifications will be discussed later. All slaves will download the transmission and check if they are the addressed receivers: if so, they will continue to process the packet, otherwise it will be discarded. Thanks to this approach, it is possible to provide not only unicast addressing but also multicast and broadcast messaging.
It is possible to obtain this behavior by redefining Chip Select role in the communication session: instead of just bytes, this signal is used to window messaging sessions. This line is, in fact, used to wake up the slaves and let them know that an incoming transmission is about to begin. Further transmissions will be handled by slaves according to some built-in finite state machines, updated according to the content of the previous packet. This approach shows no problems when applied to the SIMO signal but can provoke a serious number of dangerous conflicts when referred to SOMI line, if an arbitration rule is not applied; moreover, slaves are allowed to use this line to asynchronously request the master attention. Potential malfunctions are prevented by configuring the SOMI pin, when not employed in a data transfer, as a high impedance input, while the master uses a pull-up resistor to mark the line in a recessive state.
As soon as the master needs to receive data from a particular slave, it explicitly gives the possession of the SOMI line and disables the pull-up resistor, granting only to that device the right to occupy the channel with its transmission. An example of a possible bus using FlexSPI is shown in the following figure 16 [17] . where, for example, the master is simply used as a sink collecting data from slaves, e.g. sensor [18] [19] ; with this configuration, any device has the possibility to communicate the need of a bus concession to the master, which will perform a procedure to understand which slave produced the signal. Once pinpointed the slave, the master will give to it possession of the channel, downloading the content of the slave's pending queue and eventually transmitting data. FlexSPI, in fact, is based on SPI protocol and therefore supports full-duplex communications, doubling the available bandwidth of channel with respect to I 2 C; this eventuality, however, must be properly signaled in order to let devices correctly to configure their pins and available memory. It is obviously possible to have just half-duplex data exchanges but, if transmitter is the slave, the master should be made aware of how many clock pulses it must send to fully download the packet.
Analyzing this protocol from a more physical point of view, the principal feature of FlexSPI is taking advantage of both the GPIO and SPI modules available on microcontrollers: the default situation, in fact, features the SIMO and SOMI signals as inputs for both the master and the slaves, while the remaining signal are output for the master and input for the slaves.
As soon as a particular communication session is desired, the firmware will disconnect the necessary pins from the GPIO module and connect them to the SPI one, performing the required operations. This model of the physical interface is represented in figure 17 , showing that the constant multiplexing among modules is completely transparent to application layer. An important implementation advice, adopted in the developed firmware, has been followed:
it is preferable, for the master, to never connect the Chip Select line to the SPI module: since this signal is used to window command session and not bytes, in fact, its behavior would be more unpredictable. It is therefore suggested to let Chip Select pin always connected to the GPIO module, lowering and rising its output logical level when needed in the implementation.
a. Packet structures and addressing strategies
FlexSPI uses two different kinds of address categories: Universal Device Address (UDA) and
Device Short Address (DSA). UDAs are several bytes long universal addresses that unambiguously represent every single device that can be connected to a FlexSPI bus; these addresses are fixed and cannot be altered during the entire device lifetime. Only two addresses are reserved: all zeroes, used if an UDA is not assigned and all ones, for broadcast sessions.
DSAs, instead, are much shorter than UDAs and therefore used to reduce protocol overhead; these addresses are leased by the master that controls the channel by mean of a dedicated procedure. Even in this case the addresses with all zeroes and all ones are reserved with analogous functions. Two types of addressing strategies can be used, In-Packet Addressing understanding of frame formats, it is sufficient to say that while the first case performs addressing at packet level, the second one exploits a different strategy. Since addressing is performed only towards slaves, an asymmetric packet format was created to reduce overhead and power consumption, especially in slaves; this is the reason why only masters can have two different frame formats. If IPA is used, the master packet will have the structure in figure   18 , where the name of the various fields and their length expressed in byte is reported. Figure 18 . IPA Master packet format.
The packet structure is composed by different fields and sub-fields whose role is here reported:  Opcode: specifies the packet content or the command; it is composed by:
 CMD: marker of the command.
 M: indicates if the packet contains a MASK field.
 S: if set, the address is a DSA, otherwise it is an UDA.
 LEN: represents the residual length of the packet, i.e. the number of bytes following the LEN field itself.
 Addressing Field (MPH-AF): is the field used to address one or multiple slaves; in some procedures may be omitted. This field is composed by:
 DEST: is the address of the receiver and it must be always set.
 MASK: used to send unicast, multicast or broadcast packet.
 Master Packet Payload (MPP): is the informative content that the master wants to send to one or more slaves; it can be empty if the master is sending commands.
 Master Packet Footer (MPF): optional field that can be used to make a safer communication by using a CRC signature and check.
Since in OPA, addressing is not performed at packet level, there is no need of addressing fields in the packet header (figure 19, with reduced header). The only different field from IPA format is ELEN, a two-bit long field that indicates a total packet length greater than 257 bytes. Figure 19 . OPA Master packet format.
Slaves are not allowed to perform addressing and therefore their packet structure is the one in figure 20. The asymmetry in the frame structure has been chosen because, in sophisticated systems, slaves are typically energy-constrained devices that would benefit from a shorter transmission. All fields have the same meaning of an IPA master packet, with the only difference in the two last bits of the OPCODE:
 The penultimate bit is reserved and always 0.
 P, instead of S, used to signal the presence of pending data in the slave's queue that should be sent to the master in another session. figure 21 and it is articulated in three steps. The first one is the initialization, highlighted in light gray in the following figure: it is a falling edge of the clock with the chip select high, i.e. when no communications are happening at the moment.
This event is detected by all the slaves on the bus that will perform the necessary operation to get ready for the very beginning of the CSP procedure. Addressing is performed by toggling the Chip Select line for a number of times equal to the addressed DSA slave. This count goes on until the clock returns high, signaling to slaves that they can stop counting the chip select pulses. The device that finds the count equal to its address will consider itself the receiver, while the others will discard further communications.
The addressed slave will take possession of the SOMI and start a communication session according to what has been required from the application layer of the devices; the possession is held until the reset sequence is received. Highlighted in dark gray, the reset signal is a clock falling edge followed immediately by a rising edge, always with the Chip Select high; this event is an invalid zero count of chips select pulses that announces, to all the slaves on the bus, that channel reservation is ended. The master can therefore address a new slave with the same procedure knowing that all the devices are ready to listen again to the channel.
b. Data Transfer Mechanisms
FlexSPI main features and advanced procedures are enabled by sharing the SPI bus between slave devices, but this technique needs a mechanism to avoid collisions to work properly. This is the reason why the SOMI line is always kept free by all slaves on the bus while the master marks it with a recessive state by applying a pull-up resistor and, moreover, giving to slaves open-drain signaling capabilities. This signaling is performed in two different ways thanks to many procedures that will be analyzed in the following paragraph. The concession of the channel is given by the master with a poll-slave packet: this command is used to inform the addressed slave that in the following communication session it will have to send data.
Both full-duplex and half-duplex messaging are allowed in SPI peripherals, and so in FlexSPI, doubling the channel available bandwidth. It must be considered, however, that since master is responsible for injecting clock pulses into the bus, it must be made aware of the amount of necessary toggling to download the slave packet. Therefore, if communication is half-duplex and the slave is device that must send data, the master will first download just the first byte of a slave packet, containing the residual length of frame and then perform necessary operations to send as many clock pulses as required. In practical implementations, this behavior can be obtained by configuring the master to send dummy data: it is a slave's job to ignore the incoming transmission. This dependency on frame length must be carefully taken into account if the transmission is full-duplex: a possible data loss can be experienced if devices are not ready to react to different packet lengths. Presence of the LEN field in packet headers help devices to properly identify who needs to send a longer frame and promptly react:
 If the slave is sending a longer packet, the master must keep sending bytes although its packet ended with the CRC bytes, padding the transmission with a count-down to 0 of the remaining clock cycles that will be sent.
 If the master, on the other hand, is supposed to send a longer packet, the slave will simply start sending padding bytes as long as clock cycles are received.
This behavior is summarized in figure 22 , where the added bytes in both cases are at the receiver side after the CRC field, with the "PB" label used to identify slave's padded bytes.
As soon a communication session is over, the master deselects the slave by rising the Chip Select. The SOMI is then freed by slave and the master applies the pull-up resistor, recovering the idle situation ready to listen to open-drain signals or to start a new session. Commands sent by the master must be interleaved by a proper delay to let slaves process the received packet; however, an ad hoc procedure can be employed to avoid the use of imprecise delays. 
IV. OVERVIEW ON THE AVAILABLE PROCEDURES
Once described mechanisms involved in the creation of a shared SPI bus and, consequently, the fundamentals of a FlexSPI-based data exchange, some of the available procedures are here described. Thanks to its features, in fact, FlexSPI is capable of providing a series of advanced tools to help monitoring the bus condition, to ensure an efficient and fully supported communication by all devices and, above all, to create a dynamic master-slave architecture with the possibility of safely hot-plugging new devices.
a. MAster Device Somi based IRQ Synchronization (MADIS)
The MAster Device Somi based IRQ Synchronization (MADSIS) procedure has been designed to create a synchronization mechanism between the master and slave at the end of a communication session in order to avoid imprecise delays between transmission of different packets. When developing the application layer of the firmware, it must be taken into account the eventuality that some devices on the bus may not support this sub-procedure; on the other hand, the master possesses not only the address of the slave on bus but also their FlexSPI capabilities. Once a transmission between the master and a slave is successfully ended, instead of waiting for a defined amount of time, the master can be configured to wait a signal on the SOMI line, already marked with a pull-up resistor. The slave, on the other hand, as soon as it finishes to process the received packet, is allowed to produce a short open-drain signal on its output line. In this way, the master will be informed that the last addressed slave is ready to start a new communication session. The master device waiting time cannot be infinite: it is a developer's job to size a proper wait time for a SOMI event according to the designed application, in order not to excessively slow down the bus. This waiting wait, however, does not excessively affect the master energy consumption since the device can be put in a low-power mode with interrupts enabled, ready to react to the expected signal.
If the transmission from the master was addressed to more than one slave, the open-drain signaling grants that all slaves can produce a SOMI pulse without any kind of conflict.
However, since slaves may have different CPU capabilities, in case of multicast transmissions, it is recommended to let slaves wait a random delay before pulling down the SOMI line: in this way, slower devices can safely complete their processing avoiding conflicts and data loss.
b. SIMO/SOMI Link Indicator
The SOMI signal is typically disconnected from the SPI module and used as a GPIO pin for As can be observed in figure 23, when the SOMI logical level is high, slaves deduce that this link is valid. As soon a transition is detected, every slave will start a timer: if the SOMI line does not return high in a certain amount of time, the link will be considered broken. A similar technique can be applied to SIMO line when Chip Select is inactive, enabling a monitoring procedure for slaves called SIMO Link indicator (SIMO-LI). In this procedure, as shown in figure 24 , a slave produces a short pulse on SIMO line; the master has already applied a pullup resistor with monitoring purposes. Similarly to the SOMI-LI procedure, if its logical value returns high in a defined amount of time, the slave will deduce that a valid master is currently managing channel. On the other hand, if timeout expires, the link will be considered broken; this event can be seen as a warning related to excessive fanout or a master serious malfunction. Regardless of the payload, once the SPEEDACK packet has been received by the slave, it will send a SPEEDACK packet too; this is done to ensure that the proposed speed has been recognized and granted by all devices. To ensure a safe data transfer, if the master supports different speeds for different slaves, any multicast communication session is performed transmitting data with the lowest speed among the ones negotiated.
d. Procedure of Master Device Solicitation
The nature of a classic SPI communication is based on a master that sends data but is also responsible to download them from slaves that are not able to signal the urge of master's assistance to empty their pending buffer. a PINGREQ command to every slave and a subsequent POLLSAVE command to concede the channel and let the addressed slave reply. The PINGACK command used by slaves to reply will have a P field set to 1 if the polled slave has pending data: as soon one of the replies to the master's ping has this feature, the master will recognize the source of the SOMI pulse and therefore starts the procedure to download pending data. If no slave replies with a PINGACK having a P field set, the master will deduce that a new device has been connected to the bus and, consequently, will start a proper procedure to identify its address.
It must be noted that, since SOMI pulses will not collide, more than one slave can safely start a SIMP procedure; however, only the first slave polled by the master and having a pending data will be served, leaving the others unattended. For this reason, if a slave finds its SIMP request unsatisfied, it is forced to start again the same procedure. Once the master has recognized the requesting slave, it promptly produces a POLLSLAVE command to reserve it channel and then downloads pending packet. As in previous procedure, if, for some reason, the master does not find the slave address in its memory, it will be forced to start a polling session to identify the source of message. When devices on bus don't support any of these two procedures, a third one can be used: called Periodic Slave devices Polling, it is aimed to request from time to time to slaves if they have pending data to be transmitted.
SIMO enabled Slave IRQ Signaling (S

e. Slave Discovery Procedure
One of the most powerful features of FlexSPI is providing a mechanism to register presence of new devices on the bus, identifying their addresses and eventually their capabilities to optimize future communication sessions. The Slave Discovery Procedure (SDP) is responsible for the complex task of identifying new addresses and can be entered, for example, when the master is solicited by a slave to give the channel but it is not able to identify it, due to the lack of its identity in memory. The basic principle of this procedure is obtaining multiple responses to particular ping requests, exploiting the open-drain connection of devices. This mechanism can be safely used avoiding collision if slaves reply with a special PINGACK packet, crafted with all the header fields as zeroes when responding to a multicast ping. To speed-up the procedure, it is not necessary that slaves wait for a POLLSLAVE to reply: the master will promptly begin to inject clock pulses in order to retrieve any null frame.
Supposing more than one slave on bus, the first step to identify the address of a new device is writing an address conflict table: this analysis is used to investigate if slaves share the same value in some position of their addresses. This preliminary step requires, apart from PINGREQ packet, the BCASTSHUT: this command tells to addressed slaves not to reply to the following ping request. Among its options, the BCASTSHUT command can silence devices having particular address structures or addresses greater than a certain threshold.
The conflict table is filled with the following procedure:
1.
A BCASTSHUT aimed to silence slaves whose addresses have only the LSB equal to 1 is sent, followed by a PINGREQ broadcast command.
2.
If an answer is received, this means that at least a slave with a 0 as LSB in the address is present on the bus. The master will then set the column corresponding to the LSB, on a row called zeroes; if no answer is detected, the cell will be clear.
3.
The same procedure is repeated, this time silencing slaves with a 0 as LSB. A reply will produce the setting of the LSB column in the ones line, otherwise a clear.
4.
The procedure is repeated, shifting the investigated bit to the left until all the bits composing the address are investigated.
In this way, a table as the one in figure 28 is obtained. From its analysis, it is possible to deduce that where no conflict results, i.e. when in a column there is a 0 and a 1, all slaves share the same value of the bit in that position of their address. Figure 28 . Example of conflict table.
The number of conflicting bit positions can be calculated with the following formula:
where zeros ij and ones ij are the i-th bit of the j-th byte of the respective row. Given this result, it is possible to conclude that the number of bits of the address that must be investigated are
x #byteAdd -#(c).
This preliminary step allows to exponentially reduce the number of possible addresses that a slave can have, lightening the employed algorithm search by applying it only to the conflicting positions.
The shared bus architecture is exploited with a binary search applied in a distributed way:
each slave device will actively participate in the discovery, since they are the only ones possessing the required pieces of information. This mechanism is called distributed Binary Search Algorithm (dBSA) and it uses once again PINGREQ and BCASTSHUT commands; this time, however, addresses greater than a certain value are silenced. A proper set of flag variables ensures that, when ping answers are reactivated, only slaves having an undiscovered address will consider themselves authorized to reply, ensuring that discovered slaves will not take part anymore in the process. In this way, the algorithm becomes faster the more it is able to find new devices. This cooperative discovery needs a proper mapping function to apply the binary search only to conflicting bits. The function is the following:
where Z i and O i are the i-th members of the ones and zeros rows, #(a) is the length in bits of addresses and j ϵ [1; #(c) ] is an index that must be updated every time the first condition of the previous equation is applied. Thanks to this equation, it is possible to place investigated values of the binary search, i.e. bits in conflicting position, in a valid candidate address and to verify its belonging to bus. The resolution of conflict bits through dBSA to discover unknown addresses of new devices is made by the algorithm described with the flow-chart in figure 29.
Two indexes are initialized, HSide and LSide, and their mean value, Pivot, is calculated; this value is mapped with the previous equation and then used by the BCASTSHUT and PINGREQ commands to find whether that address belongs to the bus. The value of Pivot is continuously updated with the classic approach of a binary search, until a new address is finally discovered.
Once the new slave address is registered, nothing is known regarding its FlexSPI capabilities; the protocol however implements an ad-hoc command, GETOPT, to collect these pieces of information. The master can also decide to start procedure to assign a DSA address to lighten the overhead of future communications or directly start a communication session with it. Figure 29 . Algorithm flow chart related to conflicting bit resolution.
V. IMPLEMENTATION OF FLEXSPI AUXILIARY PROCEDURES
This paragraph is focused on the implementation and testing of some of the advanced procedures that FlexSPI enables; in particular three procedures, among those outlined in paragraph IV, have been selected. By adding these features to the developed framework, it is possible to explore and appreciate the expandability of this communication protocol, making it suitable to meet advanced requirements of smart objects. Some preliminary improvements have been applied to the firmware in order to increase its efficiency:
 FreeRTOS tick rate has been reduced from 1 kHz to 2Hz. This variable sets the frequency of the RTOS tick interrupt that, when occurs, wakes up the scheduler to investigate if a task has become ready; this situation is encountered, for example, when the device is executing the idle task waiting for a delay to end. Although this modification does not particularly affect procedures, it becomes very important in terms of energy consumption:
in this way, the device remains in low power mode for the entire wait period.
 The implementation of short delays has been obtained by keeping the device active, while long delays exploit the on-board Real-Time Clock. This peripheral can be used to produce a time interval that relies on the hardware counter module equipped in the microcontroller. In this way, it is possible to wait for long delays without the CPU intervention, that can be operated, therefore, in low power mode.
The three chosen advanced procedures are: MADSIS, SOMI-LI and PSN, described with their implementations and with results coming from experimental validation tests. As can be noted, however, when a SOMI interrupt is detected by the master, the MAC layer is informed in two different ways. This behavior has been implemented with the listing reported below, where the PORT1 Interrupt Service Routine of the master is reported.
a. MADSIS procedure: code design and validation
MAster
In order to speed-up the system, in fact, a SOMI interrupt is interpreted according to macState:
 If unemployed, the master has received a totally asynchronous event. For this reason, since the master is supposed to react by potentially performing complex actions, the event is communicated to the MAC layer via its listening queue.
 On the other hand, when the master is moving across its finite state machine, this event can be generated only by a slave at the end of a communication session. For this reason, it is used to wake up the master letting it to continue its operations.
The last situation is the one involved during the MADSIS procedure. After resuming the idle condition of its SPI pins, the MAC layer of the master receives a PlmeTXdisableRequest or PlmeRXdisableRequest primitive and it begins to wait for SOMI events from previously addressed slave. The master, in fact, attempts to take a semaphore called xSemMadsis and blocks itself. As soon as the SOMI event is generated by slave, ISR releases the semaphore and the MAC layer becomes running thus resuming its operations, eventually beginning a new data exchange. The MADSIS procedure requires also to modify the slave portion of the firmware: as soon as it completes its processing on the received packet, a SOMI event is generated by invoking a function called madsisSlave. Implementation of this simple function is reported in the next listing. SOMI pin, left floating by slave, is configured as output and a zero logical level is produced;
after that, the pin returns to its previous condition. When this function is invoked, the pull-up resistor was already applied by master; for this reason other operations are not required.
MADSIS procedure requires also hardware-level operations on the master: interrupts on SOMI line can be obtained by short circuiting this pin with a GPIO one. In this way, it is possible to extend PORT1 interrupts to the SPI pins. The model of experimental setup used to verify the MADSIS procedure, together with a detailed photo showing the short circuit on the master, is represented in figure 31 ; red wire and clamps were used to perform connection on the master. As can be noted from listing, the detection of a falling edge on SOMI line causes the release of a semaphore, which in turn unblocks the parallel task vSomiLiTask. In this way, slave can start the operations required to confirm a malfunction in the link with the master, excluding a false alarm. vSomiLiTask parallel task, before confirming that link is broken, has to wait for a given amount of time, almost one second in this framework. Once this timeout expires, the slave checks again SOMI pin logical level: if it returns high, a false alarm has been detected;
Listing. Slave PORT1 Interrupt Service Routine (ISR).
if it remains low, the link is considered broken. LEDs have also been used to monitor the different phases of this procedure: the red LED turns on immediately after the task is entered.
As reported in the implementation of this task in following listing, two LEDs are turned on or off according to the outcome of verification; at last, to perform reliable tests, when the link is considered broken, the slave scheduler is immediately suspended.
From a hardware point of view, as reported in figure 35 , the implementation of this procedure has been obtained by performing two different actions. The first one is the presence of a short circuit among SOMI pin and a GPIO pin of slave to enable interrupt capabilities. The second modification has been made so that SOMI line was not floating when bus had been voluntarily removed: a floating line, in fact, could cause spurious pulses eventually misinterpreted by the device. For this reason, a pull-down resistor has been placed between the SOMI line and GND: in this way, when the link gets broken, the line is marked with a low logical level. return high when a spurious pulse is detected: for this reason, the time the device must wait before declaring that the link is broken must be dimensioned according, among other things, to pulling resistors in play. The exposed scheme was used to realize the experimental setup reported in figure 36 , where the pull-down resistor has been connected using a breadboard. This procedure has been verified by manually causing the break of the link: the first time, the connector, used to plug all available SPI pins, has been removed and reinserted before timeout expired, verifying the false alarm recognition. The second time, instead, the connector was not reinserted after its removal: since the link remained broken, the orange LED turned on and the device stopped working. In this way, it has been possible to verify that the system coherently reacted both in case of a simulated false alarm and when the link is actually broken. It must be taken into account that real malfunctions or link interruptions may not always be so defined. Since this procedure is started from the slave, two things happen before the packet exchange of figure 25 takes place: the slave, thanks to its application layer, requests a new speed and triggers the generation of a SOMI pulse; the master, supposing that it knows who produced this pulse, sends a POLLSLAVE packet to listen to the slave request. The packet exchange used to perform PSN procedure exploits the finite state machines that set register macState, conveniently expanded for both the master and slave. The auxiliary task, called vPsnSTask, is responsible for updating the state of macState, assisting the MAC layer when necessary; this task has a different implementation whether the device is the master or a slave.
Master finite state machine is expanded as reported in figure 38a: as can be seen, its extension is backward compatible with the ping procedure. The presence of the auxiliary task in the MAC layer ensures that SPEEDACK packet is sent twice: in fact, after first acknowledgement, the master sets the new speed and tests the bus to verify that data exchanges with the required clock frequency are supported.
The SPEEDREQ packet sent by the slave to master has a payload containing the speed that the slave wants to be used for future data exchanges. However, in a generic scenario, the master may not always be able to grant slave request, and the decision has to be made according a specific criterion. Since the metric that the master uses deeply depends on the bus itself and on the application, in this framework, the master decides according a threshold: if the request is smaller than 3 MHz it can be granted, otherwise it is rejected. Slave finite state machine is extended as reported in figure 38b : the diagram is deeply expanded to provide support to all the different phases of the required packets exchange. As explained before, the transmission of the SPEEDREQ packet is triggered by the application layer, which activates the necessary operations to request to the master the possession of channel; the master sends POLLSLAVE packet and downloads the slave request. The detail of all the exchanged packets transmitted to perform the procedure are reported in the following figures 40 and 41. After the slave SOMI pulse and POLLSLAVE packet, slave is authorized to transmit its SPEEDREQ packet; the payload, composed by 2 bytes, contains the desired speed. The master, using its parallel task, evaluates the request and, once found acceptable, sends a SPEEDACK packet having as a payload the speed that is going to be set. The master then sends a POLLSLAVE packet in order to retrieve SPEEDACK command from the slave. Once this exchange has correctly taken place, the master modifies bus speed and tests the channel. Figure 41 represents the data exchange that occurs with negotiated speed:
because of the higher speed, resolution has been increased in order to be able to recognize the waveforms. The master starts the channel test by sending, once again, a SPEEDACK packet;
after grating the channel to the addressed slave with POLLSLAVE command, the SPEEDACK reply of slave is downloaded. This exchange completes the procedure and the good outcome of all the exchanged packets is used by the master to verify that the speed is supported. At last, the change of speed is shown in figure 42 , representing the detail of the two SPEEDACK commands that the master sends during the procedure. As described before, in fact, the master sends twice this packet to the slave, one to acknowledge the request and one to test the bus. 
d. Example of Master smartness Improvement
A final test has been executed, aiming to test the benefits of FlexSPI in enhancing smartness of the devices that use this communication protocol to exchange data. This framework shows how the master can adapt data exchange speed according to addressed slave, becoming aware of the bus structure and reacting dynamically to changes. Both the ping and speed negotiation procedures have been used, triggering their activation with the buttons available on the experimenter board; MADSIS procedure keeps being used by the devices, too. The master, during its initialization phase, creates a table of the devices on the bus, associating to every address a fixed speed set to 250 kHz; a first ping procedure is then performed. After that, the slave used before performs the speed negotiation procedure, requiring a 1.5 MHz bus speed, and master updates coherently its table. The master then performs, once again, the ping procedure; this time, however, different speeds are used to communicate with the two devices.
The scheme of experimental setup used for this test is represented in figure 43a . The master also possesses the necessary SOMI short circuit to perform MADSIS procedure; the slave requiring the new speed is the one on the center. This model has been followed to perform the connection between the devices, obtaining the setup shown in figure 43b.
(b) (a) Figure 43 . Scheme (a) and realized experimental setup for smartness improvement test (b).
As usual, exchanged signals have been collected using the logical analyzer. Figure 44 shows time intervals when the two ping procedures are performed, before and after slave PSN. By focusing on CLK line and looking on the width of the white stripe, it is possible to deduce that the master correctly reads and updates its table: a faster speed bus is, in fact, used for the first slave during second ping, changing coherently the clock speed when the second slave is polled. Figure 44 . Detail of the two ping procedure performed before and after that the first slave negotiated a new speed.
(b)
All the presented procedures highlight the advantages of using a communication protocol based on a structured software architecture. It has been possible, in fact, to analyze advantages of a synchronization mechanism among the devices on bus, using extra signaling; it has also been observed how devices can be made "channel aware", constantly monitoring the bus and, at the same time, protecting themselves from false alarms. Thanks to speed negotiation, it is possible to build a bus made by devices deeply different one another: the master, in fact, can be made smart and adapt its behavior according to the task that must fulfill. These aspects are very adherent to the IoT principles proving how beneficial can be adopting this protocol when developing advanced smart objects. Obviously, nothing comes without a price: apart from software complexity, an investigation in terms of energy consumption has to be made in order to understand if the advantages deriving from this protocol can be obtained without increasing power consumption to levels harming the device energetic autonomy. This is the aim of future research work in which FlexSPI framework will be used to quantify the energy consumption.
VI. CONCLUSIONS
All the explained procedures, joint with its physical architecture, stress the potentiality of developed FlexSPI protocol, capable of providing a lot functions tailored for dynamic systems and optimizing communication sessions. The dynamic switch among the peripherals present in microcontrollers gives also the possibility to slaves to safely trigger events captured by the master, surpassing the rigid scheme of classic master-slave architectures in which only master is responsible for every action on the bus. The fixed number of wires, together with the pushpull data exchange provided by SPI, make possible to create simpler circuital layouts for devices with several components without renouncing to a high-speed throughput [17] . I 2 C protocol, natural candidate for a comparison, although provides some variants of these procedures is limited by its RC connection, as explained in [1] . FlexSPI can be built like a MAC layer above the SPI bus to process all necessary pieces of information to perform the packet level addressing, using a stack having a layered architecture. This is idea followed in the firmware development to implement this communication protocol experimentally verifying that it is possible to obtain a shared push-pull based bus. Some of the discussed procedures have been also implemented to directly check their working mechanisms and advantages. In order to give some context to obtained results, the developed firmware has been also used to make a direct comparison with I 2 C, providing an indication on which devices could have more benefits from this protocol and how its implementation can be improved.
