City University of New York (CUNY)

CUNY Academic Works
Dissertations and Theses

City College of New York

2012

Rapid Decoding of Digital Data Streams Using Field
Programmable Gate Arrays
Andrew Hernandez
CUNY City College

How does access to this work benefit you? Let us know!
More information about this work at: https://academicworks.cuny.edu/cc_etds_theses/89
Discover additional works at: https://academicworks.cuny.edu
This work is made publicly available by the City University of New York (CUNY).
Contact: AcademicWorks@cuny.edu

Rapid Decoding of Digital Data Streams
Using Field Programmable Gate Arrays

THESIS
Submitted in partial fulfillment of the requirement for the degree
Master of Engineering (Computer Science)
At
The City College
of the
City University of New York
by
Andrew Hernandez
December 2010

Approved:

_______________ __________________
Professor Izidor Gertner, Thesis Advisor

________________ ________________
Professor Douglas Troger, Chairman
Department of Computer Science

Contents
Acknowledgements......................................................................................................................... 5
Introduction .................................................................................................................................... 5
Literature Review ............................................................................................................................ 6
Core Concepts ................................................................................................................................. 7
Securities Trading ........................................................................................................................ 7
NYSE Arca .................................................................................................................................... 8
Market Share ........................................................................................................................... 8
Arcabook .................................................................................................................................. 9
Message Rates ......................................................................................................................... 9
Data Rates................................................................................................................................ 9
Time Matters ............................................................................................................................. 10
Operations ................................................................................................................................. 11
Parsing ................................................................................................................................... 11
Storage ................................................................................................................................... 12
Retrieval ................................................................................................................................. 12
Storage Message Types ............................................................................................................. 12
Add ......................................................................................................................................... 12
Modify.................................................................................................................................... 12
Delete..................................................................................................................................... 13
Processing Options .................................................................................................................... 13
Binary vs. Compacted ................................................................................................................ 13
Methodology................................................................................................................................. 14
Data Homogeny......................................................................................................................... 14
CPU ............................................................................................................................................ 14
Specification3 ......................................................................................................................... 14
Procedure .............................................................................................................................. 15
GPU ............................................................................................................................................ 15
Specification .......................................................................................................................... 16
Procedure .............................................................................................................................. 16
CUDA ...................................................................................................................................... 16
FPGA .......................................................................................................................................... 17
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 2

Specifications ......................................................................................................................... 17
Procedure .............................................................................................................................. 17
VHDL ...................................................................................................................................... 17
Parsing ................................................................................................................................... 18
Storage ................................................................................................................................... 22
Retrieval ................................................................................................................................. 24
Complete Design ....................................................................................................................... 25
Results ........................................................................................................................................... 25
CPU ............................................................................................................................................ 26
Parsing ................................................................................................................................... 26
Storage ................................................................................................................................... 27
Retrieval ................................................................................................................................. 30
GPU ............................................................................................................................................ 30
Parsing ................................................................................................................................... 31
Storing.................................................................................................................................... 32
Retrieval ................................................................................................................................. 34
FPGA .......................................................................................................................................... 35
Parsing ................................................................................................................................... 35
Storing.................................................................................................................................... 36
Retrieval ................................................................................................................................. 39
Conclusion ..................................................................................................................................... 40
Abstract ......................................................................................................................................... 41
References .................................................................................................................................... 42
Figure List ...................................................................................................................................... 44
Table List ....................................................................................................................................... 45
Appendix A – C# Codes ................................................................................................................. 46
Parsing ....................................................................................................................................... 46
Storage ...................................................................................................................................... 49
Retrieval .................................................................................................................................... 52
Appendix B – CUDA Codes ............................................................................................................ 53
Parsing ....................................................................................................................................... 53
Storage ...................................................................................................................................... 55
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 3

Retrieval .................................................................................................................................... 57
Appendix C – VHDL Codes............................................................................................................. 59
Parsing Modules ........................................................................................................................ 59
Load Counter ......................................................................................................................... 59
BufferRam .............................................................................................................................. 60
ByteOrder .............................................................................................................................. 61
Parser ..................................................................................................................................... 62
Storage Modules ....................................................................................................................... 63
Add ......................................................................................................................................... 63
Delete..................................................................................................................................... 64
Modify.................................................................................................................................... 64
ASMMux ................................................................................................................................ 65
Retrieval Modules ..................................................................................................................... 67
RetrieveCounter .................................................................................................................... 67
ParsedRam ............................................................................................................................. 68
Appendix D – Arcabook Messages ................................................................................................ 70
Message Header Format ........................................................................................................... 70
Add Message Body Format ....................................................................................................... 72
Modify Message Body Format .................................................................................................. 74
Delete Message Body Format ................................................................................................... 76

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 4

Acknowledgements
Axiom Markets LLC
Tag Team Trading LLC
New York Stock Exchange Arca
Professor Izidor Gertner (CCNY)

Introduction
In today’s modern marketplace, the overwhelming majority of financial exchange transactions
are completed entirely on computers. These transactions range from shopping online, to
transferring money between accounts, and they even include high frequency securities trading.
This is especially true in competitive marketplaces like the New York Stock Exchange (NYSE) or
the Chicago Mercantile Exchange (CME). The NYSE provides an environment for buyers and
sellers to trade stock in companies that are registered for public ownership. The CME does the
same for commodity futures contracts like oil or gold. In recent years, matching buyers and
sellers has been taken out of the hands of human regulators and given to computers. Now the
overwhelming majority of these transactions are handled completely electronically. Going even
further, even the human trader, who initiates the trades, is being replaced by black box
computer algorithms. It has become a race to show who has the fastest systems14. As a result,
these exchanges are fast moving and can be very chaotic. Securities trade hands multiple times
per second and prices change just as quickly. On the micro level, prices change seemingly
without warning or reason. In order to remain competitive, rapid access to accurate market
data is mandatory. A superior electronic trading platform offers many advantages.
One obvious advantage is higher profits. If I have access to a timelier snapshot of marketplace
data than a competitor, I will be in a position to react sooner, taking advantage of opportunities
before anyone else. I could execute trades before competitors and subsequently capture the
profits before they do. The ability to do this over and over the course of days and weeks would
make me a leader in the industry, and makes the value of my company greater6. Many high
profile companies are engaged in this type of work. These companies include Citigroup,
Goldman Sachs, and Merrill Lynch. Over the last few years, these large financial institutions
have been employing highly intelligent researchers in the fields of math and science. All this in
an effort to create even faster and more advanced trading platforms. Typically, these
platforms have been developed and deployed on commodity based hardware, that is, general
purpose x86 microprocessors. These processors are excellent for many functions, but they do
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 5

not offer many advantages in terms of speed or efficiency. I want to implement these
operations on specially designed hardware.
A well designed hardware solution will outperform an x86 microprocessor solution since it has
far less overhead and a much shorter critical path. A design such as mine would be
immediately deployable in industry and provide any company with a technological distinction
and advantage over its competitors.
There are a large number of considerations that must be taken into account when designing a
hardware solution for these types of operations. The chief concern is that of data
manipulation. Financial data is typically provided in a binary or encoded stream that must first
be parsed, second stored and, third retrieved. Optimizing all three of these operations is
essential. A bottleneck at any point will slow down the entire system.

Literature Review
String parsing, storage, and retrieval are not new concepts, nor are efforts to speed up
electronic trading platforms. My investigation is primarily focused on the advantages gained
using streams of financial data, but the same concepts can apply over many different fields.
This technology could provide benefit to any industry that has need of rapid manipulation of
raw data in real-time data.
IBM has a high performance xml parser solution called lxml2. This is a library of functions
written in Python that focuses on decoding xml data rapidly7. My research does not utilize xml
directly, but many of the same principles apply. Xml is used to provide metadata and
processing information for many real time data streams. The xml language is largely composed
of plain text strings, so having the ability to rapidly parse those strings to determine their exact
nature, and store the gleaned content would be highly useful. The lxml library allows for the
processing of extremely large data sets, 2 GB or more. IBM’s library is a good starting point,
but my focus is on speed and a pure software solution it is not sufficient.
Another project dealing with these same concepts is the human genome project. Completed in
2003, the human genome project was under way during the late nineties and early two
thousands. During those years, a number of different agencies and scientists were able to
achieve some great goals5. They identified all the numerous genes, and determined the
sequence of the chemical base pairs that make up human DNA. They stored this information
about DNA and chemical base pairs using a series of string characters. The human genome
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 6

project required a string parser and string storage application. This led to another goal of the
project, to improve tools for data analysis of human genes and DNA. In general, DNA is
represented by series of repeating strings or characters. To optimize process the project
developed software that handled string manipulation. The tools developed by the human
genome project applied directly to their specific application, and do not necessarily translate to
optimizations for more general purpose data sets. Data sets like those I dealt with in my
research. Again their solutions were developed as software and are not as optimal as the
hardware solutions I am proposing.
Microsoft markets a product called BizTalk Server8. BizTalk is a specialized service bus that
speeds up certain types of business communication. Through the use of adapters, data streams
can be translated from one technology into another. It is also useful for extracting information
from data feeds and converting it to plain English. These types of operations are very similar to
what I am trying to accomplish with my research. A fine solution, but as with the other existing
systems I mentioned, this is a software solution implemented on commodity hardware.
My project deals with a data protocol called FIX or Financial Information eXchange. This format
is ubiquitous within the industry16. Almost every exchange and financial server provider has at
least some data transmitted in FIX. It is so commonplace that there are many companies that
provide systems that exclusively handle data in FIX format. These companies include Rapid
Addition, SAVVIS, and Realtime Systems Group (RTS)18. I extensive experience with the RTS
solutions. They provide software that parses and even stores data feeds that are identical to
what I deal with in my thesis. Their software is fast, reliable, and efficient - but again, it is
software. The key difference between this, the other aforementioned solutions and my
research is my solution is implemented on the hardware directly. The others use software. A
proper hardware solution will be much faster, a better use of resources, and create a
competitive edge that would be difficult to match.

Core Concepts
Securities Trading
The core impetus for this thesis is to improve on already existing technology as it is related to
the field of securities trading. A security in this context is a fungible and negotiable instrument
that has financial worth itself or abstractly represents that value. Typical types of securities are
stocks, bond, futures contracts, options and swaps. Securities trading began centuries ago and
has evolved in parallel with business practices, government regulations, and technology
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 7

innovations. There are a myriad of reasons an individual or institution wants to trade
securities. These range from speculation and a desire to make money to a hedge against
wealth eroding factors like inflation. Regardless of the reasons, in order to deal in securities
one must understand the market, what instruments are available to them, and how technology
drives the whole system. Just like the reasons for doing so, there are numerous ways to trade
securities. I can deal with an opaque market like a bank or credit union. I can trade over the
counter with other individuals or companies, or I can go to an open exchange. My project deals
with trading on an open exchange. There are many exchanges, such as the New York Stock
Exchange, the NASDAQ, and the Chicago Mercantile Exchange. These exchanges are open to
anyone who wants to trade. They are completely transparent, meaning anyone can see details
of any transaction. Due to this open and transparent structure, the security markets on these
exchanges are very competitive. Technology advantages can go a long way. Different
exchanges can have slightly different features. I focused on the New York Stock Exchange
(NYSE) Arca, due to its completely electronic nature.

NYSE Arca
NYSE Arca, formerly known as ArcaEx (an abbreviation of Archipelago Exchange) is a securities
exchange on which both stocks and options are traded9. NYSE Arca is an ideal exchange for my
experimentation. Unlike many other exchanges, NYSE Arca is fully electronic. This means no
trades occur on an open outcry trading floor. Trades done on a trading floor are often not
disseminated through the electronic reporting system in real time. This component levels the
field for everyone involved. Since it is fully electronic, technology plays an even larger role in
gaining a trading advantage.
Market Share
As of 1 March 2007, NYSE Arca is the second largest Electronic Communication Network in
terms of shares traded9 Approximately one out of every six shares traded on the American
financial markets is traded on the system. For New York Stock Exchange-listed securities or
Tape A, it accounts for just over 10% of the shares traded. For NASDAQ-listed securities, NYSE
Arca accounts for approximately 20% of the trading volume. For exchange-traded funds, NYSE
Arca accounts for 30-40% of the traded volume9. This volume makes NYSE Arca and even more
ideal place to do my experimentation. A large amount of volume means a large amount of
data.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 8

Arcabook
The data distribution service for NYSE Arca is called Arcabook11. This service is provided directly
from NYSE Arca and contains all the raw data of orders and transactions that have taken place
on the exchange. As underscored by the previous section there are a tremendous amount of
orders placed and executed on NYSE Arca. This equates to a substantial amount of streaming
data from Arcabook. The following two sections outline those message (a complete unit of
data) rates, and raw data rates.
Message Rates10
Arcabook
Peak message per second rate
Packet size
Maximum number of packets
in a day
Maximum total number of
individual
book messages in a day

Current
310,000
Variable
300,000,000

2011 end of year projected
450,000
Variable
500,000,000

1,250,000,000

2,000,000,000

Table 1 - Arcabook Message Rates

Data Rates 10
Stream
OTC only
LX only
ETF only
BB only
All depth of book
subscriptions (total)

Current Rate Mbps
13
33.5
28
2
77

2011 end of year projected
22
54.5
45.5
3
125

Refresh Interval
Retransmission
Refresh Request

40
40
20

65
65
32.5
Table 2 - Arcabook Data Rates

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 9

Time Matters
Successful and profitable trading is largely determined by two factors: strategy and
connectivity. Strategy includes concepts such as when to buy, when to sell, and how much to
buy and sell. For my purposes, I am not concerned with these concepts as they lie outside the
scope of this thesis. I assume that any hypothetical strategy is sound. Instead, I am more
focused on the second factor, connectivity. Primarily, I am concerned with how quickly I can
interpret and act upon incoming data. Reducing time required to make a decision leads to
advantages over competitors.
For example, say I have a strategy that tells me to buy 1000 shares of XYZ every time it reaches
a certain price. If the price is very good, there may be many investors that would like to
purchase at that price. Likely, only a limited number of shares exist at the price at which I want
to execute. I need to act faster than the others that also want to buy so I can secure my
position. If I have a computer system that is able to act more quickly than my competitors I can
ensure I get the shares I want at the price I want.
There are many components that determine exactly how much time is required from the
beginning of a favorable market event to the execution of an order. First, market conditions are
broadcast out by an exchange (NYSE Arca via Arcabook in my case). These messages include the
current state of the order book, that is how many buy and sell orders exist, the quantity and
price, for a specific product. There is network latency between the exchange and the local
computer which will implement my strategy. This value is determined by my distance from the
exchange and the type of connection I have. Next, I must process the data. This data comes in
on a network stream; it may be encoded or raw. Either way, I must parse it out to get to the
relevant information I require for my strategy. This is the process I hope to accelerate. After I
have actionable data, I determine is anything is to be done, I create a response back to the
exchange and send it off. Figure 1 shows a diagram depiction generally how this process
happens.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 10

Figure 1 - General Flow of Data

Operations
There are three major operations that must be performed when decoding digital data streams
of financial information. These operations relate to message types that flow into the system
from the Arcabook data feeds. A more complete look at Arcabook message types and message
field breakdowns are shown in Appendix D – Arcabook Messages.
Parsing
The first operation is parsing. It primarily involves taking a message from the data stream in its
raw format, stripping away the pieces of data that I do not need, and pulling out the pieces of
data I am concerned with. Depending on the objective of the trading strategy there are
different pieces of information I need. The data fields in there raw form can be found in
Appendix D. For example, I will almost always be interested in fields that concern prices,
quantities and side, that is, buy or sell. I have less interest in fields like exchange timestamps
and sequence codes. This can be a time consuming operation since I am required to randomly
access different parts of a buffer array that holds each individual message and reform it into a
smaller structure that contains just the information that I am concerned with.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 11

Storage
Once the raw data has been parsed, it is ready for storage. The storage phase consists of three
sub operations: add, modify, and delete. The message types for those sub operations are
analyzed in the following section. Each is handled a little bit differently, but they each
essentially operate in the same way by making a change to the locally stored order book. Each
performs the specified operation on the underlying data structure, in my case, a dictionary
structure. Depending on the type of message, the data structures are adjusted to reflect
current market conditions.
Retrieval
The final operation is retrieval. Data that flows in is parsed out, and stored in a data structure.
I must be able to retrieve that data structure at any moment in order for the human trader or a
computer program to see the complete snapshot of market conditions. That snapshot is based
on what is in the data structure, and using it, decisions can be made about whether to buy or
sell.

Storage Message Types
Add
The first and most common message is the add type. A detailed breakdown of the message can
be found in Appendix D. The add message type is disseminated over the data stream every time
a new order is placed at the exchange. When an add message occurs I create a new entry in
the order book data structure. That entry holds the specific information about the order, such
as price, quantity, and order number.
Modify
The second and least common of the three types is the modify message. Again, the details of
the message are shown in Appendix D. This message is very similar to the add type, but rather
than representing a brand new order at the exchange, it’s an update to an already existing
order. Modifications can occur in price or quantity. Modifications occur infrequently because
the typical method of modification is to delete an existing order and add a new one (called a
cancel/replace). When this message is received I use the order number to reference the stored
order in the data structure and make whatever modifications are necessary.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 12

Delete
The last message is the delete type. This occurs very frequently. Delete happens every time an
order is removed from the exchange. This removal can happen because it was deleted by the
originator, filled by counter-party, or canceled by the exchange. Also, removals can happen if
the order expires because the market is too far away from it. As with the other the specific
details can be seen in Appendix D. When I receive this message, I used the order number to
reference the entry in the data structure, and I remove it.

Processing Options
There are many hardware and software combinations that could potentially process the
Arcabook data streams. I explored three different possibilities. A typical and easily employable
solution is pure software running on an x86 processor. This application could be nearly
universally deployed and maintained. Conversely, this type of processor has little specialization
for the operations I require. The second option is a graphics processor unit or GPU. This
processor has distinct processing components to optimize certain types of calculations. This
option is intriguing and worthwhile to explore due to new interest in General Purpose GPU
(GPGPU), but this hardware is more appropriate for parallel operations. I am more concerned
with fast serial processing. The final option is a field programmable gate array or FPGA. This
should be the best since I can make the hardware gates function in any way I desire.

Binary vs. Compacted
The NYSE Arcabook data service offers two different types of data streams, binary and
compacted. Binary streams are simply the raw data presented in a predictable, unadjusted
format. Compacted streams implement the Financial Information eXchange (FIX) FAST
protocols to reduce the total number of bytes sent to each client. FIX is a messaging standard
used for real-time electronic transfer of securities transactions16. FAST provides additional
features that enhance the FIX standard. Each stream has its advantages. The major tradeoff of
the two is between bandwidth and processing time. The binary streams do not need to be
decoded first, but the compacted streams use less bandwidth. My experimentation focuses on
one core concept: speed. For my purposes, bandwidth is not an issue. Therefore, the
compacted streams are ruled out immediately. The extra step of decoding FIX compacted data
will always make processing those streams slower than the binary.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 13

Methodology
While there exist many ways to effectively speed up the processing of this financial data, my
research focused on the three core operations of data manipulation previously outlined:
parsing, storage and retrieval. These operations must be performed thousands of times a
second and millions, even billions, of times per day. At that rate, even a single feed can provide
an overwhelming amount of data. There exists countless different ways to realize these
operations. One highly efficient method is with use of an FPGA. In order to show that
effectiveness, I also developed similar programs using a general purpose x86 CPU and a
specialized GPU.

Data Homogeny
After initial testing, I discovered that having each solution run using a live Arcabook feed was
not ideal. Since the stream never repeats, results from these tests could never be accurately
compared with one another. Since financial markets are always changing, for my intents and
purposes, the data that was provided was random. This means that one solution may have
been getting an easier load of data to process. In order to correct this problem, I recorded live
data from an Arcabook feed. I then created a simulated feed using this data that could be
replayed. This allowed for each solution to be tested on the same set of data and permitted a
more exact comparison of the solutions.

CPU
This first and most obvious choice for these operations is a commodity x86 microprocessor. This
type of hardware is universally available and provides all the necessary, albeit general,
elements to carry out the core operations. Specifically, I used an Intel Xeon quad-core
processor running at a speed of 2.5 GHz (more detailed specifications are provided in Table 3).
Basic parsing and storage functions are coding in the C# programming language. C# is an
excellent language, and is commonly used in the financial industry. It provides a good balance
of ease of use and efficiency. There are five operations that I am concerned about parse, add,
modify, delete, and retrieve.
Specification3
The specifications for the CPU used in my procedure are as follows:
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 14

CPU Name
CPU Code Name
CPU Socket Type
CPU Technology
CPU Core Speed
CPU Bus Speed
CPU Total Cores
Motherboard Name
Motherboard Chipset
Motherboard Total Ram

Intel Xeon E5420
Harpertown
Socket 771 LGA
45 nm
2.50 GHz
332.5 MHz
4
Dell 0RW199
Intel 5400B
4096 MB
Table 3 - CPU Specifications

Procedure
Each of the five operations was measured individually. This helped me to identify bottlenecks in
the processes and it served as a guide in designing my hardware solution. In order to reduce the
impact of measurement on the overall time, measurements were taken every fifty operations
and averaged. With a CPU solution there are many factors to consider. First, the core speed
and capabilities of the CPU itself play a large role in the overall speed. I purposely chose a more
powerful processor for this portion of the experiment. While even faster CPUs do exist, I think
this one give a good general sense of what a CPU solution can do. Second is the efficiency of the
code itself. A better code will give faster results, again, more optimum solutions are likely to
exist, but the code I have created provides a good general sense of what this solution is like.
Finally, on a general purpose CPU, it is difficult to filter out how much delay is caused by the
other duties of the microprocessor. These include running the operating system, background
programs, and anything else that is running concurrently with the program. In my experiment, I
strived to reduce the total amount of extraneous work done by the CPU and focus on the
solution.

GPU
The second solution is to use a GPU (also called visual processing unit or VPU) for the five
functions described in the previous section. A GPU or is a specialized processor that offloads 3D
graphics rendering from the microprocessor. A GPU solution consists of many of the same
elements and therefore many of the same caveats of the CPU solution. GPUs are built for
different kinds of operations and accordingly offer different functionality than a CPU. A GPU is
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 15

a typically good for performing operations in parallel. Parallel operations can offer an
advantage over a serial operation in terms of speed and efficiency17.
Specification4
The specifications for the GPU used in my procedure are as follows:
GPU Name
GPU Code Name
GPU Technology
GPU Core Speed
GPU Total Memory
GPU Memory Type
GPU Interface Type

NVIDIA GeForce GTS 250
G92
65 nm
738 MHz
1024 MB
GDDR3
PCI-Express x16
Table 4 - GPU Specifications

Procedure
The procedure is similar to the procedure with the CPU. I measured each of the five operations
independently, and recorded the results of the tests. These results are reported in a
subsequent section.
CUDA
NVIDIA (the manufacturer of the GPU previously specified) provides a special computing
architecture that runs on their chips. It is called the Compute Unified Device Architecture or
CUDA15. A main feature of CUDA is its parallel execution nature. With CUDA many general
purpose, i.e. non-graphical, problems can be solved on a GPU. CUDA has been shown to be
very useful to accelerate problems in biology, cryptography and other complex fields. An
additional feature of a GPU is access to shared memory. CUDA focuses on running programs on
many threads in parallel rather than one thread very fast. A shared memory region is accessible
by all threads, allowing for faster throughput. Code that incorporates CUDA can be compiled in
a number of different languages including C, Fortran, Java, .Net and MatLab.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 16

FPGA
The final solution is an implementation of the abovementioned functions using a field
programmable gate array (FPGA). FPGAs are integrated circuits with a feature of being
configurable after manufacture. FPGAs offer many advantages over CPUs and even GPUs,
primarily with regard to speed. Using a hardware description language, such as VHDL, I can
specify the physical layout of a circuit with the ability to parse and store the data I am
concerned with. Implementing a solution at the hardware level is faster due to specialization,
and by having a dedicated device to provide the operations. A CPU or a GPU must perform
other functions needed to maintain an operating system and other running programs, while an
FPGA does not. The FPGA should provide the fastest string parsing and data storage among the
three tests. Altera, a producer of a large number of FPGAs with wide range of functionality,
manufactures the chip I used. I chose the latest Statix IV chip to take advantage of its speed
and functionality, the core features of which are shown in Table 5 below.
Specifications12
The specifications for the FPGA used in my procedure are as follows:
FPGA Name
FPGA Data Rates
FPGA Power Consumption
FPGA Ram Support
FPGA Interface Type

Stratix IV
Up to 11.3 Gbps
100 mW
DDR, DDR2, DDR3, SDRAM
PCIe

Table 5 - FPGA Specifications

Procedure
Again, the procedure is similar to the experiments with the CPU and GPU. I measured each of
the five operations separately. I recorded the results of the tests. These results are reported in
a subsequent section.
VHDL
There exist a few different languages with which I could program my FPGA solution. I chose
VHSIC hardware description language, or VHDL. VHDL was developed in the 1980s by the US
Department of Defense to remedy a number of issues relating to hardware design1. Over the
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 17

years it has been updated and revised and is now a top choice when designing hardware13. I
used Altera’s Quartus software to build my hardware solution with VHDL. Inside the software,
individual modules can be prototyped, modeled, and unit tested. The modules are then
interconnected inside the program and the complete hardware solution can be simulated. The
software also provides an interface with the FPGA chip itself.
Parsing
There are a total of the eight modules that are used in the parsing process. Ideally, data flows
through them freely and is independent of a clock signal, this allows for maximum speed. The
first is similar to a random access memory (RAM) module, but is more narrowly focused. Its
functionality is to act as a buffer to store each incoming message from the network socket. It is
capable of storing 16 words of width 32 bits for total of 64 bytes. Each message is loaded into
the RAM module one word at a time. Once the entire message is loaded, the module is turned
off for writing, and the parsing process begins. The segmentation of data in this module allows
for finer degree of retrieval during the parsing process. During parsing, some parts of the
message play a more important role than the others. This segmentation enables me to grab
important data that may be embedded in the middle more quickly. Figure 2 below shows the
block diagram as it appears in the Quartus software, its underlying VHDL code can be found in
Appendix C.

Figure 2 - Ram Buffer Module

The next block allows me to change the order of the data stream from network order to byte
order. As in the previous parsing techniques on the CPU and GPU, this step is important so I
can deal with the message in the correct order. Since the operation is simply switching the
order of bytes on the bus wire, there is no cost in processing time. It can be wired so that the
last by becomes the first byte and vice versa. Data can flow straight through without the need
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 18

for an extra clock cycle. The block diagram is shown in Figure 3 below, and its underlying code
can be found in Appendix C.

Figure 3 - Network to Byte Order Module

The next phase in the parsing process requires me to look at the incoming message for the
chunk of data that identifies the message type. There are three possibilities: add, modify and
delete. The function of this block, shown in Figure 4 below, is to strip out that specific part of
the message and enable the next phase accordingly. This module will read in the message, look
at just the part that is of concern (the message type), send the message through the output,
and send an activation signal to the module that corresponds with the message type. As with
the previous block, there’s no time cost here since data flows straight through without clock
advancement. The code is found an Appendix C.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 19

Figure 4 - Message Type Parser Module

The next three modules in the parsing process are designed specifically to handle the three
message types, and, depending on the message, set-up the subsequent storage phase. The
simplest of these three blocks is the delete module. Since the delete message does not require
me to extract any additional data from the incoming message, on the next clock signal I can
simply clear out the storage location at the corresponding address. The add and modify
modules work in similar ways. They prepare a storage area in RAM to accept new data, or
modify existing data on the next clock signal. They also activate the data extraction module so
it is prepared for either an add or a modify message. The three individual blocks are shown in
the three following Figures and their codes can be found in Appendix C.

Figure 5 - Add Module

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 20

Figure 6 - Modify Module

Figure 7 - Delete Module

The output of the previous three modules needs to be fed into the storage component. Since
the component can only take one input, I need a way to select from the output of the add,
delete, and modify modules. The next piece of the circuit is a multiplexor, shown in Figure 8,
with underlying code in Appendix C. This takes the three outputs from the previous stage and
conflates them down to a single line out. I can select with input goes to the output, and the
subsequent storage module using a two bit selection signal.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 21

Figure 8 - Multiplexer

The final stage before storage is a simple latch. This allows me to momentarily store the
message ID number obtained during parsing during the storage phase. Without this latch, as
soon as the input to the parsing stage changes, it would propagate into the storage stage. This
latch prevents that problem and keeps the essential data in place to be processed. The data
inside the latch remains there until a signal is fed to the gate input as shown in Figure 9 below.

Figure 9 - Latch

Storage
Two modules are used in the storage process outside of the signals received at the end of the
parsing process. At the end of the storage process, data is stored in a block called parsed RAM;
this block is simply another random access memory component. Using real world data, order
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 22

books can grow quite large, especially for a large centralized exchange like the NYSE ARCA.
Being able to store each order individually requires a substantial amount of RAM. Here the
RAM is only 65536 words long. In reality that length should be much larger, somewhere in the
order of gigabytes. Word length is 32 bits; Arca specifies their order ID have a length of the 4
bytes. Since every order is unique, I can take advantage of 32 bit word length by simply storing
each order to a RAM location that corresponds to the order ID. This allows for maximum
processing speed, since I am not required to perform any additional calculation such as a hash
to come up with a unique storage location. This RAM module also allows for simultaneous
reading and writing. This is ideal since I would wish to asynchronously be accessing this storage
element for both read and write. Reading can take place at any time, and a real world system
would likely be reading almost constantly. Writing only takes place when it is enabled by one of
the three activation modules. The block diagram is shown below in Figure 10; underlying code
is in Appendix C.

Figure 10 - Parsed RAM Storage

The final component used in the storage operation is the data extraction module. The key
functionality here is to cut down the total message length. Initially, each individual message is
quite long, almost 64 bytes. I want to remove any superfluous data that is not essential and
just store the information I am concerned about. In reality, there are only a handful of fields of
real concern, such as the symbol index, which identifies the underlying security quantity, the
order price, and side (buy or sell). All other data such as source time, exchange IDs, and firm
indices can be safely discarded. Depending on the input from the activation modules, the data
extractor can operate in one of two modes, add or modify. Since add and modify messages are
slightly different, the essential data lies in a different portion of the message, and therefore
they must be extracted differently. In both cases, essential data can be pulled out from the
message and reduced down to a 64 bit word length. Those 64 bit words are then fed in to the
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 23

parsed RAM module at the location that corresponds to the order ID. This is one of the slower
operations in my system since it has to take time to read each of the words that make up the
message from the segmented buffer. Block diagram is in figure below; see Appendix C for the
VHDL code.

Figure 11 - Data Extractor

Retrieval
Retrieval is the final step in my system, and is also the most costly. There are two blocks
involved in the reading process. One has already been covered, that is the parser module that
stores the complete order book. The second is a simple counter module that cycles through the
contents of the parsed RAM, causing it sequentially output the entire order book. Whenever a
request for the order book is made, the module takes that single signal and then sets the read
address to each memory address one by one at each new clock signal. This is a time
consuming process requiring one clock cycle for each order. The exact time to complete the
entire read is determined by the total length of the RAM. It can be made more intelligent by
storing the last order ID number received and stopping and that point, reducing the number of
extra clocks signals wasted reading empty data. The block diagram is shown below in Figure 12.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 24

Figure 12 - Retrieval Counter

Complete Design
The complete design with all the inter-module connections is show in the next figure. Data
flows from the left side to the right side. On the right side, a single output pin is provided to
supply the retrieved message. All the data is written out in parallel for maximum speed.

Figure 13 - Complete Design with Interconnects

Results
The results of each solution are presented, analyzed and compared in the following sections.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 25

CPU
The results for the CPU solution are presented first and will be used a baseline against which I
will compare the other solutions.
Parsing
The first step in the process is parsing; it is displayed first in Figure 14. The results are about
what I expected. There is a larger concentration at the left pushing down toward 10^-6 with a
steep then gentler slope down up until about 10^-2. This makes sense; for the most part the
operation is very quick, unless the CPU is otherwise engaged. Since there are numerous other
activities the CPU could be performing the delay varies greatly. Most of the delays are not very
long but some can last one hundred of a second.

Figure 14 - Parsing Data from Network Stream (CPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 26

Storage
Add
The next operation to illustrate is add. The histogram of the combined trials is shown in Figure
15. The results here are bit different than those of the parse operation. There are two areas of
note. First the spike on the left side, about half the time the operation takes between 10^-7
and 10^-6 seconds. This is the optimal performance on the CPU. The other half of the results
are in the 10^-6 and 10^-5 ranges. This is caused by the CPU having to perform other
operations. Similar to the delays during parsing, if the CPU has to perform other tasks, adding
to the data structure may be delayed.

Figure 15 - Adding to the Data Structure (CPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 27

Delete
The results of the delete operation are very similar to those of the add operation. Again, there
are two notable areas the spike on the left in the same range as the one in the add histogram.
The left area again represents optimal performance. There is also a group of trials further to
the right. This group is a little bit slower than the same area on the add operation. Deleting an
entry from the data structure can be a little bit slower. To add, I can just put the new entry into
the structure, but to delete I must first retrieve the entry then clear it.

Figure 16 - Deleting from the Data Structure (CPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 28

Modify
The last of the storage operations is modify. The modify trials were much different than add
and delete as shown in figure. There is only one concentration of results that fall nearly entirely
into the 10^-5 to 10^-4 ranges. This makes the modify operation the slowest of the three. This
would be expected, the modify operation is more complex than add or delete. Here I have to
retrieve an entry in the data structure, update it with the new content and reinsert it into the
structure.

Figure 17 - Modifying the Data Structure (CPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 29

Retrieval
The last set of data if for the retrieval operation. The results here are quite typical. The data
follows a regular bell shape center just below 10^-5 seconds. The complete graph is in Figure
18.

Figure 18 - Reading from the Data Structure (CPU)

GPU
The GPU results are very similar to those of the CPU. This is expected since a GPU and CPU
would approach the procedures in a similar way. GPU advantages are more significant when
performing parallel operations. My stream of financial data is serial by nature, so those
advantages are not very pronounced in my results. Nevertheless, a GPU does offer a few other
advantages. The GPU is not concerned with many of the operations the CPU must perform, like
those to maintain an operating system and other running programs. Clock speeds and overall
transistor topologies are different as well.
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 30

Parsing
As before, the first graph presented is for parsing, it is shown in Figure 19. I found these results
particularly interesting. First, the elapsed time for this operation on the GPU seems to push
harder down in the 10^-6 range. Overall, more results are down in that areas, meaning that
there is some speed gain in the GPU. The shape of the histogram as a whole is different from
that of the CPU. The CPU showed a nice curve with results tapering off as they approached the
10^-4 range. Here there are two distinct peaks, one around 10^-6 and one at 10^-5. There are
fewer outliers beyond those ranges. This shows that there must be other factors that almost
half the time, take the focus of the GPU off parsing. This could be any number of things. The
GPU has many other responsibilities such as maintaining a graphical user interface (GPU) and
any other tasks relating to graphics or video.

Figure 19 - Parsing Data from the Network Stream (GPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 31

Storing
Next I present the three graphs that I created from the measurement data of the storage
operations. The parsing operation was comparable to the CPU solution as far as time is
concerned, but the storage operations as a whole seem to be slower on the GPU. I am
beginning to conclude that the GPU solution is not an optimal one.
Add
First, is the add operation. Here the results seem to be contained mostly in the 10^-6 to 10^-5
range. The CPU solution was mostly pushing into the sub 10^-6 territory. Again I see a similar
dual peak pattern similar to the parsing results. To me, this suggests the normal operation of
the GPU affects the latency of the add operation about half of the time.

Figure 20 - Adding to the Data Structure (GPU)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 32

Delete
The delete operation is next. The aggregated data is show in histogram form in Figure 21.
Again, I see the dual peaks; although, these are a little bit different. One of the peaks, the one
showing the slower trials, is much less pronounced. As with the CPU most of the trials fall
below 10^-6, but they are a bit more spread out. Overall, as with add, the operation is slightly
slower.

Figure 21 - Deleting from the Data Structure (GPU)

Modify
The data gathered from the last of the storage operations, modify, is shown in the following
figure. This operation is a little bit slower than delete and slower than the same operation on
the CPU. One significant detail is the dual peaks of previous operations are now almost gone.
This suggests that the modify operation is interrupted less often by other tasks.
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 33

Figure 22 - Modifying the Data Structure (GPU)

Retrieval
Finally, I will look at the results of the retrieval operation on the GPU. The data here is very
interesting. This operation is slightly faster than in the CPU solution. The data still falls in the
same 10^-6 to 10^-5 range but it is skewed more toward the former. Also as with the modify
operation the double peaks are gone. This suggests that this type of operation is not affected
by other operations performed by the GPU itself. The graph is show in Figure 23.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 34

Figure 23 - Reading from the Data Structure (GPU)

FPGA
Due to the sheer expense of a prototype board with a Stratix IV type FPGA, one was not made
available to me for testing. Therefore, simulation results are the data gathered on a hardware
simulator in Altera’s Quartus software. While not one hundred percent realistic (it provides
more of a test in theory), it gives a good approximation of latencies. These results at least allow
for comparison against the results of the GPU and CPU tests.
Parsing
The first result graph is for parsing. As with previous parsing tests, this graph shows the elapsed
time for each parsing operation versus the number of times that elapsed time occurred. This
graph is a little different; here the elapsed time axis is on a linear rather than a logarithmic
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 35

scale. It also shows that one hundred percent of the operations lasted the same amount of
time, specifically about 1 x 10^-4 seconds. This is one of the great features of an FPGA. There is
no overhead to run this and the subsequent operations. The hardware chip has only to do the
operations I specify. This allows for a static and predictable latency. This helps to optimize
strategies since I can know exactly how long an action will take to complete. While this is a
desirable feature of the FPGA, it should be noted that these results are overall slower than the
GPU and CPU tests. Those showed latencies in the 10^-5 and 10^-6 ranges. This is due to those
processors running at much higher clock speeds. Here my simulation is limited to a slower
speed. Although, this obstacle can be easily overcome; I could simply use a faster clock speed.
The elapsed time for this operation is directly related to the clock speed. These results are in
line with what would be expected. The parsing process is the slowest of all the operations.

Figure 24 - Parsing Data from the Network Stream (FPGA)

Storing
Next, the storage operations are discussed. Again, these are simulation results so the graphs
show a single static value occurring over and over.
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 36

Add
Below, in Figure 25, I have displayed the results of the add operation simulation on the FPGA.
Here I am beginning to see the real improvements of the FPGA over the CPU and GPU solutions.
The time required to store a new message into RAM is about 1.6x10^-7 seconds. CPU and GPU
solutions were in the 10^-6 range. Since this is the most common of the storage operations,
this gain is speed will be very significant

Figure 25 - Adding to the Data Structure (FPGA)

Delete
Now I will look at the results for delete. Delete is an extremely fast operation on the FPGA.
Essentially, to perform a delete I need only to identify a single number from the incoming
message (the order number) and clear that location in memory. The results in Figure 26 show
this operation taking about 2x10^8 seconds. This is much faster than both the CPU and GPU.
Deletes happen very often, having the operation sped up by that much provides a nice
advantage.
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 37

Figure 26 - Deleting from the Data Structure (FPGA)

Modify
Again, the last storage operation is modify. The results of the simulation of this operation are
show in Figure 27. As with add and delete, the modify operation is much faster on the FPGA
than on the CPU or GPU. This operation takes 1.6x10^-7. Notably, this is the same duration as
the add message. This is expected since the modify operation is very similar to the add
operation. Furthermore, in my design the operation for both is almost identical. The only
difference is that the modify message modifies content already in memory, and the add loads
new content. Either way, the data is parsed the same way and travels down the circuit via the
same path.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 38

Figure 27 - Modifying the Data Structure (FPGA)

Retrieval
Finally, I will discuss the last operation, retrieval. This result was surprising to me. This
operation overall is very slow compared to the CPU and GPU. The data graph is shown in Figure
28. The elapsed time for this step is 6*10^-4. The CPU and GPU perform this same operation
down in the 10^-6 area. This increase in time is due to the limitations of the Stratix IV chip
itself. The clock speed is limited on the Stratix IV chip and is substantially slower than the GPU
or CPU. Given that my implementation retrieves data serially, this could be a drawback when
many retrievals must be performed.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 39

Figure 28 - Reading from the Data Structure (FPGA)

Conclusion
The FPGA solution is excellent. It can outperform the CPU and GPU solutions by implementing
all necessary functionality on hardware rather than software. There exists one stipulation. The
entire process is dependent on the clock speed of the hardware design. I used a 100 Mhz clock
in my design. While this performed admirably, it did not speed up every operation. This is
easily overcome by increasing clock speed. If I were to use 500 Mhz or even 1 Ghz I would see
enormous speed improvements. In fact, for all three solutions, performance is dependent on
clock speed. By nature, parsing data from an incoming data stream is purely a serial operation.
I receive data at a fixed speed, and I have to process messages as they come in. The only true
way to improve speed is to increase the rate at which the processor performs calculations. This
means increasing the clock speed. There exist a myriad of solutions for parsing financial data
feeds, or any data feed. These solutions are implemented in many different ways using many
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 40

different technologies. By allowing these operations to be performed directly on hardware, I
eliminated any bottleneck or extraneous calculations. This research shows that data feed
parsing at this level is fast and efficient.

Abstract
My study reduces time required to parse, store, and retrieve information from high bandwidth
digital data streams. I achieve this reduction with the creation of unique hardware based on
field programmable gate arrays (FPGAs). I first created software that performs the
aforementioned operations using a standard x86 microprocessor, as well as a dedicated
graphics processing unit. Measurements taken using this software provided a baseline to
establish the improvements achieved with my FPGA solution. I then created a virtual prototype
of FPGA based hardware that performs the same operations. As this design was tested using
real world data, I made improvements and eliminated bottlenecks in the operations.
Ultimately, I achieved a design that converts a data stream into usable information as quickly as
possible. My research has many potential benefits. I primarily focused on advantages gained
using streams of financial data. By reducing the time needed to have usable information from a
financial exchange, investors would have an advantage over competitors. Similarly, this
technology provides benefit to any industry that has need of rapid decoding of real-time data.
Much of my research is a continuation of similar efforts to increase data stream decoding
performance such as those by the Human Genome Project or IBM’s lxml project. My work is
distinctive by introducing the FPGA hardware element. Working at the hardware level, a greater
level of time reduction can be achieved.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 41

References
1. "A Brief History of VHDL." Doulos - Global Independent Leaders in Design and Verification Know-how.
Web. 23 May 2010.
<http://www.doulos.com/knowhow/vhdl_designers_guide/a_brief_history_of_vhdl>.
2. Daly, Liza. "High-performance XML Parsing in Python with Lxml." IBM - United States. Web. 07 Sept.
2010. <http://www.ibm.com/developerworks/xml/library/x-hiperfparse/>.
3. "Dell Precision T7400 Workstation." Dell – The Official Site. Web. 3 July 2010.
<http://www.dell.com/us/en/dfb/desktops/precn_t7400/pd.aspx?refid=precn_t7400&cs=28&s=dfb>.
4. "GeForce GTS 250." NVIDIA. Web. 02 Mar. 2010.
<http://www.nvidia.com/object/product_geforce_gts_250_us.html>.
5. "Human Genome Project Information." Oak Ridge National Laboratory. Web. 03 July 2010.
<http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml>.
6. Kroft, Steve. "How Speed Traders Are Changing Wall Street - 60 Minutes - CBS News." Breaking News
Headlines: Business, Entertainment & World News - CBS News. Web. 01 June 2010.
<http://www.cbsnews.com/stories/2010/10/07/60minutes/main6936075.shtml>.
7. "Lxml - Processing XML and HTML with Python." Codespeak Home Page. Web. 07 Aug. 2010.
<http://codespeak.net/lxml/>.
8. "Microsoft BizTalk Server." Microsoft BizTalk Server. Web. 12 May 2010.
<http://www.microsoft.com/biztalk/>.
9. "NYSE Arca Equities Overview." NYSE, New York Stock Exchange. Web. 15 Feb. 2010.
<http://www.nyse.com/equities/nysearcaequities/1156241406908.html>.
10. NYSE ArcaBook Multicast for Equities Client Specification. NYSE. PDF.
11. "NYSE ArcaBook." NYXdata. Web. 6 June 2010. <http://www.nyxdata.com/arcabook>.
12. "Stratix IV FPGA: High Density, High Performance AND Low Power." FPGA CPLD and ASIC from Altera.
Web. 07 June 2010. <http://www.altera.com/products/devices/stratix-fpgas/stratix-iv/stxiv-index.jsp>.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 42

13. "VASG: VHDL Analysis and Standardization Group." EDA-STDS.ORG Home Page. Web. 23 May 2010.
<http://www.eda.org/vhdl-200x/>.
14. "Wall Street's Speed War." Investoholic Blog. Web. 07 May 2010. <http://investoholic.net/general/99wall-streets-speed-war>.
15. "What Is CUDA?" NVIDIA. Web. 02 Mar. 2010.
<http://www.nvidia.com/object/what_is_cuda_new.html>.
16. "What Is FIX?" The FIX Protocol Organization. Web. 07 Jan. 2010. <http://www.fixprotocol.org/what-isfix.shtml>.
17. "What Is GPU Computing?" NVIDIA. Web. 2 Mar. 2010.
<http://www.nvidia.com/object/GPU_Computing.html>.
18. "Who Uses FIX?" The FIX Protocol Organization. Web. 07 Jan. 2010.
<http://www.fixprotocol.org/adopters/>.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 43

Figure List
Figure 1 - General Flow of Data .................................................................................................... 11
Figure 2 - Ram Buffer Module....................................................................................................... 18
Figure 3 - Network to Byte Order Module .................................................................................... 19
Figure 4 - Message Type Parser Module....................................................................................... 20
Figure 5 - Add Module .................................................................................................................. 20
Figure 6 - Modify Module ............................................................................................................. 21
Figure 7 - Delete Module .............................................................................................................. 21
Figure 8 - Multiplexer.................................................................................................................... 22
Figure 9 - Latch .............................................................................................................................. 22
Figure 10 - Parsed RAM Storage ................................................................................................... 23
Figure 11 - Data Extractor ............................................................................................................. 24
Figure 12 - Retrieval Counter ........................................................................................................ 25
Figure 13 - Complete Design with Interconnects ......................................................................... 25
Figure 14 - Parsing Data from Network Stream (CPU).................................................................. 26
Figure 15 - Adding to the Data Structure (CPU)............................................................................ 27
Figure 16 - Deleting from the Data Structure (CPU) ..................................................................... 28
Figure 17 - Modifying the Data Structure (CPU) ........................................................................... 29
Figure 18 - Reading from the Data Structure (CPU) ..................................................................... 30
Figure 19 - Parsing Data from the Network Stream (GPU) ........................................................... 31
Figure 20 - Adding to the Data Structure (GPU) ........................................................................... 32
Figure 21 - Deleting from the Data Structure (GPU)..................................................................... 33
Figure 22 - Modifying the Data Structure (GPU) .......................................................................... 34
Figure 23 - Reading from the Data Structure (GPU) ..................................................................... 35
Figure 24 - Parsing Data from the Network Stream (FPGA) ......................................................... 36
Figure 25 - Adding to the Data Structure (FPGA).......................................................................... 37
Figure 26 - Deleting from the Data Structure (FPGA) ................................................................... 38
Figure 27 - Modifying the Data Structure (FPGA) ......................................................................... 39
Figure 28 - Reading from the Data Structure (FPGA) ................................................................... 40

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 44

Table List
Table 1 - Arcabook Message Rates ................................................................................................. 9
Table 2 - Arcabook Data Rates ........................................................................................................ 9
Table 3 - CPU Specifications .......................................................................................................... 15
Table 4 - GPU Specifications ......................................................................................................... 16
Table 5 - FPGA Specifications ........................................................................................................ 17

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 45

Appendix A – C# Codes
Parsing
Stopwatch stop = new Stopwatch();
stop.Start();
while (start <= end)
{
try
{
_messageRetransSocket.ReceiveFrom(bufRefReceive, ref
_ep);
_secondaryMessageRetransSocket.ReceiveFrom(secondaryBufRefReceive, ref _ep);
} catch (SocketException e)
{
_messageRetransSocket.ReceiveBufferSize = 0;
_secondaryMessageRetransSocket.ReceiveBufferSize = 0;
break;
}
int seqNum =
(int)IPAddress.NetworkToHostOrder(BitConverter.ToInt32(bufRefReceive, 4));
int sSeqNum =
(int)IPAddress.NetworkToHostOrder(BitConverter.ToInt32(secondaryBufRefReceive, 4));
if (seqNum == start)
{
int bMsgType = BigEndToInt16(bufRefReceive, 2);
sw.WriteLine(System.Text.Encoding.ASCII.GetString(bufRefReceive, 0,
bufRefReceive.Length));
//messag++;
//if (messag >= 10000)
//{
//
sw.Close();
//
sw = new StreamWriter("C:\\Temp\\file" +
files++.ToString() + ".txt");
//
messag = 0;
//}
if (bMsgType == Defs.msgType_GenericBook)
GenericBookHandler(bufRefReceive);
else
NonBookHandler(bufRefReceive);
start++;
_backupData.TotalRecovered++;
} else if (sSeqNum == start)
{
int bMsgType = BigEndToInt16(secondaryBufRefReceive,
2);
sw.WriteLine(System.Text.Encoding.ASCII.GetString(secondaryBufRefReceive, 0,
secondaryBufRefReceive.Length));
//messag++;
//if (messag >= 10000)
//{
//
sw.Close();
//
sw = new StreamWriter("C:\\Temp\\file" +
files++.ToString() + ".txt");
//
messag = 0;
//}

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 46

if (bMsgType == Defs.msgType_GenericBook)
GenericBookHandler(secondaryBufRefReceive);
else
NonBookHandler(secondaryBufRefReceive);
start++;
_backupData.TotalRecovered++;
}
bufRefReceive = new byte[2048];
secondaryBufRefReceive = new byte[2048];
i++;
if (i > 1000 || stop.ElapsedMilliseconds > 500)
{
_messageRetransSocket.ReceiveBufferSize = 0;
_secondaryMessageRetransSocket.ReceiveBufferSize = 0;
break;
}
}
_messageRetransSocket.SetSocketOption(SocketOptionLevel.IP,
SocketOptionName.DropMembership,
moPrimary);

_secondaryMessageRetransSocket.SetSocketOption(SocketOptionLevel.IP,
SocketOptionName.DropMembership,
moSecondary);
while (_messageRetransSocket.Available > 0)
_messageRetransSocket.ReceiveFrom(bufRefReceive, ref _ep);
while (_secondaryMessageRetransSocket.Available > 0)
_secondaryMessageRetransSocket.ReceiveFrom(secondaryBufRefReceive, ref _ep);
}
int msgType = BigEndToInt16(bufReceive, 2);
sw.WriteLine(System.Text.Encoding.ASCII.GetString(bufReceive, 0,
bufReceive.Length));
//messag++;
//if (messag >= 10000)
//{
//
sw.Close();
//
sw = new StreamWriter("C:\\Temp\\file" + files++.ToString()
+ ".txt");
//
messag = 0;
//}
if (msgType == Defs.msgType_GenericBook)
GenericBookHandler(bufReceive);
else
NonBookHandler(bufReceive);
} else if (missed == 0)
{
int msgType = BigEndToInt16(bufReceive, 2);
sw.WriteLine(System.Text.Encoding.ASCII.GetString(bufReceive, 0,
bufReceive.Length));
//messag++;
//if (messag >= 10000)
//{
//
sw.Close();
//
sw = new StreamWriter("C:\\Temp\\file" + files++.ToString()
+ ".txt");
//
messag = 0;

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 47

//}
if (msgType == Defs.msgType_GenericBook)
GenericBookHandler(bufReceive);
else
NonBookHandler(bufReceive);
//sw.Stop();

int start = 0;
Order workingOrder;
int numBodyEntries = (int)buffer[14];
for (int j = 0; j < numBodyEntries; j++)
{
workingOrder.symbolIndex = BigEndToInt16(buffer, start + 16);
int innerMsgType = BigEndToInt16(buffer, start + 18);
workingOrder.sequenceNumber = BigEndToInt32(buffer, start + 20);
switch (innerMsgType)
{
case Defs.msgType_Add:
long orderID = BigEndToInt64(buffer, start + 28);
workingOrder.volume = BigEndToInt32(buffer, start + 36);
int priceNumerator = BigEndToInt32(buffer, start + 40);
int priceScaleCode = (int)buffer[start + 44];
workingOrder.price = PriceScale(priceNumerator,
priceScaleCode);
workingOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start + 45, 1);
workingOrder.firmIndex = BigEndToInt16(buffer, start + 48);
workingOrder.sessionID = (int)buffer[start + 50];
start += 36;
//pw.GWrite("parse", sp.Elapsed.ToString());
//if (MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex) == 6.692)
_orderBook[MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex)].AddEntryPrimary(orderID, workingOrder);
break;
case Defs.msgType_Modify:
orderID = BigEndToInt64(buffer, start + 28);
workingOrder.volume = BigEndToInt32(buffer, start + 36);
priceNumerator = BigEndToInt32(buffer, start + 40);
priceScaleCode = (int)buffer[start + 44];
workingOrder.price = PriceScale(priceNumerator,
priceScaleCode);
workingOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start + 45, 1);
workingOrder.firmIndex = BigEndToInt16(buffer, start + 48);
workingOrder.sessionID = (int)buffer[start + 50];
start += 36;
//pw.GWrite("parse", sp.Elapsed.ToString());
//if (MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex) == 6.692)
_orderBook[MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex)].ModifyEntryPrimary(orderID, workingOrder);
break;
case Defs.msgType_Delete:
orderID = BigEndToInt64(buffer, start + 28);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 48

workingOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start + 36, 1);
workingOrder.sessionID = (int)buffer[start + 39];
workingOrder.firmIndex = BigEndToInt16(buffer, start + 40);
start += 28;
// if (MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex) == 6.692)
//{
//pw.GWrite("parse", sp.Elapsed.ToString());
if (_isInitialized == true)
_orderBook[MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex)].DeleteEntryPrimary(orderID, workingOrder.sequenceNumber);
//}
break;
case Defs.msgType_Imbalance:
//volume = BigEndToInt32(buffer, start + 28);
//int totalImbalance = BigEndToInt32(buffer, start + 32);
//int marketImbalance = BigEndToInt32(buffer, start + 36);
//priceNumerator = BigEndToInt32(buffer, start + 40);
//priceScaleCode = (int)buffer[44];
//string auctionType = ASCIIEncoding.ASCII.GetString(buffer,
start + 45, 1);
//exchangeID = ASCIIEncoding.ASCII.GetString(buffer, start +
46, 1);
//securityType = ASCIIEncoding.ASCII.GetString(buffer, start +
47, 1);
workingOrder.sessionID = (int)buffer[start + 48];
//int auctionTime = BigEndToInt16(buffer, start + 50);
start += 36;
//if (MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex) == 6.692)
_orderBook[MakeKey(workingOrder.sessionID,
workingOrder.symbolIndex)].Imbalance(workingOrder.sequenceNumber);
break;
default:
//you shouldn't get here
break;
}
}

Storage
/// <summary>
/// Adds entry to both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="tmp">Order data</param>
public void AddEntryPrimary(long id, Order tmp)
{
Stopwatch sw = new Stopwatch();
sw.Start();
int check = _sequencer.CheckSource(tmp.sequenceNumber);
if (check > 0)
_isCorrupted = true;
if (check != -1)
{

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 49

if (tmp.side == "B")
{
if (!_bids.ContainsKey(tmp.price))
{
_bids.Add(tmp.price, new ConsolodatedEntry());
_bids[tmp.price].orderList = new SortedDictionary<long,
int>();
}
if (!_bids[tmp.price].orderList.ContainsKey(id))
{
_bids[tmp.price].orderList.Add(id, tmp.volume);
_bids[tmp.price].totalVolume =
_bids[tmp.price].orderList.Sum(ent => ent.Value);
}
}
else if (tmp.side == "S")
{
if (!_asks.ContainsKey(tmp.price))
{
_asks.Add(tmp.price, new ConsolodatedEntry());
_asks[tmp.price].orderList = new SortedDictionary<long,
int>();
}
if (!_asks[tmp.price].orderList.ContainsKey(id))
{
_asks[tmp.price].orderList.Add(id, tmp.volume);
_asks[tmp.price].totalVolume =
_asks[tmp.price].orderList.Sum(ent => ent.Value);
}
}
_primaryDataStore.Add(id, tmp);
_totalAdds++;
}
sw.Stop();
//pw.GWrite("storeAdd", sw.Elapsed.ToString());
}
/// <summary>
/// Removes entry from both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="sequenceNumber">Source Sequence Number</param>
public void DeleteEntryPrimary(long id, int sequenceNumber)
{
//Stopwatch sw = new Stopwatch();
//sw.Start();
int check = _sequencer.CheckSource(sequenceNumber);
if (check > 0)
_isCorrupted = true;
if (check != -1)
{
if (_primaryDataStore.ContainsKey(id))
{
if (_primaryDataStore[id].side == "B")
{
_bids[_primaryDataStore[id].price].orderList.Remove(id);
if (_bids[_primaryDataStore[id].price].orderList.Count == 0)
_bids.Remove(_primaryDataStore[id].price);
else

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 50

_bids[_primaryDataStore[id].price].totalVolume =
_bids[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
else if (_primaryDataStore[id].side == "S")
{
_asks[_primaryDataStore[id].price].orderList.Remove(id);
if (_asks[_primaryDataStore[id].price].orderList.Count == 0)
_asks.Remove(_primaryDataStore[id].price);
else
_asks[_primaryDataStore[id].price].totalVolume =
_asks[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
_primaryDataStore.Remove(id);
_totalDeletes++;
}
else
_isCorrupted = true;
}
//sw.Stop();
//pw.GWrite("storeDel", sw.Elapsed.ToString());
}
/// <summary>
/// Modifies entry in both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="tmp">Order Data</param>
public void ModifyEntryPrimary(long id, Order tmp)
{
//Stopwatch sw = new Stopwatch();
//sw.Start();
int check = _sequencer.CheckSource(tmp.sequenceNumber);
if (check > 0)
_isCorrupted = true;
if (check != -1)
{
if (_primaryDataStore.ContainsKey(id))
{
if (_primaryDataStore[id].side == "B")
{
_bids[_primaryDataStore[id].price].orderList.Remove(id);
if (_bids[_primaryDataStore[id].price].orderList.Count == 0)
_bids.Remove(_primaryDataStore[id].price);
else
_bids[_primaryDataStore[id].price].totalVolume =
_bids[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
else if (_primaryDataStore[id].side == "S")
{
_asks[_primaryDataStore[id].price].orderList.Remove(id);
if (_asks[_primaryDataStore[id].price].orderList.Count == 0)
_asks.Remove(_primaryDataStore[id].price);
else
_asks[_primaryDataStore[id].price].totalVolume =
_asks[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
if (tmp.side == "B")
{

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 51

if (!_bids.ContainsKey(tmp.price))
{
_bids.Add(tmp.price, new ConsolodatedEntry());
_bids[tmp.price].orderList = new SortedDictionary<long,
int>();
}
_bids[tmp.price].orderList.Add(id, tmp.volume);
_bids[tmp.price].totalVolume =
_bids[tmp.price].orderList.Sum(ent => ent.Value);
}
else if (tmp.side == "S")
{
if (!_asks.ContainsKey(tmp.price))
{
_asks.Add(tmp.price, new ConsolodatedEntry());
_asks[tmp.price].orderList = new SortedDictionary<long,
int>();
}
_asks[tmp.price].orderList.Add(id, tmp.volume);
_asks[tmp.price].totalVolume =
_asks[tmp.price].orderList.Sum(ent => ent.Value);
}
_primaryDataStore.Remove(id);
_primaryDataStore.Add(id, tmp);
}
else
_isCorrupted = true;
}
//sw.Stop();
//pw.GWrite("storeMod", sw.Elapsed.ToString());
}

Retrieval
/// <summary>
/// Retrives the bid side of the order book for the specifed symbol
/// </summary>
/// <param name="symbol">The symbol being requested</param>
/// <returns>Bid side of the orderbook sorted by price</returns>
public SortedDictionary<double, ConsolodatedEntry> GetBidPrices(string symbol)
{
lock (_orderBook)
{
//Stopwatch sw = new Stopwatch();
//sw.Start();
double value;
_symbolMap.TryGetValue(symbol, out value);
//tmp dictionary to facilitate stopwatch;
SortedDictionary<double, ConsolodatedEntry> tmp =
_orderBook[value].ConsolodatedBidPrices;
//sw.Stop();
//pw.GWrite("read", sw.Elapsed.ToString());
//if (tmp.Count > 5)
//
_guiLatency = sw.ElapsedMilliseconds;
return tmp;
}
}

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 52

Appendix B – CUDA Codes
Parsing
public void Updater()
{
byte[] bufReceive = new byte[2048];
//Stopwatch sw = new Stopwatch();
string test;
GASS.CUDA.Types.Int1 inx;
inx.x = 1;
while (true)
{
string file = "C://Temp/file" + inx.x.ToString() + ".txt";
StreamReader sr = new StreamReader(file);
inx.x++;
if (inx.x > 500)
break;
while ((test = sr.ReadLine()) != null)
{
if (test == "")
continue;
sp = new Stopwatch();
sp.Reset();
sp.Start();
System.Text.UTF8Encoding encoding = new
System.Text.UTF8Encoding();
bufReceive = encoding.GetBytes(test);
if (bufReceive.Length < 4)
continue;
GASS.CUDA.Types.Int1 msgType;
msgType.x = BigEndToInt16(bufReceive, 2);
if (msgType.x == Defs.msgType_GenericBook)
GenericBookHandler(bufReceive);
else
NonBookHandler(bufReceive);
}
sr.Close();
}
getter.Abort();
return;
}
private void GenericBookHandler(byte[] buffer)
{
if (buffer.Length < 40)
return;
GASS.CUDA.Types.Int1 start;
start.x = 0;
Order currentOrder;
GASS.CUDA.Types.Int1 numBodyEntries;
numBodyEntries.x = (int)buffer[14];
GASS.CUDA.Types.Int1 j;
for (j.x = 0; j.x < numBodyEntries.x; j.x++)
{
if (start.x + 50 > buffer.Length)
return;

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 53

GASS.CUDA.Types.Int1 innerMsgType;
currentOrder.symbolIndex = BigEndToInt16(buffer, start.x + 16);
innerMsgType.x = BigEndToInt16(buffer, start.x + 18);
currentOrder.sequenceNumber = BigEndToInt32(buffer, start.x + 20);
switch (innerMsgType.x)
{
case Defs.msgType_Add:
GASS.CUDA.Types.Long1 orderID;
orderID.x = BigEndToInt64(buffer, start.x + 28);
currentOrder.volume = BigEndToInt32(buffer, start.x + 36);
GASS.CUDA.Types.Int1 priceNumerator;
priceNumerator.x = BigEndToInt32(buffer, start.x + 40);
GASS.CUDA.Types.Int1 priceScaleCode;
priceScaleCode.x = (int)buffer[start.x + 44];
currentOrder.price = PriceScale(priceNumerator.x,
priceScaleCode.x);
currentOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start.x + 45, 1);
currentOrder.firmIndex = BigEndToInt16(buffer, start.x + 48);
currentOrder.sessionID = (int)buffer[start.x + 50];
start.x += 36;
parseWrite.GWrite("parse", sp.Elapsed.ToString());
if (!_orderBook.ContainsKey(MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)))
_orderBook.Add(MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex), new OrderBook());
_orderBook[MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)].AddEntryPrimary(orderID.x, currentOrder);
break;
case Defs.msgType_Modify:
orderID.x = BigEndToInt64(buffer, start.x + 28);
currentOrder.volume = BigEndToInt32(buffer, start.x + 36);
priceNumerator.x = BigEndToInt32(buffer, start.x + 40);
priceScaleCode.x = (int)buffer[start.x + 44];
currentOrder.price = PriceScale(priceNumerator.x,
priceScaleCode.x);
currentOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start.x + 45, 1);
currentOrder.firmIndex = BigEndToInt16(buffer, start.x + 48);
currentOrder.sessionID = (int)buffer[start.x + 50];
start.x += 36;
parseWrite.GWrite("parse", sp.Elapsed.ToString());
//if (MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex) == 6.692)
if (!_orderBook.ContainsKey(MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)))
return;
_orderBook[MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)].ModifyEntryPrimary(orderID.x, currentOrder);
break;
case Defs.msgType_Delete:
orderID.x = BigEndToInt64(buffer, start.x + 28);
currentOrder.side = ASCIIEncoding.ASCII.GetString(buffer,
start.x + 36, 1);
currentOrder.sessionID = (int)buffer[start.x + 39];
currentOrder.firmIndex = BigEndToInt16(buffer, start.x + 40);
start.x += 28;

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 54

// if (MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex) == 6.692)
//{
parseWrite.GWrite("parse", sp.Elapsed.ToString());
if (!_orderBook.ContainsKey(MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)))
return;
_orderBook[MakeKey(currentOrder.sessionID,
currentOrder.symbolIndex)].DeleteEntryPrimary(orderID.x, currentOrder.sequenceNumber);
//}
break;
default:
//you shouldn't get here
break;
}
}
}

Storage
/// <summary>
/// Adds entry to both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="tmp">Order data</param>
public void AddEntryPrimary(long id, Order tmp)
{
Stopwatch sw = new Stopwatch();
sw.Start();
if (tmp.side == "B")
{
if (!_bids.ContainsKey(tmp.price))
{
_bids.Add(tmp.price, new ConsolodatedEntry());
_bids[tmp.price].orderList = new SortedDictionary<long, int>();
}
if (!_bids[tmp.price].orderList.ContainsKey(id))
{
_bids[tmp.price].orderList.Add(id, tmp.volume);
_bids[tmp.price].totalVolume = _bids[tmp.price].orderList.Sum(ent
=> ent.Value);
}
} else if (tmp.side == "S")
{
if (!_asks.ContainsKey(tmp.price))
{
_asks.Add(tmp.price, new ConsolodatedEntry());
_asks[tmp.price].orderList = new SortedDictionary<long, int>();
}
if (!_asks[tmp.price].orderList.ContainsKey(id))
{
_asks[tmp.price].orderList.Add(id, tmp.volume);
_asks[tmp.price].totalVolume = _asks[tmp.price].orderList.Sum(ent
=> ent.Value);
}
}
if (!_primaryDataStore.ContainsKey(id))
_primaryDataStore.Add(id, tmp);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 55

sw.Stop();
storeWrite.GWrite("storeAdd", sw.Elapsed.ToString());
}
/// <summary>
/// Removes entry from both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="sequenceNumber">Source Sequence Number</param>
public void DeleteEntryPrimary(long id, int sequenceNumber)
{
Stopwatch sw = new Stopwatch();
sw.Start();
if (_primaryDataStore.ContainsKey(id))
{
if (_primaryDataStore[id].side == "B")
{
_bids[_primaryDataStore[id].price].orderList.Remove(id);
if (_bids[_primaryDataStore[id].price].orderList.Count == 0)
_bids.Remove(_primaryDataStore[id].price);
else
_bids[_primaryDataStore[id].price].totalVolume =
_bids[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
else if (_primaryDataStore[id].side == "S")
{
_asks[_primaryDataStore[id].price].orderList.Remove(id);
if (_asks[_primaryDataStore[id].price].orderList.Count == 0)
_asks.Remove(_primaryDataStore[id].price);
else
_asks[_primaryDataStore[id].price].totalVolume =
_asks[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
_primaryDataStore.Remove(id);
_totalDeletes++;
}
else
_isCorrupted = true;

sw.Stop();
storeWrite.GWrite("storeDel", sw.Elapsed.ToString());
}
/// <summary>
/// Modifies entry in both the order book and consolidated prices list
/// </summary>
/// <param name="id">Order ID</param>
/// <param name="tmp">Order Data</param>
public void ModifyEntryPrimary(long id, Order tmp)
{
Stopwatch sw = new Stopwatch();
sw.Start();
if (_primaryDataStore.ContainsKey(id))
{
if (_primaryDataStore[id].side == "B")
{
_bids[_primaryDataStore[id].price].orderList.Remove(id);
if (_bids[_primaryDataStore[id].price].orderList.Count == 0)
_bids.Remove(_primaryDataStore[id].price);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 56

else
_bids[_primaryDataStore[id].price].totalVolume =
_bids[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
else if (_primaryDataStore[id].side == "S")
{
_asks[_primaryDataStore[id].price].orderList.Remove(id);
if (_asks[_primaryDataStore[id].price].orderList.Count == 0)
_asks.Remove(_primaryDataStore[id].price);
else
_asks[_primaryDataStore[id].price].totalVolume =
_asks[_primaryDataStore[id].price].orderList.Sum(ent => ent.Value);
}
if (tmp.side == "B")
{
if (!_bids.ContainsKey(tmp.price))
{
_bids.Add(tmp.price, new ConsolodatedEntry());
_bids[tmp.price].orderList = new SortedDictionary<long,
int>();
}
if (!_bids[tmp.price].orderList.ContainsKey(id))
_bids[tmp.price].orderList.Add(id, tmp.volume);
_bids[tmp.price].totalVolume =
_bids[tmp.price].orderList.Sum(ent => ent.Value);
}
else if (tmp.side == "S")
{
if (!_asks.ContainsKey(tmp.price))
{
_asks.Add(tmp.price, new ConsolodatedEntry());
_asks[tmp.price].orderList = new SortedDictionary<long,
int>();
}
if (!_asks[tmp.price].orderList.ContainsKey(id))
_asks[tmp.price].orderList.Add(id, tmp.volume);
_asks[tmp.price].totalVolume =
_asks[tmp.price].orderList.Sum(ent => ent.Value);
}
_primaryDataStore.Remove(id);
_primaryDataStore.Add(id, tmp);
}
else
_isCorrupted = true;
sw.Stop();
storeWrite.GWrite("storeMod", sw.Elapsed.ToString());
}

Retrieval
/// <summary>
/// Retrives the bid side of the order book for the specifed symbol
/// </summary>
/// <param name="symbol">The symbol being requested</param>
/// <returns>Bid side of the orderbook sorted by price</returns>
public void GetBidPrices()
{
while (true)
{

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 57

if (_orderBook.Count < 6)
continue;
lock (_orderBook)
{
Stopwatch sw = new Stopwatch();
sw.Start();
double value;
//tmp dictionary to facilitate stopwatch;
SortedDictionary<double, ConsolodatedEntry> tmp =
_orderBook.ElementAt(5).Value.ConsolodatedBidPrices;
sw.Stop();
parseWrite.GWrite("read", sw.Elapsed.ToString());
//if (tmp.Count > 5)
//
_guiLatency = sw.ElapsedMilliseconds;

}
Thread.Sleep(5);
}
}

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 58

Appendix C – VHDL Codes
Parsing Modules
Load Counter
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY lpm;
USE lpm.all;
ENTITY LoadCounter IS
PORT
(
clock
cnt_en
q
);
END LoadCounter;

: IN STD_LOGIC ;
: IN STD_LOGIC ;
: OUT STD_LOGIC_VECTOR (3 DOWNTO 0)

ARCHITECTURE SYN OF loadcounter IS
SIGNAL sub_wire0

: STD_LOGIC_VECTOR (3 DOWNTO 0);

COMPONENT lpm_counter
GENERIC (
lpm_direction
: STRING;
lpm_port_updown
: STRING;
lpm_type
: STRING;
lpm_width
: NATURAL
);
PORT (
clock : IN STD_LOGIC ;
q
: OUT STD_LOGIC_VECTOR (3 DOWNTO 0);
cnt_en : IN STD_LOGIC
);
END COMPONENT;
BEGIN
q

<= sub_wire0(3 DOWNTO 0);

lpm_counter_component : lpm_counter
GENERIC MAP (
lpm_direction => "UP",
lpm_port_updown => "PORT_UNUSED",
lpm_type => "LPM_COUNTER",
lpm_width => 4
)
PORT MAP (
clock => clock,
cnt_en => cnt_en,
q => sub_wire0
);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 59

END SYN;

BufferRam
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY altera_mf;
USE altera_mf.all;
ENTITY Ram IS
PORT
(
clock
data
rdaddress
wraddress
wren
q

: IN STD_LOGIC ;
: IN STD_LOGIC_VECTOR (31 DOWNTO 0);
: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
: IN STD_LOGIC := '1';
: OUT STD_LOGIC_VECTOR (31 DOWNTO 0)

);
END Ram;

ARCHITECTURE SYN OF ram IS
SIGNAL sub_wire0

: STD_LOGIC_VECTOR (31 DOWNTO 0);

COMPONENT altsyncram
GENERIC (
address_aclr_b
: STRING;
address_reg_b
: STRING;
clock_enable_input_a
: STRING;
clock_enable_input_b
: STRING;
clock_enable_output_b
: STRING;
intended_device_family
: STRING;
lpm_type
: STRING;
numwords_a
: NATURAL;
numwords_b
: NATURAL;
operation_mode
: STRING;
outdata_aclr_b
: STRING;
outdata_reg_b
: STRING;
power_up_uninitialized
: STRING;
read_during_write_mode_mixed_ports
: STRING;
widthad_a
: NATURAL;
widthad_b
: NATURAL;
width_a
: NATURAL;
width_b
: NATURAL;
width_byteena_a
: NATURAL
);
PORT (
wren_a : IN STD_LOGIC ;
clock0 : IN STD_LOGIC ;
address_a
: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
address_b
: IN STD_LOGIC_VECTOR (3 DOWNTO 0);
q_b
: OUT STD_LOGIC_VECTOR (31 DOWNTO 0);
data_a : IN STD_LOGIC_VECTOR (31 DOWNTO 0)
);
END COMPONENT;
BEGIN

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 60

q

<= sub_wire0(31 DOWNTO 0);

altsyncram_component : altsyncram
GENERIC MAP (
address_aclr_b => "NONE",
address_reg_b => "CLOCK0",
clock_enable_input_a => "BYPASS",
clock_enable_input_b => "BYPASS",
clock_enable_output_b => "BYPASS",
intended_device_family => "Stratix IV",
lpm_type => "altsyncram",
numwords_a => 16,
numwords_b => 16,
operation_mode => "DUAL_PORT",
outdata_aclr_b => "NONE",
outdata_reg_b => "CLOCK0",
power_up_uninitialized => "FALSE",
read_during_write_mode_mixed_ports => "DONT_CARE",
widthad_a => 4,
widthad_b => 4,
width_a => 32,
width_b => 32,
width_byteena_a => 1
)
PORT MAP (
wren_a => wren,
clock0 => clock,
address_a => wraddress,
address_b => rdaddress,
data_a => data,
q_b => sub_wire0
);

END SYN;

ByteOrder
library
ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity byteOrder is
port (
data
:
q
:
);
end byteOrder;

in std_logic_vector(31 downto 0);
out std_logic_vector(31 downto 0)

architecture behv of byteOrder is
begin

q(0)
q(1)
q(2)
q(3)
q(4)
q(5)
q(6)
q(7)

<=
<=
<=
<=
<=
<=
<=
<=

data(31);
data(30);
data(29);
data(28);
data(27);
data(26);
data(25);
data(24);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 61

q(8) <= data(23);
q(9) <= data(22);
q(10) <= data(21);
q(11) <= data(20);
q(12) <= data(19);
q(13) <= data(18);
q(14) <= data(17);
q(15) <= data(16);
q(16) <= data(15);
q(17) <= data(14);
q(18) <= data(13);
q(19) <= data(12);
q(20) <= data(11);
q(21) <= data(10);
q(22) <= data(9);
q(23) <= data(8);
q(24) <= data(7);
q(25) <= data(6);
q(26) <= data(5);
q(27) <= data(4);
q(28) <= data(3);
q(29) <= data(2);
q(30) <= data(1);
q(31) <= data(0);

end behv;

Parser
library
ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity Parser is
port ( data
:
in std_logic_vector(31 downto 0);
q
:
out std_logic_vector(31 downto 0);
add
:
out std_logic;
modifiy :
out std_logic;
delete :
out std_logic
);
end Parser;
architecture behv of Parser is
begin
q(31 downto 0) <= data(31 downto 0);
process (data)
begin
if data(15 downto 0) = "1100100" then
add <= '1';
modifiy <= '0';
delete <= '0';
elsif data(15 downto 0) = "1100101" then
add <= '0';
modifiy <= '1';
delete <= '0';

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 62

elsif data(15 downto 0) = "1100110" then
add <= '0';
modifiy <= '0';
delete <= '1';
else
add <= '0';
modifiy <= '0';
delete <= '0';
end if;
end process;
end behv;

Storage Modules
Add
library
ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity Add is
port ( act
data
en
q

:
:
:

in std_logic;
in std_logic_vector(31 downto 0);
:
out std_logic;
out std_logic_vector(15 downto 0)

);
end Add;
architecture behv of Add is
begin
process (act)
begin
if act = '1' then
en <= '1';
q(0) <= data(16);
q(1) <= data(17);
q(2) <= data(18);
q(3) <= data(19);
q(4) <= data(20);
q(5) <= data(21);
q(6) <= data(22);
q(7) <= data(23);
q(8) <= data(24);
q(9) <= data(25);
q(10) <= data(26);
q(11) <= data(27);
q(12) <= data(28);
q(13) <= data(29);
q(14) <= data(30);
q(15) <= data(31);
else
en <= '0';
end if;

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 63

end process;
end behv;

Delete
library
ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity Delete is
port ( act
data
en
q
);
end Delete;

:
:
:

in std_logic;
in std_logic_vector(31 downto 0);
:
out std_logic;
out std_logic_vector(15 downto 0)

architecture behv of Delete is
begin
process (act)
begin
if act = '1' then
en <= '1';
q(0) <= data(16);
q(1) <= data(17);
q(2) <= data(18);
q(3) <= data(19);
q(4) <= data(20);
q(5) <= data(21);
q(6) <= data(22);
q(7) <= data(23);
q(8) <= data(24);
q(9) <= data(25);
q(10) <= data(26);
q(11) <= data(27);
q(12) <= data(28);
q(13) <= data(29);
q(14) <= data(30);
q(15) <= data(31);
else
en <= '0';
end if;
end process;
end behv;

Modify
library
ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity Modify is
port ( act
data
en
q
);

:
:
:

in std_logic;
in std_logic_vector(31 downto 0);
:
out std_logic;
out std_logic_vector(15 downto 0)

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 64

end Modify;

architecture behv of Modify is
begin
process (act)
begin
if act = '1' then
en <= '1';
q(0) <= data(16);
q(1) <= data(17);
q(2) <= data(18);
q(3) <= data(19);
q(4) <= data(20);
q(5) <= data(21);
q(6) <= data(22);
q(7) <= data(23);
q(8) <= data(24);
q(9) <= data(25);
q(10) <= data(26);
q(11) <= data(27);
q(12) <= data(28);
q(13) <= data(29);
q(14) <= data(30);
q(15) <= data(31);
else
en <= '0';
end if;
end process;
end behv;

ASMMux
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY lpm;
USE lpm.lpm_components.all;
ENTITY ASMMux IS
PORT
(
data0x
data1x
data2x
sel
result
);
END ASMMux;

:
:
:
:
:

IN STD_LOGIC_VECTOR (15 DOWNTO 0);
IN STD_LOGIC_VECTOR (15 DOWNTO 0);
IN STD_LOGIC_VECTOR (15 DOWNTO 0);
IN STD_LOGIC_VECTOR (1 DOWNTO 0);
OUT STD_LOGIC_VECTOR (15 DOWNTO 0)

ARCHITECTURE SYN OF asmmux IS
--

type STD_LOGIC_2D is array (NATURAL RANGE <>, NATURAL RANGE <>) of STD_LOGIC;
SIGNAL sub_wire0
SIGNAL sub_wire1
SIGNAL sub_wire2

: STD_LOGIC_VECTOR (15 DOWNTO 0);
: STD_LOGIC_VECTOR (15 DOWNTO 0);
: STD_LOGIC_2D (2 DOWNTO 0, 15 DOWNTO 0);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 65

SIGNAL sub_wire3
SIGNAL sub_wire4

: STD_LOGIC_VECTOR (15 DOWNTO 0);
: STD_LOGIC_VECTOR (15 DOWNTO 0);

BEGIN
sub_wire4
sub_wire3
result
<=
sub_wire1
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(2,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(1,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,
sub_wire2(0,

<= data0x(15 DOWNTO 0);
<= data1x(15 DOWNTO 0);
sub_wire0(15 DOWNTO 0);
<= data2x(15 DOWNTO 0);
0)
<= sub_wire1(0);
1)
<= sub_wire1(1);
2)
<= sub_wire1(2);
3)
<= sub_wire1(3);
4)
<= sub_wire1(4);
5)
<= sub_wire1(5);
6)
<= sub_wire1(6);
7)
<= sub_wire1(7);
8)
<= sub_wire1(8);
9)
<= sub_wire1(9);
10)
<= sub_wire1(10);
11)
<= sub_wire1(11);
12)
<= sub_wire1(12);
13)
<= sub_wire1(13);
14)
<= sub_wire1(14);
15)
<= sub_wire1(15);
0)
<= sub_wire3(0);
1)
<= sub_wire3(1);
2)
<= sub_wire3(2);
3)
<= sub_wire3(3);
4)
<= sub_wire3(4);
5)
<= sub_wire3(5);
6)
<= sub_wire3(6);
7)
<= sub_wire3(7);
8)
<= sub_wire3(8);
9)
<= sub_wire3(9);
10)
<= sub_wire3(10);
11)
<= sub_wire3(11);
12)
<= sub_wire3(12);
13)
<= sub_wire3(13);
14)
<= sub_wire3(14);
15)
<= sub_wire3(15);
0)
<= sub_wire4(0);
1)
<= sub_wire4(1);
2)
<= sub_wire4(2);
3)
<= sub_wire4(3);
4)
<= sub_wire4(4);
5)
<= sub_wire4(5);
6)
<= sub_wire4(6);
7)
<= sub_wire4(7);
8)
<= sub_wire4(8);
9)
<= sub_wire4(9);
10)
<= sub_wire4(10);
11)
<= sub_wire4(11);
12)
<= sub_wire4(12);
13)
<= sub_wire4(13);
14)
<= sub_wire4(14);
15)
<= sub_wire4(15);

lpm_mux_component : lpm_mux
GENERIC MAP (
lpm_size => 3,
lpm_type => "LPM_MUX",
lpm_width => 16,

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 66

lpm_widths => 2
)
PORT MAP (
sel => sel,
data => sub_wire2,
result => sub_wire0
);

END SYN;

Retrieval Modules
RetrieveCounter
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY lpm;
USE lpm.all;
ENTITY RetrieveCounter IS
PORT
(
clock
cnt_en
q
);
END RetrieveCounter;

: IN STD_LOGIC ;
: IN STD_LOGIC ;
: OUT STD_LOGIC_VECTOR (15 DOWNTO 0)

ARCHITECTURE SYN OF retrievecounter IS
SIGNAL sub_wire0

: STD_LOGIC_VECTOR (15 DOWNTO 0);

COMPONENT lpm_counter
GENERIC (
lpm_direction
: STRING;
lpm_port_updown
: STRING;
lpm_type
: STRING;
lpm_width
: NATURAL
);
PORT (
clock : IN STD_LOGIC ;
q
: OUT STD_LOGIC_VECTOR (15 DOWNTO 0);
cnt_en : IN STD_LOGIC
);
END COMPONENT;
BEGIN
q

<= sub_wire0(15 DOWNTO 0);

lpm_counter_component : lpm_counter
GENERIC MAP (
lpm_direction => "UP",
lpm_port_updown => "PORT_UNUSED",
lpm_type => "LPM_COUNTER",
lpm_width => 16
)
PORT MAP (
clock => clock,

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 67

cnt_en => cnt_en,
q => sub_wire0
);

END SYN;

ParsedRam
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY altera_mf;
USE altera_mf.all;
ENTITY ParsedRam IS
PORT
(
clock
data
rdaddress
wraddress
wren
q
);
END ParsedRam;

: IN STD_LOGIC ;
: IN STD_LOGIC_VECTOR (63 DOWNTO 0);
: IN STD_LOGIC_VECTOR (15 DOWNTO 0);
: IN STD_LOGIC_VECTOR (15 DOWNTO 0);
: IN STD_LOGIC := '1';
: OUT STD_LOGIC_VECTOR (63 DOWNTO 0)

ARCHITECTURE SYN OF parsedram IS
SIGNAL sub_wire0

: STD_LOGIC_VECTOR (63 DOWNTO 0);

COMPONENT altsyncram
GENERIC (
address_aclr_b
: STRING;
address_reg_b
: STRING;
clock_enable_input_a
: STRING;
clock_enable_input_b
: STRING;
clock_enable_output_b
: STRING;
intended_device_family
: STRING;
lpm_type
: STRING;
numwords_a
: NATURAL;
numwords_b
: NATURAL;
operation_mode
: STRING;
outdata_aclr_b
: STRING;
outdata_reg_b
: STRING;
power_up_uninitialized
: STRING;
read_during_write_mode_mixed_ports
: STRING;
widthad_a
: NATURAL;
widthad_b
: NATURAL;
width_a
: NATURAL;
width_b
: NATURAL;
width_byteena_a
: NATURAL
);
PORT (
wren_a : IN STD_LOGIC ;
clock0 : IN STD_LOGIC ;
address_a
: IN STD_LOGIC_VECTOR (15 DOWNTO 0);
address_b
: IN STD_LOGIC_VECTOR (15 DOWNTO 0);
q_b
: OUT STD_LOGIC_VECTOR (63 DOWNTO 0);

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 68

data_a : IN STD_LOGIC_VECTOR (63 DOWNTO 0)
);
END COMPONENT;
BEGIN
q

<= sub_wire0(63 DOWNTO 0);

altsyncram_component : altsyncram
GENERIC MAP (
address_aclr_b => "NONE",
address_reg_b => "CLOCK0",
clock_enable_input_a => "BYPASS",
clock_enable_input_b => "BYPASS",
clock_enable_output_b => "BYPASS",
intended_device_family => "Stratix IV",
lpm_type => "altsyncram",
numwords_a => 65536,
numwords_b => 65536,
operation_mode => "DUAL_PORT",
outdata_aclr_b => "NONE",
outdata_reg_b => "CLOCK0",
power_up_uninitialized => "FALSE",
read_during_write_mode_mixed_ports => "DONT_CARE",
widthad_a => 16,
widthad_b => 16,
width_a => 64,
width_b => 64,
width_byteena_a => 1
)
PORT MAP (
wren_a => wren,
clock0 => clock,
address_a => wraddress,
address_b => rdaddress,
data_a => data,
q_b => sub_wire0
);

END SYN;

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 69

Appendix D – Arcabook Messages
Message Header Format
All messages are preceded by a standard header format with the exception of
the Order Book Refresh Message. The table on the next page describes the
header fields of an NYSE ARCA Quote message.
Field
MsgSize

Offset Size
(Bytes)
0
2

MsgType

2

2

Format

Description

Binary
Integer

This field indicates the minimum size of the
message body in
bytes. Total size can vary with the number of
bodies in the
message:
Sequence Number Reset – ’18 Bytes’
Heartbeat Message – ‘14 Bytes’
Heartbeat Response Message – ’34 Bytes’
Message Unavailable – ‘22 Bytes’
Retransmission Request Message – ‘42 Bytes’
Retransmission Response Message – ’42 Bytes’
Book Refresh Request Message – ‘38 Bytes’
Imbalance Refresh Request Message – ’38
Bytes’
Book Refresh Message – ’46 Bytes’
Imbalance Refresh Message – ’50 Bytes’
Symbol Index Mapping Request Message– ’38
Bytes’
Symbol Index Mapping Message – ’34 Bytes’
Firm Index Mapping Request Message – ’38
Bytes’
Firm Index Mapping Message – ’26 Bytes’
Symbol Clear – ’22 Bytes’
Add Order Message - ’46 Bytes’
Modify Order Message - ’46 Bytes’
Delete Order Message - ’38 Bytes’
This field identifies the type of message
‘1’ – Sequence Number Reset

Binary
Integer

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 70

MsgSeqNum

4

4

Binary
Integer

SendTime

8

4

Binary
Integer

ProductId

12

1

RetransFlag

13

1

Binary
Integer
Binary
Integer

‘2’ – Heartbeat Message
‘5’ – Message Unavailable
‘10’ – Retransmission Response message
‘20’ – Retransmission Request Message
‘24’ – Heartbeat Response Message
‘30’ – Book Refresh Request Message
‘31’ – Imbalance Refresh Request Message
‘32’ – Book Refresh Message
‘33’ – Imbalance Refresh Message
‘34’ - Symbol Index Mapping Request Message
‘35’ - Symbol Index Mapping Message
‘36’ - Symbol Clear
‘37’ – Firm Index Mapping Message
‘38’ – Firm Index Mapping Request Messsage
‘99’ – Generic Book Message for Add, Modify,
Deletes,
Imbalances
This field contains the message sequence
number assigned by PDP for each product. It is
used for gap detection. Also known as
Line Sequence Number (LSN).
This field specifies the time message was
created by PDP. The
number represents the number of milliseconds
since midnight of
the same day.
‘115’ is the product value used in the PDP
header to identify the ArcaBook feed
A flag that indicates whether this is an original,
retransmitted, or
‘replayed’ message. Valid values include:
‘1’ – Original message
‘2’ – Retransmitted message
‘5’ – Refresh Retransmission
‘6’ – Failover Retransmission
‘7’ – Start of Update
‘8’ – End of Update

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 71

NumBodyEntries

14

1

Binary
Integer

FILLER

15

1

ASCII
String

‘9’ – Only one packet in update
The number of times the message body repeats
in the message.
For example, if the body consists of a field
(named Volume) and
the “NumBodyEntries” field is 2, the number of
bytes in the
message body will be 8
This is a filler, reserved for future use

Add Message Body Format
The table below describes the body fields of an Arcabook Add message (MsgType
= ‘100’). Arcabook sends this message for a new open order.
Field
SymbolIndex

Offset Size
(Bytes)
16
2

Format

Description

Binary
Integer

This field identifies the numerical
representation of the
symbol. User can combine this value with the
session id
to obtain a unique key
This field identifies the type of message
Message
‘100’ – Add Order Message
This field contains the sequence number
assigned by the
source system to this message. The sequence
number is
unique only to a given stock. Hence orders for
two
different stocks may share the same source
sequence
number. Please note that the sequence
number while it

MsgType

18

2

Binary
Integer

SourceSeqNum

20

4

Binary
Integer

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 72

increases serially, it does not increase
monotonically.
This field specifies the quote generation time.
The
number in this field represents the number of
milliseconds since midnight of the same day.
The Order ID identifies a unique order and will
allow
customers of NYSE Arca Trades to correlate
trades to the
order
This field contains the size of the order. Please
note we
do not send Odd Lot (<100) quotes.
This field specifies the price of the order.

SourceTime

24

4

Binary
Integer

OrderId

28

4

Binary
Integer

Volume

32

4

Binary
Integer

PriceNumerator 36

4

PriceScaleCode

40

1

Side

41

1

ExchangeID

42

1

SecurityType

43

1

FirmIndex

44

2

Binary
Integer
Binary
Decimal Placement
Integer
ASCII
This field indicates the side of the order
Character Buy/sell.
Valid Values:
‘B’ – Buy
‘S’ – Sell
ASCII
The id of the originating exchange of the
Character quote.
Valid values:
‘N’ – NYSE (not used)
‘P’ – NYSE ARCA
‘B’ – NYSE ARCA BB
ASCII
This field specifies the security type for this
Character message.
Valid values:
‘E’ – Equity
‘B’ – BB
Binary
This field identifies the numerical
Integer
representation of the
firm sending the quote if attributed.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 73

SessionID

46

1

FILLER

47

1

Binary
Integer
ASCII
String

Identifies the Source Session of the Symbol.
This is a filler, reserved for future use

Modify Message Body Format
The table below describes the body fields of an ArcaBook Modify message (MsgType =
‘101’). ArcaBook sends this message when an order in an ArcaBook is modified. The order
id refers to the original order sent in the add order message. The following events trigger a
modify order message.
• The price of an order changes
• The size of an order changes
• An order is partially filled
• An order is routed to an away market with some shares remaining in the ArcaBook.
Note: If an away market declines the NYSE Arca preference, a Modify
Order message is sent to “add” the declined shares back to the Archipelago
book.
Field
Offset Size
Format
Description
(Bytes)
SymbolIndex
16
2
Binary
This field identifies the numerical
Integer
representation of the
symbol. User can combine this value with the
session id
to obtain a unique key
MsgType
18
2
Binary
This field identifies the type of message
Integer
Message
‘101’ – Modify Order Message
SourceSeqNum
20
4
Binary
This field contains the sequence number
Integer
assigned by the
source system to this message. The sequence
number is
unique only to a given stock. Hence orders for
two
different stocks may share the same source
sequence
number. Please note that the sequence
number while it
Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 74

increases serially, it does not increase
monotonically.
This field specifies the quote generation time.
The
number in this field represents the number of
milliseconds since midnight of the same day.
The Order ID identifies a unique order and will
allow
customers of NYSE Arca Trades to correlate
trades to the
order
This field contains the size of the order. Please
note we
do not send Odd Lot (<100) quotes.
This field specifies the price of the order.

SourceTime

24

4

Binary
Integer

OrderId

28

4

Binary
Integer

Volume

32

4

Binary
Integer

PriceNumerator

36

4

PriceScaleCode

40

1

Side

41

1

ExchangeID

42

1

SecurityType

43

1

FirmIndex

44

2

Binary
Integer
Binary
Decimal Placement
Integer
ASCII
This field indicates the side of the order
Character Buy/sell.
Valid Values:
‘B’ – Buy
‘S’ – Sell
ASCII
The id of the originating exchange of the
Character quote.
Valid values:
‘N’ – NYSE (not used)
‘P’ – NYSE ARCA
‘B’ – NYSE ARCA BB
ASCII
This field specifies the security type for this
Character message.
Valid values:
‘E’ – Equity
‘B’ – BB
Binary
This field identifies the numerical
Integer
representation of the
firm sending the quote if attributed.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 75

SessionID

46

1

FILLER

47

1

Binary
Integer
ASCII
String

Identifies the Source Session of the Symbol.
This is a filler, reserved for future use

Delete Message Body Format
The table below describes the body fields of an ArcaBook Delete message
(MsgType = ‘102’). ArcaBook sends this message when an order is taken off of
the NYSE Arca open order book. The following events will trigger the
transmission of a delete order message.
• An order is cancelled
• An order expires
• An order is routed to an away market. Note: If the away market declines
the NYSE ARCA preference, an Add Order message with the original
order id will be sent to return the order to the ArcaBook.
• An order is filled

SymbolIndex

Offset Size
(Bytes)
16
2

MsgType

18

2

Binary
Integer

SourceSeqNum

20

4

Binary
Integer

SourceTime

24

4

Binary
Integer

Field

Format

Description

Binary
Integer

This field identifies the numerical
representation of the symbol. User can
combine this value with the session id to
obtain a unique key
This field identifies the type of message
Message
‘102’ – Delete Order Message
This field contains the sequence number
assigned by the source system to this
message. The sequence number is unique
only to a given stock. Hence orders for two
different stocks may share the same source
sequence number. Please note that the
sequence number while it increases serially, it
does not increase monotonically.
This field specifies the quote generation time.
The number in this field represents the
number of milliseconds since midnight of the
same day.

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 76

OrderId

28

4

Side

41

1

ExchangeID

42

1

SecurityType

43

1

SessionID

46

1

FirmIndex

44

2

FILLER

47

1

Binary
Integer

The Order ID identifies a unique order and will
allow
customers of NYSE Arca Trades to correlate
trades to the
order
ASCII
This field indicates the side of the order
Character Buy/sell.
Valid Values:
‘B’ – Buy
‘S’ – Sell
ASCII
The id of the originating exchange of the
Character quote.
Valid values:
‘N’ – NYSE (not used)
‘P’ – NYSE ARCA
‘B’ – NYSE ARCA BB
ASCII
This field specifies the security type for this
Character message.
Valid values:
‘E’ – Equity
‘B’ – BB
Binary
Identifies the Source Session of the Symbol.
Integer
Binary
This field identifies the numerical
Integer
representation of the
firm sending the quote if attributed.
ASCII
This is a filler, reserved for future use
String

Hernandez, Rapid Decoding of Digital Data Streams Using Field Programmable Gate Arrays | 77

