1 research outputs found
Cloud based Real-Time and Low Latency Scientific Event Analysis
Astronomy is well recognized as big data driven science. As the novel
observation infrastructures are developed, the sky survey cycles have been
shortened from a few days to a few seconds, causing data processing pressure to
shift from offline to online. However, existing scientific databases focus on
offline analysis of long-term historical data, not real-time and low latency
analysis of large-scale newly arriving data.
In this paper, a cloud based method is proposed to efficiently analyze
scientific events on large-scale newly arriving data. The solution is
implemented as a highly efficient system, namely Aserv. A set of compact data
store and index structures are proposed to describe the proposed scientific
events and a typical analysis pattern is formulized as a set of query
operations. Domain aware filter, accuracy aware data partition, highly
efficient index and frequently used statistical data designs are four key
methods to optimize the performance of Aserv. Experimental results under the
typical cloud environment show that the presented optimization mechanism can
meet the low latency demand for both large data insertion and scientific event
analysis. Aserv can insert 3.5 million rows of data within 3 seconds and
perform the heaviest query on 6.7 billion rows of data also within 3 seconds.
Furthermore, a performance model is given to help Aserv choose the right cloud
resource setup to meet the guaranteed real-time performance requirement