Digitization and data-driven manufacturing process is needed for today's
industry. The term Industry 4.0 stands for today industrial digitization which
is defined as a new level of organization and control over the entire value
chain of the life cycle of products; it is geared towards increasingly
individualized customer's high-quality expectations. However, due to the
increase in the number of connected devices and the variety of data, it has
become difficult to store and analyze data with conventional systems. The
motivation of this paper is to provide an overview of the understanding of the
big data pipeline, providing a real-time on-premise data acquisition, data
compression, data storage and processing with Apache Kafka and Apache Spark
implementation on Apache Ha-doop cluster, and identifying the challenges and
issues occurring with implementation the Farplas manufacturing company, which
is one of the biggest Tier 1 automotive supplier in Turkey, to study the new
trends and streams related to topics via Industry 4.0.Comment: 8 page