Problem: I have a smart environment monitoring device at home, this has a few sensors, like a temperature sensor, a humidity sensor, a carbon monoxide sensor, noise level and air quality sensor, it is also connected to the internet and every 30 seconds, it sends this data to the server. In addition to environment data, it occasionally also sends the device status data every hour, which looks at the overall condition of the device.
This is a typical problem for a stream processing data system to handle. This is a high velocity and high volume data which needs to be handled by our data processing system.
The Kappa architecture
An architectural pattern which handles a stream processing system is the kappa architecture, which treats streams as first class citizen, apache spark streaming is a framework which has a concept of micro batches and structured streaming to be able to process data from stream oriented systems.
In part two we will take a look at the kappa architecture in detail and come up with an initial design of a system that handles such a use case.