Here are some things that I am consuming this week. It could be talks I am watching, podcast episodes, articles, or book reviews. These are a few of the things that I found interesting this week.
- Real Life is Messy. We Should Design For It – Marco Altini (Medium)
- Do Organizations need Data Governance as a Service – Dennis D. McDonald (Website)
- A Thorough Introduction to Apache Kafka – Stanislav Kozlovski (@stankozlovski)
- Rethinking Microservices with Stateful Streams – Ben Stopford (@benstopford)
Article: Real Life is Messy. We Should Design For It. (Link)
Author: Marco Altini
The reason I liked this article is that it is talking about the concept of measuring your data quality. The article is talking specifically about wearables and times when the signal between the wearable and another device might be weak and thus data might be getting lost. This is often a very interesting challenge especially as applications become more distributed and data is synchronized via messaging and other types of transports. The question for the receiver is always, “Do I have all the data that I should”.
We need to start being more open and building data quality checking into our pipelines. As an example, as a consumer of a stream of data I should be able to consume not only the stream of data, but also a stream of data quality that is going to give me information on times where my data might of missing, degraded, or simply wrong. Thus, as the consumer, I can make choices about how I might display or not display that data to the user. This approach of planning for missing data is more prevalent in high volume consumption such as sensors and web traffic where overall aggregate numbers tend to be acceptable. This is something that we need to be thinking about more and more as systems and data continue to become more distributed.
Article: Do Organizations need Data Governance as a Service (DGaaS) (Link)
Author: Dennis D. McDonald
To me the key takeaway for this article is the pushing of the thought that Data Governance is no longer and IT problem to solve it is an organization problem to solve. Gone are the days of data documentation being done by a central data team. The fact that data is so pervasive in organizations now, the responsibility of maintaining metadata and documentation about data has to be spread out through the entire organization. The hard part of this problem tends to be convincing leaders outside of data or IT that this process is important and that they should dedicate people in their organization to it. Data Governence is no longer just an IT problem.
Article: A Thorough Introduction to Apache Kafka (Link)
Author: Stanislav Kozlovski
Kafka is seemingly everywhere these days when you are talking about distributed data and systems. This article does a pretty good job of giving the reader a good introduction to Kafka with some really good diagrams that help explain things. I am not sure I would call it thorough because I think there is still a lot of detail for the reader to understand, but it would give someone the basics.
Video: Rethinking Microservices with Stateful Streams (Link)
Presenter: Ben Stopford
I definitely need to go back and watch this one again. There is so much goodness in this talk that a repeated viewing is warranted. If your company is building microservices this is a really great talk to listen to. It does a great job of presenting the the different types of application architectures and the challenges associated with them from a data perspective. I am definitely bought into the possibilities of the the streaming platform being able to serve data as a central platform, but need to think about the details of implementing it. I think an additional viewing is definitely in order.