Software maintenance is often framed as a technical problem. Refactoring code, fixing bugs, or upgrading dependencies. In this conversation, Robby Russell talks with Rein Henrichs about a different lens, one centered on understanding.
Rein is a Principal Software Engineer at Procore, where he works within a large, long-lived system used across the construction industry. Rather than focusing on tooling, Rein emphasizes that well-maintained software is software that makes sense to the people maintaining it.
To explain this, Rein introduces the idea of the line of representation, drawing on the work of Richard Cook. Engineers do not interact directly with systems. They rely on representations such as logs, dashboards, and code. These are approximations, not reality, echoing ideas from Plato’s Allegory of the Cave.
When those representations break down, teams lose shared understanding, what Rein describes as “common ground.” This often shows up as weak signals. Subtle indicators that something is not quite right. They are easy to ignore, but over time they lead to confusion and slower decision-making.
Incidents make this especially visible. Rein explains how teams build alignment under pressure, highlighting that the role of an incident commander is coordination, not control. Clear communication matters as much as technical correctness.
The conversation also explores how large systems behave in practice. They rarely fail completely. Instead, they degrade in multiple ways at once. While SLOs can help teams respond to customer-facing issues, they do not capture internal clarity or alignment.
Rein references W. Edwards Deming to highlight a common trap. Not everything that matters can be measured. High-performing teams often rely on judgment, experience, and shared context.
Toward the end, Rein connects these ideas to The Field Guide to Understanding Human Error by Sidney Dekker, challenging the idea that incidents are simply caused by mistakes. Instead, they emerge from the same behaviors that usually lead to success, just under different conditions.
For teams working in complex systems, the takeaway is straightforward. Maintaining software depends on maintaining understanding.
Links & Resources
Concepts & References
- How Complex Systems Fail – Richard Cook
- The Field Guide to Understanding Human Error – Sidney Dekker
- W. Edwards Deming
- Gerald Weinberg – Secrets of Consulting
Referenced in this Conversation
- Kent Beck: You’re Ignoring Optionality and Paying for It
- Charity Majors: Deploys Are Just the Beginning
- Heidi Helfand: The Art and Wisdom of Changing Teams
Thanks to Our Sponsor!
Turn hours of debugging into just minutes! AppSignal is a performance monitoring and error-tracking tool designed for Ruby, Elixir, Python, Node.js, Javascript, and other frameworks.
It offers six powerful features with one simple interface, providing developers with real-time insights into the performance and health of web applications.
Keep your coding cool and error-free, one line at a time!
Use the code maintainable to get a 10% discount for your first year. Check them out!