Application Modernization - Part 4: engineering the delivery Platform

Application Modernization - Part 4: engineering the delivery Platform

Modernizing applications towards a cloud-native approach, that breaks monolithic components into multiple smaller microservice, leads to a rise in complexity in managing the IT Infrastructure operations. While each microservice becomes simpler and potentially more scalable, the number of moving pieces to be managed drastically increases.

The traditional approach of the Operations teams of managing individual Servers/VMs, monitoring,  and applying security fixes, middleware patches, and upgrades cannot scale effectively.  

A different approach is needed based on large-scale automation, to keep control and scale effectively, and is commonly explained through a common metaphor for this,  "Cattle vs Pets" where:

  • Pets are the traditional Servers/VMs, individually and carefully managed by the operation team. This is a labor-intensive activity, that has to be prioritized based on the criticality of components, leading to an increasingly divergent amount of combinations of software components at different levels in each server.
  • Cattle are the cloud-native VMs and Containers, managed through large-scale automation. A failing Container is not repaired, but replaced as soon as possible, while at the same time continuously learning how to reduce faults by improving automated procedures.

Two common approaches to automation have emerged, DevOps (or even better, DevSecOps, but in this context, I will not focus on Security) and SRE (Site Reliability Engineering).

Dev Ops and SRE

These two approaches might seem like two sides of the same coin since both focus on comparable tools and practices, rooted in automation and Agile methods. But at the same time, the target focus is slightly different, mostly based on the original sponsors of the approach:

  • DevOps approach originates from Development teams, looking for ways to bring their application to production in a faster and safer way. Automating builds, testing, and deployments through a continuous improvement process to bring business value to production, removing operational impediments in doing so.
  • SRE, as the name itself implies, focuses on the Reliability aspect, which is a key concern for the Operation teams that strive to keep the infrastructure and server running while the application and software installed might change. The focus is not so much on Business value directly, but rather on improving the manageability of applications, to keep the systems going or eventually restore them as fast as possible.
DevOps vs SRE approach
DevOps vs SRE approach

With this perspective, it seems almost as if the two approaches might be antithetical. So what's the reason for this? It boils down to the consequences of Conway's law, and specifically to the fact that in many organizations the Dev and Ops structures have different success metrics.

The stakeholders

Dev and Ops are driven by two main types of manager profiles, with alternative approaches on some different topics, which I summarize below.

Development vs Operation mindset
Development vs Operation mindset

The summary presented here is a kind of polarized reference, with real situations not so black and white. The more the goals of both types of stakeholders are aligned, the simper is collaborating.

Converging on an end-to-end approach

To make this possible, the first step is to better align the Goals of both Development and Operation structures. To achieve a successful Digital Transformation and modernize applications both Business KPIs and Service levels goals must be shared by the organization.  

Platform Engineering supporting Dev & Ops
Platform Engineering supporting Dev & Ops

To implement this, a Platform Engineering approach (see also) is critical to support both Dev and Ops activities, by providing common methods, platforms, and services to boost productivity and collaborations within and across both teams.

Conclusion

Platform Engineering becomes the critical enabler to effectively manage a Hybrid Multi-Cloud at scale. This is the technological foundation to develop effectively any kind of Business platform, that can drive a real Digital Transformation.