Application Modernization - Part 3: Unravel the data
Data in applications has a powerful gravitational pull, so transforming and modernizing applications is heavily impacted by existing data. Digital Transformations have focused on enabling new cloud applications and frontends but remained connected to existing transactional backends (see also). This kind of approach highlighted potential issues in handling traffic load shifts in reading vs writing transactional data. I had some direct experience with this on projects:
- Online ticketing: moving from traditional sales at stations to online apps and services leads to a significant reduction in the conversion rate from travel searches to tickets sold (e.g. an online user tries to plan several trip options before committing to a purchase ), increasing resource consumption.
- Online Banking: The adoption of online banking greatly increased the usage of query operations (e.g. end user can read their account statements online) vs actual financial transactions. To contain the increased load and cost on the backend transactional systems a data replication approach (a.k.a Copy Banking) can be used to offload query traffic from the expensive Mainframe towards a hybrid cloud containerized infrastructure.
To continue the cloud journey, we need to refactor also backend transactional systems, with their persistent data (of mission-critical, financial nature), and make the new services able to scale adequately and cost-effectively.
Breaking down the monolith
Domain- Driven-Design (DDD) is a very good tool to break down the Business domain (see also), to identify parts of the business applications that can be modernized and operated as independent microservice components. Analysis of the data is very helpful in this process, to identify contexts based on different data relationship types:
- Foreign key relationships: Group tables that are related to the same entity
- Transactional relationships: Group tables that are updated in the same transaction
These groups of related tables are indications of candidate DDD aggregates. A good candidate is when the tables of the group are tightly coupled within the group but have few relationships with other tables.
This is such a good strategy, that the largest chapter of the "Monolith to Microservices" book by Sam Newmann focuses explicitly on patterns for managing data handling.
The identified contexts can then be extracted (code and data together) using variations of the "Strangler Pattern", to create the replacement independent components.
Coexistence architecture
While on the front-end and application components, the strangler pattern can be applied using HTTP proxy/routing techniques to dynamically replace sections of the application, the need of seeing consistent data requires other techniques.
Rolling out a modernized application is rarely a big-bang event, with instead long periods of parallel deployment and testing of the mission-critical applications. This "coexistence" need requires a suitable architecture and in IBM we have a reference cookbook that provides some key fundamental patterns.
These patterns help in moving out the data and managing the transition in either direction:
- Current to Modernized
- Modernized to Current
Let's have a look at these patterns, focusing on the Current to Modernized direction.
Change Data Capture
Current systems events are discovered by consuming changes in the system's existing artifacts (typically database files and/or data sources). This pattern is used when changes to Current Programs are not affordable or strategically desirable, but in this case, additional effort must usually be spent to adapt the event for the destination systems with on-demand transformations that often duplicate existing legacy logic.
Key tools for implementing this pattern are:
- IBM® IBM Change Data Capture (CDC Replication)
- Oracle GoldenGate
- Debezium, for an open source CDC
Pro | Cons |
---|---|
|
|
Application Event Streaming
This Pattern is used to replicate the Current system application's state, through application changes to expose all business events to an event stream(s). Events are sourced to destination systems with on-demand transformation (usually limited to adapting model schemas, but not replicating business logic).
Events are exposed after the business operation is completed to avoid phantom events. To support this it's advised to use a message-passing middleware with transactional guarantees (either local transaction or exactly-once/at-least-once guarantees). Such middleware can be:
- Queue-based systems (JMS, IBM MQ, etc...)
- Topic/Partition based (Kafka)
Further articles in this series will go more in-depth on events/messaging.
Pro | Cons |
---|---|
|
|
Filtering & Transformation engine
This pattern is used to reduce event streams relevant events by applying transformation and filtering rules and caching processed events
Filtering rules are used to:
- Drop irrelevant events (i.e. CDC sends everything)
- Route to different/multiple event destinations
Event cache is used to:
- Filter out duplicate events (i.e. multiple sources)
- Guaranteed idempotent retries of event transformations
- Aggregate low-level events
Transformation can become complex both in CDC and Event Streaming scenarios, especially if transformations need to duplicate the business logic of the Current system. EAI pattern-aware tools such as Camel/Fuse are particularly useful for these kinds of transformations.
Pro | Cons |
---|---|
|
|
Choosing an approach
Ultimately, CDC is not a silver bullet for modernizing systems. All these patterns are tools for modernizing applications, each with its own strengths and complexity, and each, in the wrong context, can become an anti-pattern.
We need to choose the right balance of complexity based on the solution we want to address. As a criterion for this Litmus test, I would use the target envisioned architecture, that can either completely replace the old system or partially reuse it:
- Replace: The old system no longer has strategic value, so the goal is moving out of legacy and transitioning completely to a new solution. In this scenario, CDC with Filtering and Transformation logic eases the transition to the new data model, with the goal of removing these steps with duplicated logic as soon as the transition is complete. This scenario would fit better with a monodirectional data flow.
- Coexist: The old system still has strategic value, so the goal is to get back ownership of the old legacy application and integrate it with the new components. In this scenario, Event Streaming will work best at decoupling the systems and replication data based on a shared semantical model, but CDC might still be effectively used if the two data models do not require significant transformation logic to be maintained. It's also a more appropriate target solution for bidirectional data flows.
Especially with the second approach, It would be better to avoid a 2-speed architecture organizational model, but rather have the legacy application teams included in the overall IT process, to foster an effective Agile/DevSecOps environment in maintaining the two systems and their data flows.