Modernization Pressures
In the last 4 years, the Data Management space has seen more modernization than the prior 20 years combined. ETL vendors are pressing for disruptive changes; database vendors are building compelling opportunities for faster analytics; cloud vendors are building cases for standardization with client C-Suites. Wherever you look in the data management space, there seems to be movement. But with that movement comes a big question: What do we do with our legacy systems?
Some have the luxury to start fresh and rewrite their data applications, but most have to deal with the prospect of modernizing their existing jobs into their newly selected platforms. This involves converting the old code into new code which the modern system can understand. That “code” might come in the form of metadata (in the case of an ETL tool) or might come in the form of SQL Scripts, or it might be something in the middle, like a stored procedure. In any case, for the purposes of this whitepaper, we’ll call it “code”.
There is no “magic button” for code modernization. Hearing that statement come from somebody that works for a code modernization automator might come as a surprise. But it’s the truth. Those claiming high first-run conversion percentages tend to shy away from putting that in a legal agreement without big strings attached. Why is that? Why is it that code conversion isn’t a push-button event? Simply put, it’s a complex system.
A complex system is described as: a system composed of many components which may interact with each other.
In fact for code conversion, it’s two complex systems: one legacy system and one future system. The longer a legacy system has been in operation the more “interaction” it has with other systems. So for an ETL tool, these can be connections, embedded SQL jobs, enterprise schedulers, custom programs, etc. These interactions need to be mimicked in the future system. However, of course, the target has its own limitations and capabilities. So an equally architected understanding has to be established of the future system. Presuming you can capture all that, then you’re ready to convert your code.
Converting by Hand
Organizations and system integrators will often take to executing code conversion by hand. There are 4 cases where a hand-converted code can be worthwhile, or even preferable, when all four are found to be the case:
- The quantity of code is less than 50 jobs
- There is little programmatic pattern between jobs
- The number of technologies being converted is low (ETL, databases, schedulers)
- Distance of legacy/future technologies is very low or so high that it is basically a rewrite
Low Job Quantity
The number of jobs needing to be converted is a critical count (think of a job as an ETL mapping, or a .sql file) because it increases the number of interconnected components in a complex system. Developers can handle a small number of these interconnected systems, but when they get above 50-100, they begin to lose the ability to sustain development patterns in the legacy code. Additionally, when the number of jobs is low, rework isn’t much of an issue. Organizations can afford to make mistakes in either predicting the future platform functionality or missing a flag during conversion because the impact is fairly manageable.
Low Patterning
Patterns are the basis of automating a conversion. If the patterns are simply not there, developers don’t have a reason to use any automation. In some code, each event is bespoke. This is true for custom coding languages where the developer can declare a near infinite number of patterns. In data management code, this is sometimes the case for stored procedures. As a result, the pattern types for stored procedures can be quite extensive. However, nine times out of ten it still makes sense to convert them using automation.
If the patterns are very sparse, even at a low level, then conversion by hand might be better.
Low Technology Variety
If the number of technologies delivering the solution is low, this is another point that can favor conversion by hand. When there are multiple interlinked handshakes that need to be projected to a future platform, they can introduce complexity that is difficult to consistently code by hand. A simple example is enterprise scheduling tags in SQL or even a simple datatype flag in an ETL tool. These little environmental considerations seem trivial until an organization is mid-conversion.
Distance of Technology
There are some technology distances that are so far apart that rewriting from scratch is better than attempting to convert. On the other hand, there are technologies that are so close together that they vary only slightly. SQL is a good candidate for that. The overlapping use of SQL syntax to the ANSI standard makes it a candidate for hand conversion in low job count cases. The capabilities of future platforms provide a lot of flexibility and processing prowess which legacy technologies simply had no access to. However, to presume that future technologies simply are more functional might be a little naive. Some of these legacy technologies have had 20 years of features built into their code bases, so baking those gaps into the future target is often difficult to do repetitively by hand. These complex use cases are easier to leave to a pattern conversion automation tool. The key is that the technology patterns are at least somewhat comparable to each other. If the gaps are so distant that they have no logical hooks between them, then automation might not be viable.
Low Variance vs High Variance
Part of what determines the feasibility of conversion is the variance of the code. Different platforms have logical components used for coding functionality. In ETL tools, those components are called transformations. In databases, they will have a host of command structures. But the number of components and variations of their use in a database is far higher than that of an ETL tool. Most ETL tools will have between 50-100 transformations available. Databases, on the other hand, handle logic at such a low leve, that there could be thousands of different ways to code stored procedures, views, and common table expressions to transform data. Thus it is easier to go from low variant code (ETL) to high variant code (SQL or PySpark) than it is to go from high variant code to low variant code. This is simply because the low variant target likely doesn’t map directly to many of the chosen transformation methods being used.
Been There, Done That
A good portion of automation deals come from companies that are already in the middle of converting code by hand. This selection bias is driven by first-hand experience of the “complex system” that pushes the decision past the purchasing department. Here are a few of the complexities of conducting a code conversion by hand.
Code Sharding
Legacy code usually has patterns in it simply because developers don't like to start from a blank canvas, but rather will reuse existing work. These patterns are gold for sustaining the code base because they allow for global management of the code. If something needs to be globally changed, the organization at least has a chance of doing so if the patterns are there.
When a large conversion project is undertaken by hand, code sharding is an inevitable problem because each person doing conversion work has their own way of writing the target jobs. Even the same person might not rewrite the same piece of code the same way at different points in time. So if a team of 8 people are converting code, you will get 8 sharding patterns with potentially different iterations of those shards within the same person. The result is that any effort, later on, to promote or audit global governance on the code patterns is impossible after the conversion is done.
Code Custody
With any environment that has nearly a decade of use, it becomes that “complex system” with many inputs, which must now be converted to a complex target with many inputs. Even for experienced architects, it is nearly impossible to pre-determine all the snags a team will run into mapping the complexity of the source to the complexity of the target. In hand conversion, the "potential energy" of these snags grows the further along the team is with the project because there is no one place to go to make global edits. There are many good stories here that can help hit this point home. The first is from a BladeBridge partner manager:
A few weeks back one of my partnering SI’s sheepishly called me letting me know they had decided to hand convert some ETL jobs from a legacy tool to a new one. During this conversion, they were also changing database vendors. After about 8 months, they had finished the syntactical conversion successfully and were testing data. They noticed a consistent issue with a large number of jobs that had been changed to the new cloud database. The error had to do with a datatype change which would be required in the ETL tool in order to get the data to load correctly. These data type variables resided in thousands of jobs across the ETL deployment. Unfortunately, because this error was found so late in the conversion, the relationship with the customer was in a very tight spot and required a sizable scope change which the SI had to partially swallow.
The case study above illustrates the “potential energy” that builds as a hand conversion project goes on. As snags are found, the rework can be far reaching, causing the project to go way over budget. The cause of this far reaching work is the lack of centralized code custody via a code generator. The following case provided by a BladeBridge SI partner illustrates this point when a code generator is present:
During one of my first code conversion projects 4 years back, we were using BladeBridge to migrate off of Teradata. We had finished the conversion after about 5 months and we were ready to submit the code for promotion to production. The team responsible for that was located in India. After sending them the code, the Indian team reached out to us asking where the enterprise scheduling tags were in the SQL. The team in the US wasn’t aware of this enterprise scheduler, but it was a critical component to productionizing the new SQL code. The client in the US panicked thinking there would be a massive scope change. However, our team had used BladeBridge to configure the code conversion so we simply had to take the tagging patterns communicated by the team in India and bake them into our configuration files. It took us about 2 weeks of unit testing and another 2 weeks of data testing but we were done with the needed changes very quickly.
The centralization of logic ensures the organization doesn’t lose custody of the code during the conversion project and can be adapted throughout the lifetime of the project. This eliminates the “potential energy” problem as the largess of the code patterns is addressed globally in the configuration files.
Automation Categories
Black Box Converters
Most automated code conversion tools use a black box automation tool for executing the code conversion. These Black Box solutions fall into 3 primary categories:
Consulting as a Product
These are vendors that sell their black box accelerator combined with their consulting services to make conversions happen.
Upside:
- They have specialized team members for doing code conversion work, so they can move quickly in a sales cycle with little friction.
- They can centralize a lot of metadata to optimize their conversion process.
Downside:
- Their market scale is limited to their in-house staff.
- Clients must provide all their corporate metadata to the 3rd party conversion vendor so the code can be converted. This puts a target on a small vendor’s back as a “honeypot” repository of metadata, thus the legal agreements can be extensive.
- There is no driving force outside of the vendors own pursuit for building new conversion routes, so the list of available conversion scenarios is usually quite limited.
- The scope of conversion is usually limited to whatever converts in a limited set of iterations and the client is left to convert any remaining code. The scope agreements really matter in this type of conversion.
One Hit Wonders
These are usually conversion utilities that are designed by the vendors themselves to bring their client base to the cloud. Additionally, it represents companies that hyperfocus on one conversion type.
Upside:
- Their converter is highly specialized for its niche creating solid conversion rates for pure play product deployments.
- The costs are often subsidized by the target of the conversion as the conversion tooling is often owned by them.
Downside:
- They get stuck easily in scenarios where interactions traverse the functionality of other tools and features.
- If a feature does not work, there typically is no way to configure a way out of an error until another release comes out, so projects are forced to hand convert outside of the automation.
- No cross-functional skills can be carried from one conversion to another as it can only be used for that limited source and target.
Consulting Army
This is probably the most conniving of the 3 go-to-market solutions. These are consultancies that speak to automation, but are really just using an army of consultants to execute the code conversion project.
Upside:
- Quick to execute small conversions that fit hand-conversion cases described earlier
Downside:
- Sharded patterns during execution
- Lost custody of code as project progresses
- Inconsistent conversion practices
- Lots of rework
- Not scalable to patterning
- Bogus automation claims
White Box Converters
A white box converter should be able to do a demo of the conversion and additionally expose the configuration so it can be edited by the client or systems integrator. The evidence that a vendor has a white box converter is shown by whether it is implementable by other system integrator partners or even by the client themselves. This can be confusing as often vendors will share case studies where system integrators were involved. However, if the system integrator is only testing output, that really doesn’t mean that the system integrator operated the tooling.
Upside:
- Editable configuration files that drive how code is being converted makes the conversion adaptable and extendable.
- A larger system integrator community makes for more configuration routes simply due to market exposure.
- Conversions can be conducted within the 4 walls of the organization and no metadata needs to be shared with outside parties.
Downside:
- There is a software cost to the consulting effort.
- There is a steep learning curve for understanding the substitution logic between legacy and target configuration files.
- Systems integrators may or may not have mature teams in the code conversion space.
Force of Revenue
“Follow the money” is probably one of the most overused but overly useful axioms. The approach for how code conversion companies behave is basically a function of where their source of revenue-generating opportunities comes from. There is nothing nefarious about this, of course, simply it is informational about where the focus of the conversion efforts are. Conversion vendors don’t have an unlimited amount of resources for supporting more technologies, so these driving forces really matter to their evolution as a platform. Building new connectors is often a “chicken and egg” conundrum for automation vendors. Not only does the automation vendor need access to the legacy platform, but also the ability to do effective R&D on said platform.
Vendor-Sponsored Conversions
In the case of a vendor-sponsored tool set, the incentives are narrowed to the strategic goals of the hosting vendor. For example, prior to Google’s acquisition of CompilerWorks, CompilerWorks was heavily influenced to focus its efforts on migrations to BigQuery as this became a strategic partner to Google’s sales team for driving conversion projects. Thus the strategy of code conversion technologies tends to be an effort to latch onto a single vendor so they can ensure a reliable flow of opportunities. The upside is that the target technology gets a solid handshake; the downside is that it narrows the number of technology targets the conversion vendor can support. Thus for more narrow conversions, this approach can be ideal, but they suffer from a less adapted platform for exceptions to the rule.
Systems Integrator Sponsored Conversions
In the case of a systems integrator, they act as brokers to an organization’s desired projects. Their incentive is to sell consulting hours to their clients. They need a solution that makes them competitive enough with the black box conversion solutions, without completely eliminating their consulting revenue. Their sponsorship is highly tactical to their clients' needs, thus they rarely are seeking out fidelity to a single source or target technology. This approach requires the conversion technology to be as loosely coupled as possible to new technology requirements. Thus for multi-technology conversions, this approach can be ideal, however, they suffer from less focus on any single technology stack. The upside is that they are forced to be rapidly adaptable as there is no predicting the types of conversions the system’s integrators will require.
No Automation Available
If you find yourself in a situation where there is no automation path, that does not mean that you should, by default, forge ahead with a hand conversion. Even with the additional effort of an automation vendor generating a path for your legacy technology, it will still likely be shorter than converting the code by hand. This is where you will want to seek out an automation vendor that has a broad base of configurations and has a regular practice of building new migration routes. This tends to favor the system integrator sponsored vendors.
Show Me
Of the current disciplines in technology, code conversion automation is probably up there as being one of the most complex. Folks that are good at it need to be good at so many technologies that it can seem overwhelming. For this very reason, organizations should not feel satisfied with automation proofs until they actually see that the code was auto-generated and not converted by hand. Usually when companies are asked, “did you see them actually run this through their tool?” the answer will be no. This is where organizations need to be careful that they aren’t getting a back office parlor trick. But this gets at the soft underbelly of code conversion tools. They aren’t designed for small scale conversions. This makes them appear on par with hand conversion in terms of effort.
Yes, it is true that these code conversion tools could be run “blind” to see what the output would be, but doing so with no questions asked is not a real world simulation. Questions like, “how would you like the SQL to be wrapped if at all?” or “what naming convention do you want on the target jobs?” or “what enterprise scheduling tags need to be included in the target code?” or “what orchestration method will you be using?” don’t get asked and answered in a blind conversion. And those are just the tip of the iceberg questions; far more technical questions ultimately surface at first glance at the legacy code. Even spending just a day conditioning an automation suite can result in far higher conversion rates. The duration of converting 3 jobs by hand and 3 jobs with an automation tool might be similar. But now scale that up to 200 jobs and we aren’t even in the same ballpark in terms of speed and accuracy.
Pilots
One of the knee jerk requests from conversion prospects is to hunt down the most complex job they have for a pilot. While this might be a fun showcase, the reality is that this complex job probably only happens 2-3 times, if it repeats at all, in the deployment. From an automation perspective, this isn’t the kind of job you want to prioritize for a pilot. Rather it is the job with the most repetitions in the code base, as this will mean that the entire population of such jobs could be automated. These types of jobs may happen hundreds of times and have elements that may occur thousands of times.
Modernization is Not Cheap
Those expecting a “push button” code modernization journey are setting themselves up for disappointment. The truth is that code modernization is complex work that requires a combination of configurable automation tooling and solid legacy/target architecture experience. If a vendor attempts to pitch their tools as “push button” then READ THE LEGAL AGREEMENT.
The complexity of converting code also means that there is a real cost to making it happen which is largely connected to how extensive and complex the legacy code is. To determine this, most conversion automation solutions have analysis tools which will evaluate the legacy code to determine the size of the conversion. This analysis builds an inventory of elements being used in the deployment along with a mechanism to predict the scope of complexity. The nice part about this is that the scope is of a known quantity. So while modernization can cost real money, at the very least, it isn’t a “pie in the sky” in terms of scope. This is really the key to why modernization of these “complex systems” is even feasible. There is a beginning and end to their scope and the permutations of their functionality is not infinite. As long as you have enough flexibility to adapt at scale through the automation solution, code conversion is very feasible.
Don't Forget Testing
Having a code conversion tool is the flashy part (if that’s even a thing). But don’t forget that once the code is converted the code must be tested. At the most basic level, it needs to syntactically be compatible with the future platform in a unit test.
The harder test is the data test. This is where the database in the legacy gets compared to the database in the future. Both should be able to run the same queries and get the same resulting data. Some organizations take testing to a much deeper level and they seek to test each component. These test cases can get a little crazy, but in some organizations they are mandatory.
Thus data testing can be a large piece of the project which each organization must determine how it will conduct. There are tools that facilitate these parallel tests, firing off queries to both databases and consolidating any mismatches. Depending on the level of testing, the cost of data testing can eclipse the cost of code conversion, so a plan must be in place to deal with that. This is also a place to point out the need for a code conversion platform that allows for endless iteration. You may finish the process of conversion then find something later in your data testing which may require you to return to their conversion tools to make a global change and republish. Having that flexibility is key.
Don't Forget Promotion to Production
Some organizations have a very loose production promotion process. However, the kinds of organizations that have 10-20 years of code to convert don’t usually fit in that category. Runbooks, regression tests, and documentation of the future environment need to be budgeted for the full project costs to be accounted for.
Wrangling the Complex System
Breaking down the complexity of conversion is done one step at a time.
- Decide the future state architecture (do this before you evaluate code migration solutions)
- Analyze the legacy code
- Decide on your automation platform
- Condition the automation suite to your organizations base patterns
- Convert legacy code to the future platform
- Adapt automation platform to high population error patterns
- Compare data tests between legacy and future databases
- Adapt automation platform to high population data error patterns
- Handle bespoke errors by tweaking output jobs by hand (which don’t make sense to adapt automation due to their low population)
- Run through promotion to production
These phases of a conversion can overlap each other as code moves through a code conversion project. As you can see, the process is not a “push button” event. It requires skill and focus.
If you would like to learn more about how BladeBridge approaches code conversion go to BladeBridge.com and send us your information to get in touch.
Who is Intricity?
Intricity is a specialized selection of over 100 Data Management Professionals, with offices located across the USA and Headquarters in New York City. Our team of experts has implemented in a variety of Industries including, Healthcare, Insurance, Manufacturing, Financial Services, Media, Pharmaceutical, Retail, and others. Intricity is uniquely positioned as a partner to the business that deeply understands what makes the data tick. This joint knowledge and acumen has positioned Intricity to beat out its Big 4 competitors time and time again. Intricity’s area of expertise spans the entirety of the information lifecycle. This means when you’re problem involves data; Intricity will be a trusted partner. Intricity's services cover a broad range of data-to-information engineering needs:
What Makes Intricity Different?
While Intricity conducts highly intricate and complex data management projects, Intricity is first a foremost a Business User Centric consulting company. Our internal slogan is to Simplify Complexity. This means that we take complex data management challenges and not only make them understandable to the business but also make them easier to operate. Intricity does this through using tools and techniques that are familiar to business people but adapted for IT content.
Thought Leadership
Intricity authors a highly sought after Data Management Video Series targeted towards Business Stakeholders at https://www.intricity.com/videos. These videos are used in universities across the world. Here is a small set of universities leveraging Intricity’s videos as a teaching tool:
Talk With a Specialist
If you would like to talk with an Intricity Specialist about your particular scenario, don’t hesitate to reach out to us. You can write us an email: specialist@intricity.com
(C) 2023 by Intricity, LLC
This content is the sole property of Intricity LLC. No reproduction can be made without Intricity's explicit consent.
Intricity, LLC. 244 Fifth Avenue Suite 2026 New York, NY 10001
Phone: 212.461.1100 • Fax: 212.461.1110 • Website: www.intricity.com