Search   Object-Z Publishing   Cobol University   Feedback   Contact Us   Membership   Ad Info
 
 

 

Object-Z Systems

    

  

 Best Paper of 2000

The Realities of Language Conversions
January 2, 2001

By Andrey A. Terekhov, St. Petersburg State University
Chris Verhoef, Free University of Amsterdam

Billions of lines written in Cobol, PL/I, and other mature languages are still in active use. Many developers have tried to convert these languages to more modern ones, but few have succeeded. This article sheds light on the realities of language conversions and discusses the possibilities and limitations of automated language converters.

The most influential cost drivers of software engineering relate to management, personnel, and team capability—not software tools, although software vendors and academic researchers emphasize them. Many managers become victims of quack software modification tools and services. This mechanism, also known as the silver-bullet syndrome, is what anthropologists call name magic: you just say the name of the thing—“Cobol to Java”—and you have its full power at your disposal.

As the term name magic indicates, you don’t need proof to support your claims. Anyone desperate for solutions will whole-heartedly accept those unsupported claims. (1) In fact, many software industry decision-makers find themselves trapped in huge amounts of legacy code, in dire need of modification. At the same time, software engineering educators deliver people skilled in contemporary development rather than enhancement programming, let alone geriatric care for aging legacy applications. Capers Jones (2) even thinks that this phenomenon is one of the 60 most important risks in software engineering. You do not have to be a rocket scientist to figure out that a language converter could solve your personnel problems: a language conversion is supposed to bridge the gap between available knowledge of people and the knowledge needed to solve legacy problems. Is it that easy? This article sheds light on the realities of language conversions; it also provides simple examples that expose the problems— anyone involved in software engineering can understand them.

So How Hard Is Language Conversion?

We encountered a company stating about a Cobol–to–Visual Basic converter, “The converter runs as a simple wizard: it is intended to be something a secretary can run.” Furthermore, at a panel of the International Conference on Software Maintenance, a researcher discussing the next century’s challenges implied that automated conversion is solved and that the next challenge is paradigm-shift conversion. This means not only converting Cobol to C++ but also text-based procedural code into Web-enabled object-oriented C++. From such observations, we could conclude that automated language conversions are easy. And let’s face it, converting the Cobol calculation ADD 1.005 TO A to its equivalent in VB, A = A +1.005, is indeed very easy. Any simple tool can do it. On the other hand, language conversion seems to be risky business. We know of three companies that went bankrupt and two departments of large software-intensive enterprises that were dismantled, all because of failed language conversion projects. Tom Holmes from Reasoning states that if success means making a profit, then most conversion projects are not successful. A reader of a preliminary version of this article told us of an enterprise that spent $50 million on failed language conversion projects. In his book on computing calamities, Robert Glass mentions the failure of a translation system that would convert software from an obsolete system to a new one. Management told the developers that the conversion problem was limited in scope and that they would need only to convert a limited set of constructs. This turned out not to be true. The postmortem analysis indicated that the converter was perhaps 10 times more complicated than expected; suddenly, what had been technically feasible became economically and technically infeasible. (3)   

In the news group alt.folklore.computers, S.C. Sprong, who ported software from Fortran to C, stated, “Low-level porting, even with provided documentation, is one of the blackest software arts; knowing how to do it well will surely get you a first-class ticket to hell.”

Needless to say, there is a lot of disagreement on the complexity of language conversion. Perhaps it is theoretically possible to automatically improve a software system’s structure to achieve a paradigm shift—for example, by introducing OO concepts. However, this is very difficult to automate and it implies a lot of human intervention. (4) However, the payoff is commensurately large, especially for systems with indefinite life spans. Due to automation problems, most conversion tools apply the technology of syntactic conversion. Even with this seemingly simple and low-level approach, many difficulties occur, and the scale of those intricacies is not yet fully understood.

The problem statement for language conversion is simple: convert this system to that language without changing the external behavior.

Despite the lack of solid publications about language conversions (we listed the most useful references, (5–13) but some are difficult to find or are in German, so contact us for copies or pointers), lots of vendors advertise migration tools and services. Many companies say they can convert your systems to whatever language you want. Although some of them may be doing a great job, others who claim to have technology and skills are not. For instance, one company advertising on the Internet provided examples of converted code that appeared not even to be compilable. Another company claimed to be able to convert Power-Builder to Java, but after our inquiries they had neither the experience nor the tools to do the job. Instead, they were claiming to have a process! In a quote (which we translated from German to English) from his book on OO software migration, Harry Sneed summarizes the state of the practice in the transformation marketplace as follows: (12)  “The reality looks different. Those who can read between the lines recognize that the problems are grossly simplified and that the advertised products are far from being ripe for use in practice.”

Requirements for Language Conversion Aids

The problem statement for language conversion is simple: convert this system to that language without changing the external behavior. From an abstract point of view, a language migration project seems deceptively simple. Consequently, requirements for language converters are often not formulated.

Usually, the availability of constructions that facilitate the expression of a solution determines how easy it is to formulate a solution for a particular problem. We could call such elements in a programming language native language constructs. For instance, if we wish to express a conditional problem, a language that supports a conditional language construct is more convenient than a language lacking an if-then-else construct. If we must use the latter language, we must simulate the conditional construct. Such code fragments are called simulated language constructs. We can, for instance, simulate object orientation in a language lacking OO support.

The language conversion problem for a serious software system amounts to mapping its native and simulated constructs to hopefully only native constructs available in the target language(s), as Figure 1 shows. 


Figure 1. A mapping of constructions between languages.
 

There are at least six possible categories in this mapping (expressed as the six arrows in the figure), all of which we have encountered. Requirement specifications usually mention only the native-to-native part of the mapping, in the form of a statement-by-statement conversion. This phenomenon is in accordance with people’s tendency to focus first on the easiest problems. One of us (Chris Verhoef) was an external reviewer of several large-scale language conversion projects. Most of them failed because the hard problems were avoided in the requirements. In one case, for instance, about 80% of the requirements specification dealt with the graphical user interface, while the actual language converter was represented by a single arrow. Someone underestimated the problem and thus omitted the hard parts from the requirements.  

Underestimation often leads to runaway projects. The first project failure delays the actual software conversion, which in turn triggers increased pressure on the next development team to deliver the converter. This pressure makes it easy for the new team to skip requirements altogether; after a few failure cycles, total breakdown occurs, clarifying the demise of the earlier-mentioned language conversion companies and departments.  

To develop a source-to-source converter, you need at a minimum several requirement specifications:  

  • You must inventory the native and simulated language constructs of the system that needs conversion.
     
  • You must develop a conversion strategy for each language construct. Specifically, you must list the input and output fragments that describe the converter’s desired behavior.
  • You must make an explicit statement about whether the converted system should be functionally equivalent to the original system. Intuitively, you would think this is always so, but in practice, a sophisticated automated modification effort usually exposes faults and unsafe code in the original system. Often, the customer then requires the developers to fix these faults as well. Thus, potential requirements creep must be dealt with in the requirements. Note that it also hampers testing the new system because regression testing is based on equivalence. 
  • You must include a statement about whether the original system’s test sets are to be converted. If errors in the tests you expose, the requirements must state the policy toward modification.

  • You must set as your goal maximum automation of the conversion (enabling minimal human interference). 
  • If you plan to maintain the converted system, you must make it maintainable. For instance, if the original maintenance team is going to continue in its role, the new system should be as similar as possible to the original system so that the team can recognize the original code. If, on the other hand, the maintenance team is new, the conversion should try to use the target language’s idiom so that maintainers recognize the code as normal for that language. In other cases, the original source code (say, in Cobol) is used for modifications even after a successful conversion (say, to Java), because the translation process is so automated that it’s easier to regenerate final Java code than to make changes in Java. This also helps maintainers if they are familiar with the original language but not yet with the target one.
  • You must make the converted system’s efficiency adequate both in compilation and execution time.
  • If you will use the converter many times, the time required for conversion is relevant. It is not always feasible to optimize this without distributing calculations over many machines.
  • If you plan to maintain the converted system, its size must not exceed the original system's size by much (if at all). 

Apart from these explicit requirements, there is always an implicit understanding of how the converter will work. This reflects the customer’s expectations of the advantages associated with transferring the system to a more contemporary environment. These imaginary advantages often motivate the conversion process but they are rarely realized. A popular misconception is that after conversion the system is change-enabled so that totally new features can be implemented easily. The problem is aggravated by companies marketing conversion software as yet another silver bullet; the quality of such converters is often less than optimal, and in some cases even nonexistent. An automatically converted program is usually not as good as a new one developed within the full range of a contemporary programming language. Although the outcome of conversion should ideally be code that acts as if it is written in the target language and uses the target language’s features and idiom, actual converted programs often retain the idiom of the source language (more on this later). Some people expect that the structure of the program will surely improve after automated conversion, but conceptual changes to the application will always remain labor-intensive and require human interference. (5,13) For instance, imagine replacing mainframe CICS with C++ using Microsoft Transaction Server.

Technical Problems

A list of input and output patterns is very helpful when converting from one language to another—in fact, this is the hard part of language conversion.

Converting data types

One of the first problems is converting data types. Although we do not always realize it, programming languages usually have idiosyncratic data type conventions. For instance, many people consider C++ and Java to be similar, so a native-to-native conversion seems a simple task. Yet, their data types reveal many differences. For example, C++ has pointer-type variables but Java doesn’t; Java has Booleans but C++ doesn’t. Also, C++ data type sizes vary from platform to platform, whereas they are fixed in Java. So even when converting between C++ and Java, we run immediately into the problem of representing idiosyncratic data types.

Although we do not always realize it, programming languages usually have idiosyncratic data type conventions.

Thus, you should not be surprised that the differences between languages like Cobol or PL/I and languages such as Java, VB, or C++ are perhaps insurmountable. For example, consider this PL/I data type:

DECLARE C FIXED DECIMAL (4,-1);

The variable occupies three bytes, with the decimal point assumed to be one position to the right of the number. Thus, may contain values 123450 and 123460 but not 123456. c ranges  from –99999*10 to 99999*10, and all values assigned will be truncated at the last digit, so the assignment C = 123456 is equivalent to C = 123450. Since the last digit is always a zero, it is not stored at all. Clearly, neither this data type nor the assignment operator correspond to any standard C++ data type and assignment operation. (The article by Kostas Kontogiannis and colleagues (5) converts a PL/I dialect to C++ but does not address this problem.)  

PREVIOUS | NEXT

 

This website is copyrighted by Object-Z, Systems Inc.
Maintained by Hector Gonzalez. Last Updated: January 1, 2001

Reproduction of material from any Object-Z pages without written permission is strictly prohibited.
Copyright © 1999 Object-Z Systems Inc. COBOL World, COBOL World Online,
The COBOL Report, COBOL University, CobolReport.com and COBOL U
are trademarks of Object-Z Systems Inc.