The Realities of Language Conversions
January
2,
2001
By
Andrey A. Terekhov, St. Petersburg State University
Chris Verhoef, Free University of Amsterdam
Billions of lines written in Cobol, PL/I, and other mature languages are still in active use. Many developers have tried to convert these languages to more modern ones, but few have succeeded. This article sheds light on the realities of language conversions and discusses the possibilities and limitations of automated language converters.
The
most influential cost drivers of software engineering relate to
management, personnel, and team capability—not
software tools, although software vendors and academic researchers
emphasize them. Many managers become victims of quack software
modification tools and services. This mechanism, also known as the
silver-bullet syndrome, is what anthropologists call name magic:
you just say the name of the thing—“Cobol to Java”—and you have
its full power at your disposal.
As
the term name magic indicates, you don’t need proof to support
your claims. Anyone desperate for solutions will whole-heartedly accept
those unsupported claims. (1)
In fact, many software industry decision-makers find
themselves trapped in huge amounts of legacy code, in dire need of
modification. At the same time, software engineering educators deliver
people skilled in contemporary development rather than enhancement
programming, let alone geriatric care for aging legacy applications.
Capers Jones (2)
even thinks that this phenomenon is one of the 60 most
important risks in software engineering. You do not have to be a rocket
scientist to figure out that a language converter could solve your
personnel problems: a language conversion is supposed to bridge the gap
between available knowledge of people and the knowledge needed to solve
legacy problems. Is it that easy? This article sheds light on the
realities of language conversions; it also provides simple examples that
expose the problems— anyone involved in software engineering can
understand them.
So
How Hard Is Language Conversion?
We
encountered a company stating about a Cobol–to–Visual Basic converter,
“The converter runs as a simple wizard: it is intended to be something a
secretary can run.” Furthermore, at a panel of the International
Conference on Software Maintenance, a researcher discussing the next
century’s challenges implied that automated conversion is solved and
that the next challenge is paradigm-shift conversion. This means not only
converting Cobol to C++ but also text-based procedural code into
Web-enabled object-oriented C++. From such observations, we could
conclude that automated language conversions are easy. And let’s face
it, converting the Cobol calculation ADD 1.005 TO A to its equivalent in
VB, A
= A +1.005, is indeed
very easy. Any simple tool can do it. On the other hand, language
conversion seems to be risky business. We know of three companies that
went bankrupt and two departments of large software-intensive enterprises that were dismantled, all because of
failed language conversion projects. Tom Holmes from Reasoning states that
if success means making a profit, then most conversion projects are not
successful. A reader of a preliminary version of this article told us of
an enterprise that spent $50 million on failed language conversion
projects. In his book on computing calamities, Robert Glass mentions the
failure of a translation system that would convert software from an
obsolete system to a new one. Management told the developers that the
conversion problem was limited in scope and that they would need only to
convert a limited set of constructs. This turned out not to be true. The
postmortem analysis indicated that the converter was perhaps 10 times more
complicated than expected; suddenly, what had been technically feasible
became economically and technically infeasible. (3)
In the news group alt.folklore.computers, S.C. Sprong,
who ported software from Fortran to C, stated, “Low-level porting, even
with provided documentation, is one of the blackest software arts; knowing
how to do it well will surely get you a first-class ticket to hell.”
Needless to say, there is a lot of disagreement on
the complexity of language conversion. Perhaps it is theoretically
possible to automatically improve a software system’s structure to
achieve a paradigm shift—for example, by introducing OO concepts.
However, this is very difficult to automate and it implies a lot of human
intervention. (4) However,
the payoff is commensurately large, especially for systems with indefinite
life spans. Due to automation problems, most conversion tools apply the
technology of syntactic conversion. Even with this seemingly simple and
low-level approach, many difficulties occur, and the scale of those
intricacies is not yet fully understood.
The problem statement
for language conversion is simple: convert this system to that language
without changing the external behavior.
Despite
the lack of solid publications about language conversions (we listed the
most useful references, (5–13)
but some are
difficult to find or are in German, so contact us for copies or pointers),
lots of vendors advertise migration tools and services. Many companies say
they can convert your systems to whatever language you want. Although some
of them may be doing a great job, others who claim to have technology and
skills are not. For instance, one company advertising on the Internet
provided examples of converted code that appeared not even to be
compilable. Another company claimed to be able to convert Power-Builder to
Java, but after our inquiries they had neither the experience nor the
tools to do the job. Instead, they were claiming to have a process! In a
quote (which we translated from German to English) from his book on OO
software migration, Harry Sneed summarizes the state of the practice in
the transformation marketplace as follows: (12) “The
reality looks different. Those who can read between the lines recognize
that the problems are grossly simplified and that the advertised products are far from being ripe for use in practice.”
Requirements
for Language Conversion Aids
The
problem statement for language conversion is simple: convert this system
to that language without changing the external behavior. From an abstract
point of view, a language migration project seems deceptively simple.
Consequently, requirements for language converters are often not
formulated.
Usually,
the availability of constructions that facilitate the expression of a
solution determines how easy it is to formulate a solution for a
particular problem. We could call such elements in a programming language native
language constructs. For instance, if we wish to express a conditional
problem, a language that supports a conditional language construct is more
convenient than a language lacking an if-then-else construct. If we must
use the latter language, we must simulate the conditional construct. Such
code fragments are called simulated language constructs. We can,
for instance, simulate object orientation in a language lacking OO
support.
The language conversion
problem for a serious software system amounts to mapping its native and
simulated constructs to hopefully only native constructs available in the
target language(s), as Figure 1 shows.

Figure 1. A mapping of constructions between languages.
There
are at least six possible categories in this mapping (expressed as the six
arrows in the figure), all of which we have encountered. Requirement
specifications usually mention only the native-to-native part of the
mapping, in the form of a statement-by-statement conversion. This
phenomenon is in accordance with people’s tendency to focus first on the
easiest problems. One of us (Chris Verhoef) was an external reviewer of
several large-scale language conversion projects. Most of them failed
because the hard problems were avoided in the requirements. In one case,
for instance, about 80% of the requirements specification dealt with the
graphical user interface, while the actual language converter was
represented by a single arrow. Someone underestimated the problem and thus
omitted the hard parts from the requirements.
Underestimation
often leads to runaway projects. The first project failure delays the
actual software conversion, which in turn triggers increased pressure on
the next development team to deliver the converter. This pressure makes it
easy for the new team to skip requirements altogether; after a few failure
cycles, total breakdown occurs, clarifying the demise of the
earlier-mentioned language conversion companies and departments.
To
develop a source-to-source converter, you need at a minimum several
requirement specifications:
- You must inventory the native and simulated language constructs of
the system that needs conversion.
- You must develop a conversion strategy for each language construct.
Specifically, you must list the input and output fragments that
describe the converter’s desired behavior.
- You must make an explicit statement about whether the converted
system should be functionally equivalent to the original system.
Intuitively, you would think this is always so, but in practice, a
sophisticated automated modification effort usually exposes faults and
unsafe code in the original system. Often, the customer then requires
the developers to fix these faults as well. Thus, potential
requirements creep must be dealt with in the requirements. Note that
it also hampers testing the new system because regression testing is
based on equivalence.
- You must include a statement about whether the original system’s
test sets are to be converted. If errors in the tests you expose, the
requirements must state the policy toward modification.
- You must set as your goal maximum automation of the conversion
(enabling minimal human interference).
- If you plan to maintain the converted system, you must make it
maintainable. For instance, if the original maintenance team is going
to continue in its role, the new system should be as similar as
possible to the original system so that the team can recognize the
original code. If, on the other hand, the maintenance team is new, the
conversion should try to use the target language’s idiom so that
maintainers recognize the code as normal for that language. In other
cases, the original source code (say, in Cobol) is used for
modifications even after a successful conversion (say, to Java),
because the translation process is so automated that it’s easier to
regenerate final Java code than to make changes in Java. This also
helps maintainers if they are familiar with the original language but
not yet with the target one.
- You must make the converted system’s efficiency adequate both in
compilation and execution time.
- If you will use the converter many times, the time required for conversion is relevant. It is not always feasible to optimize this without distributing calculations over many machines.
- If you plan to maintain the converted system, its size must not exceed the original system's size by much (if at all).
Apart
from these explicit requirements, there is always an implicit
understanding of how the converter will work. This reflects the
customer’s expectations of the advantages associated with transferring
the system to a more contemporary environment. These imaginary advantages
often motivate the conversion process but they are rarely realized. A
popular misconception is that after conversion the system is
change-enabled so that totally new features can be implemented easily. The
problem is aggravated by companies marketing conversion software as yet
another silver bullet; the quality of such converters is often less than
optimal, and in some cases even nonexistent. An automatically converted
program is usually not as good as a new one developed within the full
range of a contemporary programming language. Although the outcome of
conversion should ideally be code that acts as if it is written in the
target language and uses the target language’s features and idiom,
actual converted programs often retain the idiom of the source language
(more on this later). Some people expect that the structure of the program
will surely improve after automated conversion, but conceptual changes to
the application will always remain labor-intensive and require human
interference. (5,13)
For instance,
imagine replacing mainframe CICS with C++ using Microsoft Transaction
Server.
Technical Problems
A
list of input and output patterns is very helpful when converting from one
language to another—in fact, this is the hard part of language
conversion.
Converting data types
One
of the first problems is converting data types. Although we do not always
realize it, programming languages usually have idiosyncratic data type
conventions. For instance,
many
people consider C++ and Java to be similar, so a native-to-native
conversion seems a simple task. Yet, their data types reveal many
differences. For example, C++ has pointer-type variables but Java
doesn’t; Java has Booleans but C++ doesn’t. Also, C++ data type sizes
vary from platform to platform, whereas they are fixed in Java. So even
when converting between C++ and Java, we run immediately into the problem
of representing idiosyncratic data types.
Although we do not always
realize it, programming languages
usually
have idiosyncratic data type conventions.
Thus,
you should not be surprised that the differences between languages like
Cobol or PL/I and languages such as Java, VB, or C++ are perhaps
insurmountable. For example, consider this PL/I data type:
DECLARE C FIXED DECIMAL (4,-1);
The
variable occupies three bytes, with the decimal point assumed to be one
position to the right of the number. Thus, may contain values 123450 and
123460 but not 123456. c ranges from
–99999*10 to 99999*10, and all values assigned will be truncated at the
last digit, so the assignment
C = 123456
is equivalent
to C = 123450. Since
the last digit is always a zero, it is not stored at all. Clearly, neither
this data type nor the assignment operator correspond to any standard C++
data type and assignment operation. (The article by Kostas Kontogiannis
and colleagues (5)
converts a PL/I dialect to C++ but does not address this
problem.)
PREVIOUS
|
NEXT
|