CORRAL

(Code Optimization & Runtime Reconfiguration using Active Linking)

Summary

The idea of CORRAL is to create a system of simple components that each perform a certain dask on data, and that can be connected in series or complex combinations in order to create useful systems. These systems then will be easily modified, even at run-time.

For instance, a simple system would consist of three elements: a random number generator, an output element that prints to the screen, and a Ďpumpí in between that pulls random data out of the first element and passes it to the second. More complex systems could rise in size up to an effective web router.

System used

Linux 2.4.20 #5

(also UML for testing since part of CORRAL works from the kernel)

RPC is required

Terminology

  1. Provide data to the system; current examples include a random number generator and an IP message grabber.
  2. Perform specific operations on the data, such as encryption (currently using simple XOR), adding two numbers or storing data messages in a queue for later retrieval.
  3. Outputs data; current possibilities include terminal output using printf and playing raw wave data over /dev/audio. An obvious use would be to output IP messages over a network connection; however with the regular Linux kernel this isnít possible.

Goals

Issues

It is possible to create infinite loops or deadlocks by ill-considered creation or linking of elements. This, however, is not considered an issue but left to user responsibility. Indeed it is possible to create the CORRAL equivalent of " while (1); " but thatís plainly bad programming on the userís part. However an idea for future extension is to create a simple scripting language using YACC or BISON that locates this kind of error.

It is important that the system shuts down properly on user request, and cleans up. This is especially true for the kernel part, which will crash the computer if not properly shut down. Obviously, then, CORRAL must adequately respond to SIGTERM and other critical signals. It is not advisable to shutdown corral using SIGKILL as this may in some situations leave elements such as the IP message grabber in place.

One important issue throughout development has been that of mutices. The original idea was to allow plumbing on any element thatís not currently active. This does require extensive use of mutices on each element, which degrades performance. Also it means that if an element is active for a long time, plumbing will be deferred until the element is done. Then a second idea was to suspend the entire system during plumbing operations. But itís not a good idea to suspend a router for long as this will cause message loss. See below.

Plumbing solutions

The plumbing problem was solved by the following approach. First, creating new elements can always be done concurrently to any running datapaths. Removing existing elements is not Ďsafeí when any active datapath uses the element, as a pointer to a port that has already been freed tends to cause panic the kernel when dereferenced. So instead, on request an element is tagged for removal, then removed some time after that by the garbage collector.

The same holds true for ports: adding a port is safe, but removing one is not. Similarly changing a port to a different destination is not safe. Rather than using mutices, a more theoretically sound approach is taken. Each element has pointers to a number of ports. When a datapath requires a port, it copies the pointer to local memory and works from there. This means that in the meantime, the pointer in the element may change to some other value, and this will not affect running datapaths. Since changing a pointer is an atomic operation, each port pointer will always be meaningful.

The ports themselves are not freed after use, but tagged like the elements and removed by the garbage collector. The garbage collector does use a semaphore to ensure it never runs concurrently with a datapath.

Optimization

One way of optimizing involves not double-checking everything. For instance, a function that takes an element-pointer as its parameter, will in most cases not check if this pointer is actually valid or non-NULL. This is safe if we can guarantee that the calling function always supplies a valid pointer. Of course the plumbing functions arenít optimized in this way since they donít need to be as fast as possible.

Element creation

A number of sample elements already exist. Creating a new element would involve writing a number of functions for the element, then supplying the program with a struct that contains pointers to these functions.

The functions handle creation and destruction of the element, as well as checking if a certain port connection is legal (for instance certain elements only take integer inputs) and handling how data is pushed into or pulled out of the element. All of these functions can be omitted (substituting a NULL in the struct) and indeed most elements only use roughly half of them. The only mandatory function is the one that checks if connections are legal, for without that, the element cannot be connected to anything and that would be rather pointless.

If the user requests to create a port between two elements, first a query is sent to the element that inputs data into the port. This element should return the datatype of the port (or, it should indicate the port is not valid). Then, the element that the port outputs data to is queried if it will accept data of this type. If so, the connection is made.

Scripting

Currently the user can create, remove, connect and disconnect elements via a simple program that sends RPCs to the main program. It is therefore irrelevent if this main program is on the same computer or not.

A system can be defined using a standard shellscript that calls for the creation and connection of certain elements, and may use environment variables for the computer that the system is to be created on. Note that the element that intercepts IP messages also intercepts RPC messages and can therefore render the system inaccessible.