general

news
downloads

projects

streamline
beltway buffers
pipesfs
model-t

documentation

introduction
papers
slides
videos
examples
dev. manual
browse code

mailing list

join / leave
archive

development

svn download
svn log
browse svn
people

vrije universiteit amsterdam logo
Streamline is a Vrije Universiteit Amsterdam research project.

opensource logo
All code is made available under a mixed GNU Lesser General Public (LGPL) and Simplified BSD license.

:: developer manual

Table of Contents


Introduction

Most developers will write applications on top of Streamline. For you, the first part of this manual explains the application APIs. But underneath, Streamline is structured as a extensible communication toolbox. Applications construct datapaths by connecting functions through streams. Streams, in turn, are implemented through buffers and channels. If you want to extend Streamline itself you will need to learn one of the internal APIs. Each type of tool (function, channel, buffer) has its own interface.

This documentation is based on version 1.6.4 of the software. Some interfaces may be different between versions. If you have questions or comments after reading this information, send an email to the development mailing list (ffpf-devel *at* lists.sourceforge.net).


1 Applications

A Streamline application can be written is as little as 15 lines of C. Take the following example:

 1 char cdata[MAX_BUF];
 2 int fd, bd, len;
 3
 4 fd = streamline_request_insert("(tcpsock,expression=5000) > (compress,export,name=mybuf)");
 5 if (fd >= 0) {
 6 	bd = open_stream("mybuf", O_RDONLY);
 7	if (bd >= 0) {
 8		len = read_stream(bd, cdata, MAX_BUF);
 9		if (len >= 0)
10 			printf("compressed into %d bytes\n", len);
11		close_stream(bd);
12	}
13	streamline_request_remove(fd);
14 }

In the first line of code (line 4) a processing request is inserted, using streamline_request_insert. Then, if all went well, we open a datastream using open_stream in line 6. This function's semantic are identical to those of the open POSIX system call. Similarly, the call to read_stream in the following line behaves as the read call. The same for close_stream and close in line 11. After we've finished accessing the datastream we remove the processing request in line 13, using streamline_request_remove.

Even with so few lines, the application involves a bit of magic. How do we know in open_stream that we want to access the datastream "mybuf"? If you look closely at the expression passed to streamline_request_insert you will see that the last function, compress, has the option export set and a name parameter "mybuf". This tells Streamline that all data produced by compress must be saved to a buffer accessible through our POSIX interface with the given name.

The example shows that Streamline has seperate interfaces for setting up the datapath and for accessing the actual datastreams at runtime. It also has a third interface for accessing runtime datapath state, such as counters or function-local stateful memory. We will now discuss each in turn.

1.1 Inserting Requests

Inserting a Streamline request is trivial. Only two function calls are needed, both of which we've already seen in the example above.

  • int streamline_request_insert(const char *request); sets up a datapath. request must be a Streamline Request Language (SRL) expression. The functions returns -1 on error, or a request id otherwise. Keep this id as you will need it to remove the request later on.
  • int streamline_request_remove(int request_id); takes a request id and tries to remove the given request. It returns 0 on success. Any other value indicates an error.

If you have the Graphviz and gv packages installed, you can also use a helper function.

  • void streamline_request_plot(const char *request); uses the dot tool to translate an SRL expression into an image and calls gv to display it. This function is easily changed to output other formats than postscript, such as SVG or PNG. Edit function print_graph in parser/parser.c for this.

These functions are declared in interfaces/api.h.

1.2 Accessing Data

We can be brief in explaining how to access streams: if you can work with POSIX File I/O (FIO), you can work with Streamline streams. The interface, declared in interfaces/transport.h, is a replica of FIO with a twist. Each function has _stream as suffix to its name. Thus, open becomes open_stream, etcetera.

As an extension to the standard, Streamline also implements peek_stream, a faster alternative to read_stream. Whereas read_stream copies data into a buffer supplied by the application, peek_stream gives the application direct access to Streamline's internal buffer. This improves performance through a reduction in copying. But you need to be cautious when using this function; data may be overwritten as it is accessed. This is very rare (depending on underlying buffer size), but the exception must be checked each time. check_stream can be used for that. It compares a timestamp with the current state of a buffer. The complete interfaces are:

  • ssize_t peek_stream(int bd, char **dest, size_t count) has the same interface as read_stream, aside from the second argument. Instead of pointing to a user-supplied datablock, it holds a pointer to a pointer. If the call is successful, dest will point inside the Streamline buffer and the length of the block is returned. Otherwise, the function returns -1.
  • int64_t check_stream(int bd, int64_t timestamp) compares argument timestamp with the current state of the buffer. If the function returns -1, data accessed since timestamp was obtained may be overwritten. Otherwise, a new timestamp is generated. Call this function with timestamp -1 to receive a first timestamp. Suggested use is to call check_stream before and after each peek_stream call. The interface is such, that if you loop around peek_stream, a single call to check_stream inside the same loop suffices, as shown in this example:
1 int64_t timestamp;
2 ssize_t len = 0;
3 
4 timestamp = check_stream(bd, -1);
5 do {
6	len = peek_stream(bd, &ptr, INT_MAX);
7	[... more processing ...]
8	timestamp = check_stream(bd, timestamp);
9 } while (len > 0 && timestamp > 0);

1.3 Accessing Metadata

[to be filled in]

1.4 Alternative Interfaces

Besides our own interface, Streamline also supports the PCAP and MAPI interfaces. We are also busy implementing the BSD Sockets interface on top of Streamline.

1.5 More Examples

In the distribution, look under src/apps for other applications of Streamline. Especially the files in src/apps/demo are instructive, as they are intentionally kept brief.


2 Internal APIs

2.1 Functions

Functions are the processing nodes in a Streamline graph. To differentiate them from C functions we will call those C-functions. Technically, each function is implemented as a callback C-function that is called from Streamline whenever a block arrives at one of its input streams. The interface that a function must adhere to is a bit more complex than that, however.

Besides the main callback C-function (process2), it also contains C-functions for function initialization and teardown. We further differentiate between initialization of a function class and that of a function instance. The first happens when Streamline starts, the second whenever a request demands the instantiation of a new function somewhere in the runtime system. The full interface that a function must adhere to is defined in interfaces/function.h and also shown below:

struct function {
	char name[FUNC_MAX_NAMELEN];
	const char *info; 	
	enum ffpf_op opcode;
	
	int (*init)(void);
	int (*tryex)(const char *expr, unsigned int elen, unsigned int mlen);
	int (*newex)(struct functiondata *);
	int (*process2)(struct iptr *iptr, struct fptr *fptr, const struct functiondata *fdata);
	int (*delex)(struct functiondata *);
	int (*cleanup)(void);

	enum edgetype intype;
	enum edgetype outtype;
	enum edgetype classtype;
};

The name is the string by which the function can be found, e.g., "compress", info an optional explanation of what the functions does. We will skip explanation of opcode for now. Always use STREAMLINE_OP_EXTERN. Equally, we will not discuss the edgetypes here; leave these 0 in regular functions. That leaves the member C-functions. Aside from process2 all of them are optional, by the way.

  • int (*process2)(iptr, ftpr, fdata) This is the main callback C-function, which gets called for each block arriving on one of the functions inputs. The arguments appear non-trivial, but for must functions, you only need to understand what a struct fptr is.
    struct fptr {
    	char *pdu;
    	uint32_t plen;
    };
    
    As you can see, an fptr is simply a combination of a pointer and a length argument, delineating the block for which the C-function was called. The iptr is a more complex rich pointer that will also work across memory domains and which use can result in fewer data copies. Few functions need to use it directly. fdata, finally, contains pointers to information that persists between calls, such as the expression parameter passed in the SRL request and function-local persistent storage. You need persistent storage if you want to keep some state between process2 invocations, for example a counter that keeps track of how often the C-function has been called.
  • init This C-function is called once in each space for each function class. You can use it to verify that the environment is able to execute your functions. If it returns anything but 0, the function class will not be registered. This can for example happen when class ethernet notices that no ethernet network interface cards can be found.
  • cleanup The teardown counterpart to init, called just before the space closes. Only implement this if init has affected state, to undo its actions.
  • newex Called everytime a function class is instantiated into a function. It can be used to set up function-local persistent storage, or to otherwise prepare the environment. For example, (tcpsock) will open a TCP socket for communication.
  • delex Roll back the modifications to the environment made by newex. Note that even if newex failed and returned a value other than 0, delex will still be called. A common mistake is to use free and other destructors here without checking whether newex actually affected state in the first place.
  • tryex Newex has to interpret its arguments to verify that it can instantiate the function with them. If something is amiss newex will fail. But by then system state may already have been changed by other functions that are part of the same request. To allow a function to return -1 before state is affected a developer is encouraged to also implement this stateless version of newex. Tryex is called for all functions before any are instantiated. If tryex fails for some reason (e.g., the expression passed to function (bpf) is not a valid BPF expression) the entire request fails without having to call delex for already instantiated functions.

All control members return 0 on success; any other value signals an error. The exception to this rule is the datapath member process2. That C-function returns an integer, of which the lower unsigned 16bits are used for classification. Each function is conditionally dependent on its inputs. This means, that blocks are only forwarded along a stream from one function to the next, if the value returned by the first is within the range accepted by the second. By default, if you write (a) > (b), all blocks for which (a) returns anything but 0 are forwarded to (b). See the SRL documentation for more details on condition ranges. Suffice to say here that you should use the return value of process2 as a classification value, whereby it is assumed that 0 signals that a packet should be dropped.

2.1.1 Example

In case the previous gave the impression that functions are difficult to write, let me give a counter example. The following code block implements a crude HTTP blocker. The functions drops all blocks containing the string "http". We conveniently expect all blocks to end with \0 here, but this is not true in general.

static int httpblock_process(struct iptr *iptr,
			     struct fptr *ftpr,
			     struct function_data *fdata)
{
	return strstr(fptr->pdu, "http") ? 0 : 1;
}

struct function httpblock_func = {
	.name = "httpblock",
	.info = "block everything that contains the string \"http\"",
	.opcode = STREAMLINE_OP_EXTERN,
	.process2 = &httpblock_process
};

To make function writing even easier, the distribution comes with a template on which you can buld, in src/functions/template.[ch].

2.1.2 Function Registration

Just writing a function is not enough to use it. The other task is to integrate the function with the Streamline build. The current infrastructure for this is a bit crude, unfortunately. Bear with me.

Integrating your function requires two actions. First, add your function to the Streamline build process. If your function was added to a file that is already included you can skip this step. To include a new file to the userspace build, add the .c file to the FUNC_OB target list in src/Makefile. To also add it to the kernelspace build, add it to modules/Makefile at target streamline-y.

Next, have Streamline register your function (and call your init member if that exists) at runtime. In src/core/process.c, find the list of BOOTTIME_REGISTER_FUNC statements and add your own, as in:

BOOTTIME_REGISTER_FUNC(httpblock_func)

This task has one subtle choice. If you look at the statement list you will see that some functions are only initialized in a subset of all spaces. Specifically, many functions are not initialized in the kernel. If your function happens to rely on a userspace third-party library and cannot be built in kernelspace, make sure you add your statement inside the #ifndef __KERNEL__ tags.

Also, don't forget to add the headerfile in which your function is declared to src/core/process.c's list of #include statements.

2.1.3 Input Functions

Most Streamline functions only need to be executed when data arrives on their inputs. But how does data enter the graph in the first place? This is where input functions come in. These are not called as callbacks through their process2 member. Instead, they explicitly call the Streamline function that bootstraps a graph. To be precise, they call the C-function
void datapath_process_children(struct instance *, char *, int),
which will call the callback C-functions of all instances that are dependent on the one passed as the first argument. The second and third arguments are the pointer to a datablock and the length of the block. Let's look at an example. Say you want to wake up and generate a block every second. The following (userspace) function accomplishes that using Posix threads.


static struct instance * myself;
static pthread_t thr;
static int die = 0;

/** thread that bootstraps processing once every second */
static void * wakeup_callback(void *unused)
{
	static const char * mystring = "wake up!";
	
	while (!die) {
		sleep(1);
		datapath_process_children(myself, mystring, strlen(mystring));
	}
	return NULL;
}

/** affect state: start the thread */
static int wakeup_newex(struct functiondata *fdata)
{
	myself = fdata->self;
	return pthread_create(&thr, NULL, &wakeup_callback, NULL);
}

/** affect state: stop the thread */
static int wakeup_delex(struct functiondata *fdata)
{	
	die = 1;
	return pthread_join(thr, NULL);
}

static int wakeup_process(struct iptr *iptr,
			  struct fptr *ftpr,
			  struct function_data *fdata)
{
	return 0; // should never be called; but must be defined
}

struct function wakeup_func = {
	.name = "wakeup",
	.newex = wakeup_newex,
	.process2 = wakeup_process
	.delex = wakeup_delex
};

If you need maximal performance, datapath_process_children also has a sibling function, datapath_process_children_fast. That C-function demands that you have filled in the iptr and fptr structures, so it is only useful in very specific situations (when you know the itpr).

2.1.4 Modifying Blocks

So far, we have only shown functions that classify or analyze datastreams. A large class, however, modifies block contents. We differentiate between three methods of editing. The simplest is in-place, whereby a function directly modifies the contents of a block. This can be accomplished through accessing the fptr->pdu pointer.

One caveat is that not all buffers are shared read/write. If you access kernel memory as an unpriviledged process editing a buffer may give a SEGFAULT. I have yet to write a C-function that returns the access policy for a given block.

Another point involving editing is that changes are globally visible to all functions that access the same buffer. Streamline calls functions in breadth-first order, thus in "(input) > ( (edit) | (view) )", view will see the modified content. In "(input) > ( (view) | (edit) )" it will see the original content. Also, if edit and view do not share the same fptr (if they have been instantiated in different spaces) view will not see the modified contents irrespective of calling order. This situation is ambiguous and needs improvement.

Replacing Buffers

Next up are operations that change block size, such as compression. If the block shrinks, you can suffice with updating the metadata: the pdu member of fptr and the len member of iptr if defined. If the chance exists that more memory is needed than supplied by the underlying buffer, the buffer must be replaced.

To replace a buffer, a new buffer (or more precisely, buffer area) must be allocated and some iptr and fptr elements must be updated. This is error-prone, which is why I am developing a support C-function to take care of these details, along the lines of
block_get(struct iptr **iptr, struct fptr **fptr, int len); together with block_put(struct iptr *iptr, struct fptr *ftpr); to release the old block after the transformation is completed. This is work in progress, however.

Out-of-order Rewriting

The third editing mode is where input and output streams appear asynchronous. Thus, arrival of a block does not necessitate output and output may be generated at any time. This model is only needed in a very small set of functions. Currently, it is used only for TCP stream reconstruction.

This mode is so rare that I did not develop a clean interface for it. Instead, what the TCP function does is drop all blocks in process2 and independently call datapath_process_children when new data becomes available on one of the TCP streams. It is no accident that out-of-order functions use the same interface as input functions: you could say that input functions are out-of-order rewriting functions that take an empty input.

2.2 Buffers

[to be filled in]

2.3 Channels

[to be filled in]

2.4 Translators

[to be filled in]

3 Contributing to the main tree

If you want to contribute back to the project, great! Please send your changes through the mailing list. We accept two types of changes. If you have built an individual function or application, just email us the source files. If your changes affect many files, submit a patch instead (of the type used by, for example, the Linux kernel). Make sure that your changes are against the latest subversion development tree or the latest stable release.

3.1 Licensing

We can only accept code that is licensed under the LGPL or that can coexist with the LGPL (such as BSD code). You do not have to give us ownership of the code. Make sure that your name and contact information as well as the chosen license are clearly displayed at the top of each submitted file.

Are you a student interested in systems research? Streamline is a Vrije Universiteit Amsterdam research project. We're always looking for exceptional candidates for our Master's program.