AutoC Tools

03/06/2012

About

Recently I brushed up another version of my AutoC libraries - this time for Lua. On first impressions Lua looks quite odd (indicies start from 1?! Hash symbol isn't for comments?! Explicit local variable declaration?!) but really these are minor details and at whole it is a fantastic little language. It was very fun to work with.

With the AutoC tools now working well across two languages I thought I would give a short overview on how they work because at first glance I was hoping it would be indistinguishable from magic. I wont go over it in much detail, but I'll try to explain some of the more interesting aspects.

Types

Standard C has no runtime introspection, so at the heart of any system for interaction with dynamically typed programming languages is a method for storing and inspecting type information at runtime.

And at the heart of any runtime type system is a type-id. To get the type-ids of functions and stucts I make the programmer declare the types of the functions and structures via a set of simple macros. These macros use the stringification operation to feed a string into a hashing function. Any autoc macro which takes raw type tokens does this stringification before passing them into the hashing function, which returns a type-id I can use in the internals of the system.

/* A typical PyAutoC Registration macro */
#define PyAutoStruct_Register(type) PyAutoStruct_Register_TypeId(PyTypeId(type))

/* The stringification macro */
#define PyTypeId(type) PyAutoType_Register(#type, sizeof(type))

/* The Definition of the type hashing function */
PyAutoType PyAutoType_Register(char* type, int size);

The hashing function is nothing special. It simply returns an auto-incrementing integer for each new unique string it encounters. While this means that the actual type-id for a type may differ per run, it does ensure that the type-id will always remain the same for identical types in the same run. This was enough for my purposes. With a little macro trickery we can even ensure that the programmer always supplies valid types which is great for typos and argument misorderings.

This system also allows me to store type names and type sizes in a table. This is useful for other parts of the system including making stack space when function calling and printing readable error messages.

Conversions

With some type information avaliable at runtime I could set up a system to create associations between C types and (Python/Lua) types. For Python this meant creating functions converting between C types and PyObjects and in Lua this meant creating functions for pushing and interpriting c types on and off the Lua stack. I could use a void pointer in C to reference a generic location for the type conversion functions to act upon.

The scripting APIs generally provided functions for converting the native c types out of the box, so these just needed to be wrapped and associated with their appropriate c type name/id - but more importantly the system allowed users to write their own conversion functions and register their own types. This was an essential step for making the whole system programmable and extendable and it essentially overcomes the fact that overloading or template programming is missing from vanilla C. Using a void pointer and the type information previously gathered we could now created an interface for proper conversion between generic types.

Structs

With this system it is now possible to allow the scripting language access to data in c variables, and via simple extension - structs. This can just be considered an extension of the automatic conversion functionality. Users can register structs and their members. Using these registration macros the member name (as a string), offset and type-id can all be recorded.

Using the member name as a string, this information can then be looked up and used to provide a pointer to the struct member in memory, as well as the appropriate type conversion functions to apply to it.

Functions

This is the final step in the system and by far the most complicated. At first it seems like it should be easy - providing the user has registered a function, we can look up all the information we need with no problem at runtime. We can get a pointer to the function we want to call. For each argument we can get the size, count, and even the appropriate conversion function. Using these conversion functions we can even extract the raw C data we want to call the function with.

Yet C fails us at the final hurdle. There is no library function which we can call which looks like this.

void call_function(void* func_ptr, void* return_data, void* arg_data, int arg_data_size);

The issue is that the way in which a function is called is not as black and white as C might make it appear. It isn't just a case of just copying a block of argument data from one place to another. It is not something that can be easily performed at runtime and it requires detailed work by the compiler/linker. In short, the calling convention is platform independant, complex, and there are many ways in which data is pushed and pulled off the program stack.

So is it impossible? Well we still have all the information we need. Perhaps we will have to copy some aspects of what the compiler does but there should still be no reason why it isn't conceptually possible. So I did some research into calling conventions and looked into programming the call via assembly. Unfortunately this path ultimately looked like it would be a dead end. It was becoming very complicated and using assembly to perform the call would essentially throw away any real idea of portability that might have existed. Not to mention it would introduce a bunch of subtle and complex bugs which I wouldn't have the experience to look over.

I then realized that I could essentially overcome this issue altogether if I could find a way to wrap a function in my own calling convention defined in C. More specifically, it would work if I could find a way to transform a function of any type into to a function of a single specific type:

void wrapped_function(void* out, void* args);

Any function of this specific type could be called using all the runtime data I had gathered. And provided it semantically did what was intended - everything would work perfectly! So the final trick was in this transformation.

I'd already decided one of my goals was to leave the existing source unedited so doing a source transformation, although easy, wasn't really an option. Luckily it turns out that this particlar transformation is possible using only macros and nested function declarations. In general, if we assume data is stacked compactly and in a FIFO order then we can imagine the transformation macro looking something like this:

ret_t function(arg0_t, arg1_t, arg2_t, ...);

/* == Transforms To ==> */

void wrapped_function(void* out, void* args) {
  arg0_t a0 = *(arg0_t*)args;
  arg1_t a1 = *(arg1_t*)(args + sizeof(arg0_t));
  arg2_t a2 = *(arg2_t*)(args + sizeof(arg0_t) + sizeof(arg1_t));
  ...
  *(ret_t*)out = function(a0, a1, a2, ...);
}

The C macro system can express such a transformation with two limitations. Firstly the varadic macro system is not powerful enough to properly express the transformation needed on each argument and secondly if a function has return type of void then writing to the "out" variable is invalid syntax. Both of these issues can be overcome by writing variations of the macro depending on argument count and if it returns void.

All Togther

With the above systems on top of each other we end up with a system for runtime introspection of types in C as well as systems for converting between types in different languages. We even have the ability to call arbritary C functions with argument data in a different language. With all the heavy macro use it might not be seen as a pretty solution but I hope after explaining the internals it is clear that there isn't anything too horrible going on - and most of it is just to save typing.

If you want any more in depth information then looking through the source code is probably a good bet. Otherwise feel free to contact me.