Tag Archives: language implementation

COLAs or CLOAs? : are lambda systems fundamentally simpler than object systems?

Should Combined-Object-Lambda-Architecture really be Combined-Lambda-Object-Architecture?

Ian Piumarta’s IDST bootstraps a object-system, then a compiler, then a lisp evaluator. Maru bootstraps a lisp evaluator, then crafts an object system, then a compiler. Maru is much smaller and elegant than IDST.

Are object systems necessarily more complex than lambda evaluators? Or is this just another demonstration of how Lisp code/data unification is more powerful?

If message send and function calls are decomposed into lookup() and apply(), the only difference between basic OO message-passing and function calling is lookup(): the former is late-bound, the latter is early bound (in the link-editor, for example). Is OO lookup() the sole complicating factor? Is a lambda-oriented compiler fundamentally less complex than a OO compiler?

Ruby Internals: Why RUBY_FIXNUM_FLAG should be 0x00

Type tags in MRI Ruby VALUE

Internally, values in MRI Ruby are 32-bit (at least for 32-bit processors). Some of the least-significant bits are used to store type information. See the VALUE definition in include/ruby/ruby.h. Using type tag bits avoids allocating additional memory for commonly-used immutable values, like integers.

Ruby uses a single-bit tag of 0x01 as the Fixnum type tag. The remaining bits, are used to store the Fixnum’s signed value. This is an immediate value; it doesn’t require storage for the Fixnum value to be allocated, unlike the heap space that would be required for a String. Other types will use different tag bits.

Since Ruby uses 31 bits to store the Fixnum’s value, all the other types use 0 for least-significant bit; Ruby uses dynamic length tags:

 3         2         1
10987654321098765432109876543210 bit index 
sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 Fixnum
00000000000000000000000000000000 false
00000000000000000000000000000010 true
00000000000000000000000000000100 nil
00000000000000000000000000000110 undef
xxxxxxxxxxxxxxxxxxxxxxxx00001110 Symbol
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx0x0 All other types

A non-zero Fixnum type tag requires that the tag must be masked or shifted away on Fixnum operands before numeric operations. In a running program, Fixnum is probably the most common numeric type and quite possibly the most common type of all; it makes sense to make the most common operations on Fixnum: +, -, *, / as fast and as simple as possible. It’s also likely that addition is the most common operation applied to Fixnum.

Imagine the following bit of Ruby code:

x = 1
y = 3
puts x + y

Internally, INT2FIX(X) is used to create a Ruby value from a C int X:

#define INT2FIX(X) (((X) << 1) | RUBY_FIXNUM_FLAG)
#define FIX2INT(X) ((X) >> 1)


INT2FIX(1) => 3
INT2FIX(3) => 7

FIX2INT(X) shifts one bit down which removes the tag and returns the original integer.

To compute x + y, Ruby internally, in numeric.c: fix_plus(), does the following:

x + y                             =>
INT2FIX(FIX2INT(x) + FIX2INT(y))  =>
((3 >> 1) + (7 >> 1)) << 1 | 1    =>

FIX2INT(9)                        => 
(9 >> 1)                          =>

If a type-tag of 0x00 was used for Fixnums, there is no tag to remove and addition or subtraction on Fixnums requires no tag manipulations. Assume:

#define INT2FIX(X) ((X) << 1)
#define FIX2INT(X) ((X) >> 1)

The Ruby expression x + y would simply be x + y in C code, assuming no underflow or overflow into Bignum.

Multiplication with zero Fixnum tags is very simple: only one of the operands needs to be shifted down:

#define FIXMUL(X, Y)((X) >> 1) * (Y))

Fixnum division: the result of the division is shifted up:

#define FIXDIV(X, Y) (((X) / (Y)) << 1)

Two-bit type tags on 32-bit architectures

Oaklisp and LL (http://kurtstephens.com/pub/ll/) use 2-bit tags for all values. LL uses the following tag scheme:

 3         2         1
10987654321098765432109876543210 bit index 
sxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00 <fixnum>
pppppppppppppppppppppppppppppp01 <locative>
seeeeeeeemmmmmmmmmmmmmmmmmmmmm10 <flonum>
pppppppppppppppppppppppppppppp11 All other types

Floating-point (<flonum>) values are C floats which sacrifice the 2 least-significant mantissa bits for the type tag. Locatives are safe pointers to other values. Oaklisp stores floating-point values as allocated objects and uses a tag for other common immediate values: characters, etc.

The rationale for choosing a fixed-size lower 2-bit type tag, opposed to a dynamic-length type tag, as in Ruby, or high-bit tags, like some older Lisp implementations, is as follows:

C compilers and dynamic memory allocators will align allocations to word boundaries for performance reasons, so there cannot not be a pointer to an object that would require some of the lower bits of a pointer, except for sub-word access, e.g.: char *. 32-bit words are 4 bytes long; the lower 2 bits of any object pointer will always be zero, and are free to be used for type tagging.

If references to allocated objects are encoded using a 0x03 type tag, tag removal could be:

  #define RBASIC(X) ((struct RBasic*)((X) - 3))

Assuming that most of the time the interpreter is referencing structure members of the object, and does not need the actual address of the object:

  struct RBasic {
      VALUE flags; /* struct offset: + 0 */
      VALUE klass; /* struct offset: + 4 */

  RBASIC(X)->klass =>
  ((struct RBasic*)((X) - 3))->klass

C compilers convert the pointer->member expression into an offset from an address. For 32-bit VALUEs:

  RBASIC(X)->flags => *(VALUE*)((X) - 3 + 0)
  RBASIC(X)->klass => *(VALUE*)((X) - 3 + 4)

Using subtraction as the tag removal operation, instead of (X & ~3),
allows the C compiler to constant fold the tag removal and the structure offset:

  RBASIC(X)->flags => *(VALUE*)((X) - 3)
  RBASIC(X)->klass => *(VALUE*)((X) + 1)

Therefore, there is no additional tag removal cost to reference structure members with non-zero offsets. One could reorder the members depending on which is “hotter”.

Research shows that tag manipulation is a heavy cost, esp. for numerics; any tagging scheme should be as simple and consistent as possible.

For example, determining the class of a VALUE could be inlined:

  #define CLASS_OF(v) ((v) & 3) ? RBASIC(v)->klass : rb_cFixnum)

Two-bit tags naturally align with word boundaries on 32-bit processors. Thus, zero tags for integers on 32-bit processors allows pointer arithmetic on VALUE*, as in the Array#[] method, to require no tag manipulation or multiplication to offset into the allocated Array’s elements.

Thanks to Gary Wright for inspiring me to write about this.

Embedding a properly tail-recursive, stack-based interpreter in C

See /pub/tail_call_interp/tail_call_interp.c

The Scheme programming language requires proper implementation of tail calls, because all looping in Scheme is implemented as function calls — there is a very fundamental relationship between tail-calls and iteration.

Many Scheme interpreters implement proper tail-recursion by forcing everything to “happen” within a tightly looping byte-code VM. The VM loops on op-codes and manipulates a stack (or two).

Tail-calls typically reuse the current activation record with new arguments. Oaklisp, a pure OO Scheme interpreter, uses two stacks, one for values (call arguments and locals) and one for activation records, and a tightly tuned bytecode VM loop. Use of two stacks, allows more efficent control flow in tail-calls and continuations.

To embed a stack-based interpreter in C, calling out to C functions from within interpreted code and calling into interpreter functions from C need to be allowed without too much gnashing of teeth; we cannot force the developer or extender of the interpreter to put all activity in a byte-code loop.

Thus extending a properly tail-recursive bytecode VM with C call-ins and call-outs can be cumbersome.

However, we want to support a byte-code execution environment in our interpreter, otherwise it’s not much of an interpreter.

We want C code to execute in its enviroment and byte-code to execute in its environement while allowing the two execution enviroments to commnunicate efficiently and seamlessly as possible, because the interpreter will spend most of its time executing interpreted code and we want the core of the interpreter to be written and extendable in C

Since C (almost always) uses the same stack for activation records and values, we will have some problems getting proper tail-recursion out of C code (although there is some support in GCC for tail-recursion as long as there are no pointers to stack locals).

However we can do something about tail-calls occuring at the boundaries of C and interpreted code.

The example puts the tail-call loop at all call sites in the C code of the interpreter, thus allowing efficient tail-calls to and from C code, while still allowing the interpreter to be developed as a somewhat normal C program.

It directly supports the intention to iterate using tail-calls in the interpreter as an explicit iteration on activation records at the call sites in the interpreter’s implementation.

It allows different bytecode VMs to be embedded whereever and however the extender of the interpreter wants without subverting proper tail-calling.

The example is a simple interpreter that has only integer values and no interpreted function name lookups. The interpreted function names are hard-coded in C to keep this example (somewhat) short.

A more complete implementation of this technique is in an embeddable pure OO Scheme implementation, based on Oaklisp, named ll .