$ cat life.c
main(){char z[2][24][80],*a,*b,*c,*d,*e;int q,p=-1,x,y,i,k;srand(time(0));while
(1){for(strcpy(d=z[b=e=z[q=p][22],a=b-80,c=z[p],p=!p][y=23]," life.c (c) 1999 \
kurtstephens.com \033[H");y--;c=b,b=a,a=y-1?a-80:e)for((d-=80)[x=79]=10;x--;)i
=x?x-1:78,k=x-78?x+1:0,d[x]=" *\n"[q<0?rand()>>8&1:(k=a[i]+a[x]+a[k]+b[i]+b[k]+
c[i]+c[x]+c[k])-276?!(k-286):b[x]/42];write(1,z+p,1878);sleep(1);}}
Category Archives: C
ctry – try/catch/finally macros for C
https://github.com/kstephens/ctry
This library supports basic try/catch/finally blocks in C. It’s pthread-safe. And the syntax is simple:
#include #include #include #include #include "ctry.h" static FILE* open_file(const char *path, const char *mode) { FILE *f; if ( ! (f = fopen(path, mode)) ) { ctry_raise(errno, 2, path, mode); } return f; } static int do_it() { ctry_BEGIN { ctry_BODY { FILE *f = open_file("non-existent.txt", "r"); char buf[1024] = { 0 }; fread(buf, sizeof(buf[0]), sizeof(buf) - 1, f); } ctry_CATCH_ANY { ctry_exc_t *exc = ctry_exc(); fprintf(stderr, "Error: %s (%d): %s(\"%s\", \"%s\")\n", strerror(exc->code), exc->code, exc->cntx.func, exc->data[0], exc->data[1]); ctry_RETURN(1); } } ctry_END; return 0; } int main(int argc, char **argv) { assert(do_it() == 1); return 0; } |
Luxeed LED Keyboard Driver for Linux
Hardware Review
This is a really cool keyboard, except that the ESC and function keys do not light up and are strangely located and difficult to press. I also don’t understand why the spacebar does not light up; seems like it’s the most important key. The construction is not very good and the feel is a bit mushy. However, this keyboard is fun for hacking and could be really great for user training.
Device Driver and Source Code
After much troubleshooting, I have created a simple C driver library for the Luxeed deTA 100/200 keyboard using libusb. This driver allows the LEDs under each key to be set from C from most Unix-like operating systems. This is my first attempt at reverse-engineering the protocol of a USB device.
Source is located here: http://github.com/kstephens/luxeed/tree/master
Overview
There is a user-space single-threaded, non-blocking I/O daemon using libevent that accepts multiple TCP connections on port 12345 to execute pixel update commands. I’ve been able to get time between frames down to 0.025 seconds.
Commands
Commands sent from clients via TCP to the server:
set <RED> <GREEN> <BLUE> <KEY>... update wait <SECONDS> get <KEY>
Keys can be specified using names (e.g.: “TAB”, “LSHIFT”, “A”), key IDs (e.g. “#20” same as “Y”) or positions relative to the top-left corner (e.g.: “2@1” same as “W”). The “get” command returns the current color of a key. The key colors are atomically uploaded to the keyboard with “update” command.
Building and Running the Server
> cd luxeed/src > make install-prereqs > make > sudo ./luxeed --server & # start the server
Basic Commands
Make the number keys light up blue:
> netcat localhost 12345 set 00 00 ff 1 2 3 4 5 6 7 8 9 0 update ^D
Simple client that flashes keys:
> sudo ./luxeed --server & # start the server > cat flash_keys #!/bin/bash t=0.5 c="ff ff ff" c2="00 00 00" while [ $# -gt 0 ] do case "$1" in -t) t="$2"; shift 2 ;; -c) c="$2"; shift 2 ;; -c2) c2="$2"; shift 2 ;; *) break ;; esac done keys="$*" while true do netcat localhost 12345 <<END set $c $keys update wait $t set $c2 $keys update wait $t END done > ./flash_keys ENTER # flash the ENTER key
Since the server can service multiple clients, we can flash multiple keys at different rates and colors using multiple clients:
> ./flash_keys -t 0.2 -c 'ff 00 00' TAB & > ./flash_keys ENTER &
Maybe it’s time for a demo video.
Things left:
- Pausing updates to the keyboard for a short time period after any keypress would allow animations but not let them interfere with the typist. The keyboard I/O channel is too slow to handle typing and animations (~10 frame/sec) at the same time.
- A bulk update command that maps a PNM image representing the keyboard would be helpful.
- udev rules to prevent the need for running the server as root.
- Debian packaging.
There are some I/O throttling issues; sometimes the keyboard disconnects (or requires a power cycle) if data is sent to it too quickly. The only message I could decipher is a bulk keyboard update message of 5 65-byte blocks with an embedded checksum. Maybe there is a smaller differential update payload that will only update specific keys which would be more useful for faster animation. There may be a response message that needs to be read.
If you have this keyboard, give it a try!
LL source on github, LL Design Talk Slides
The source for LL is available at http://github.com/kstephens/ll/tree/master.
Also located there are the slides from my June 2008 talk at the Chicago Lisp Users Group on the design of the LL object-oriented Scheme interpreter:
http://github.com/kstephens/ll/tree/master%2Fsrc%2Fll%2Fdoc%2Fll_system_…
TM : The implementation of a real-time, single-threaded, type-segmented, conservative garbage collector
Introduction
TM
is a real-time, non-relocating, conservative, allocator using type- and color-segregated lists, based on Henry Baker’s “Treadmill” allocator. The source code for TM
is located at https://github.com/kstephens/tredmill.
TM
interleaves marking, scanning and sweeping during each mutator’s call to tm_alloc()
.
TM
attempts to limit the amount of work in the collector to avoid stopping the world for long periods of time. It does not require a separate thread for the collector as all collection activities are interleaved with allocation.
Full documentation is located at: http://kurtstephens.com/pub/tredmill/current/doc/html/index.html
Data Structures
An allocation unit is a tm_node
. Each tm_node
is allocated from a block of system memory (tm_block
) sized and aligned to a multiple of the operating system’s page size. Each tm_block
is dedicated to a particular type (tm_type)
. Each tm_type
represents a tm_node
allocation size, thus each tm_block
is uniformly parceled into tm_nodes
of the same size. In general, each power-of-two size has a tm_type
. However, tm_types
can be created for any size.
Node Coloring
Each tm_node
has a “color” tag describing its current allocation state:
tm_WHITE
: free, unused.tm_ECRU
: allocated, potentially unused.tm_GREY
: allocated, marked used, not scanned for pointers.tm_BLACK
: allocated, marked used, scanned for pointers.
Each tm_type
has it’s own tm_WHITE
, tm_ECRU
, tm_GREY
, and tm_BLACK
doubly-linked lists. Each tm_node
is a member of one, and only one, of its type’s color lists. To keep the tm_node
header small, the node’s color is encoded in the lower two bits of the tm_node._prev
pointer.
When a tm_node
’s color changes, it is moved to a different colored list in its tm_type
for processing in a different allocation phase.
Node types can be explicitly created for common sizes that are not a power of two, to reduce memory fragmentation.
Allocation Phases
The allocator interleaves the following phases with allocation:
tm_UNMARK
: Unmarking scanned blocks for subsequent rooting.(tm_BLACK -> tm_ECRU)
tm_ROOT
: Root scanning and marking nodes for scanning.(tm_ECRU -> tm_GREY)
tm_SCAN
: Scanning marked nodes for interior pointers.(tm_GREY -> tm_BLACK)
tm_SWEEP
: Freeing unmarked nodes.(tm_ECRU -> tm_WHITE)
During each call to tm_alloc()
, each phase will do some collection work before returning a newly allocated tm_node
:
tm_UNMARK
: Allocatetm_nodes
from thetm_type
’stm_WHITE
list until it is empty. A fixed number oftm_BLACK
nodes are returned to thetm_ECRU
list. Memory pressure may cause switching to thetm_ROOT
phase.tm_ROOT
: A fixed number of roots are scanned for pointers into allocated space.tm_ECRU
nodes with valid references are moved totm_GREY
. After all roots are scanned, the next phase istm_SCAN
.tm_SCAN
: A fixed number of node bytes oftm_GREY
nodes are scanned for interior pointers and moved totm_BLACK
list. Once alltm_GREY
nodes are scanned, the next phase istm_SWEEP
.tm_SWEEP
: If there are no more nodes to be scanned, sweep a fixed number oftm_ECRU
nodes to thetm_WHITE
list. Once alltm_ECRU
nodes are sweeped, the collector returns to thetm_UNMARK
phase.
As nodes are allocated, marked, scanned, sweeped and unmarked they are moved between the colored lists as follows:
In general, newly allocated tm_node
s are taken from the tm_type
’s tm_WHITE
list and placed on its tm_ECRU
list. If the type’s tm_WHITE
list is empty, an allocation tm_block
is requested from the operating system and is scheduled for parceling new tm_WHITE
nodes for the type. If the allocation tm_block
becomes empty, the process is repeated: another allocation block is requested and parceled.
A limited number of new free nodes are parceled from the type’s current allocation block as needed and are initialized as new tm_WHITE
nodes.
New tm_block
s may be requested from the operating system during all phases, if the type’s tm_WHITE
list or allocation block is empty. The reasoning is the operating system should be able to allocate a new allocation block faster than a collection that would need to completely “stop the world”.
All phases, except the tm_SWEEP
phase, allocate nodes by moving them from the tm_WHITE
list to the tm_ECRU
list. The tm_SWEEP
phase allocated nodes from tm_WHITE
to tm_GREY
, because tm_ECRU
nodes are considered to be unmarked garbage during the tm_SWEEP
phase and tm_BLACK
nodes are considered “in-use’ but are not scanned for interior pointers during the tm_SWEEP
phase. Using tm_GREY
during tm_SWEEP
is also related to tm_BLACK
node mutation, which gives rise to requiring a write barrier on tm_BLACK
nodes.
The tm_SWEEP
phase will always attempt to scan any tm_GREY
nodes before continuing to sweep any tm_ECRU
nodes.
The tm_BLACK
color might seem a better choice for tm_SWEEP
allocations, but this would force young nodes, which are more likely to be garbage, to be kept until the next tm_SWEEP
phase. Coloring tm_SWEEP
allocations tm_BLACK
would also prevent any new interior pointers stored in nodes that may reference tm_ECRU
nodes from being scanned before tm_SWEEP
is complete.
To prevent thrashing the operating system with tm_block
allocation and free requests, a limited number of unused blocks are kept on a global free list.
Aligned Blocks
Aligned blocks allow determining if generic word is a pointer to heap-allocated tm_node
to be computed with simple pointer arithmetic and a bit vector check.
To determine a pointer’s allocation:
- Checking a bit vector indexed by the pointer’s page number. The bit is set when
tm_nodes
are parceled and cleared when the entiretm_block
is unused and returned to the free block list or operating system. - Mask off the insignificant page bits to construct an aligned
tm_block
address. - Determine the size of
tm_nodes
in thetm_block
from the block’stm_type
. - Determine if the pointer resides in the data portion of the
tm_node
by considering thetm_node
andtm_block
linked-list headers to be “holes” in the address space.
Type Segregation
Nodes are segregated by type. Each type has a size. By default, type sizes are rounded up to the next power of two. A specific allocation type can be requested with tm_adesc_for_size()
. The allocation descriptor can then be used by tm_alloc_desc()
. The opaque
element can be used to store additional mutator data.
Each node type has its own colored lists, allocated block lists and accounting. Segregating node types allows allocation requests to be done without scanning tm_WHITE
nodes for best fit. However, since types and the blocks are segregated, and nodes of a larger size are not scavenged for smaller sise, this could least to poor actual memory utilization in mutators with small numbers of allocations for many sizes, since a single node allocation for a given size will cause at least one block to requested from the operating system.
The color lists are logically combined from all type for iteration using nested type and node iterators.
Write Barriers
During the tm_SCAN
and tm_SWEEP
phase, any tm_BLACK
node that is mutated must be rescanned due to the possible introduction of new references from the tm_BLACK
node to tm_ECRU
nodes. This is achieved by calling tm_write_barrier(&R)
in the mutator after modifying R
’s contents.
There are two versions of the write barrier:
tm_write_barrier(R)
tm_write_barrier_pure(R)
tm_write_barrier_pure(R)
can be called when R
is guaranteed to be a pointer to the head of a node allocated by tm_alloc()
. tm_write_barrier_pure(R)
cannot be used if R
might be an interior pointer or a pointer to a stack or root-allocated object. tm_write_barrier(R)
should be used if the address of R
is not known to be a pointer to the heap, the stack or global roots.
If R
is a pointing into global roots, tm_write_barrier(R)
will cause global root rescanning, if the collector is in the tm_SCAN
phase.
Stack writes are not barriered, because stack scanning occurs atomically at the end of tm_ROOT
.
Unfriendly Mutators
When entering code where the write barrier protocol is not followed, tm_disable_write_barrier()
can be called to put the collector into a “stop-world” collection mode until tm_enable_write_barrier()
is called.
A virtual memory write-barrier based on mprotect()
might be easier to manage than requiring the mutator to call the write barrier. Recoloring and marking root set pages can be done in hardware assuming the overhead of mprotect()
and the SIGSEGV
signal handler is low when changing phases and colors.
Issues
TM
is not currently thread-safe.TM
does not currently support allocations larger than atm_block
. This will be fixed by using another page-indexed bit vector. A “block-header-in-page” bit vector marks the page of eachtm_block
header. This bit vector will be scanned backwards to locate the first page that contains the allocation’s block header.TM
does not currently support requests for page-aligned allocations. This could be achieved by using a hash table to map page-aligned allocations to itstm_block
.
References
- The Treadmill: Real-Time Garbage Collection Without Motion Sickness, Henry G. Baker, http://home.pipeline.com/~hbaker1/NoMotionGC.html
Ruby Internals: Why RUBY_FIXNUM_FLAG should be 0x00
Type tags in MRI Ruby VALUE
Internally, values in MRI Ruby are 32-bit (at least for 32-bit processors). Some of the least-significant bits are used to store type information. See the VALUE
definition in include/ruby/ruby.h
. Using type tag bits avoids allocating additional memory for commonly-used immutable values, like integers.
Ruby uses a single-bit tag of 0x01
as the Fixnum
type tag. The remaining bits, are used to store the Fixnum
’s signed value. This is an immediate value; it doesn’t require storage for the Fixnum
value to be allocated, unlike the heap space that would be required for a String
. Other types will use different tag bits.
Since Ruby uses 31 bits to store the Fixnum
’s value, all the other types use 0 for least-significant bit; Ruby uses dynamic length tags:
3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 Fixnum 00000000000000000000000000000000 false 00000000000000000000000000000010 true 00000000000000000000000000000100 nil 00000000000000000000000000000110 undef xxxxxxxxxxxxxxxxxxxxxxxx00001110 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx0x0 All other types
A non-zero Fixnum
type tag requires that the tag must be masked or shifted away on Fixnum
operands before numeric operations. In a running program, Fixnum
is probably the most common numeric type and quite possibly the most common type of all; it makes sense to make the most common operations on Fixnum
: +
, -
, *
, /
as fast and as simple as possible. It’s also likely that addition is the most common operation applied to Fixnum
.
Imagine the following bit of Ruby code:
x = 1 y = 3 puts x + y
Internally, INT2FIX(X)
is used to create a Ruby value from a C int X
:
#define INT2FIX(X) (((X) << 1) | RUBY_FIXNUM_FLAG) #define FIX2INT(X) ((X) >> 1)
Thus:
INT2FIX(1) => 3 INT2FIX(3) => 7
FIX2INT(X)
shifts one bit down which removes the tag and returns the original integer.
To compute x + y
, Ruby internally, in numeric.c
: fix_plus()
, does the following:
x + y => INT2FIX(FIX2INT(x) + FIX2INT(y)) => ((3 >> 1) + (7 >> 1)) << 1 | 1 => 9 FIX2INT(9) => (9 >> 1) => 4
If a type-tag of 0x00
was used for Fixnums
, there is no tag to remove and addition or subtraction on Fixnums
requires no tag manipulations. Assume:
#define INT2FIX(X) ((X) << 1) #define FIX2INT(X) ((X) >> 1)
The Ruby expression x + y
would simply be x + y
in C code, assuming no underflow or overflow into Bignum
.
Multiplication with zero Fixnum
tags is very simple: only one of the operands needs to be shifted down:
#define FIXMUL(X, Y)((X) >> 1) * (Y))
Fixnum division: the result of the division is shifted up:
#define FIXDIV(X, Y) (((X) / (Y)) << 1)
Two-bit type tags on 32-bit architectures
Oaklisp and LL (http://kurtstephens.com/pub/ll/) use 2-bit tags for all values. LL uses the following tag scheme:
3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00 <fixnum> pppppppppppppppppppppppppppppp01 <locative> seeeeeeeemmmmmmmmmmmmmmmmmmmmm10 <flonum> pppppppppppppppppppppppppppppp11 All other types
Floating-point (<flonum>
) values are C float
s which sacrifice the 2 least-significant mantissa bits for the type tag. Locatives are safe pointers to other values. Oaklisp stores floating-point values as allocated objects and uses a tag for other common immediate values: characters, etc.
The rationale for choosing a fixed-size lower 2-bit type tag, opposed to a dynamic-length type tag, as in Ruby, or high-bit tags, like some older Lisp implementations, is as follows:
C compilers and dynamic memory allocators will align allocations to word boundaries for performance reasons, so there cannot not be a pointer to an object that would require some of the lower bits of a pointer, except for sub-word access, e.g.: char *
. 32-bit words are 4 bytes long; the lower 2 bits of any object pointer will always be zero, and are free to be used for type tagging.
If references to allocated objects are encoded using a 0x03
type tag, tag removal could be:
#define RBASIC(X) ((struct RBasic*)((X) - 3))
Assuming that most of the time the interpreter is referencing structure members of the object, and does not need the actual address of the object:
struct RBasic { VALUE flags; /* struct offset: + 0 */ VALUE klass; /* struct offset: + 4 */ }; RBASIC(X)->klass => ((struct RBasic*)((X) - 3))->klass
C compilers convert the pointer->member
expression into an offset from an address. For 32-bit VALUEs:
RBASIC(X)->flags => *(VALUE*)((X) - 3 + 0) RBASIC(X)->klass => *(VALUE*)((X) - 3 + 4)
Using subtraction as the tag removal operation, instead of (X & ~3)
,
allows the C compiler to constant fold the tag removal and the structure offset:
RBASIC(X)->flags => *(VALUE*)((X) - 3) RBASIC(X)->klass => *(VALUE*)((X) + 1)
Therefore, there is no additional tag removal cost to reference structure members with non-zero offsets. One could reorder the members depending on which is “hotter”.
Research shows that tag manipulation is a heavy cost, esp. for numerics; any tagging scheme should be as simple and consistent as possible.
For example, determining the class of a VALUE could be inlined:
#define CLASS_OF(v) ((v) & 3) ? RBASIC(v)->klass : rb_cFixnum)
Two-bit tags naturally align with word boundaries on 32-bit processors. Thus, zero tags for integers on 32-bit processors allows pointer arithmetic on VALUE*
, as in the Array#[]
method, to require no tag manipulation or multiplication to offset into the allocated Array
’s elements.
Thanks to Gary Wright for inspiring me to write about this.
Scheme: New release of LL 0.15
Download: http://github.com/kstephens/ll/tree/master
LL is:
An embeddable pure, class-based, object Lisp system C library with multiple inheritance and mix-in support based on ideas from Scheme, Oaklisp and Dylan. Clean namespace and proper tail calls in C.
Version 0.15:
- Adds a method lookup cache at all call sites, including primitive C code, reduces full method lookups by over 80%.
- Relies on Bohem GC 7.0 (included).
- Passes R4RS tests.
- Improved bytecode compiler and constant folding on non-side-effecting methods.
- Compiles with GCC 4.1.3.
call/cc
is partially supported.- Stack buffers for activation records and values are dynamically allocated.
(write)
and(display)
detect loops.- Inline
ll_call()
macros are more optimized. - More posix: methods.
- Better
<error>
attribute handling. - More hooks for default error handling behavior.
- Debugger no longer pollutes global namespace.
- Locatives to read-only globals correctly raise errors.
- Objects with binary data slots will not cause (equal?) to crash, see
<not-equal-mixin>
. - Fixes memory problems.
- Improved build process.
- Improved boot process.
llt
options: -e, -d, -p.
Ruby : Touching The Obj-C Void : nil is nil
A long time ago, in Objective-C on the NeXT, one could often remove nil
checks, because all messages to nil
would immediately return nil
(or 0
depending on the caller’s method signature).
How many times have we seen this in Ruby?:
def foo bar && bar.baz && bar.baz.caz("x") end
Or even worse, avoiding redundant execution?:
def foo (temp = @bar) && (temp = temp.baz) && temp.caz("x") end
In Objective-C this could be written as:
- foo { return [[bar baz] caz: "x"]; }
So in Ruby:
class ::NilClass def method_missing(*args) nil end end @bar = nil def foo @bar.baz.caz("x") end foo # => nil
Assuming that most of the time bar
is not nil
, NilClass#method_missing => nil
makes for cleaner code that also runs faster than checking for nil
along the way.
An additional benefit is that nil
can also be used as an immutable empty collection sink by defining NilClass#size => 0
, NilClass#empty? => true
, etc.
Obviously, it breaks code that expects exceptions to be thrown for messages to nil
.
Introduce a method that explicitly checks for nil:
module ::Kernel def not_nil; self; end end class ::NilClass def not_nil; raise("not_nil failed"); end end @bar = nil def foo @bar.baz.caz("x").not_nil end foo # => RuntimeError: not_nil failed
Comments?
Ruby: Date / Rational / Fixnum#gcd hack increased app performance by 15%
UPDATE: Fixnum#gcd
was accepted int MRI 1.8: See https://devdriven.com/2010/02/ruby-fixnumgcd-accepted-into-mri/ .
Ruby Date
uses Rational
heavily, which calls Integer#gcd
for every new Rational
. The Integer#gcd
method is generic to handle Bignums
, but performs terribly for Fixnum#gcd(Fixnum)
, which is probably the most often case.
This RubyInline hack saved 15% execution time in a large Rails application:
require 'inline' class Fixnum inline do | builder | builder.c_raw ' static VALUE gcd(int argc, VALUE *argv, VALUE self) { if ( argc != 1 ) { rb_raise(rb_eArgError, "wrong number of arguments (%d for %d)", argc, 1); } /* Handle Fixnum#gcd(Fixnum) case directly. */ if ( FIXNUM_P(argv[0]) ) { /* fprintf(stderr, "Using Fixnum#gcd(Fixnum)\n"); */ long a = FIX2LONG(self); long b = FIX2LONG(argv[0]); long min = a < 0 ? - a : a; long max = b < 0 ? - b : b; while ( min > 0 ) { int tmp = min; min = max % min; max = tmp; } return LONG2FIX(max); } else { /* fprintf(stderr, "Using super#gcd\n"); */ return rb_call_super(1, argv); } } ' end end
Update:
Sorry for the late reply. If the code above does not work via cut-and-paste, download it from here.
This will be released soon as a gem dynamic library called speedfreaks,
with other performance-enhancing snippets.
Thanks for the feedback!