RedSteak
Tag Archives: ruby
cabar – An Extensible Software Component Mangement System
Cabar – an extensible software component backplane for managing software components.
Ruby: Performance of Symbol Construction
Measurements of Symbol Constructor Expressions.
n=10_000_000
ruby 1.8.6 (2008-08-08 patchlevel 286) [i686-linux]:
> /cnu/bin/ruby symbol_benchmark.rb user system total real Null Test 0.740000 0.000000 0.740000 ( 0.742914) 'foo_bar' 1.670000 0.000000 1.670000 ( 1.661374) "foo_bar" 1.620000 0.000000 1.620000 ( 1.625221) :foo_bar 0.890000 0.000000 0.890000 ( 0.886903) :'foo_bar' 0.880000 0.000000 0.880000 ( 0.878555) :"foo_bar" 0.860000 0.000000 0.860000 ( 0.867110) "foo_bar".to_sym 2.830000 0.000000 2.830000 ( 2.830536) str.to_sym 2.050000 0.000000 2.050000 ( 2.052756) :"{str}" 0.880000 0.000000 0.880000 ( 0.881367) :"foo_#{'bar'}" 0.860000 0.000000 0.860000 ( 0.854944) "foo_#{'bar'}".to_sym 2.880000 0.000000 2.880000 ( 2.942499) :"foo_#{bar}" 4.280000 0.000000 4.280000 ( 4.290880) "foo_#{bar}".to_sym 4.930000 0.000000 4.930000 ( 4.929801)
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]:
> /usr/bin/ruby symbol_benchmark.rb user system total real Null Test 0.780000 0.000000 0.780000 ( 0.782060) 'foo_bar' 3.480000 0.770000 4.250000 ( 4.263535) "foo_bar" 3.400000 0.820000 4.220000 ( 4.224401) :foo_bar 2.280000 0.950000 3.230000 ( 3.223590) :'foo_bar' 2.370000 0.860000 3.230000 ( 3.249133) :"foo_bar" 2.310000 0.940000 3.250000 ( 3.283796) "foo_bar".to_sym 5.010000 0.980000 5.990000 ( 6.029690) str.to_sym 3.740000 0.930000 4.670000 ( 4.706827) :"{str}" 2.440000 0.810000 3.250000 ( 3.254253) :"foo_#{'bar'}" 2.470000 0.780000 3.250000 ( 3.245739) "foo_#{'bar'}".to_sym 4.800000 0.970000 5.770000 ( 5.767003) :"foo_#{bar}" 7.490000 0.990000 8.480000 ( 8.506735) "foo_#{bar}".to_sym 8.640000 0.890000 9.530000 ( 9.532481)
ruby 1.9.0 (2008-11-25 revision 20347) [i686-linux]:
> ~/local/ruby/trunk/bin/ruby symbol_benchmark.rb user system total real Null Test 0.690000 0.000000 0.690000 ( 0.695087) 'foo_bar' 1.760000 0.000000 1.760000 ( 1.772417) "foo_bar" 1.780000 0.000000 1.780000 ( 1.772031) :foo_bar 0.720000 0.000000 0.720000 ( 0.764095) :'foo_bar' 0.710000 0.000000 0.710000 ( 0.710952) :"foo_bar" 0.720000 0.000000 0.720000 ( 0.717041) "foo_bar".to_sym 3.370000 0.000000 3.370000 ( 3.371836) str.to_sym 2.340000 0.010000 2.350000 ( 2.347026) :"{str}" 0.720000 0.000000 0.720000 ( 0.723032) :"foo_#{'bar'}" 0.710000 0.000000 0.710000 ( 0.712813) "foo_#{'bar'}".to_sym 3.370000 0.020000 3.390000 ( 3.399618) :"foo_#{bar}" 5.030000 0.000000 5.030000 ( 5.037542) "foo_#{bar}".to_sym 5.060000 0.010000 5.070000 ( 5.112535)
rubinius 0.10.0 (ruby 1.8.6 compatible) (c458077eb) (12/31/2009) [i686-pc-linux-gnu]:
> ~/local/ruby/rubininus/bin/rbx symbol_benchmark.rb user system total real Null Test 2.369585 0.000000 2.369585 ( 2.369602) 'foo_bar' 3.616918 0.000000 3.616918 ( 3.616949) "foo_bar" 3.672719 0.000000 3.672719 ( 3.672752) :foo_bar 2.407764 0.000000 2.407764 ( 2.407796) :'foo_bar' 2.462789 0.000000 2.462789 ( 2.462824) :"foo_bar" 2.401784 0.000000 2.401784 ( 2.401812) "foo_bar".to_sym 7.127442 0.000000 7.127442 ( 7.127472) str.to_sym 15.128944 0.000000 15.128944 ( 15.128974) :"{str}" 2.411760 0.000000 2.411760 ( 2.411789) :"foo_#{'bar'}" 2.437595 0.000000 2.437595 ( 2.437629) "foo_#{'bar'}".to_sym 7.120185 0.000000 7.120185 ( 7.120209) :"foo_#{bar}" 20.144837 0.000000 20.144837 ( 20.144864) "foo_#{bar}".to_sym 20.222530 0.000000 20.222530 ( 20.222891)
jruby 1.1.5 (ruby 1.8.6 patchlevel 114) (2008-11-03 rev 7996) [i386-java]:
> ~/local/ruby/jruby-1.1.5/bin/jruby symbol_benchmark.rb user system total real Null Test 0.936000 0.000000 0.936000 ( 0.936172) 'foo_bar' 1.886000 0.000000 1.886000 ( 1.886079) "foo_bar" 1.880000 0.000000 1.880000 ( 1.880344) :foo_bar 1.379000 0.000000 1.379000 ( 1.379957) :'foo_bar' 5.842000 0.000000 5.842000 ( 5.842242) :"foo_bar" 5.834000 0.000000 5.834000 ( 5.833263) "foo_bar".to_sym 2.597000 0.000000 2.597000 ( 2.597294) str.to_sym 2.306000 0.000000 2.306000 ( 2.306082) :"{str}" 5.655000 0.000000 5.655000 ( 5.655672) :"foo_#{'bar'}" 5.785000 0.000000 5.785000 ( 5.785285) "foo_#{'bar'}".to_sym 2.585000 0.000000 2.585000 ( 2.584877) :"foo_#{bar}" 7.532000 0.000000 7.532000 ( 7.531846) "foo_#{bar}".to_sym 8.361000 0.000000 8.361000 ( 8.360630)
symbol_benchmark.rb:
require 'benchmark' def bm_expr(x, expr, title = nil) e = <<"END" n = 10_000_000 str = 'foo_bar' bar = 'bar' #begin; GC.enable; GC.start; GC.disable; rescue nil; end begin; GC.start; rescue nil; end x.report(#{(title || expr).inspect}) { n.times { #{expr} } } END eval e end Benchmark.bm(25) { | x | bm_expr x, "", 'Null Test' bm_expr x, "'foo_bar'" bm_expr x, '"foo_bar"' bm_expr x, ':foo_bar' bm_expr x, ":'foo_bar'" bm_expr x, ':"foo_bar"' bm_expr x, '"foo_bar".to_sym' bm_expr x, 'str.to_sym' bm_expr x, ':"{str}"' bm_expr x, ':"foo_#{\'bar\'}"' bm_expr x, '"foo_#{\'bar\'}".to_sym' bm_expr x, ':"foo_#{bar}"' bm_expr x, '"foo_#{bar}".to_sym' }
First time I ran the benchmarks without forcing and disabling GC :'foo_bar'
appeared faster than :foo_bar
. Reminder to self: always disable GC around benchmarks, unless you are benchmarking GC.
Ruby Internals: Why RUBY_FIXNUM_FLAG should be 0x00
Type tags in MRI Ruby VALUE
Internally, values in MRI Ruby are 32-bit (at least for 32-bit processors). Some of the least-significant bits are used to store type information. See the VALUE
definition in include/ruby/ruby.h
. Using type tag bits avoids allocating additional memory for commonly-used immutable values, like integers.
Ruby uses a single-bit tag of 0x01
as the Fixnum
type tag. The remaining bits, are used to store the Fixnum
’s signed value. This is an immediate value; it doesn’t require storage for the Fixnum
value to be allocated, unlike the heap space that would be required for a String
. Other types will use different tag bits.
Since Ruby uses 31 bits to store the Fixnum
’s value, all the other types use 0 for least-significant bit; Ruby uses dynamic length tags:
3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1 Fixnum 00000000000000000000000000000000 false 00000000000000000000000000000010 true 00000000000000000000000000000100 nil 00000000000000000000000000000110 undef xxxxxxxxxxxxxxxxxxxxxxxx00001110 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx0x0 All other types
A non-zero Fixnum
type tag requires that the tag must be masked or shifted away on Fixnum
operands before numeric operations. In a running program, Fixnum
is probably the most common numeric type and quite possibly the most common type of all; it makes sense to make the most common operations on Fixnum
: +
, -
, *
, /
as fast and as simple as possible. It’s also likely that addition is the most common operation applied to Fixnum
.
Imagine the following bit of Ruby code:
x = 1 y = 3 puts x + y
Internally, INT2FIX(X)
is used to create a Ruby value from a C int X
:
#define INT2FIX(X) (((X) << 1) | RUBY_FIXNUM_FLAG) #define FIX2INT(X) ((X) >> 1)
Thus:
INT2FIX(1) => 3 INT2FIX(3) => 7
FIX2INT(X)
shifts one bit down which removes the tag and returns the original integer.
To compute x + y
, Ruby internally, in numeric.c
: fix_plus()
, does the following:
x + y => INT2FIX(FIX2INT(x) + FIX2INT(y)) => ((3 >> 1) + (7 >> 1)) << 1 | 1 => 9 FIX2INT(9) => (9 >> 1) => 4
If a type-tag of 0x00
was used for Fixnums
, there is no tag to remove and addition or subtraction on Fixnums
requires no tag manipulations. Assume:
#define INT2FIX(X) ((X) << 1) #define FIX2INT(X) ((X) >> 1)
The Ruby expression x + y
would simply be x + y
in C code, assuming no underflow or overflow into Bignum
.
Multiplication with zero Fixnum
tags is very simple: only one of the operands needs to be shifted down:
#define FIXMUL(X, Y)((X) >> 1) * (Y))
Fixnum division: the result of the division is shifted up:
#define FIXDIV(X, Y) (((X) / (Y)) << 1)
Two-bit type tags on 32-bit architectures
Oaklisp and LL (http://kurtstephens.com/pub/ll/) use 2-bit tags for all values. LL uses the following tag scheme:
3 2 1 10987654321098765432109876543210 bit index -------------------------------- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxx00 <fixnum> pppppppppppppppppppppppppppppp01 <locative> seeeeeeeemmmmmmmmmmmmmmmmmmmmm10 <flonum> pppppppppppppppppppppppppppppp11 All other types
Floating-point (<flonum>
) values are C float
s which sacrifice the 2 least-significant mantissa bits for the type tag. Locatives are safe pointers to other values. Oaklisp stores floating-point values as allocated objects and uses a tag for other common immediate values: characters, etc.
The rationale for choosing a fixed-size lower 2-bit type tag, opposed to a dynamic-length type tag, as in Ruby, or high-bit tags, like some older Lisp implementations, is as follows:
C compilers and dynamic memory allocators will align allocations to word boundaries for performance reasons, so there cannot not be a pointer to an object that would require some of the lower bits of a pointer, except for sub-word access, e.g.: char *
. 32-bit words are 4 bytes long; the lower 2 bits of any object pointer will always be zero, and are free to be used for type tagging.
If references to allocated objects are encoded using a 0x03
type tag, tag removal could be:
#define RBASIC(X) ((struct RBasic*)((X) - 3))
Assuming that most of the time the interpreter is referencing structure members of the object, and does not need the actual address of the object:
struct RBasic { VALUE flags; /* struct offset: + 0 */ VALUE klass; /* struct offset: + 4 */ }; RBASIC(X)->klass => ((struct RBasic*)((X) - 3))->klass
C compilers convert the pointer->member
expression into an offset from an address. For 32-bit VALUEs:
RBASIC(X)->flags => *(VALUE*)((X) - 3 + 0) RBASIC(X)->klass => *(VALUE*)((X) - 3 + 4)
Using subtraction as the tag removal operation, instead of (X & ~3)
,
allows the C compiler to constant fold the tag removal and the structure offset:
RBASIC(X)->flags => *(VALUE*)((X) - 3) RBASIC(X)->klass => *(VALUE*)((X) + 1)
Therefore, there is no additional tag removal cost to reference structure members with non-zero offsets. One could reorder the members depending on which is “hotter”.
Research shows that tag manipulation is a heavy cost, esp. for numerics; any tagging scheme should be as simple and consistent as possible.
For example, determining the class of a VALUE could be inlined:
#define CLASS_OF(v) ((v) & 3) ? RBASIC(v)->klass : rb_cFixnum)
Two-bit tags naturally align with word boundaries on 32-bit processors. Thus, zero tags for integers on 32-bit processors allows pointer arithmetic on VALUE*
, as in the Array#[]
method, to require no tag manipulation or multiplication to offset into the allocated Array
’s elements.
Thanks to Gary Wright for inspiring me to write about this.
Ruby : Touching The Obj-C Void : nil is nil
A long time ago, in Objective-C on the NeXT, one could often remove nil
checks, because all messages to nil
would immediately return nil
(or 0
depending on the caller’s method signature).
How many times have we seen this in Ruby?:
def foo bar && bar.baz && bar.baz.caz("x") end
Or even worse, avoiding redundant execution?:
def foo (temp = @bar) && (temp = temp.baz) && temp.caz("x") end
In Objective-C this could be written as:
- foo { return [[bar baz] caz: "x"]; }
So in Ruby:
class ::NilClass def method_missing(*args) nil end end @bar = nil def foo @bar.baz.caz("x") end foo # => nil
Assuming that most of the time bar
is not nil
, NilClass#method_missing => nil
makes for cleaner code that also runs faster than checking for nil
along the way.
An additional benefit is that nil
can also be used as an immutable empty collection sink by defining NilClass#size => 0
, NilClass#empty? => true
, etc.
Obviously, it breaks code that expects exceptions to be thrown for messages to nil
.
Introduce a method that explicitly checks for nil:
module ::Kernel def not_nil; self; end end class ::NilClass def not_nil; raise("not_nil failed"); end end @bar = nil def foo @bar.baz.caz("x").not_nil end foo # => RuntimeError: not_nil failed
Comments?
Ruby: Date / Rational / Fixnum#gcd hack increased app performance by 15%
UPDATE: Fixnum#gcd
was accepted int MRI 1.8: See http://devdriven.com/2010/02/ruby-fixnumgcd-accepted-into-mri/ .
Ruby Date
uses Rational
heavily, which calls Integer#gcd
for every new Rational
. The Integer#gcd
method is generic to handle Bignums
, but performs terribly for Fixnum#gcd(Fixnum)
, which is probably the most often case.
This RubyInline hack saved 15% execution time in a large Rails application:
require 'inline' class Fixnum inline do | builder | builder.c_raw ' static VALUE gcd(int argc, VALUE *argv, VALUE self) { if ( argc != 1 ) { rb_raise(rb_eArgError, "wrong number of arguments (%d for %d)", argc, 1); } /* Handle Fixnum#gcd(Fixnum) case directly. */ if ( FIXNUM_P(argv[0]) ) { /* fprintf(stderr, "Using Fixnum#gcd(Fixnum)\n"); */ long a = FIX2LONG(self); long b = FIX2LONG(argv[0]); long min = a < 0 ? - a : a; long max = b < 0 ? - b : b; while ( min > 0 ) { int tmp = min; min = max % min; max = tmp; } return LONG2FIX(max); } else { /* fprintf(stderr, "Using super#gcd\n"); */ return rb_call_super(1, argv); } } ' end end
Update:
Sorry for the late reply. If the code above does not work via cut-and-paste, download it from here.
This will be released soon as a gem dynamic library called speedfreaks,
with other performance-enhancing snippets.
Thanks for the feedback!
Currency: Ruby Package for FX and Money
See: http://github.com/kstephens/currency
The rubygems package currency
implements an object-oriented representation of currencies, monetary values, foreign exchanges and rates.
Currency::Money uses a scaled integer representation of the monetary value and performs accurate conversions from string values.
UserQuery: Ruby Package Simplifies General Searching in Rails
See: https://github.com/kstephens/userquery
The Ruby package userquery
allows users to do general queries on SQL database table columns using a simple query language. The package parses tokens from the user’s query and generates SQL WHERE clauses immune to SQL injection attacks.
For example, if a user wants to search for all entries
records on a DATETIME field named date
, the user can enter: “11/1/2006
“ into a text field associated with searching on the date
column.
UserQuery will intuitively convert this query into an SQL WHERE clause fragment:
( (entries.date >= '2006-11-01 00:00:00') AND (entries.date < '2006-11-02 00:00:00') )
The user query syntax includes “NOT
“, “OR
“, “AND
“ operators, grouping with parentheses, well as relational operators like “LESS THAN 5
“ or “>= $500
“. Keyword searching, like “foo AND NOT ‘bar baz’
“ using SQL LIKE operators is configurable.
To use UserQuery:
- a Schema specifies how the query parser will interpret the user’s query for each column,
- a Parameters object binds the query values to the schema and provides a Rails-compatible domain object,
- the query strings are parsed and the SQL WHERE clause is generated.
require 'user_query' # This specifies the query schema. s = UserQuery::Schema. new(:table => 'entries', :field => [ # col, type [ :id, :number ], [ :date, :datetime ], [ :memo, :string ], [ :amount, :money ] ] ) # This represents the user's query input. user_input = { :date => '11/1/2006' } p = UserQuery::Parameters.new(user_input) # Query is parsed and the SQL WHERE clause in generated. puts s.sql(user_input, p)
The :money
column type above refers to the Currency::Money class.
The UserQuery Parser object uses a recursive-descent parser which recognizes tokens in the user query string based on the column’s type. It generates an abstract parse tree which is then converted to SQL using the column bindings from the schema object and the data value token constants found by the parser by a Generator object. Syntax errors in the user’s input are returned in the Parameters object.
A Parameters object can be used in a Rails controller to collect query input from text fields in a form template:
require 'user_query' class EntriesController < ActionController::Base model :entry def index list render :action => 'list' end def list @query = UserQuery::Parameters.new(params[:query] ||= { }) q_sql = UserQuery::Schema. new(:table => Entry, :field => [ # Override ActiveRecord::Base introspection: [ :amount, :money ] ] ).sql(@query) @entries_pages, @entries = paginate :entries, :class_name => 'Entry', :per_page => 20, :conditions => [ @q_sql ? @q_sql.gsub(/%/, '%%') : '1' ], :order => 'id' end # Other methods ... end
When an ActiveRecord::Base
subclass is used as the Schema’s :table
option, the Schema attempts to infer query fields by using the cls.columns()
method. Column type mappings can be overriden.
The corresponding views/entries/list.rhtml
view:
<%= start_form_tag :action => 'list' %> <%= error_messages_for 'query' %> <table class="entries_list"> <tr> <td align="center"><%= text_field 'query', 'id', :size => 4 %></td> <td align="center"><%= text_field 'query', 'name', :size => 10 %></td> <td align="center"><%= text_field 'query', 'date', :size => 10 %></td> <td align="center"><%= text_field 'query', 'memo', :size => 20 %></td> <td align="center"><%= text_field 'query', 'amount', :size => 10 %></td> <td align="center"><%= submit_tag 'Search' %></td> </tr> <tr> <th>ID</th> <th>Name</th> <th>Date</th> <th>Memo</th> <th>Amount</th> </tr> <% for entry in @entries %> <tr> <td align="right"><%= entry.id %></td> <td><%=h entry.name %></td> <td><%=h entry.date.strftime("%Y/%m/%d") %></td> <td><%=h entry.memo %></td> <td align="right"><%=h entry.amount.format %></td> </tr> <% end %> </table> <%= end_form_tag %>
A Parameters
object interfaces to Rails as an ActiveRecord::Base object to simplify collecting input from view template forms and reporting errors back to the user. However, UserQuery can be used without Rails.
The first file release is pending, so please visit the svn repo at:
git clone https://github.com/kstephens/userquery.git
For more examples, see:
- The test app
- The test cases
Bruce Burdick contributed the idea of inferring the Schema :field
s by introspecting the underlying ActiveRecord::Base subclass.
I’d love to hear your feedback on this module.