Reference: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/26869
I have a proof-of-concept patch to MRI that caches #to_s
values for immutable values. It is implemented using a few fixed-size hash tables. http://github.com/kstephens/ruby/commits/to_s_maybe_frozen/
It reduces the number of #to_s
result objects by 1890 during the MRI test suite for NilClass#to_s
, TrueClass#to_s
, FalseClass#to_s
, Symbol#to_s
, and Float#to_s
.
It requires a minor semantic change to Ruby core. This minor change could cascade into a huge performance improvement for all Ruby implementations — as will be illustrated later:
#to_s may return frozen Strings
.
This appears to not be a problem since any callers of #to_s
are likely to anticipate that the receiver may already be a String
and are not going to mutate it — #to_s
is a coercion. The current MRI test suite passes if some #to_s
results are frozen.
For code that may expect #to_s
to return a mutable, an Object#dup_if_frozen
method might be helpful. This method will return self.dup
if the receiver is #frozen?
and is not an immediate or an immutable. (Aside: a fast #dup_unless_frozen
method might be helpful for general memoization of computations!)
This caching technique could be extended into other immutables (e.g.: the Numerics
) and objects whose #to_s
representations never change (e.g.: Class
, Module
?) and for #inspect
under similar constraints.
In the patch, Fixnum#to_s
is not cached because Fixnums
are often incremented during long loops; any cache for it is quickly churned. However, this could be enabled if it proves useful in practice.
If this new semantic for #to_s
is reasonable, I recommend explicitly storing frozen strings for true.to_s
, false.to_s
, nil.to_s
and storing Symbol#to_s
with each Symbol
, likewise for #inspect
.
In practice, most Ruby String
literals become garbage immediately. If Symbol#to_s
was guaranteed to be always be cached, this would enable the use of:
puts :"some string"
instead of
puts "some string"
as an in-line memoized frozen String that creates no garbage when calling puts
which will call #to_s
on its argument, but never mutate the result. A parser or compiler could recognize Symbol#to_s
as an operation with no side-effect and elide it, providing a true String
constant. This idiom would eliminate the pointless String
garbage created by the evaluation of every String
literal.
This is far more expressive and concise than:
SOME_STRING = "some string".freeze
...
puts SOME_STRING
The alternative to :"some string"
might be to memoize all String
literals as frozen. This is a superior syntax and semantic — old code would need to change on a massive scale, but any issues would be easy to diagnose:
str = '' # Make a mutable empty string.
str << "foo" # "foo" is garbage
str << "bar" # "bar" is garbage
would become:
str = ''.dup # Make a mutable empty string.
str << "foo" # "foo" is not garbage
str << "bar" # "bar" is not garbage
The latter is backwards-compatible with the current String
literal semantics.