June #TMIL - Ruby Under a Microscope

This post marks the halfway point of fulfilling my New Year's resolution to post about coding once a month this year. As the published date suggests, summer got in the way of timeliness! The curious reader can calculate the stats on how many days after the first day of the month it's taken to post. (Internal lawyer says: The resolution was to create a post for each month, not publish them on time.) Alright, let's get down to business.

In June, I completed Pat Shaughnessy's book, Ruby Under a Microscope.

The writing is clear and concise, and the layout of each chapter extremely helpful in reinforcing the material. The chapters are interspersed with illustrations, "Experiments" (usually benchmarking code) and definitions (that start simple and grow more complete as nuances are explained). The book is also a great way to gain comfort in fundamental programming concepts, like how a compiler works. Below are a few highlights from my notes.

The book starts with an explanation of Ruby's tokenization and parsing process (take a peek with the Ripper class), and how code is compiled into instructions that YARV can execute. While the first few chapters go pretty deep into topics, such as how Ruby puts values on stack frame and how the environment and stack pointers deal with scope, there are still things to play with. If you run the following snippet in irb and would like to know what the output means, this book is your guide.

stuff = <<STR
  5.times do
  puts "foo!!!"
end
STR
puts RubyVM::InstructionSequence.compile(stuff).disasm


After the deep dive down to the metal, the author walks through the Ruby object model and its repesentation in C structures, then method and constant lookup, until we come to "The Hash Table: The Workhorse of Ruby Internals." Like the early chapters, it presents a fundamental computer science concept through the lens of Ruby, enhancing understanding of both. For example, when you create

hash = {}
hash[:some_key] = "foo"


the basic idea is that Ruby takes the key, runs it through a hash function, and puts it in an available "bin", determined by the hashed key modulus the number of bins. Then, when retrieving values, Ruby recalculates the hashed value and looks in the corresponding bin for the item you're retrieving.

Eventually, two keys will have the same hash, resulting in a hash collision. Ruby has built in constants that determine when to allocate more bins to avoid this, and Pat's experiments and corresponding graphs show the spike in milliseconds when this happens. (And also the spike after inserting the 7th item into a hash since Ruby 2 stores up to six items in an array instead of a pointer to the hash table. It then just compares the keys when looking things up, instead of using the hashing function to find the right bin to compare keys in.) Pat generously posted a draft of this chapter while writing it that you can find here. There's also a great explanation (using Ruby code to rebuild the basic functionality) of how hashing works posted here.

And finally, a few other fun snippets that get unpacked and explained in the book:
  • Each level on the stack can have a different self
  • instance_eval uses self and an environment pointer to access variables in different scopes - when you call instance_eval, a closure and new lexical scope is created, and self becomes the receiver
  • disabling garbage collection when running benchmarks can help avoid skewed results
  • There are lots of goodies in Ruby core and standard library that don't get much attention, like ObjectSpace
  • Object structures like RString, RArray and friends are defined in include/ruby/ruby.h 
This book provides much more than this post can do justice to. If you've been wondering what happens behind-the-scenes when you create a lambda and how it's later called, or have been confused about class variables (shared by subclasses) and class instance variables (not shared; each instance of a class or subclass has its own), wondered how JRuby and Rubinius fit into the picture, or the similarites between what RObjects, RStrings and the C struct that comprise Ruby objects look like, then this is the book for you. And if you haven't wondered about those things, there's no better way to get started!

Popular posts from this blog

Thinking About BIPA and Machine Learning

Changing PDF Metadata with Python

A New Serverless Look