dtracing the Ruby GVL

Nate Berkopec posted an interesting question on twitter:

Puma office hours: we want to measure what % of the time the global VM lock is held in a Ruby process. anyone got an idea on how to even start to observe that, performantly? #railsconf

I thought about using ‘dtrace’ immediately, since I already experimented with adding custom probes to MRI Ruby a few years back.

With that I set out to patch MRI to allow for this. And it worked - Ruby with the patch can be installed using

ruby-install ruby-2.6.3 --patch https://balazs.kutilovi.cz/gvl-tracing.patch

The patch consists of:

  1. Adding the probe definitions and the hooks.
  2. Changing the VM struct to include a ‘waiting’ counter. While Ruby 2.5 used a counter to track threads waiting for the GVL, the later versions switched to a linked list. So I re-introduced the counter to avoid walking the list every time.

With these changes applied, I could get to tracing.

The traced program was Puma (Version 3.12.1 (ruby 2.6.3-p62), running a config.ru that consisted of:

puts Process.pid
run ->(env) { [200, {"Content-Type" => "text/html"}, ["Hello World!"]] }

Puma in the default settings (for puma config.ru), ran in the development environment and with 0 min and 16 max threads. The workload was generated using wrk:

wrk -c 20 -d 10 -t 10 --latency http://localhost:9292

And here are some results:

Quantized/binned times (in microseconds) that the GVL was held:

➜  ~ sudo dtrace -p 95059 -s gvl.d
dtrace: script 'gvl.d' matched 8 probes

 value  ------------- Distribution ------------- count
     1 |                                         0
     2 |                                         1
     4 |@@                                       14102
     8 |@@@@@@@@@@@@@@@                          123107
    16 |@@@@@@@@@@@@@@@@@@                       146116
    32 |@@@@@                                    40426
    64 |                                         1996
   128 |                                         258
   256 |                                         47
   512 |                                         118
  1024 |                                         96
  2048 |                                         11
  4096 |                                         0

And a number of threads waiting to get the GVL:

➜  ~ sudo dtrace -p 95059 -n 'ruby::vm-gvl-acquire { @waits = quantize(arg0) }'

dtrace: description 'ruby::vm-gvl-acquire ' matched 1 probe

 value  ------------- Distribution ------------- count
    -1 |                                         0
     0 |                                         497
     1 |                                         87
     2 |                                         341
     4 |                                         1553
     8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 346829
    16 |                                         0

So most of the time, 8-15 threads were waiting for the GVL - wrk was saturating all threads and they had to wait for the lock. (quantize increments the value in the highest bucket < arg0)

It should be also possible to trace how long a particular thread spent in the GVL:

sudo dtrace -p 11864 -n 'ruby::vm-gvl-acquire { start[tid] = timestamp } ruby::vm-gvl-release { @total[tid] = sum((timestamp - start[tid])) }'