DTracing the Ruby GVL
Nate Berkopec posted an interesting question on twitter:
Puma office hours: we want to measure what % of the time the global VM lock is held in a Ruby process. anyone got an idea on how to even start to observe that, performantly? #railsconf
I thought about using ‘dtrace’ immediately, since I already experimented with adding custom probes to MRI Ruby a few years back.
With that I set out to patch MRI to allow for this. And it worked - Ruby with the patch can be installed using
ruby-install ruby-2.6.3 --patch https://balazs.kutilovi.cz/gvl-tracing.patch
The patch consists of:
- Adding the probe definitions and the hooks.
- Changing the VM struct to include a ‘waiting’ counter. While Ruby 2.5 used a counter to track threads waiting for the GVL, the later versions switched to a linked list. So I re-introduced the counter to avoid walking the list every time.
With these changes applied, I could get to tracing.
The traced program was Puma (Version 3.12.1 (ruby 2.6.3-p62)
, running a config.ru
that consisted of:
puts Process.pid
run ->(env) { [200, {"Content-Type" => "text/html"}, ["Hello World!"]] }
Puma in the default settings (for puma config.ru
), ran in the development environment and with 0
min and 16 max threads. The workload was generated using wrk:
wrk -c 20 -d 10 -t 10 --latency http://localhost:9292
And here are some results:
Quantized/binned times (in microseconds) that the GVL was held:
➜ ~ sudo dtrace -p 95059 -s gvl.d
dtrace: script 'gvl.d' matched 8 probes
value ------------- Distribution ------------- count
1 | 0
2 | 1
4 |@@ 14102
8 |@@@@@@@@@@@@@@@ 123107
16 |@@@@@@@@@@@@@@@@@@ 146116
32 |@@@@@ 40426
64 | 1996
128 | 258
256 | 47
512 | 118
1024 | 96
2048 | 11
4096 | 0
And a number of threads waiting to get the GVL:
➜ ~ sudo dtrace -p 95059 -n 'ruby::vm-gvl-acquire { @waits = quantize(arg0) }'
dtrace: description 'ruby::vm-gvl-acquire ' matched 1 probe
value ------------- Distribution ------------- count
-1 | 0
0 | 497
1 | 87
2 | 341
4 | 1553
8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 346829
16 | 0
So most of the time, 8-15 threads were waiting for the GVL - wrk was saturating all threads and they had to wait for the lock. (quantize increments the value in the highest bucket < arg0)
It should be also possible to trace how long a particular thread spent in the GVL:
sudo dtrace -p 11864 -n 'ruby::vm-gvl-acquire { start[tid] = timestamp } ruby::vm-gvl-release { @total[tid] = sum((timestamp - start[tid])) }'