The same problem also happens with $.flow.rerun() (i.e. sometimes one of the reruns doesn’t start/resume).
In those cases, we use the data store and a monitoring workflow to detect reruns that are not happening when they should, and we call the resume URL (from the data store) to force the rerun to resume.