When starting a large amount of threads, the loop was inefficient. It
would loop over the threads and if one wasn't yet ready it would sleep a
bit and then reevaluate all the threads. This reevaluation of threads
already checked was inefficient, and could lead to the time budget
running out.
This patch splits the check, and keeps track of the threads that have
already passed. This avoids the rescanning of already checked threads.