I actually didn't measure 21MHz, that was the figure quoted in the random documentation I was looking at. You are right, the actual value is 20MHz.
I've switched over to using my old (but fast) HP 54845A Infinium scope, and taken some time to set up the probes with the smallest possible ground loops. This is still probing on the Pi connector, with the MUX enabled. The limiting factor here is the scope probes: they are 150MHz HP 10074C.
The following scope traces are all showing configuration from FLASH @ 20MHz. The top trace is the clock, the bottom trace is data:
Zooming in, it actually looks pretty decent:
Zooming in some more, the rising edge of the clock is being used by the ICE40.
So that looks like a good 20ns setup time and 25ns hold time, which should be fine.
As far as I can tell from glitch triggering, there are no significant liches on the clock (of either polarity)
Given this, I really can't explain why configuration is unreliable on my board @ 20MHz.
All I can say is that dropping to 10MHz makes the world of difference.
It also makes very little difference to the overall configuration time, because there is so much dead time between bytes being sent:
I'm observing about 13us per byte.
As far as overall configuration time, this is dominated by gaps between the bytes,
I'm measuring here from the rising edge to CRST to CDONE:
FLASH @ 20MHz: 1.728s
FLASH @ 10MHz: 1.725s
Serial @ 20MHz: 1.156s
Serial @ 10MHz: 1.216s
The bit file itself is 135104 bytes, which is 1080832.
So the transmission time should be 54ms @ 20MHz and 108ms @ 10MHz if there were no overheads.
Clearly something unexpected is occurring here!
Can you see if you are observing similar values?