Yes, I thought about using quad-spi.
I'm collecting data from my 8-bit sample from my ADC at 100Mhz. (one read every cycle). I'm writing it to SRAM every other cycle (storing 2 samples - 16-bit). I'm collecting 20us of data so 16KBytes / 8KB words.
My STM32 acts as SPI Master and is clocking the SPI bus at 10Mhz. The Arduino code just starts a transaction and goes into a loop calling SPI.transfer16() .. I used 16-bit words since the SRAM is 16-bit. I don't care too much about disabling the mux as I have a 8-LED PMOD I used to show status.
I will most likely use a pre-scaler on the ADC clock and allow the STM to control the sample rate and buffer size. I don't anticipate the buffer ever being larger than about 16K bytes, so 8K SPI transfers. Right now that takes about 20 ms. I was dumping the data from the FPGA over the UART before.