A bit of background on the why the video RAM ended up like it is.
In a first version of the design, the 6KB Video RAM was somewhat simpler (i.e. there was no state machine):
On a Xilinx, a dual port RAM like this would use up 6KB of block RAM.
On the Lattice ICE40, it ends up using 12KB, because it has two read ports and one write port (i.e. to provide two independant read ports, the memory is simply duplicated).
I wanted to add a SID, and there wasn't sufficient space (only 16KB total).
To shrink the Video RAM down to 6KB, I needed to reduce it to a single read port, hence the state machine. This shares Port B between the CPU and the Video. It works along the same lines that noise killers work in a real Atom, by adding an additional 8-bit for the data the CPU reads from the Video RAM.
Try changing the state machine back to running off the vga_clk, this might be all that's needed.