So, there's definitely some concept I'm not grasping here. In an attempt to know more about retro computing, I'm trying to build a rudimentary PPU and a text frame buffer. Let's focus on the frame buffer. Currently, its code look like this:
module textbuffer(input clk, input reset,
input cs, input rw, input [$clog2(WIDTH*HEIGHT):0] addr, input [7:0] di, output reg [7:0] dout,
input [7:0] hpos, input [6:0] vpos, input vsync, input hsync,
output reg [3:0] color);
parameter WIDTH = 20;
parameter HEIGHT = 15;
reg [7:0] videoram [0:(1<<12)-1];
initial $readmemh("videoram.hex", videoram, 0);
initial $readmemh("attrram.hex", videoram, 512);
initial $readmemh("font_cp437_8x8.hex", videoram, 2048);
reg [$clog2(WIDTH*HEIGHT)-1:0] pos;
reg [7:0] char;
reg [7:0] attr;
always @(posedge clk)
begin
if (cs & rw) videoram[{2'b00, addr}] <= di;
if (cs & ~rw) dout <= videoram[{2'b00, addr}];
pos = vpos[6:3] * WIDTH + { 4'b0000, hpos[7:3] };
char = videoram[{3'b000, pos }];
attr = videoram[{3'b001, pos }];
color <= videoram[{1'b1, char, vpos[2:0]}][~hpos[2:0]] ? attr[3:0] : attr[7:4];
end
endmodule
Which if left alone (i.e., no connecting to the data
and address
lines), create the following stats (I'm synthesising this with other components not mentioned in this post, so let's just take the numbers as a point of reference):
DFFs: 117
LUTs: 1202
CARRYs: 89
BRAMs: 2
IOBs: 9
PLLs: 1
GLBs: 8
Now, I use a rudimentary control unit (a module that is mocking the yet-to-exist CPU), and access the FB via memory mapping:
module control(input clk, input reset, input vsync, output reg [15:0] addr, output reg [7:0] data, output reg rw);
reg [7:0] letter;
reg [3:0] color;
reg [7:0] pos;
reg [2:0] delay;
always @(posedge clk or posedge reset)
begin
if (reset) begin
letter <= 0;
color <= 0;
pos <= 80;
delay <= 3'b111;
end
else if (vsync) begin
delay <= delay - 1;
// Update Character RAM
case (delay)
2'b00: begin
addr <= 16'hF003 + 16'h200;
color <= color + 1;
data <= { 4'b0000, color };
rw <= 1;
end
2'b10: begin
addr <= 16'hEFF8;
pos <= pos - 2;
data <= pos;
rw <= 1;
end
2'b01: begin
addr <= 16'hF005;
letter <= letter + 1;
data <= letter;
rw <= 1;
end
default: rw <= 0;
endcase
end
end
endmodule
This changes three bytes at three addresses, two of them are mapped into the FB (one representing the ASCII character, and another the attribute/color). These mappings and the corresponding cs/rw
signals are mediated at the top module by something like this:
// Memory Mapping
wire [15:0] addr;
reg rw;
wire [7:0] cpu_do;
wire [7:0] cpu_di = tb_oe ? tb_do : (sp_oe ? sp_do : 8'h0);
wire tb_cs = addr === 16'b1111_xxxx_xxxx_xxxx;
wire tb_oe = tb_cs & ~rw;
wire [7:0] tb_do;
wire sp_cs = addr === 16'b1110_1111_1111_xxxx;
wire sp_oe = sp_cs & ~rw;
wire [7:0] sp_do;
So what is the problem? Well. Every time I add some logic to the control unit, the LUTs explode. The above code — which manipulates 3 addresses — result in these stats:
DFFs: 721
LUTs: 3046
CARRYs: 122
BRAMs: 2
IOBs: 9
PLLs: 1
GLBs: 8
A 7x explosion in DFFs and a 3x explosion in LUTs. My current main hypothesis is that I'm making inappropriate usage of BRAM. I'm also wary of the triple access to videoram[]
in the same clock cycle. I've tried separating each access on their own always
block, but it's results +/- in the same usage.
I also understand that synchronously deciding what to render during the same position of the "beam" is asking for trouble; my next attempt would be to pre-fill a buffer of size WIDTH
of line VPOS+1
during VPOS
, but before going into that rabbit hole, I would really like to understand what is going on.
Thanks!