Expanding VTACH

May 12, 2013

I want to squash a bug as a way to show how you “debug” Verilog hardware designs (or, at least, one way to do so)

I’ve recently been building vtach. The code isn’t complete yet, and like all first versions it has some bugs and some incomplete features. This time around, I want to squash a bug as a way to show how you “debug” Verilog hardware designs (or, at least, one way to do so).

Recall that vtach uses one hot encoding for its four states. The states go from 1 to 2, to 4, to 8 – a single bit represents each state uniquely. The states have the following purposes:

1 – Load next instruction

2 – Hold next instruction, load “bug” for next address, load operand address

3 – Get result of instruction

4 – Store result

The test program in vtach_tb.v (tb, by the way, stands for “test bench,” a common term for some test fixture that doesn’t end up in the final product, but is useful for verification) look like this:

   dut.mem.row0[0]=13'h101;  // load location 1 into acc (acc=500)
   dut.mem.row0[1]=13'h500;  // output location 0 (print 101)
   dut.mem.row0[2]=13'h033;  // Input to location 33 (X)
   dut.mem.row0[3]=13'h533;  // output location 33
   dut.mem.row0[4]=13'h200;  // add acc + location 0 (500+101=601)
   dut.mem.row0[5]=13'h733;  // sub acc - location 33 (601-X)
   dut.mem.row0[6]=13'h610;  // store acc to location 10
   dut.mem.row0[7]=13'h510;  // output location 10
   dut.mem.row0[8]=13'h820;  // goto location 20
  dut.mem.row2[0]=13'h599;   // output return address from jump (should be 9)
  dut.mem.row2[1]=13'h900;  // halt!

The first line dutifully loads 500 into the accumulator when you use the “go” script (you can tell because the future math output is correct). That script is just a lazy way to execute iverilog (Icarus Verilog[[]]), which translates your Verilog into an executable simulation and then runs the simulation. This isn’t the final goal, of course. The final goal is to use the Verilog to configure an FPGA, but it is easier to test with a software simulation first.

Like all test cases, that second line is only as good as the data it tests. Just because you get the correct output doesn’t necessarily mean the test will work for any data. For example, location 20 contains 599 (a different instruction). If you change the code, you’ll find the first bug:

   dut.mem.row0[0]=13'h120;  // load location 20 into acc (acc=599)
   dut.mem.row0[1]=13'h500;  // output location 0 (print 101)

The output from both runs with 0 input appears below, side by side:

Uh oh. The original computed 500+101=601. The new code should be computing 599+120=719. If you are good at balancing a checkbook, you might be able to see what must be happening, but for the sake of argument, let’s say the cause is a mystery. How do you figure out what’s happening?

The vtach_tb.v file contains two important lines that start with dollar signs ($). These lines tell the simulator to do something special and don’t affect the final result from a hardware translator:


These lines tell the simulator to write all the variables out to vtach_tb.vcd (the vcd extension indicates a Verliog value change dump. Several tools can use this format, but I’ll use GTKWave.

The figure below shows the vcd file from the run that erroneously outputs 620. From top to bottom, you can see the clock trace, the clock phase, the instruction register, and the accumulator value. Below that is the data bus and the current memory address. The final trace shows the reset signal. You can ignore everything that happens while reset is high. Note that until the first load, the simulator didn’t know what was in the accumulator, so it shows it as xxxx and in red.

It is tempting to think of the clock phase as “lasting” the entire duration of the numbers in the phase row of the figure. That’s not right, though. Remember the flip flops in vtach sample the inputs at the rising edge of the clock. So, phase 2, for example, occurs right at the rising edge of the clock when phase is already 2. Then it immediately shifts the phase to 4 (which won’t really occur until the next rising clock edge).

Armed with that you should be able to trace the execution of the program and you can see that even though we asked the program to load the value in address 21 (ir=121), it is still getting 500 loaded! The memory address shows 20, but not when the accumulator changes. By the time it changes, the memory address is 01 (the next instruction) and the data at address 1 is what gets loaded into the accumulator.

The Verilog logic that loads the accumulator is in alu.v. Specifically:

	  4'b0001:   // CLA (store memory address to accumulator)
	       if (phase==4'b0010)

Can you spot the error? This is really an instruction execution, but it occurs in phase 2. It needs to occur in the next phase (the third phase which is number 4 because of the one hot encoding). Changing 4’b0010 to 4’b0100 gives this output:


That’s correct, of course. The figure below shows the correct timing diagram after fixing the phase in the CLA instruction.

A simple mistake but potentially very difficult to find without the simulation and visualization tools. GTK Wave can “probe” anywhere in your circuit effortlessly.

This exercise also gives you an idea about how the edge-triggered nature of flip flops works. I’ll talk more about that next time.