C for FPGAs

For as often as I like to remind people that you should clearly understand a question before you try to answer it, I still fall into that very trap once in awhile. The other day, a buddy of mine asked me how to use C with an FPGA. I launched into a talk about something I’d been looking at for a while, but haven’t had a chance to play with: C compilers that generate either FPGA bit streams or HDL (like Verilog or VHDL) that can be put on an FPGA.

After about five minutes, I noticed he was getting glassy-eyed. A few probing questions revealed that he meant how could he put a CPU on an FPGA that would run C code. Whoops. Wrong answer.

But, still, an interesting answer. There are a few options out there for going from C to an FPGA (not counting System C, which is a whole different topic). For example, there is an open source compiler, although it hasn’t been updated since 2006, so that’s not a great sign. There’s also Impulse, which is a commercial product that started out as Streams-C, a compiler that originated at Los Alamos but is now a broken link.

However, the one I have tried a few times (mainly because it is easy to get started with) is the free online compiler at http://www.c-to-verilog.com. Simply go to the web site and click on the ridiculously large blue “Try it now” button.

A text box will start you out with some example code, and you’ll have a chance to customize some of the FPGA environment. The compiler assumes you have an array of data in memory, so the parameters let you customize things like the address width, the size of each word, and the number of memory ports. It also lets you select multiple ALU units and options such as how to unroll loops and if you want pipelining enabled.

Really, it is probably easier for you to just try it out as opposed to me trying to explain it. The example code you start with is:

#define ITEMS (1024) 
//returns the number of 1's in a word 
static inline unsigned int popCnt(unsigned int input) { 
    unsigned int sum = 0; 
    for (int i = 0; i 

This takes 1024 items in the B array, counts the number of 1 bits in each word, and stores the result in the A array. With the configuration set for no unrolling and no pipelining, you get a download of a Verilog file that has the requisite code and a simple test bench. It is really too long to reproduce here, but here are a few snippets:

module _Z7my_mainPjS_  (clk, reset, rdy,// control 
	mem_A_out0, mem_A_in0, mem_A_addr0, mem_A_mode0, // memport for: A 
	mem_B_out0, mem_B_in0, mem_B_addr0, mem_B_mode0, // memport for: B 
	p_A, p_B, return_value); // params 
. . .
// Control 
case (eip)

If you skim the code, it is pretty easy to see how each "cycle" causes eip to change, and that does a new operation. Not rocket science, but it is handy to be able to write C code instead of dealing with your own state machine. Also, when you turn on unrolling and pipelining, it gets much more complex very quickly, even though the principle is the same.

According to the FAQ, you can't use recursive functions, float, double, structures, pointers to functions, and library calls (like printf or malloc). You also are not allowed to create global variables or local arrays. Makes sense. You can find additional details in the FAQ about the generated interface. It is as you'd expect. Clock, reset, return value, ready, and any user-defined parameters are automatically added to the Verilog module.

There are other examples you can try: a CRC32 generator, an Ethernet processor, wavelets, YUV to RGB, and SHA1, among others.

I haven't found an excuse to use this in real life yet, but it seems like it could be useful. Have you used it? Or a different C to HDL (or bitstream) product? Leave a comment and let me know about your experience with it.