Archive for February, 2009

FPGA introductory workshop at /tmp/lab, March 21st

I’m organizing a introductory workshop for people who wish to start designing with FPGAs at the /tmp/lab hackerspace near Paris. The event will take place on Saturday March 21st, from 14:30 to 23:30 and is free of charge.

Agenda :

  1. Presentation of the FPGA technology
  2. Project examples
  3. Bases of synchronous logic design
  4. Hands-on: implementation of a simple audio generator
  5. If time allows: Verilog introduction
  6. Implementation of the audio generator using Verilog

More info on this page : http://www.tmplab.org/wiki/index.php/Workshop_Introduction_aux_FPGA.
Contact me if you wish to participate by sending an email to sebastien dot bourdeauducq at gmail. Language will be French unless there is a demand for English.

7 Comments

re2c and Lemon, an elegant alternative to Flex and Bison

Now that the Milkymist hardware is sufficiently advanced, it’s time to run some real software on it.

I have started designing the subsystem that evaluates the per-frame and per-point equations ; a central component of the rendering process. Doing this requires parsing the preset code ; and this kind of task is usually done using a so-called compiler-compiler (or parser generator). Lex and Yacc (and often their GNU equivalents Flex and Bison) are perhaps the most popular tools, and were what I tried in the first place.

But it turned out that the code generated by them is laden with ugly global variables and, more importantly, not-so-portable glibc calls that would cause problems in the minimalistic Milkymist software environment, and which would require rather dirty hacks to solve. The cleanest option would have been to modify Flex and Bison, but, as often with GNU software, the code readability standard is pretty low and I would then have to maintain and distribute my modified tools ; turning the little technical problem into a development and management nightmare.

Fortunately, after some web crawling I found these two tools :

Both of them do not use global variables and no glibc calls, unless you enable assertions or debug output. I would say their only major problem is the scarcity of documentation ; and I basically ran into these two issues that could be pointed out better in the documentation :

  • Lemon associates numbers with each token type, and generates a include file listing them. You must use that include file. If you don’t, and try to supply your own numbers instead (coming straight from the lexer for instance), this will fail because the numbers are hardcoded in the parser (instead of using the identifiers from the generated include file).
  • Lemon uses a stack where it pushes tokens which did not cause a rule reduction yet. If you want to read the token string in the parser (and you often do), you will probably pass a pointer to the string to the Parse() function, and that pointer will be pushed on the parser stack. Then you have to be careful that the data pointed to is not modified until the parser is destroyed. A way to solve this problem is to make copies of the data and use the %token_destructor directive to make the parser automatically free those copies.
My re2c+Lemon parser dissecting a MilkDrop preset

My re2c+Lemon parser dissecting a MilkDrop preset

2 Comments

Big ideas (Don’t get any) Radiohead cover

You may already have seen this video, but I just love it :)

1 Comment

AC97 controller completed

I now have completed the AC97 controller that will be used to record audio in Milkymist. Although the design is relatively simple, the important features are there – supporting DMA buffers, interrupt-driven mode, full duplex (simultaneous playback and recording) and codec register access. The only missing parts are suspend modes and variable sample-rates (only the standard 48kHz rate is supported).

While this is not one of the hardest parts, it’s good to have this working (and use the ML401 board to listen to music ;) )

Using the Microsoft Plug and Play AC97 extension to detect the LM4550 codec of the ML401 (PnP IDs 0x4e5343 0x50)

Using the Microsoft Plug and Play AC97 extension to detect the LM4550 codec of the ML401 (PnP IDs 0x4e5343 0x50)

No Comments

It’s working !

By now, all known bugs are fixed in the image warping engine, and I can already make some visual effects using the FPGA board :)

I had to resort to GPL Cver and good old PLI to find the source of the problems (mostly due to a stupid typo in the code of the DMA write engine) since working around Verilator bugs would have been too tedious. Cver is many times slower, but more reliable.

I also moved the CSRs to a dedicated bus, since Wishbone caused some problems :

- Wishbone is made to support variable latency through the “ack” signal, whereas CSRs are, as their name suggests, registers that can usually be accessed in one clock cycles on FPGA architectures. In this case, most of the time this signal causes useless complications and timing problems. The new bus requires all accesses to be made within one cycle and removes the “ack” signal.

- Wishbone has two signal for qualifying a cycle, which are “cyc” and “stb” and serve no real purpose when accessing CSRs. There is usually a single master, and the address decoding can be made at the slave. So those signals can and have been removed.

- Wishbone requires a multiplexer in the slave to master data path. By adding the requirement that a slave puts out a zero value when it is not addressed, that multiplexer can be replaced by a potentially distributed OR that can improve timing and make chip layout easier (since the CSR bus is made to connect many peripherals that can cover a large area of the chip, timing is important).

- All CSR bus signals are explicitly shared between the slaves except the slave to master data. This makes the interconnect code much more readable.

- This issue is not specific to Wishbone, but the Wishbone to CSR bus bridge implemented in the Milkymist SoC registers all signals passing through the two bus, in order to improve timing when more devices are added to Wishbone (and that’s what will happen in Milkymist, since the shader and audio DMA still need to be added to this bus).

To sum up, I have made this diagram of the system architecture, with the current progress.

soc_architecture

No Comments

Youtube…

Sans commentaire.

sarkozy

No Comments

Warp engine (almost) working on hardware

I eventually finished coding the FML DMA engines for the warper, ran some simulations, integrated the warper into the SoC and synthesized everything. And it even met timing at 100MHz, after slight modifications :)

Now has come the tedious bug-hunting phase. By now, everything is working, except that approximately one time out of 400, the source pixel seems not to be read correctly, resulting in visible black and white dots on the image. Sounds like some corner case with pipeline handshakes and cache status is not handled properly. To make things worse, Verilator is prone to a serious Heisenbug : adding a $display in a clocked “always” block to monitor a suspicious signal sometimes prints another value as the one written to the registers. Also, registering data with blocking assignments in one always block and then recapturing it with non-blocking assignments in another block sometimes yields incorrect results.

After these issues are sorted out, the next steps are implementing bilinear filtering (optional, only improves image quality) and negative off-screen coordinates (to be able to implement effects like centered zooming of the screen). The last major steps to a fully fledged MilkDrop implementation would then be a fast FPU and the software.

About performance, the fill rate in the fully running SoC with a VGA output at 640×480@60Hz has dropped to 30 MPixels/s. Analyzing the pipeline handshakes reveals that most of this performance hit is due to the read latency of the memory system. In the future, this could be improved by using a dual-port RAM for the source image cache, so that in the event of a cache miss, the DMA read engine can continue processing the stream of incoming requests while refilling the cache to honor the request that caused the cache miss. But 30MPixels/s is still enough to get good visual effects.

No Comments