Archive for April, 2010
Milkymist 0.5 “/tmp/lab party” released
I am pleased to annouce the release of Milkymist 0.5 – that was successfully used for the first time for actual VJing ![]()
It’s slowly getting into place – there are now 42 visual patches distributed with this release, whose rendering matches that of MilkDrop and/or simply looks good.
I do not distribute a binary kit of this release. The reason is that the the alpha blending feature broke the Xst synthesis of the TMU, making most of the new features pretty useless (they do work very well when using another synthesizer, though). As it turns out, the Xilinx crapware fails to synthesize correctly arithmetic constructs like “a*x+(E-a)*y”, requiring manual instantiation of DSP48 primitives for those. I have already fixed the problem for the texture filtering, but I feel somewhat demotivated to do it again (the S6 boards are coming soon and hopefully will not tickle this bug). If someone is interested in Xst synthesis for Virtex-4, please send me a patch.
Complete change log:
* System capabilities register
* Memory performance monitoring (retrieves memory bandwidth utilization and average memory access time from the live system)
* New PFPU instructions:
** QUAKE (Quake-III style inverse square root approximation)
** IF (conditional)
** TSIGN (sign manipulation)
* Translucency (alpha) support in TMU
* Faster FastMemoryLink arbiter
* Fixed DRAM write-to-read (tWTR) timing violation on fully pipelined transfers
* New FPVM (Floating Point Virtual Machine) library for runtime compilation of PFPU programs
** High-level API makes it easy to use the PFPU
** Supports addition, subtraction, multiplication, fast inverse square root, square root, division (experimental), modulo (experimental), integer/float conversions, comparisons (above/below/equal), conditional statements (if), absolute value, sine, cosine, integer part, min(), max()
* New patch parser
* New renderer features
** Configurable per-vertex equations
** Video echo
** Warp
** Scale (sx/sy)
** Q variables
* New patches included
* irender command to input patch code on the serial console
* Build host tools using clang instead of GCC
* Software bugfixes
** TFTP boot in QEMU
** Correct placement of motion vectors
** LCD user interface race conditions
** Renderer stop race conditions
BEC schedule modifications
Posted by lekernel in Breizh Entropy Congress, Milkymist on April 16, 2010
Because of the cancellation of many European flights, some speakers are unfortunately unable to attend the Breizh Entropy Congress. We are therefore making some changes in the schedule of tomorrow (Saturday April 17th):
- The talk by António Costa Valente (10:00-11:00) is replaced by a workshop by Sébastien Bourdeauducq on authoring Milkymist visual patches. Beginners welcome, no FPGA programming involved. Bring your laptop with a serial terminal program like GtkTerm (warning: GNU propaganda on this page, but the program itself is good). Language: English+French.
- The talk by Philippe Langlois (13:00-14:00) is replaced by a workshop by Nicolas Brodu about how to set up teleconferencing. Come and learn how to install and run Asterisk! A presentation of the Encours.Org project is also included. Language: English+French.
BEC streaming
Posted by lekernel in Breizh Entropy Congress on April 16, 2010
![]()
Thanks to Ubicast (and Flowty), we have video streaming of the conference rooms. Go here to watch the rooms live!
Milkymist presentation at Breizh Entropy Congress
Posted by lekernel in Breizh Entropy Congress, Milkymist on April 13, 2010
Slides for the presentation of Milkymist at Breizh Entropy Congress (Rennes, France, April 15-17) are available here.
This new presentation features only a quick overview of the system on chip design, and then focuses on the software and the Milkymist One product. The presentation will be concluded with a demonstration of the system (on a ML401 board) and the live coding of some visual patches (presets).
HES2010 FPGA reverse engineering results
Posted by lekernel in Uncategorized on April 11, 2010
Even though no one managed to find the correct passwords of the HES2010 FPGA reverse engineering challenge (maybe time was too short), we still congratulate Eric Rannaud who came up with an expertly designed method and very interesting conclusions. He wrote:
At first, I only worked with the binary tarball, and only had a look
at public/ later. I’m not entirely sure why, as I did read about the
sources in the announcement, but I only got around to look at them a
while after that. This made the beginning of the challenge more
entertaining I suppose.I ran strings on bios.bin, found the “Access granted!…” string,
looked in the disassembly bios.bin for a use of that string’s address,
and a few instructions before that, found the addresses for the
security module registers on the CPU bus.For level2, I converted the NCD to XDL, which is a much nicer text
version of the NCD. It’s the netlist at a gate level (so it’s somewhat
big and you can get confused or lost, as I probably was — even though
I did use a script to extract the passwords… oh, well). XDL is still
a somewhat unwieldy format, relying on a lot of implicit details
regarding the hardware primitives on the FPGA, so it can be a bit hard
to interpret (all these details are essentially specified in the
Spartan 3 user guide, with a couple of errors and ambiguities).I located the two 32 bit comparators for pwd1 and pwd2 (the XDL
contains names derived from design wire identifiers, something like
“security/pgood2_cmp_eq0000_wg_cy<1>“). They are built with a bunch of
LUT4 chained with a fast carry-chain. The LUT equations are such that
the output is 1 only when the 4 LUT inputs match 4 bits of pwd1 or
pwd2. There are several possibilities at this point:1- You figure out pwd1 and pwd2 from the LUT equations, then pass them
to the software. That led to my previous email. Even though I got it
wrong, that should be pretty straightforward.
2- You follow the outputs of the comparators back to the CSR bus,
figuring out the secret by looking at the logic between pgood1 and
pgood2 and the bus port. Your “bus <= {32{pgood1 & pgood2}} & secret"
trick made the synthesized logic a bit tricky, with some folding and
maybe physical synthesis obscuring things a bit. I had a look at that,
but stopped after (I think) figuring the logic for half the bits of
the CSR bus port, i.e. half the secret bits.Ideally, you would want an XDL simulator: you excise a subnetlist of
the XDL by extracting the graph between the bus input port and bus
output port (dropping all the other logic). You feed the stimuli
corresponding to two writes at the addresses found in bios.bin. Then,
you follow the activity in your simulator to find the location of the
comparators, then you either find the equations, or your simply
monitor pgood1 and pgood2 (separately): you only need to try up to 2 *
2^32 possibilities (instead of 2^64), which with a fast enough
simulator should be feasible (as you only simulate a small subset of
the design).The advantage of such a (local) simulator is: (i) much easier to see
what's going on as a program does the work for you, (ii) had you used
a much trickier type of password matching device (hash({pwd1, pwd2})
== known; or say, pwd1 and pwd2 is a sequence of bits used to put a
large LFSR in an initial state, then run it for a while, then check
128 bits of the LFSR output for a given sequence), being able to feed
any stimuli you want to a black-box, with the ability to observe any
wire in it, lets you progressively replace know parts with
higher-level code that you can comprehend.Anyway, with more time (or a faster brain), level2 is quite feasible,
even without any Verilog or C code.For level5 my approach was going to be the following.
As I know about the two 32 bit comparators, and that they are
synthesized in a pretty obvious way (two long fast carry-chain with
LUT equations in a special form -- i.e. only one input out of 16 gives
an output of 1, or "popcount(eqn) == 1"), it should be relatively easy
to locate the comparators in the device.As long as the device is not too full, ISE uses a carry-chain with a
constrain to adjacent slices. Locating the LUT configuration bits in
the bitstream, for each LUT in each slice, is a fairly straightforward
task. You can use XDL to force a given physical LUT to be set at
0x0000 and 0xFFFF, say -- only 16 bits and a couple of CRC change in
the bitstream. By looking for the LUT equations with "popcount(eqn) ==
1", you get a list of 4 bit values (such that "lut4(eqn, value) ==
1"). You then have several approaches:a- If that list of 4 bit values is small-ish, assume that the
synthesizer kept the same grouping of bits per LUT as in level2 (i.e.
one LUT matches candidate_pwd1[3:0], another matches
candidate_pwd2[11:8], etc.), then there are only so many possible
combinations of fixed 4 bit values to form a 64 bit word, (64/4) out
of size_of_the_list. You try all these passwords through the software.
b- You rely on the fact that the comparison LUTs must be next to
each to others to get a smaller list (e.g. any isolated LUT with
popcount(eqn)==1 can be ignored).
c- Figure out the bits controlling the carry-chains (again using
XDL). Find enabled carry-chain in the device (there should not be that
many), correlate that with the ideas of a. and b.If you didn't have the information from level2, or if XST made
different choices than in level2 regarding synthesis decisions, your
pretty much have to know something about routing configuration in the
bitstream (so far, we didn't need anything like that).The software released by Jean-Baptiste Note a ulogic.org does not
support Spartan 3A. So you either need to add it (should not be too
hard), or you start from scratch. In any case, this work likely takes
more than 48 hours. But it is tractable.From this you can:
1- Extract some kind of netlist. You don't need everything, enough to
get close to a partial XDL-like around the CPU bus. To find the CPU
bus, follow the outputs of the only 32 bit 1-cycle adder in the design
(look for the longest carry-chain)... you rarely have more than one of
these.
2- More fun, you alter the bitstream to reroute some of the values
coming from the comparator outputs either back to the bus or to
external pins. Then you need to test 2^33 values (the speedup comes
from the fact you can observe pgood1 and pgood2 independently, as
opposed to only "pgood1 & pgood2" -- you guess two independent 32 bit
values, not a single 64 bit value). Depending on the system clock and
the speed of the software, that should take a few hours.In my view, most of the difficulty in this type of task comes from the
sheer size of the netlists, not really from the obscurity of the
bitstream. The work at ulogic.org pretty much deals with that.
FPGA security challenge: files released
Posted by lekernel in Uncategorized on April 8, 2010
Download here the files for the challenge:
and see the description of the levels below! Do not miss the presentation at 14:00…
LEVEL 0
When the bitstream is loaded, the FPGA expects a 16-bit password which is shifted bit per bit using two pins. If the password is good, you are rewarded with an Arduino-style LED show. The participants will have to discover this password. This level is solvable using common techniques and is intended to give a rough overview of how FPGAs work in practice and what hardware security is about. Participants who think it’s trivial are encouraged to skip it and proceed directly to level 1.
Participants are given the bitstream, the NeoCAD Circuit Description (NCD) that they can examine with FPGA Editor, and the Verilog source code (of course, with a different password).
LEVEL 1
Same as Level 0, but this time, the password is 64-bit. Participants are given the same files.
LEVEL 2
This time, the security device is embedded into a complex system-on-chip (based on Milkymist [6]) comprised of a microprocessor, memories and serial port – all implemented on the same FPGA. A software program run on the FPGA softcore processor will talk to the security device and send it a password to make it reveal a built-in secret information. Participants will have to find out that secret. This level is harder than the previous one because the security device will be buried among thousands of FPGA logic cells comprising the system-on-chip and connected to it through an on-chip bus.
Participants are still given the NCD file and the source code, making the task significantly easier.
LEVELS 3-5
They are the same as levels 0-2, but without the NCD! (and different passwords of course).
Those are obviously the most interesting levels, as when you are working with a real security system, they will never give you the NCD. Reverse engineering bitstreams involves good knowledge of the FPGA’s internal structure (the previous levels should have gotten you some of this), mastery of Boole algebra and logic functions manipulations, and expertise with file format reverse engineering. There is an existing effort [8].
REFERENCES
[1] http://www.cl.cam.ac.uk/~sd410/papers/fpga_security.pdf
[2] http://spectrum.ieee.org/semiconductors/design/the-hunt-for-the-kill-switch
[3] http://www.xilinx.com/products/devkits/aes_sp3a_eval400_avnet.htm
[4] http://lekernel.net/blog/?p=668
[5] http://lekernel.net/blog/?p=429
[6] http://www.milkymist.org
[7] http://www.milkymist.org/wiki/index.php?title=Installing_the_Spartan_3A_evaluation_kit_mini-port
[8] http://www.ulogic.org
[9] http://lekernel.net/blog


