The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

I haven’t talked much about it (this being my first post and all…) but I’m pretty smitten with the current wave of RISC-V processors. I really like that it’s an open development platform, which means chip companies can buy or create the core that works with the basic instruction set and the programmers have a consistent and shared set of tools for the entire line ranging from $.10 parts for a home thermostat to a workstation class part. Rising tides and all that.

As a software guy, there are chip vendors that I’ve never heard of, but Bouffalolab Lab seems to be a pretty new name to most of us in the U.S. They’re making waves with the announcement of the BL602 and BL604 Systems-on-Chip (SOC). This part is just becoming available in the final week of October 2020 and it’s looking pretty interesting.

There are two parts in the immediate family: the BL602 and the BL604. The two parts differ by the number of GPIO pins and thus, the size of the external package. BL602 has 16 GPIOs and comes in a 32-pin QFP. The BL604 bumps to 23 GPIOs and rides in a 40-pin QFP. The 32-pin part should come in around 5mm per edge, so it’ll fit in really small applications. The integration is high enough that on the 32-pin part, half the pins are devoted to GPIO with the rest being the mandatory 3.3V, ground, crystal in, grounding, and other housekeeping. Outside the party, but similar enough that the SDK references them sometimes by accident, in the BL606 and 608. On those you go to ARM cores and trade wireless for audio, but that’s not our focus today.

It has 2.4Ghz radios, so that covers BTLE 5.0 and 802.11 b/g/n. Wi-Fi through WPA3 is supported. The microprocessor core runs at 192Mhz and comes from the SiFive cores. JTAG support is included and it claims support for the Segger family of debugging pods and software. It has 276K of SRAM, which is a really nice step up from the closest competing part, the GD32V’s which pack a mere 32K. Both parts have 128K of flash. There are two UARTs, but no USB controller like the GD32V. It has the standard alphabet soup that we expect in a post ESP8266 including SDIO, SPI, I2C, and such.

Bouffalo is a Chinese company and the documentation reflects that. Google Translate helps bring the doc and some code comments to us, but the reality is that many, many sections in the manual are just blank for now. The 34 page datasheet is oriented toward hardware developers, but does throw the smallest of bones to programmers. For example, it doesn’t tell you WHAT the UARTs are (it’s not a 16554) but it tells you where they are in memory.

A few companies are racing to put boards in the hands of developers. Both SiPeed and Pine64 seem to be early movers with prices hovering around $5USD with shipping being about as much again. I’m sure we’ll see an ocean of reference designs on Aliexpress and such soon. Currently, we have three such boards:

  • Pine64’s PineCone64: USB-C (yay!) via a CH340N and with RGB LED.
  • Doi.am’s DT-BL10: Micro USB
  • SiPeed’s BL602: Micro USB. Rumored to have FTDI serial interface. Might be same as Doi.am’s

Development environments include Eclipse + OpenOCD or Freedom Studio + OpenOCD. The company provides an app called DevCube (unrelated to the VR tool of the same name) to configure devices for production. This helps IoT makers prepare the flash for partitioning and OTA upgrades. It seems inevitable that the popular PlatformIO IDE will be supported. Since it’s a familiar SiFive core, just submitting the needed flash layout and programming data is probably on the order of dozens of lines of code to support PlatformIO for BL602/BL606.

If we look past the large number of blank doc pages and the high percentage of Chinese in the SDK, what do we find? This is actually the jackpot for RISC-V designs – there are working Mac, Linux, and Windows (Cygwin) GNU-based toolchains right in the tree at launch. Filenames are a bizarre mix of CamelCase and underscore_separators, even sometimes in the same directory. It ships with GCC 8.3, but since RISC-V was upstreamed a while ago, I’d expect upgrading to later versions to be easy. It relies heavily on Amazon’s FreeRTOS as the core. The entire package relies on external source (FreeRTOS, GNU Tool binaries, compression libs, etc.) by copying and not via git-referencing the modules by inclusion. This means you’re probably always on a stable build, but it also almost insures that you’ll always be behind releases of third party code, which can be bad in network-connected devices.  Amazon’s MQTT and Jobs interfaces are well represented. The expected libraries for a high-functioning 32-bit core are all there, with a pretty full ANSI libc with printf, compression and a filesystem for the flash. Startup code is a pretty straight SiFive example for GCC. There are 31 separately buildable apps for demonstrating features of the device, including a CLI for interfacing with it somewhat like boot monitors of days past. “Hello, World” is there, of course, but ironically doesn’t contain the string “hello” as it gets printed in a callback from the initialization state machine as it advances.

The network modules are, unfortunately, binary blobs at this time. I expect the community to make short work of changing that.

Overall, this looks like a good entry for the bottom/low-end space for wirelessly connected (or not) devices for hobbyist and commercial applications. The SiFive core is well regarded for performance and the performance should be better than the GD32V’s 108Mhz while the price lower than the AI-centric dual-core 64-bit Kendryte K210.

Start your engines with:

Official Boufallo SDK
Pine64 is offering hardware to developers that help get BL602 started.
SiPeed (makers of Longnan and Maix lines) has a BL602 SDK.
Doit has a $4 eval board with a BL602 SDK.
All three of those are forks from the Boufallo code. We don’t know yet how different the hardware is.

RJL