The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

4 thoughts on “Faster GPIO reads and writes on GD32V RISC-V cores

  1. Robert, doesn’t this just punt the problem to protection of the GPIOA_BOP/BC registers instead ?
    Not saying that isn’t better, just saying that it seems we just move the issue.
    I will be the first to admit that I have not read the full hymnal, though.

    • That’s a good point, Andy. If synchronization across bits on the port is required and you need them to toggle on the same edge, you’re indeed going to need a larger “lock” on writes. The bit-level set and clear operations would let different threads control independent bits without locking or other synchronization, but only if that’s the only thread honking on those bits.

      If you have multiple threads or context controlling a single output register without synchronization, you have bigger problems, of course.:-) This trick is mostly useful if you have different code controlling different bits – they don’t have to read/mask/hit an accessor to get state from another module. Thread A can send sets and clears to the Door Lock bit while Thread B can control the lighting because they are independent. You’re saved from possible locking issues since you don’t have to lock the entire GPIO register or mess with shadowing registers,

      As with all things timing and synchronization related, some dizziness may occur. If side effects worsen or a rash of timing incidents occurs, consult professional help and temporarily treat liberally with alcohol. 🙂

Leave a Reply to Robert Lipe Cancel Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>