Many of us have been pretty disappointed in the long lead time it takes to get chips from specification into production.  For RISC-V devotees, this was brought into clearest focus this year where November of 2021 brought us ratified specification for Vector Computing 1.0, in particular, but we’ve mostly developed via emulated cores in software or FPGAs  or through chips like Allwinner’s D1 family of parts which paired a single core with a pre-release version of the Vector spec that was already over a year old when the device shipped. Lucky for us, we may see history repeating in the one year part of that with first Vector 1.0 silicon coming late this calendar year, so likely November or December.

Many of us had hopes that StarFive, with their close ties to IP vendor SiFive, and their collective “dry-run” experience with shipping many hundreds of chips through BeagleV, Starlight, VisionFive which in the upcoming JH-7110 iteration DOES bring around 3D graphics, and four 1.5Ghz cores along with a comfortable (2-8GB) headroom of RAM. The Kickstarter from StarFive was successful with over 2,000 units and that’s easily one of the most anxiously awaited parts of 2022, with Pine64’s fast-following board adding PCIe graphics/other expansion slot,

The new part has generated less buzz, because while it has been known for a few months, it was under press embargo until now,  It comes from Shenzhen’s Bouffalo Lab which is relatively unknown outside of RISC-V developer circles. They’re very much a Chinese company and their Western presence can be pretty tricky to find a pulse for, but they have a family of developer tools with (mostly) enough English documentation, tools, and support. While they have really inexpensive I/O chips, their chips will be mostly known by readers of this page as being the brains of Pine64’s Pinecone and reduced pin count Pine Nut. In broad strokes, those BL602 and BL604 chips are comparable to the ESP32-C3, with a SiFive E24 core and a basket of I/O, including Bluetooth and WiFi. Cousins BL702 and 706 add more GPIO, may trade WiFi for Zigbee in certain models, and have cost/performance models that make it possible to emulate an FTDI in software, suitable for a $3.59 JTAG board ir drive full size panel displays while feeding WiFi services, GPIO monitoring, and such. They’re very flexible parts.

The zinger here is that for BL808, their newest chip (expected “soon”) we leave behind the SiFive cores and go with the cores that were open sourced by Alibaba’s chip division, T-Head about last year. Bouffalo was able to pair T-Head’s experience in high-speed cores with their own experience in fabbing high-volume/high-volume parts, and fuse in value like the new Vector 1.0 specifiction. Now that we have ~18 months or more of experience in simulating and building software for those parts via LLVM and, less so, GCC, that seems like a great partnership.

The coarse-level datasheet is almost self-deprecating. “Take four marginally related compute nodes and attach everything to everything” look:

Bouffalo did what they did best, and Sipeed is on deck to do for this chip what they did to the (then) ground-breaking GD32VF103 (zillions of <$10 RISC-V boards without cables and a very usable SDK) or the K210 – which they morphed into a dozen form factors and married an early Rocket design with a numeric computation unit made FL acceleration/AI  accessible to the < $20USD developer in many packages. So what makes BL808 a good date to bring to the computing ball of 202x? 

Integration. The likes of Sipeed, Pine64, and others will mount the board to a variety of backing form factors so people wanting access to these can just use them without having to wire-wrap them or hire a high speed digital logic team to take all the high speed timing craziness.

Tool stability. RISC-V is probably the first real ocean of silicon tech that’s had the software team delivering on high before the hardware team could make wafers. RISC-V is simulated, the tools are validated, and these tools are all available at the risk/scale/price point you want to pick.

ZZZZZZ TODO: Insert 3-wide frame of chip cut-ways and QR’s here.

There are already hundreds of pages of documentation available online. It’s probably not the best place, but it’s the first place I’ve seen that’s publicized in a way that doesn’t look like like a leak. :–)

Of course, the chips themselves have RealTimeCounters, 20-channel Direct Memory Access Controllers (as we do) , USB2,  JTAG, SPI, four UARTs and all those other creature comforts that we essentially expect to see in our $10 chips these days. (Pricing hasn’t been announced…)  This part has so many processing/IO cores that it’s actually hard to distinguish them.

“The wireless subsystem includes a RISC-V 32-bit high-performance CPU, integrated Wi-Fi /BT/Zigbee wireless…”
“The multimedia subsystem includes a RISC-V 64-bit ultra-high-performance CPU and integrates video processing modules such as DVP/CSI/ H264/NPU, which can be widely used in various AI fields such as video surveillance/smart speakers….”
“NPU (numeric processing unit) HW NN (hardware neural networking) co-processor (BLAI-100 – Bouffalo Logic Artificial Intellligence) generally used for AI applications
Of course, there’s also a low-power 32-bit RISC-V unit to babyset THOSE four compute modules, because it’s 2020 and why the hell not!!!

You literally end up with M0 having “32-bit RISC-V CPU with a 5-stage pipeline structure, supports RISC-V 32/16-bit mixed instruction set, contains 64 external interrupt sources, and 4 bits can be used to configure interrupt priority.”
D0 has “a 64-bit RISC-V CPU with a 5-stage pipeline structure, supports the RISC-V RV64IMAFCV instruction architec- ture, contains 67 external interrupt sources, and 3 bits can be used to configure the interrupt priority.”

As a software engineer, your job as a shepherd is to keep all the computing power your customers have being asked to pay for busy, but not overloaded. Don’t awaken a 64-bit core with an FPU fi you can service your immediate need (maybe it’s a temperature sensore recognizing something is hell-bound)  can be handled by a mostly 16-bit, integer-only RISC-V part. Of course, lighting up the numeric inference cores brings on a very different source of power and performance tradeoffs.

Of course, the chip has the mandatory boat of timers, PWMs, ethernet (10-100Mbps only)  and more. It really is quite ridiculous what a couple of dollars and 88 pins will buy in modern time. It’s an added bonus that these parts are expected to be available with less than a 104-week lead time. 🙂

These look like very cool chips and I look forward to seeing board from the likes o Sipeed, and maybe Pine64 or BeagleV very soon. I haven’t seem formal pricing yet, but I expect to see full boards for less than comparable D1 boards, but to have the added benefits of standard compliance (ahem, those page table bits and jumping the gun on V without pushing it into the reserved opcode space…) over the Allwinner parts. These should be priced way under the JH-7110’s, but have the edge of NPU’s (particularly when pairdd with Sipeed’s new MaixWHATISTHATCALLED?LOOKITUPROBERT) library that makes NPU/Tensor-style programming pretty easy..

Programmers, what tools do you need to see to takme these boards?
Hardware types, what playgrounds can you build for the programmers to fill?

Eventually: cc to lupyuen, caesar, bouffalo team, others for comments…

The much anticipated products from Sipeed, The M1S Dock and M0 Sense are now being delivered to customers. Mine arrived in the U.S on December 20, to my surprise as the tracking number never fired on USPS Informed Delivery and Fedex did not announce the delivery. These were purchased boards and are not prerelease.

M1S Dock

M1S Dock is a board with the Bouffalo BL808 Processor. It features three RISC-V cores: one 480Mhz 64-bit -T-head D906 variant that’s similar to the one in Allwinner’s D1 (including the outdated 0.7.1 vector unit, alas), one 320Mhz T-Head 32-bit E907 for coprocessing, and one low-power 150 Mhz T-Head RV32EMC core for super low power use, such as keyword recognition to awaken the others on demand. As a bonus, it contains NPU BLAI-100 (Bouffalo Lab AI engine) for video/audio detection/recognition.

The M1S Dock starts at $10.80 for the board with headers and ranges to $24 with camera, LCD, and case.

The device supports:

  • 2.4 GHz 802.11 b/g/n Wi-Fi  4
  • Bluetooth 5.x dual mode (classic + BLE)
  • IEEE 802.15.4 for Zigbee
  • 10/100M Ethernet through add-on board

There is 64MB of RAM and a “real” MMU with RV32, so while you’re not going to run your favorite Fedora workstation-class configuration on it, a ‘normal’ embedded Linux kernel and supporting utilities is quite practical.

Optional peripherals from Sipeed, pictured below, include the display, a debug board (which features yet another RISC-V part, the BL706, to bit-bang the debug protocol (which appears to NOT be JTAG), a camera, and a hard plastic case.

Image of M1SDock and M0Sense
M1SDock and M0Sense

Assembling the case is best described as painful. While it looks like a flexible silicone case, it’s not. It’s a hard plastic with a rubbery texture. The screen has to be removed from the double-stick tape holding it to the board, have the screen passed through the hole, have the screen fastened to the board, and then the board threaded into the case. Since the double-sided tape for the screen has a small area, I’m not expecting to be able to remove and re-insert the screen very many times.  If I’d known what a pain it was, I wouild have certainly soldered down the provided .100 posts before mounting it.

Image of Back of M1s Dock
Back of M1s Dock
Image of Front of assembled Sipeed M1s Dock
Front of assembled Sipeed M1s Dock

 

Sipeed has done well providing documentation for the M1S Dock, including pinouts, a full SDK (with Bouffalo Labs) , AI Model and Framework, and a handy drag & drop approach to burning firmware. and many M1S Dock demos.

M0 Sense

Also delivered are the M0Sense boards. These are a lovable little alternative to nRF52480-class hardware. The featured processor is the BL702 at 144Mhz. Twelve of the sixteen pins are available I/Os and the board comes with Bluetooth, including BLE. The SiFive core is attached to 132K of ram and 512K of flash. The board provides an IMU and a USB Full-speed (12Mbps) interface. Computationally they may not take the dual-cores (and PIO) of the RP2040 products, but these are great alternatives in the RISC-V world that offer easy programming and plenty of powerful I/O.

The board starts at $4.50 USD. Adding the .96 screen makes it $5.99.

Sipeed has done well providing documentation for the M0Sense, including pinouts, a full SDK (with Bouffalo Labs) , AI Model and Framework, and a handy drag & drop approach to burning firmware. and many M0 Sense demos.

Summary

Between these boards, you have a very low-end sensor board with ML abilities for $4 that includes I2C, SPI, and all the normal things to connect to your own sensors AND a relatively high-end MCU with a dedicated ML coprocessor. With M1S Dock being a cousin to Pine64’s OX64, we’re sure to see a ton of software development around them. They’ve taken the sharp edges of Bouffalo’s unpleasant boot loader by providing a drag-and-drop capable boot loader. The BL808’s available RAM, performance, and price really makes it difficult to lean into the Kendryte K210 class of boards as we enter 2023.

I really look forward to exploring these boards in coming weeks and months. What do you plan to do with them?

I don’t have enough LilyGO products in my lab; I should probably have more. They seem to make clever products aimed at the developer/hobbyist market (that’s me!) but it seems that they get undercut by features on on one product, then are months late to market on the next, It seems they manage to remain on my radar while escaping a place on my bench. I learned of T-LilyGO, which is an improved version of what’s best known as a Speed Longnan Nano (GD32V) just a few weeks after I bought a bucket of Nanos.

LilyGO’s latest product, the T-PicoC3, manages to pull a unique development twist. The product marriage is “obvious”, though I haven’t seen it done. I’m writing about this because of one obscure feature. First, let’s explore roots of what makes it awesome.

The two chips that make T-Pico3 great

T-Pico3 is about $15USD shipped to the US. It manages to use not one, but TWO of the season’s most deservedly buzz-worthy MCUs. Both the ESP32-C3, a RISC-V part with really great pin density and development SDK support, and the Pico 2040 – as well as an onboard antenna, an external IPEX connector, a 7789 display controller w/ 1.14″ LCD, tons of GPIO, and way more in the familiar dev board size. Barrels of (virtual) ink have been written on whether the ESP32-C3’s SiFive-backed RISC-V core or the RP2040’s dual ARM Cortex M0’s are “better”. The answer, of course, is “it depends” and I’m not taking sides in this article. But what if you’re a student wanting to learn both ARM and RISC-V and you don’t want to choose between your peanut butter and chocolate, but just want one yummy(!) product that mixes them both?

T-PicoC3_en.jpg

So on one tiny PCB, they deliver the dual-core+specialized bit-banging capacity of RP-2040 (which has been used to bitbang DVI (!) and countless light chasers based on WS2812, which is a protocol with odd timing requirements) with the ESP32-C3 seemingly left to handle a full TCP-IP stack, Bluetooth, compression, security, and other such tasks while “additionally” being a quite capable 160Mhz RISC-V core of its own.

Putting “the two great tastes that taste great together” sounds like a good idea (let one chip specialize in networking, two RP2040 PIO bit cores blink out a CAN bus or 2812 (or other) LED blinkies) and hold it all down with some MicroPython on either (or both) of the Cortex M0 cores. Whether you’re a student that really just wants to backpack a single board for both ARM and RISC-V programming or you’re building a robotics or IoT thing, it’s just easy to imagine these going well together. You can make crazy combinations of dedicated interrupt controllers, GPIO controllers, interrupt domains, etc. MIx them up as you see fit.

…because I’m writing while hungry.

Debugging your creation

Normally, for ease of debugging, I nod to the ESP32-C3. Inside the part are dedicated cores that run a FTDI-like parallel controller that can be used for JTAG debugging AND a USB communications class, so your device’s serial console can appear on the same plug you’re powering the unit from. For someone that values mobility while debugging, it’s awesome. So those are the obvious connections for the USB lines.

But how do you debug the RP2040?

I’m normally a big fan of USB-C. Beyond the speed, the power – both in literal volts and amps and in the capacity of devices you can attach – I dig that they can be flipped end for end (no “host” and “target” end) and that they can be flipped from top to bottom, making them impossible to plug in upside down. The receptacles are symmetric. Compliant USB-C cables are only kind symmetrical.  It’s lesser known that the tops and the bottoms of the plugs are actually not fully asymmetrical.  The controllers actually engage in strategic lies that pull off the flippability trick. These same strategic lies are what allows the cables to smuggle additional kinds of data, such as Thunderbolt signals or GPIO and JTAG pins in the case of the breakout board for Pine64‘s Pinecil. Even with these mistruths, it’s common practice to uphold the guidelines for the cable to work either way.

Do you hate your users?

For a hybrid board like we’ve described above where we’re trying to attach two quite different SoC’s to the host, the “obvious” thing to do would be to add something like CH340 to give USB powers to the RP2040 console. Then you add a USB hub and tie both chips behind the hub, allowing all three devices to appear to the host. A more sophsticated design might tie the 2040 instead to a serial from the ESP32-C, but then you lose USB-master mode for the 2040. A circuit-layout Jenga master may have been able to find a USB-C pinout that let one device ride shotgun on the bus of another along the approach of the Pinecil’s exposed JTAG lines but I think that has compromise if you’re pretending to be a host instead of a target. Instead, we’re left to imagine this conversation happening within LilyGO’s engineering:

“What if we built an interface that worked completely differently if plugged upside down?”

“Why would you build a Cursed USB-C device? Do you hate your users?”

“What if we made it blink different colors, but unreliably?”

“OK.”

So our imagined engineers dutifully run off and after a little USB Selective Disobedience, successfully deliver power and ground – safely – in either orientation. (That’s the red and green in the below diagram and that’s intentionally made infallible.) With one mating between the cable and the board, the ESP32 has ownership of D+ and D- so the JTAG and serial ports associated with the RISC-V side of the house are presented to the host. You can use esptool to program and manage that device or program it via JTAG. In the other orientation, RP2040 gets the port and it’s either a USB mass storage device awaiting a .uf2 boot file to chomp on or a connection from the Thonny Python IDE.

Imagine the top (“blue led”/RP2040) being attached to the top D+/D= pair on the right and the bottom (“Red LED”/ESP32-C3) being attached to the bottom D-/D+ pair.

USB Type-C connector Pinout

Great. Now we’ve created a product that works exactly like a ‘normal’ user would never expect it to work, but, given the target audience, this is probably OK. Only it leads to hilarious disclaimers like this:

When connecting, the onboard LED lamp will be indicated according to the connected chip (due to cable problems, it is possible that the indicator light is opposite to the actual connected chip, or even two LED lights at the same time, please replace another cable when two led lights up at the same time)

 

The moral(s) of this story

Morel Mushroom In Leaves Close-up   No, no, no. Those are morels. They’re different.

  1. LilyGo makes some pretty cool stuff. Their products aren’t necessarily destined for “Raspberry Pi” levels of creativity and ubiquitousness, but they have some nifty and fun mixups that can save a budding EE (or a struggling SWE) from rolling their own designs. Many of their products are straight-forward mashups of existing low-cost circuits, but on one convenient board. Not everything needs to be complicated to be useful.
  2. Sometimes, there just are not points awarded for style. It’s easy to imagine a $15 product that may spend a semester or two inside a students backpack or a one-off that’s inside your IoT robot that needs both WiFi and finely controlled WS2812 “lasers” – or, heck, real lasers cutting into or measuring something. These may be programmed less than a dozen times before they’re retired (the first case) or sent into duty and unlikely to be connected to a PC ever again (the second case) As long as the users are in on the ‘joke’ that a cable that should never need to be flipped sometimes needs to be flipped, maybe that’s OK. Making that connection twice as reliable but requiring twice as many cables, a hub, defining the interaction of all these devices potentially controlling  the bus at the same time, etc. is money you’ll never get back.
  3. It’s absolutely not, however, OK to do this to an end-user, mass produced product unless you do, in fact, hate your users. (Hint: they will retaliate somehow….) Consumers want standard things to work in standard ways.
For a while, SiFive and LLVM were both developing support for RISC-V Vector 1.0. LLVM is now the only one in active development.

Jim Wilson, a GNU developer of decades, works for SiFive and while he wasn’t the one doing the work, it seemed likely he knew who did, so this is authoritative. He recently said:

There is no actively maintained gcc rvv support, and no ongoing gcc rvv development. Current work is all in LLVM, and LLVM is recommended if you want rvv support. … SiFive abandoned the gcc rvv work and is doing only llvm rvv work now. The gcc rvv branch is badly out of date.

Reading deeper into the GCC development list, this is really just a form of tough love as work on GCC’s RISC-V vector (and auto-vectorization in general) has been talked down before in July, 2021: 

 It isn’t up to date with the evolving RVV ISA spec, it isn’t up to date with the evolving RVV intrinsics spec, there are ugly hacks in the vectorizing optimization passes required to make it work, there is no autovectorization support, it is missing basic optimizations like eliminating duplicate vsetvli instructions, etc. The current status is that it is only useful as a toy for demos. SiFive and a few other organizations are contributing to the LLVM vector support, but no one is contributing to the gcc vector support. Alibaba has expressed some interest in contributing recently but it isn’t clear how we will handle their patches yet. The current stuff was mostly done by SiFive, but SiFive is not currently interested in funding this work.

I’m less sure of rjiejie‘s credentials, but they may work for Alibaba/T-Head, the maker of the cores used by Allwinner in D1 and D1s. That may be the “we” in his comment:

We have also supported/maintained the RVV v1.0 feature, you could download prebuilt gcc toolchain from Alibaba website[1].

Registration for the site is required and Google Translate doesn’t handle it well, so I’m not sure, but that may be a path for someone really needing GCC with Vector 1.0. It’s not clear if that work handles 0.7.1 of Vector as was used in D1. The C910-based products also seem to support 0.7.1, so their 1.0 support must be for future chips.

LLVM is one of several projects that has struggled with the issues of handling multiple V versions in the same code, but their resolution wasn’t clear. Simulator QEMU “solved” them problem when adding 1.0 by dropping support for Vector 0.7.1.

SiFive is “only” one of many core vendors and that they can’t be expected to carry the development/maintenance/support for such things by themselves, but it’s surprising (to me) that they’ve halted development.

T-Head has binaries (and maybe source) for GCC that supports V1.0 and probably 0.7.1, though that may be in a branch as it’s pretty clearly a dead end now that V1.0 has been ratified. LLVM and SiFive were, at least at one time, partnering in LLVM development. LLVM seems to have an active plan and are shipping V1.0 support now.

For GCC’s status to be “useful as a toy for demos”, it’s probably a disservice to even have it in the default builds of GCC until someone is willing to fund the couple of person-years that Jim mentions to get it on track. At least bugreports to LLVM are likely to get traction as it’s actively developed.

For now, if you’re developing vector code on RISC-V, prepare to pair your toolchain with the chip/simulator you’re using. It’s likely to be finicky for a while.

We’re all familiar with the fable of the boiling frogs, unable to sense the change they’re (literally!) immersed in. Enthusiasts of RISC-V architecture may be encountering the same right now: late 2020 gave us a steady stream of new hardware announcements, but we may not have a great sense of us since the hardware isn’t always possible to order yet. Let’s review some of the upcoming products in this market, duly nothing that products can change or get canceled before they even ship.

We had two major new families of entries in the iOT category. Both use the RISC-V to drive WiFi and Bluetooth radio stacks. Bouffalo Lab’s BL602 is available in quantity now. Starting around $2.50 for a module with multiple development boards in the $5-$10 range (including Pine64’s Nutcracker for PineCone and the DoIt DT-BL10), this chip starts with a core from SiFive and has 802.11 b/g/n and Bluetooth 5. The upcoming BL-702 family adds Zigbee radios. There is enough compute resources (CPU, RAM, Timers, etc.) that you can build your own software right onto the radio chip via their multitasking OS and open development kits. You may recognize this as the basic model popularized by Espressif in their ESP8266 in recent years.

Espressif also embraced RISC-V with their upcoming ESP32-C3 family. It’s interesting that this chip doesn’t even get a distinct name at this point as Espressif apparently sees the CPU core as only a small part of the product. Still, by volume, the ESP32-C3 is likely to become an extremely popular choice.

Moving up a step computationally, we enter more traditional chips and single-board computers. Alibaba’s Xuantie 910 is widening into a family of chips. The C906 is being marketed for more entry level class, but still featuring a load of I/O, multiple cores, support for the still-not-ratified Vector extensions, and more. Press releases tend to mix up the 910 and the 906, but they both seem pretty hot.  In late January, anAndroid Open Source Port of C910 was demonstrated. Embedded specialists Sipeed have announced a C906 development board that’ll run Debian and that starts at $12.50. If Sipeed does for that what they’ve done for GD32V and K210, we should see lots of interesting SBC projects from them.

Sipeed teases C906 RISC-V board

Rios is bringing us a claimed competitor to the Raspberry Pi called the PicoRio. It’s coming inthree stages:

  • PicoRio 1.0 is a headless, four-core RV64GC that’s capable of running Linux at 500Mhz. It’s been used from 2020H2 to an expectation of beta in 2021H1.
  • PicoRio 2.0 adds Imagination’s PowerVR GE7800 XE series GPU, which may finally bring a GPU-capable RISC-V development board into casual hobbyist price points.
  • PicoRio 3.0 strives to bring the performance to be comparable to a tablet or desktop computer.

Another entry in the Pi-class of hardware, though not at Pi Price, is the Beagle V from the group that brought us the famed Beagle Bone. It uses two of SiFive’s U74 cores at 1Ghz includes 8GiB of LPDDR4 RAM, gigabit Ethernet, an 802.11n Wi-Fi + Bluetooth 4.2 chipset, and a dedicated hardware video transcoder supporting H.264 and H.265 at 4K and 60fps.The system also offers four USB 3.0 ports, a full-size HDMI out, 3.5mm conventional audio jack, and a 40-pin GPIO header. As a snack for those interested in AI applications, it also features  a Tensilica Vision VP6 DSP for machine-vision applications, a Neural Network Engine, and a single-core NVDLA (Nvidia Deep Learning Accelerator).

Core provider SiFive is bolting Freedom U740 cores to a min-ITX design in HiFive Unmatched. X16 PCIe expansion, 16GB of DDR4 RAM, NVME M.2 slot, Gigabit ethernet, and four cores at 1.4Ghz should make this a entry-level desktop-class system, including host-CPU class of building for native applications at full scale. For professional developers, the $665 entry ticket should be more appealing that the $999 for the board’s predecessor, Unleashed.

The PicoRio V1 and Unmatched have already slipped from Q4 into 2021.

Still, while we’re not bathing in fresh alternatives to the GD32V and K210, we have several alternatives on the proverbial launching pad and several options to bring excitement into lives and toolboxes of RISC-V aficionados.

What do you see coming up? What are you most anxious to work with?

 

The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

I haven’t talked much about it (this being my first post and all…) but I’m pretty smitten with the current wave of RISC-V processors. I really like that it’s an open development platform, which means chip companies can buy or create the core that works with the basic instruction set and the programmers have a consistent and shared set of tools for the entire line ranging from $.10 parts for a home thermostat to a workstation class part. Rising tides and all that.

As a software guy, there are chip vendors that I’ve never heard of, but Bouffalolab Lab seems to be a pretty new name to most of us in the U.S. They’re making waves with the announcement of the BL602 and BL604 Systems-on-Chip (SOC). This part is just becoming available in the final week of October 2020 and it’s looking pretty interesting.

There are two parts in the immediate family: the BL602 and the BL604. The two parts differ by the number of GPIO pins and thus, the size of the external package. BL602 has 16 GPIOs and comes in a 32-pin QFP. The BL604 bumps to 23 GPIOs and rides in a 40-pin QFP. The 32-pin part should come in around 5mm per edge, so it’ll fit in really small applications. The integration is high enough that on the 32-pin part, half the pins are devoted to GPIO with the rest being the mandatory 3.3V, ground, crystal in, grounding, and other housekeeping. Outside the party, but similar enough that the SDK references them sometimes by accident, in the BL606 and 608. On those you go to ARM cores and trade wireless for audio, but that’s not our focus today.

It has 2.4Ghz radios, so that covers BTLE 5.0 and 802.11 b/g/n. Wi-Fi through WPA3 is supported. The microprocessor core runs at 192Mhz and comes from the SiFive cores. JTAG support is included and it claims support for the Segger family of debugging pods and software. It has 276K of SRAM, which is a really nice step up from the closest competing part, the GD32V’s which pack a mere 32K. Both parts have 128K of flash. There are two UARTs, but no USB controller like the GD32V. It has the standard alphabet soup that we expect in a post ESP8266 including SDIO, SPI, I2C, and such.

Bouffalo is a Chinese company and the documentation reflects that. Google Translate helps bring the doc and some code comments to us, but the reality is that many, many sections in the manual are just blank for now. The 34 page datasheet is oriented toward hardware developers, but does throw the smallest of bones to programmers. For example, it doesn’t tell you WHAT the UARTs are (it’s not a 16554) but it tells you where they are in memory.

A few companies are racing to put boards in the hands of developers. Both SiPeed and Pine64 seem to be early movers with prices hovering around $5USD with shipping being about as much again. I’m sure we’ll see an ocean of reference designs on Aliexpress and such soon. Currently, we have three such boards:

  • Pine64’s PineCone64: USB-C (yay!) via a CH340N and with RGB LED.
  • Doi.am’s DT-BL10: Micro USB
  • SiPeed’s BL602: Micro USB. Rumored to have FTDI serial interface. Might be same as Doi.am’s

Development environments include Eclipse + OpenOCD or Freedom Studio + OpenOCD. The company provides an app called DevCube (unrelated to the VR tool of the same name) to configure devices for production. This helps IoT makers prepare the flash for partitioning and OTA upgrades. It seems inevitable that the popular PlatformIO IDE will be supported. Since it’s a familiar SiFive core, just submitting the needed flash layout and programming data is probably on the order of dozens of lines of code to support PlatformIO for BL602/BL606.

If we look past the large number of blank doc pages and the high percentage of Chinese in the SDK, what do we find? This is actually the jackpot for RISC-V designs – there are working Mac, Linux, and Windows (Cygwin) GNU-based toolchains right in the tree at launch. Filenames are a bizarre mix of CamelCase and underscore_separators, even sometimes in the same directory. It ships with GCC 8.3, but since RISC-V was upstreamed a while ago, I’d expect upgrading to later versions to be easy. It relies heavily on Amazon’s FreeRTOS as the core. The entire package relies on external source (FreeRTOS, GNU Tool binaries, compression libs, etc.) by copying and not via git-referencing the modules by inclusion. This means you’re probably always on a stable build, but it also almost insures that you’ll always be behind releases of third party code, which can be bad in network-connected devices.  Amazon’s MQTT and Jobs interfaces are well represented. The expected libraries for a high-functioning 32-bit core are all there, with a pretty full ANSI libc with printf, compression and a filesystem for the flash. Startup code is a pretty straight SiFive example for GCC. There are 31 separately buildable apps for demonstrating features of the device, including a CLI for interfacing with it somewhat like boot monitors of days past. “Hello, World” is there, of course, but ironically doesn’t contain the string “hello” as it gets printed in a callback from the initialization state machine as it advances.

The network modules are, unfortunately, binary blobs at this time. I expect the community to make short work of changing that.

Overall, this looks like a good entry for the bottom/low-end space for wirelessly connected (or not) devices for hobbyist and commercial applications. The SiFive core is well regarded for performance and the performance should be better than the GD32V’s 108Mhz while the price lower than the AI-centric dual-core 64-bit Kendryte K210.

Start your engines with:

Official Boufallo SDK
Pine64 is offering hardware to developers that help get BL602 started.
SiPeed (makers of Longnan and Maix lines) has a BL602 SDK.
Doit has a $4 eval board with a BL602 SDK.
All three of those are forks from the Boufallo code. We don’t know yet how different the hardware is.

RJL