For a while, SiFive and LLVM were both developing support for RISC-V Vector 1.0. LLVM is now the only one in active development.

Jim Wilson, a GNU developer of decades, works for SiFive and while he wasn’t the one doing the work, it seemed likely he knew who did, so this is authoritative. He recently said:

There is no actively maintained gcc rvv support, and no ongoing gcc rvv development. Current work is all in LLVM, and LLVM is recommended if you want rvv support. … SiFive abandoned the gcc rvv work and is doing only llvm rvv work now. The gcc rvv branch is badly out of date.

Reading deeper into the GCC development list, this is really just a form of tough love as work on GCC’s RISC-V vector (and auto-vectorization in general) has been talked down before in July, 2021: 

 It isn’t up to date with the evolving RVV ISA spec, it isn’t up to date with the evolving RVV intrinsics spec, there are ugly hacks in the vectorizing optimization passes required to make it work, there is no autovectorization support, it is missing basic optimizations like eliminating duplicate vsetvli instructions, etc. The current status is that it is only useful as a toy for demos. SiFive and a few other organizations are contributing to the LLVM vector support, but no one is contributing to the gcc vector support. Alibaba has expressed some interest in contributing recently but it isn’t clear how we will handle their patches yet. The current stuff was mostly done by SiFive, but SiFive is not currently interested in funding this work.

I’m less sure of rjiejie‘s credentials, but they may work for Alibaba/T-Head, the maker of the cores used by Allwinner in D1 and D1s. That may be the “we” in his comment:

We have also supported/maintained the RVV v1.0 feature, you could download prebuilt gcc toolchain from Alibaba website[1].

Registration for the site is required and Google Translate doesn’t handle it well, so I’m not sure, but that may be a path for someone really needing GCC with Vector 1.0. It’s not clear if that work handles 0.7.1 of Vector as was used in D1. The C910-based products also seem to support 0.7.1, so their 1.0 support must be for future chips.

LLVM is one of several projects that has struggled with the issues of handling multiple V versions in the same code, but their resolution wasn’t clear. Simulator QEMU “solved” them problem when adding 1.0 by dropping support for Vector 0.7.1.

SiFive is “only” one of many core vendors and that they can’t be expected to carry the development/maintenance/support for such things by themselves, but it’s surprising (to me) that they’ve halted development.

T-Head has binaries (and maybe source) for GCC that supports V1.0 and probably 0.7.1, though that may be in a branch as it’s pretty clearly a dead end now that V1.0 has been ratified. LLVM and SiFive were, at least at one time, partnering in LLVM development. LLVM seems to have an active plan and are shipping V1.0 support now.

For GCC’s status to be “useful as a toy for demos”, it’s probably a disservice to even have it in the default builds of GCC until someone is willing to fund the couple of person-years that Jim mentions to get it on track. At least bugreports to LLVM are likely to get traction as it’s actively developed.

For now, if you’re developing vector code on RISC-V, prepare to pair your toolchain with the chip/simulator you’re using. It’s likely to be finicky for a while.

I’ve been away from writing for a bit for personal reasons and I’ve missed talking much about many events in the RISC-V world this year. Here’s a jumble of thoughts from October of 2021.

Low Points: BeagleV Starlight canceled, Nezha/D1 launch issues

I was lucky to have tinkered with the prerelease BeagleV board (codenamed Starlight) that featured the StarFive
JH-7100 SoC. It was well documented, had an amazing technical group of active participants from both corporate and hobbyist backgrounds, all working together on merit, and good tooling. Antmicro’s ‘Renode‘ emulator made developing on these parts a breeze.

Unfortunately for the business, BeagleV/Starlight and StarFive were unable to reach a production agreement and BeagleV Starlight project was cancelled. I did some software and hardware work that went into the proverbial chipper, but I managed to learn and refine some skills along the way. I remain hopeful that the low-volume (Two core) JH-7100 and later (Quad core, embedded GPU, PCIe) JH-7110 will be delivered at a similar price point by the likes of Antmicro or Radxa, which has already missed their ship date. It’s all resulted in some thrash, but it’s possible that all the players (Beagle, Antmicro, Radxa, Starfive) dust off and ship RISC-V boards.

I have possession of a Nezha developer board. This is the official development board made by Allwinner as a vehicle for their D1 chip. By contrast to StarFive, documentation on this device and board is poor.  The maker of the chip and the board, Allwinner,
having a pretty poor record playing nicely with open source developers with license violations being common. When it was at a price point similar to BeagleV, it seemed an underdog as a single core device, but it did have the claim to fame of being the first shipping device of supporting the RISC-V Vector extension. I’ve been a member of a few different discord/slack/telegram groups for this device and they’ve all been dominated by people stuck at the starting line: just finding a maintained distro that doesn’t require a login in Chinese and a phone number in China is a common challenge.

Unfortunately for many developers, D1 supported only 0.7.1 of Vector, which has source and binary incompatibilities with the final 1.0 version of that extension which is currently (October 2021) in final stages of public review. This part also really requires Allwinner’s own use of GCC/Binutils to use these extensions well. Interestingly, the RISC-V part of this SoC comes from Alibaba’s XuanTie C906 line, which was itself recently open-sourced, though there have been serious issues trying to land Alibaba’s incompatible work in upstream projects like GCC and QEMU.

I’d love to be able to comment more on the actual development board, but can’t as it appears my board is apparently totally DOA. I hope to be able to write more about it soon.

This board gets the (somewhat deserved) criticism of being overpriced when compared to high-volume devices like Pi and the (awkward) criticism of  having a single 1.0Ghz core and relying on an old version of the Vector specification. As the final version still doesn’t exist and fab times just plain take a while to get from Verilog  to real silicon, we can be only so mad at the first chip to support even a pre-release V spec. We can be more upset that the chip requires violating the RISC-V specification on reserved bits in the paging machinery. All this does lead to an up-looking highlight to finish up this catch-up.

On the Horizon: Allwinner D1s/F133

This week, there’s been interest in a new revision of the D1. The Allwinner D1s (sometimes called the “F133” for reasons I haven’t yet grasped) is a cost-optimized version of the original D1. Where the D1 really seemed to ship only with their own development board, Nezha, the D1s seems to come out of the gate ready for the likes of SeedStudio and Mango Pi’s ~$10USD RISC-V board  or in low quantity to put on your own open-source boards, like Xassette

It’s a slightly confusing product, but some of that may just be translation/documentation issues.  It’s cost-reduced, and that filters through to the boards we’ve seen so far. There’s 64MB of RAM on board, but it’s sold as “Linux ready”.  The removal of HDMI signaling means no monitor and 64MB will require a very stripped down system. Cramming Linux into the 8MB on a K210 was (barely) possible, so this must be possible, even if cramped.  Still, for a single-purpose or educational environment, that’s probably OK. The Allwinner F133 overview avoids any comparison to D1, refers to itself as “video decoding platform”, and even avoids use of the phrase “RISC-V” completely.

It’s interesting that one of the most controversial RISC-V chips of 2021 managed to ship a second revision this year while we have so many that have just seemingly collapsed under their own weight or never found their legs beyond original announcements. (Blink twice if you’re alive, PicoRio!) 

As we approach the end of the year, we’ve had quite some changes in the RISC-V ecosystem. It’s likely that the product families that have most met or exceeded my expectations are the BL602/706 family and Espressif’s menagerie of ESP32-C3 and ESP32-C6.

What have been your biggest disappointments or surprises in RISC-Ville? 


We’re all familiar with the fable of the boiling frogs, unable to sense the change they’re (literally!) immersed in. Enthusiasts of RISC-V architecture may be encountering the same right now: late 2020 gave us a steady stream of new hardware announcements, but we may not have a great sense of us since the hardware isn’t always possible to order yet. Let’s review some of the upcoming products in this market, duly nothing that products can change or get canceled before they even ship.

We had two major new families of entries in the iOT category. Both use the RISC-V to drive WiFi and Bluetooth radio stacks. Bouffalo Lab’s BL602 is available in quantity now. Starting around $2.50 for a module with multiple development boards in the $5-$10 range (including Pine64’s Nutcracker for PineCone and the DoIt DT-BL10), this chip starts with a core from SiFive and has 802.11 b/g/n and Bluetooth 5. The upcoming BL-702 family adds Zigbee radios. There is enough compute resources (CPU, RAM, Timers, etc.) that you can build your own software right onto the radio chip via their multitasking OS and open development kits. You may recognize this as the basic model popularized by Espressif in their ESP8266 in recent years.

Espressif also embraced RISC-V with their upcoming ESP32-C3 family. It’s interesting that this chip doesn’t even get a distinct name at this point as Espressif apparently sees the CPU core as only a small part of the product. Still, by volume, the ESP32-C3 is likely to become an extremely popular choice.

Moving up a step computationally, we enter more traditional chips and single-board computers. Alibaba’s Xuantie 910 is widening into a family of chips. The C906 is being marketed for more entry level class, but still featuring a load of I/O, multiple cores, support for the still-not-ratified Vector extensions, and more. Press releases tend to mix up the 910 and the 906, but they both seem pretty hot.  In late January, anAndroid Open Source Port of C910 was demonstrated. Embedded specialists Sipeed have announced a C906 development board that’ll run Debian and that starts at $12.50. If Sipeed does for that what they’ve done for GD32V and K210, we should see lots of interesting SBC projects from them.

Sipeed teases C906 RISC-V board

Rios is bringing us a claimed competitor to the Raspberry Pi called the PicoRio. It’s coming inthree stages:

  • PicoRio 1.0 is a headless, four-core RV64GC that’s capable of running Linux at 500Mhz. It’s been used from 2020H2 to an expectation of beta in 2021H1.
  • PicoRio 2.0 adds Imagination’s PowerVR GE7800 XE series GPU, which may finally bring a GPU-capable RISC-V development board into casual hobbyist price points.
  • PicoRio 3.0 strives to bring the performance to be comparable to a tablet or desktop computer.

Another entry in the Pi-class of hardware, though not at Pi Price, is the Beagle V from the group that brought us the famed Beagle Bone. It uses two of SiFive’s U74 cores at 1Ghz includes 8GiB of LPDDR4 RAM, gigabit Ethernet, an 802.11n Wi-Fi + Bluetooth 4.2 chipset, and a dedicated hardware video transcoder supporting H.264 and H.265 at 4K and 60fps.The system also offers four USB 3.0 ports, a full-size HDMI out, 3.5mm conventional audio jack, and a 40-pin GPIO header. As a snack for those interested in AI applications, it also features  a Tensilica Vision VP6 DSP for machine-vision applications, a Neural Network Engine, and a single-core NVDLA (Nvidia Deep Learning Accelerator).

Core provider SiFive is bolting Freedom U740 cores to a min-ITX design in HiFive Unmatched. X16 PCIe expansion, 16GB of DDR4 RAM, NVME M.2 slot, Gigabit ethernet, and four cores at 1.4Ghz should make this a entry-level desktop-class system, including host-CPU class of building for native applications at full scale. For professional developers, the $665 entry ticket should be more appealing that the $999 for the board’s predecessor, Unleashed.

The PicoRio V1 and Unmatched have already slipped from Q4 into 2021.

Still, while we’re not bathing in fresh alternatives to the GD32V and K210, we have several alternatives on the proverbial launching pad and several options to bring excitement into lives and toolboxes of RISC-V aficionados.

What do you see coming up? What are you most anxious to work with?


It is not an exaggeration that the current wave of IoT devices owes a lot to the Espressif ESP8266  family of devices. That means a new member of this family is a big deal and it’s pretty exciting that the newest, the ESP32-C3, moves to a RISC-V core. We have a draft of the ESP32-C3 data sheetfor those ready to dig in.

ESP8266, Quick History

In 2014, The ESP8266 came to the scene, bundling a full WiFi package, including antenna, ROM, RAM, and a CPU into a package that integrated with Hayes modem-like command set for communicating with a host that could be as simple as an Arduino or less. Eventually, enough was learned about the core, a Tensilica  Xtensa Diamond Standard 106Micro running at 80 MHz, that hackers were able to run their own code on board and often eliminate the “host” processor completely, often for under $10 at that time and in decline since.

ESP32 was the 2016 successor, bringing in Bluetooth and more powerful integrated CPU. Available as a chip or a (FCC-tested) module that included antennas, the most common configuration was dual-core, allowing a less cramped balance of a developer’s own code with the integrated feeding of the radio stack. The Xtensa LX6 cpu core was still not widely loved by programmers with toolchain issues remaining common.

Esp32-C3: Now with more RISC-V

Early in November 2020, we first got hints of a RISC-V design, the Bouffalo Labs BL602 family, making an attack on that market of low pin count, high integration devices striking a blow at the ESP32 price point of about $5. Late in November, we now have confirmation that (awkwardly named) ESP32-C3 is being released by Espressif as the newest member of their family, though details are only slowly coming out of China, as they do.

ESP32-C3 will be pin-compatible with the large ESP8266 family. It includes a 160Mhz 32-bit RISC-V core toreplace the Tensilica CPU. As you’d expect in 2020, b/g/n WiFi and Bluetooth Low-Energy (BLE) are table stakes. ESP32-C3 brings 400 kB of SRAM and 384 kB ROM.  

We don’t yet know what RISC-V core they are using (SiFive, Nuclei, etc.) or if they’ve created their own.  As this is likely to be a relatively humble RV32IMAC (or less!) design, we’d expect high degrees of compatibility with the wide variety of RISC-V tools that we already have. We don’t know if the trend of binary blobs (a problem being tackled by Pine64) will remain, but it’s likely they will given the regulatory landmine around radios.

With access to the wealth of dev tools, socket compatibility with ESP8266, and Espressif’s embrace of the maker communities, this device is sure to be a hit. Unfortunately, it’s a little too early for a stocking stuffer this year, but it’s one of a series of parts that’ll make RISC-V fun to follow in 2021.

The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

I haven’t talked much about it (this being my first post and all…) but I’m pretty smitten with the current wave of RISC-V processors. I really like that it’s an open development platform, which means chip companies can buy or create the core that works with the basic instruction set and the programmers have a consistent and shared set of tools for the entire line ranging from $.10 parts for a home thermostat to a workstation class part. Rising tides and all that.

As a software guy, there are chip vendors that I’ve never heard of, but Bouffalolab Lab seems to be a pretty new name to most of us in the U.S. They’re making waves with the announcement of the BL602 and BL604 Systems-on-Chip (SOC). This part is just becoming available in the final week of October 2020 and it’s looking pretty interesting.

There are two parts in the immediate family: the BL602 and the BL604. The two parts differ by the number of GPIO pins and thus, the size of the external package. BL602 has 16 GPIOs and comes in a 32-pin QFP. The BL604 bumps to 23 GPIOs and rides in a 40-pin QFP. The 32-pin part should come in around 5mm per edge, so it’ll fit in really small applications. The integration is high enough that on the 32-pin part, half the pins are devoted to GPIO with the rest being the mandatory 3.3V, ground, crystal in, grounding, and other housekeeping. Outside the party, but similar enough that the SDK references them sometimes by accident, in the BL606 and 608. On those you go to ARM cores and trade wireless for audio, but that’s not our focus today.

It has 2.4Ghz radios, so that covers BTLE 5.0 and 802.11 b/g/n. Wi-Fi through WPA3 is supported. The microprocessor core runs at 192Mhz and comes from the SiFive cores. JTAG support is included and it claims support for the Segger family of debugging pods and software. It has 276K of SRAM, which is a really nice step up from the closest competing part, the GD32V’s which pack a mere 32K. Both parts have 128K of flash. There are two UARTs, but no USB controller like the GD32V. It has the standard alphabet soup that we expect in a post ESP8266 including SDIO, SPI, I2C, and such.

Bouffalo is a Chinese company and the documentation reflects that. Google Translate helps bring the doc and some code comments to us, but the reality is that many, many sections in the manual are just blank for now. The 34 page datasheet is oriented toward hardware developers, but does throw the smallest of bones to programmers. For example, it doesn’t tell you WHAT the UARTs are (it’s not a 16554) but it tells you where they are in memory.

A few companies are racing to put boards in the hands of developers. Both SiPeed and Pine64 seem to be early movers with prices hovering around $5USD with shipping being about as much again. I’m sure we’ll see an ocean of reference designs on Aliexpress and such soon. Currently, we have three such boards:

  • Pine64’s PineCone64: USB-C (yay!) via a CH340N and with RGB LED.
  •’s DT-BL10: Micro USB
  • SiPeed’s BL602: Micro USB. Rumored to have FTDI serial interface. Might be same as’s

Development environments include Eclipse + OpenOCD or Freedom Studio + OpenOCD. The company provides an app called DevCube (unrelated to the VR tool of the same name) to configure devices for production. This helps IoT makers prepare the flash for partitioning and OTA upgrades. It seems inevitable that the popular PlatformIO IDE will be supported. Since it’s a familiar SiFive core, just submitting the needed flash layout and programming data is probably on the order of dozens of lines of code to support PlatformIO for BL602/BL606.

If we look past the large number of blank doc pages and the high percentage of Chinese in the SDK, what do we find? This is actually the jackpot for RISC-V designs – there are working Mac, Linux, and Windows (Cygwin) GNU-based toolchains right in the tree at launch. Filenames are a bizarre mix of CamelCase and underscore_separators, even sometimes in the same directory. It ships with GCC 8.3, but since RISC-V was upstreamed a while ago, I’d expect upgrading to later versions to be easy. It relies heavily on Amazon’s FreeRTOS as the core. The entire package relies on external source (FreeRTOS, GNU Tool binaries, compression libs, etc.) by copying and not via git-referencing the modules by inclusion. This means you’re probably always on a stable build, but it also almost insures that you’ll always be behind releases of third party code, which can be bad in network-connected devices.  Amazon’s MQTT and Jobs interfaces are well represented. The expected libraries for a high-functioning 32-bit core are all there, with a pretty full ANSI libc with printf, compression and a filesystem for the flash. Startup code is a pretty straight SiFive example for GCC. There are 31 separately buildable apps for demonstrating features of the device, including a CLI for interfacing with it somewhat like boot monitors of days past. “Hello, World” is there, of course, but ironically doesn’t contain the string “hello” as it gets printed in a callback from the initialization state machine as it advances.

The network modules are, unfortunately, binary blobs at this time. I expect the community to make short work of changing that.

Overall, this looks like a good entry for the bottom/low-end space for wirelessly connected (or not) devices for hobbyist and commercial applications. The SiFive core is well regarded for performance and the performance should be better than the GD32V’s 108Mhz while the price lower than the AI-centric dual-core 64-bit Kendryte K210.

Start your engines with:

Official Boufallo SDK
Pine64 is offering hardware to developers that help get BL602 started.
SiPeed (makers of Longnan and Maix lines) has a BL602 SDK.
Doit has a $4 eval board with a BL602 SDK.
All three of those are forks from the Boufallo code. We don’t know yet how different the hardware is.