I don’t have enough LilyGO products in my lab; I should probably have more. They seem to make clever products aimed at the developer/hobbyist market (that’s me!) but it seems that they get undercut by features on on one product, then are months late to market on the next, It seems they manage to remain on my radar while escaping a place on my bench. I learned of T-LilyGO, which is an improved version of what’s best known as a Speed Longnan Nano (GD32V) just a few weeks after I bought a bucket of Nanos.

LilyGO’s latest product, the T-PicoC3, manages to pull a unique development twist. The product marriage is “obvious”, though I haven’t seen it done. I’m writing about this because of one obscure feature. First, let’s explore roots of what makes it awesome.

The two chips that make T-Pico3 great

T-Pico3 is about $15USD shipped to the US. It manages to use not one, but TWO of the season’s most deservedly buzz-worthy MCUs. Both the ESP32-C3, a RISC-V part with really great pin density and development SDK support, and the Pico 2040 – as well as an onboard antenna, an external IPEX connector, a 7789 display controller w/ 1.14″ LCD, tons of GPIO, and way more in the familiar dev board size. Barrels of (virtual) ink have been written on whether the ESP32-C3’s SiFive-backed RISC-V core or the RP2040’s dual ARM Cortex M0’s are “better”. The answer, of course, is “it depends” and I’m not taking sides in this article. But what if you’re a student wanting to learn both ARM and RISC-V and you don’t want to choose between your peanut butter and chocolate, but just want one yummy(!) product that mixes them both?

T-PicoC3_en.jpg

So on one tiny PCB, they deliver the dual-core+specialized bit-banging capacity of RP-2040 (which has been used to bitbang DVI (!) and countless light chasers based on WS2812, which is a protocol with odd timing requirements) with the ESP32-C3 seemingly left to handle a full TCP-IP stack, Bluetooth, compression, security, and other such tasks while “additionally” being a quite capable 160Mhz RISC-V core of its own.

Putting “the two great tastes that taste great together” sounds like a good idea (let one chip specialize in networking, two RP2040 PIO bit cores blink out a CAN bus or 2812 (or other) LED blinkies) and hold it all down with some MicroPython on either (or both) of the Cortex M0 cores. Whether you’re a student that really just wants to backpack a single board for both ARM and RISC-V programming or you’re building a robotics or IoT thing, it’s just easy to imagine these going well together. You can make crazy combinations of dedicated interrupt controllers, GPIO controllers, interrupt domains, etc. MIx them up as you see fit.

…because I’m writing while hungry.

Debugging your creation

Normally, for ease of debugging, I nod to the ESP32-C3. Inside the part are dedicated cores that run a FTDI-like parallel controller that can be used for JTAG debugging AND a USB communications class, so your device’s serial console can appear on the same plug you’re powering the unit from. For someone that values mobility while debugging, it’s awesome. So those are the obvious connections for the USB lines.

But how do you debug the RP2040?

I’m normally a big fan of USB-C. Beyond the speed, the power – both in literal volts and amps and in the capacity of devices you can attach – I dig that they can be flipped end for end (no “host” and “target” end) and that they can be flipped from top to bottom, making them impossible to plug in upside down. The receptacles are symmetric. Compliant USB-C cables are only kind symmetrical.  It’s lesser known that the tops and the bottoms of the plugs are actually not fully asymmetrical.  The controllers actually engage in strategic lies that pull off the flippability trick. These same strategic lies are what allows the cables to smuggle additional kinds of data, such as Thunderbolt signals or GPIO and JTAG pins in the case of the breakout board for Pine64‘s Pinecil. Even with these mistruths, it’s common practice to uphold the guidelines for the cable to work either way.

Do you hate your users?

For a hybrid board like we’ve described above where we’re trying to attach two quite different SoC’s to the host, the “obvious” thing to do would be to add something like CH340 to give USB powers to the RP2040 console. Then you add a USB hub and tie both chips behind the hub, allowing all three devices to appear to the host. A more sophsticated design might tie the 2040 instead to a serial from the ESP32-C, but then you lose USB-master mode for the 2040. A circuit-layout Jenga master may have been able to find a USB-C pinout that let one device ride shotgun on the bus of another along the approach of the Pinecil’s exposed JTAG lines but I think that has compromise if you’re pretending to be a host instead of a target. Instead, we’re left to imagine this conversation happening within LilyGO’s engineering:

“What if we built an interface that worked completely differently if plugged upside down?”

“Why would you build a Cursed USB-C device? Do you hate your users?”

“What if we made it blink different colors, but unreliably?”

“OK.”

So our imagined engineers dutifully run off and after a little USB Selective Disobedience, successfully deliver power and ground – safely – in either orientation. (That’s the red and green in the below diagram and that’s intentionally made infallible.) With one mating between the cable and the board, the ESP32 has ownership of D+ and D- so the JTAG and serial ports associated with the RISC-V side of the house are presented to the host. You can use esptool to program and manage that device or program it via JTAG. In the other orientation, RP2040 gets the port and it’s either a USB mass storage device awaiting a .uf2 boot file to chomp on or a connection from the Thonny Python IDE.

Imagine the top (“blue led”/RP2040) being attached to the top D+/D= pair on the right and the bottom (“Red LED”/ESP32-C3) being attached to the bottom D-/D+ pair.

USB Type-C connector Pinout

Great. Now we’ve created a product that works exactly like a ‘normal’ user would never expect it to work, but, given the target audience, this is probably OK. Only it leads to hilarious disclaimers like this:

When connecting, the onboard LED lamp will be indicated according to the connected chip (due to cable problems, it is possible that the indicator light is opposite to the actual connected chip, or even two LED lights at the same time, please replace another cable when two led lights up at the same time)

 

The moral(s) of this story

Morel Mushroom In Leaves Close-up   No, no, no. Those are morels. They’re different.

  1. LilyGo makes some pretty cool stuff. Their products aren’t necessarily destined for “Raspberry Pi” levels of creativity and ubiquitousness, but they have some nifty and fun mixups that can save a budding EE (or a struggling SWE) from rolling their own designs. Many of their products are straight-forward mashups of existing low-cost circuits, but on one convenient board. Not everything needs to be complicated to be useful.
  2. Sometimes, there just are not points awarded for style. It’s easy to imagine a $15 product that may spend a semester or two inside a students backpack or a one-off that’s inside your IoT robot that needs both WiFi and finely controlled WS2812 “lasers” – or, heck, real lasers cutting into or measuring something. These may be programmed less than a dozen times before they’re retired (the first case) or sent into duty and unlikely to be connected to a PC ever again (the second case) As long as the users are in on the ‘joke’ that a cable that should never need to be flipped sometimes needs to be flipped, maybe that’s OK. Making that connection twice as reliable but requiring twice as many cables, a hub, defining the interaction of all these devices potentially controlling  the bus at the same time, etc. is money you’ll never get back.
  3. It’s absolutely not, however, OK to do this to an end-user, mass produced product unless you do, in fact, hate your users. (Hint: they will retaliate somehow….) Consumers want standard things to work in standard ways.
For a while, SiFive and LLVM were both developing support for RISC-V Vector 1.0. LLVM is now the only one in active development.

Jim Wilson, a GNU developer of decades, works for SiFive and while he wasn’t the one doing the work, it seemed likely he knew who did, so this is authoritative. He recently said:

There is no actively maintained gcc rvv support, and no ongoing gcc rvv development. Current work is all in LLVM, and LLVM is recommended if you want rvv support. … SiFive abandoned the gcc rvv work and is doing only llvm rvv work now. The gcc rvv branch is badly out of date.

Reading deeper into the GCC development list, this is really just a form of tough love as work on GCC’s RISC-V vector (and auto-vectorization in general) has been talked down before in July, 2021: 

 It isn’t up to date with the evolving RVV ISA spec, it isn’t up to date with the evolving RVV intrinsics spec, there are ugly hacks in the vectorizing optimization passes required to make it work, there is no autovectorization support, it is missing basic optimizations like eliminating duplicate vsetvli instructions, etc. The current status is that it is only useful as a toy for demos. SiFive and a few other organizations are contributing to the LLVM vector support, but no one is contributing to the gcc vector support. Alibaba has expressed some interest in contributing recently but it isn’t clear how we will handle their patches yet. The current stuff was mostly done by SiFive, but SiFive is not currently interested in funding this work.

I’m less sure of rjiejie‘s credentials, but they may work for Alibaba/T-Head, the maker of the cores used by Allwinner in D1 and D1s. That may be the “we” in his comment:

We have also supported/maintained the RVV v1.0 feature, you could download prebuilt gcc toolchain from Alibaba website[1].

Registration for the site is required and Google Translate doesn’t handle it well, so I’m not sure, but that may be a path for someone really needing GCC with Vector 1.0. It’s not clear if that work handles 0.7.1 of Vector as was used in D1. The C910-based products also seem to support 0.7.1, so their 1.0 support must be for future chips.

LLVM is one of several projects that has struggled with the issues of handling multiple V versions in the same code, but their resolution wasn’t clear. Simulator QEMU “solved” them problem when adding 1.0 by dropping support for Vector 0.7.1.

SiFive is “only” one of many core vendors and that they can’t be expected to carry the development/maintenance/support for such things by themselves, but it’s surprising (to me) that they’ve halted development.

T-Head has binaries (and maybe source) for GCC that supports V1.0 and probably 0.7.1, though that may be in a branch as it’s pretty clearly a dead end now that V1.0 has been ratified. LLVM and SiFive were, at least at one time, partnering in LLVM development. LLVM seems to have an active plan and are shipping V1.0 support now.

For GCC’s status to be “useful as a toy for demos”, it’s probably a disservice to even have it in the default builds of GCC until someone is willing to fund the couple of person-years that Jim mentions to get it on track. At least bugreports to LLVM are likely to get traction as it’s actively developed.

For now, if you’re developing vector code on RISC-V, prepare to pair your toolchain with the chip/simulator you’re using. It’s likely to be finicky for a while.

We’re all familiar with the fable of the boiling frogs, unable to sense the change they’re (literally!) immersed in. Enthusiasts of RISC-V architecture may be encountering the same right now: late 2020 gave us a steady stream of new hardware announcements, but we may not have a great sense of us since the hardware isn’t always possible to order yet. Let’s review some of the upcoming products in this market, duly nothing that products can change or get canceled before they even ship.

We had two major new families of entries in the iOT category. Both use the RISC-V to drive WiFi and Bluetooth radio stacks. Bouffalo Lab’s BL602 is available in quantity now. Starting around $2.50 for a module with multiple development boards in the $5-$10 range (including Pine64’s Nutcracker for PineCone and the DoIt DT-BL10), this chip starts with a core from SiFive and has 802.11 b/g/n and Bluetooth 5. The upcoming BL-702 family adds Zigbee radios. There is enough compute resources (CPU, RAM, Timers, etc.) that you can build your own software right onto the radio chip via their multitasking OS and open development kits. You may recognize this as the basic model popularized by Espressif in their ESP8266 in recent years.

Espressif also embraced RISC-V with their upcoming ESP32-C3 family. It’s interesting that this chip doesn’t even get a distinct name at this point as Espressif apparently sees the CPU core as only a small part of the product. Still, by volume, the ESP32-C3 is likely to become an extremely popular choice.

Moving up a step computationally, we enter more traditional chips and single-board computers. Alibaba’s Xuantie 910 is widening into a family of chips. The C906 is being marketed for more entry level class, but still featuring a load of I/O, multiple cores, support for the still-not-ratified Vector extensions, and more. Press releases tend to mix up the 910 and the 906, but they both seem pretty hot.  In late January, anAndroid Open Source Port of C910 was demonstrated. Embedded specialists Sipeed have announced a C906 development board that’ll run Debian and that starts at $12.50. If Sipeed does for that what they’ve done for GD32V and K210, we should see lots of interesting SBC projects from them.

Sipeed teases C906 RISC-V board

Rios is bringing us a claimed competitor to the Raspberry Pi called the PicoRio. It’s coming inthree stages:

  • PicoRio 1.0 is a headless, four-core RV64GC that’s capable of running Linux at 500Mhz. It’s been used from 2020H2 to an expectation of beta in 2021H1.
  • PicoRio 2.0 adds Imagination’s PowerVR GE7800 XE series GPU, which may finally bring a GPU-capable RISC-V development board into casual hobbyist price points.
  • PicoRio 3.0 strives to bring the performance to be comparable to a tablet or desktop computer.

Another entry in the Pi-class of hardware, though not at Pi Price, is the Beagle V from the group that brought us the famed Beagle Bone. It uses two of SiFive’s U74 cores at 1Ghz includes 8GiB of LPDDR4 RAM, gigabit Ethernet, an 802.11n Wi-Fi + Bluetooth 4.2 chipset, and a dedicated hardware video transcoder supporting H.264 and H.265 at 4K and 60fps.The system also offers four USB 3.0 ports, a full-size HDMI out, 3.5mm conventional audio jack, and a 40-pin GPIO header. As a snack for those interested in AI applications, it also features  a Tensilica Vision VP6 DSP for machine-vision applications, a Neural Network Engine, and a single-core NVDLA (Nvidia Deep Learning Accelerator).

Core provider SiFive is bolting Freedom U740 cores to a min-ITX design in HiFive Unmatched. X16 PCIe expansion, 16GB of DDR4 RAM, NVME M.2 slot, Gigabit ethernet, and four cores at 1.4Ghz should make this a entry-level desktop-class system, including host-CPU class of building for native applications at full scale. For professional developers, the $665 entry ticket should be more appealing that the $999 for the board’s predecessor, Unleashed.

The PicoRio V1 and Unmatched have already slipped from Q4 into 2021.

Still, while we’re not bathing in fresh alternatives to the GD32V and K210, we have several alternatives on the proverbial launching pad and several options to bring excitement into lives and toolboxes of RISC-V aficionados.

What do you see coming up? What are you most anxious to work with?

 

The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

I haven’t talked much about it (this being my first post and all…) but I’m pretty smitten with the current wave of RISC-V processors. I really like that it’s an open development platform, which means chip companies can buy or create the core that works with the basic instruction set and the programmers have a consistent and shared set of tools for the entire line ranging from $.10 parts for a home thermostat to a workstation class part. Rising tides and all that.

As a software guy, there are chip vendors that I’ve never heard of, but Bouffalolab Lab seems to be a pretty new name to most of us in the U.S. They’re making waves with the announcement of the BL602 and BL604 Systems-on-Chip (SOC). This part is just becoming available in the final week of October 2020 and it’s looking pretty interesting.

There are two parts in the immediate family: the BL602 and the BL604. The two parts differ by the number of GPIO pins and thus, the size of the external package. BL602 has 16 GPIOs and comes in a 32-pin QFP. The BL604 bumps to 23 GPIOs and rides in a 40-pin QFP. The 32-pin part should come in around 5mm per edge, so it’ll fit in really small applications. The integration is high enough that on the 32-pin part, half the pins are devoted to GPIO with the rest being the mandatory 3.3V, ground, crystal in, grounding, and other housekeeping. Outside the party, but similar enough that the SDK references them sometimes by accident, in the BL606 and 608. On those you go to ARM cores and trade wireless for audio, but that’s not our focus today.

It has 2.4Ghz radios, so that covers BTLE 5.0 and 802.11 b/g/n. Wi-Fi through WPA3 is supported. The microprocessor core runs at 192Mhz and comes from the SiFive cores. JTAG support is included and it claims support for the Segger family of debugging pods and software. It has 276K of SRAM, which is a really nice step up from the closest competing part, the GD32V’s which pack a mere 32K. Both parts have 128K of flash. There are two UARTs, but no USB controller like the GD32V. It has the standard alphabet soup that we expect in a post ESP8266 including SDIO, SPI, I2C, and such.

Bouffalo is a Chinese company and the documentation reflects that. Google Translate helps bring the doc and some code comments to us, but the reality is that many, many sections in the manual are just blank for now. The 34 page datasheet is oriented toward hardware developers, but does throw the smallest of bones to programmers. For example, it doesn’t tell you WHAT the UARTs are (it’s not a 16554) but it tells you where they are in memory.

A few companies are racing to put boards in the hands of developers. Both SiPeed and Pine64 seem to be early movers with prices hovering around $5USD with shipping being about as much again. I’m sure we’ll see an ocean of reference designs on Aliexpress and such soon. Currently, we have three such boards:

  • Pine64’s PineCone64: USB-C (yay!) via a CH340N and with RGB LED.
  • Doi.am’s DT-BL10: Micro USB
  • SiPeed’s BL602: Micro USB. Rumored to have FTDI serial interface. Might be same as Doi.am’s

Development environments include Eclipse + OpenOCD or Freedom Studio + OpenOCD. The company provides an app called DevCube (unrelated to the VR tool of the same name) to configure devices for production. This helps IoT makers prepare the flash for partitioning and OTA upgrades. It seems inevitable that the popular PlatformIO IDE will be supported. Since it’s a familiar SiFive core, just submitting the needed flash layout and programming data is probably on the order of dozens of lines of code to support PlatformIO for BL602/BL606.

If we look past the large number of blank doc pages and the high percentage of Chinese in the SDK, what do we find? This is actually the jackpot for RISC-V designs – there are working Mac, Linux, and Windows (Cygwin) GNU-based toolchains right in the tree at launch. Filenames are a bizarre mix of CamelCase and underscore_separators, even sometimes in the same directory. It ships with GCC 8.3, but since RISC-V was upstreamed a while ago, I’d expect upgrading to later versions to be easy. It relies heavily on Amazon’s FreeRTOS as the core. The entire package relies on external source (FreeRTOS, GNU Tool binaries, compression libs, etc.) by copying and not via git-referencing the modules by inclusion. This means you’re probably always on a stable build, but it also almost insures that you’ll always be behind releases of third party code, which can be bad in network-connected devices.  Amazon’s MQTT and Jobs interfaces are well represented. The expected libraries for a high-functioning 32-bit core are all there, with a pretty full ANSI libc with printf, compression and a filesystem for the flash. Startup code is a pretty straight SiFive example for GCC. There are 31 separately buildable apps for demonstrating features of the device, including a CLI for interfacing with it somewhat like boot monitors of days past. “Hello, World” is there, of course, but ironically doesn’t contain the string “hello” as it gets printed in a callback from the initialization state machine as it advances.

The network modules are, unfortunately, binary blobs at this time. I expect the community to make short work of changing that.

Overall, this looks like a good entry for the bottom/low-end space for wirelessly connected (or not) devices for hobbyist and commercial applications. The SiFive core is well regarded for performance and the performance should be better than the GD32V’s 108Mhz while the price lower than the AI-centric dual-core 64-bit Kendryte K210.

Start your engines with:

Official Boufallo SDK
Pine64 is offering hardware to developers that help get BL602 started.
SiPeed (makers of Longnan and Maix lines) has a BL602 SDK.
Doit has a $4 eval board with a BL602 SDK.
All three of those are forks from the Boufallo code. We don’t know yet how different the hardware is.

RJL