I don’t have enough LilyGO products in my lab; I should probably have more. They seem to make clever products aimed at the developer/hobbyist market (that’s me!) but it seems that they get undercut by features on on one product, then are months late to market on the next, It seems they manage to remain on my radar while escaping a place on my bench. I learned of T-LilyGO, which is an improved version of what’s best known as a Speed Longnan Nano (GD32V) just a few weeks after I bought a bucket of Nanos.

LilyGO’s latest product, the T-PicoC3, manages to pull a unique development twist. The product marriage is “obvious”, though I haven’t seen it done. I’m writing about this because of one obscure feature. First, let’s explore roots of what makes it awesome.

The two chips that make T-Pico3 great

T-Pico3 is about $15USD shipped to the US. It manages to use not one, but TWO of the season’s most deservedly buzz-worthy MCUs. Both the ESP32-C3, a RISC-V part with really great pin density and development SDK support, and the Pico 2040 – as well as an onboard antenna, an external IPEX connector, a 7789 display controller w/ 1.14″ LCD, tons of GPIO, and way more in the familiar dev board size. Barrels of (virtual) ink have been written on whether the ESP32-C3’s SiFive-backed RISC-V core or the RP2040’s dual ARM Cortex M0’s are “better”. The answer, of course, is “it depends” and I’m not taking sides in this article. But what if you’re a student wanting to learn both ARM and RISC-V and you don’t want to choose between your peanut butter and chocolate, but just want one yummy(!) product that mixes them both?

T-PicoC3_en.jpg

So on one tiny PCB, they deliver the dual-core+specialized bit-banging capacity of RP-2040 (which has been used to bitbang DVI (!) and countless light chasers based on WS2812, which is a protocol with odd timing requirements) with the ESP32-C3 seemingly left to handle a full TCP-IP stack, Bluetooth, compression, security, and other such tasks while “additionally” being a quite capable 160Mhz RISC-V core of its own.

Putting “the two great tastes that taste great together” sounds like a good idea (let one chip specialize in networking, two RP2040 PIO bit cores blink out a CAN bus or 2812 (or other) LED blinkies) and hold it all down with some MicroPython on either (or both) of the Cortex M0 cores. Whether you’re a student that really just wants to backpack a single board for both ARM and RISC-V programming or you’re building a robotics or IoT thing, it’s just easy to imagine these going well together. You can make crazy combinations of dedicated interrupt controllers, GPIO controllers, interrupt domains, etc. MIx them up as you see fit.

…because I’m writing while hungry.

Debugging your creation

Normally, for ease of debugging, I nod to the ESP32-C3. Inside the part are dedicated cores that run a FTDI-like parallel controller that can be used for JTAG debugging AND a USB communications class, so your device’s serial console can appear on the same plug you’re powering the unit from. For someone that values mobility while debugging, it’s awesome. So those are the obvious connections for the USB lines.

But how do you debug the RP2040?

I’m normally a big fan of USB-C. Beyond the speed, the power – both in literal volts and amps and in the capacity of devices you can attach – I dig that they can be flipped end for end (no “host” and “target” end) and that they can be flipped from top to bottom, making them impossible to plug in upside down. The receptacles are symmetric. Compliant USB-C cables are only kind symmetrical.  It’s lesser known that the tops and the bottoms of the plugs are actually not fully asymmetrical.  The controllers actually engage in strategic lies that pull off the flippability trick. These same strategic lies are what allows the cables to smuggle additional kinds of data, such as Thunderbolt signals or GPIO and JTAG pins in the case of the breakout board for Pine64‘s Pinecil. Even with these mistruths, it’s common practice to uphold the guidelines for the cable to work either way.

Do you hate your users?

For a hybrid board like we’ve described above where we’re trying to attach two quite different SoC’s to the host, the “obvious” thing to do would be to add something like CH340 to give USB powers to the RP2040 console. Then you add a USB hub and tie both chips behind the hub, allowing all three devices to appear to the host. A more sophsticated design might tie the 2040 instead to a serial from the ESP32-C, but then you lose USB-master mode for the 2040. A circuit-layout Jenga master may have been able to find a USB-C pinout that let one device ride shotgun on the bus of another along the approach of the Pinecil’s exposed JTAG lines but I think that has compromise if you’re pretending to be a host instead of a target. Instead, we’re left to imagine this conversation happening within LilyGO’s engineering:

“What if we built an interface that worked completely differently if plugged upside down?”

“Why would you build a Cursed USB-C device? Do you hate your users?”

“What if we made it blink different colors, but unreliably?”

“OK.”

So our imagined engineers dutifully run off and after a little USB Selective Disobedience, successfully deliver power and ground – safely – in either orientation. (That’s the red and green in the below diagram and that’s intentionally made infallible.) With one mating between the cable and the board, the ESP32 has ownership of D+ and D- so the JTAG and serial ports associated with the RISC-V side of the house are presented to the host. You can use esptool to program and manage that device or program it via JTAG. In the other orientation, RP2040 gets the port and it’s either a USB mass storage device awaiting a .uf2 boot file to chomp on or a connection from the Thonny Python IDE.

Imagine the top (“blue led”/RP2040) being attached to the top D+/D= pair on the right and the bottom (“Red LED”/ESP32-C3) being attached to the bottom D-/D+ pair.

USB Type-C connector Pinout

Great. Now we’ve created a product that works exactly like a ‘normal’ user would never expect it to work, but, given the target audience, this is probably OK. Only it leads to hilarious disclaimers like this:

When connecting, the onboard LED lamp will be indicated according to the connected chip (due to cable problems, it is possible that the indicator light is opposite to the actual connected chip, or even two LED lights at the same time, please replace another cable when two led lights up at the same time)

 

The moral(s) of this story

Morel Mushroom In Leaves Close-up   No, no, no. Those are morels. They’re different.

  1. LilyGo makes some pretty cool stuff. Their products aren’t necessarily destined for “Raspberry Pi” levels of creativity and ubiquitousness, but they have some nifty and fun mixups that can save a budding EE (or a struggling SWE) from rolling their own designs. Many of their products are straight-forward mashups of existing low-cost circuits, but on one convenient board. Not everything needs to be complicated to be useful.
  2. Sometimes, there just are not points awarded for style. It’s easy to imagine a $15 product that may spend a semester or two inside a students backpack or a one-off that’s inside your IoT robot that needs both WiFi and finely controlled WS2812 “lasers” – or, heck, real lasers cutting into or measuring something. These may be programmed less than a dozen times before they’re retired (the first case) or sent into duty and unlikely to be connected to a PC ever again (the second case) As long as the users are in on the ‘joke’ that a cable that should never need to be flipped sometimes needs to be flipped, maybe that’s OK. Making that connection twice as reliable but requiring twice as many cables, a hub, defining the interaction of all these devices potentially controlling  the bus at the same time, etc. is money you’ll never get back.
  3. It’s absolutely not, however, OK to do this to an end-user, mass produced product unless you do, in fact, hate your users. (Hint: they will retaliate somehow….) Consumers want standard things to work in standard ways.

History

The Kendryte K210 seems to have been one of the early success stories for RISC-V, if not in mainstream computing, certainly in maker mindshare. The 64-bit device had two cores, enough support peripherals to be useful for your robotics project, enough AI to recognize faces or do image detection and following for your self-driving robot project, and ran a chopped-down Linux if you really needed it, though this was all pretty precarious in 8MB of core. Obviously, the successor device should address these and bring up some 2020 level specs from the 2018-ish design we saw with K210. That device even had a name leaked or rumored: “Kendryte K510.”

I can find rumors and predictions for K510 as far back as 2019. Canaan (another name for Kendryte, as best I can tell) themselves talked about K510 in December of 2019:

Zhang said that the new generation of K510 chip has been greatly optimized in algorithm and architecture. Compared with the first-generation chip, the K510’s computing power will increase by 5-10 times, and it will be developed for 5G scenarios.

Finally, almost nine months ago, K510 was formally announced, but no reference designs or availability was given, so it stayed in my “hype” folder. While I still haven’t seen hardware shipping, we now have some faith that hardware is now purchasable.

Enter the new developer’s reference board

AnalogLamb is offering the DEV-AI0002, a K510 Dual RISC-V64 Core AI Board with Dual Camera and LCD. 

 

It looks like a substantial board, offering dual-core RISC-V64 CPU with frequency up to 800 MHz. They claim 3 TeraFLOPS is possible. (Editor’s note: the K510 doc repeatedly says “800Mhz”, but that’ll be hard to do with a 5 stage, in order CPU like this…)   If true, that would put the device on par with the fastest GPUs from 2009, a large Xeon from 2015, or a beefy gaming machine from 2020. That said, this power isn’t coming from a 3D GPU; only a 2D GPU is cited for this board.

Beyond the power of the SoC itself, the reference boards add 512MB LPDDR3@1600MHz, a Camera Board with two camera sensors and Base Board. There are services for an LCD display, 1000M ethernet RJ45, HDMI, USB, TF Card, GPIO, UART and Audio Interface. CRB adds:

  • K510 integrate the dual-core RISC-V64 CPU and DSP up to 800MHz
  • Up to 3 TFLOPS AI, Ultra low-power wake-up VAD
  • Input high-definition triple camera, MIPI CSI/DVP interface;
  • Output: 4Video Layer + 3 OSD Layer;
  • High-quality H264 video encoding, 2 channels 1080P@60;
  • 2D image accelerator: zoom, crop, rotate, OSD overlay.
  • Camera Sensor Board with two sensors
  • 512MB LPDDR3@1600MHz
  • 1000M Ethernet RJ45 Interface and Wireless Module
  • HDMI and a LCD Display
  • USB OTG and USB Type-C Power Supply
  • USB to UART for Debug
  • TF Card Interface and GPIOs

It’s following the model of D1 and Raspberry Pi Compute Model in using a main board to carry the SoC and a larger board to bring out I/O connectors like TF (“TransFlash” is the term for uncertified SD Cards) sockets, USB 2.0, GPIO, Gig Eth, HDMI, and such. The unspoken theory is that decoupling these allows smaller (pronounced “cheaper”) carrier boards and replacing the CPU modules with newer ones as they come to market. It’s the future we were promised with Pentium-II “cartridges”. The K510 CRB Hardware Guide is one of the few in the initial doc release that’s Google Translate handle well to convert to English. The acronym isn’t known to me (yet – comments welcome!) but I’m assuming it’s “Customer Reference Board”. In some fantasy land, the K1020

40-pin GPIO connector – with a twist

Though the 40 pin connector may make you think of a Raspberry Pi-like expansion bus, but the pinouts are incompatible. It seems that a direct link to section 3.15 doesn’t work, so I’ll just repeat it here:

Figure 3-18 40P pin header expansion interface Table 3-4 Expansion interface definition

Numbering definition Numbering definition
1 VDD_1V8 2 GND
3 VDD_1V8 4 GND
5 VDD_3V3 6 GND
7 VDD_3V3 8 GND
9 VDD_5V 10 GND
11 VDD_5V 12 GPIO_1V8_95
13 GPIO_3V3_114 14 GPIO_3V3_115
15 GPIO_1V8_92 16 GPIO_1V8_96
17 GPIO_1V8_105 18 GPIO_1V8_107
19 GPIO_1V8_104 20 GPIO_1V8_106
twenty one GPIO_1V8_118 twenty two GPIO_1V8_119
twenty three GPIO_1V8_93 twenty four GPIO_1V8_94
25 GPIO_3V3_125 26 GPIO_3V3_124
27 GPIO_3V3_127 28 GPIO_3V3_126
29 GND 30 GND

(I kept twenty one through twenty-four as words because that’s how Google Translate presents them to English readers. Is there some significance to this in the original Chinese?)

While there are a few 3.3Volt lines, a majority of them are 1.8V. While this board doesn’t really seem to target IoT hobbyist style projects, this will provide a challenge for anyone that DOES want to attach their favorite Adafruit or Sparkfun gizmoid of the Pi or Arduino-class products that are almost universally 3.3v these days. There are few 3.3V lines available on this connector, so they might run out quickly.  If you’re connecting a 3.3V device to a 1.8v host, you’ll need to brush up on the details of level shifting or find a component that’s better suited.  Most 3.3v devices will read the maximum high of a 1.8V signal as a “low”, meaning it would be unable to recognize any change in voltage reliably. A $200 board really isn’t meant for running robotics servos and air sensors. Save those projects for a Dr. Who HiFive (RISC-V, of course!) Inventor Kit

Andestar V5, the primary core (two, actually) of the K510

The processor itself takes a big step up from the RISC-V Rocket design that was used at the heart of the K210. The tech docs show that they’re using an Andestar V5 design from Andes Technology, but clearly updated from the Andestar V5 they announced in September of 2019.  Of particular note, we see the Vector (presumably 1.0) support which was only ratified in December of 2021. That’s pretty exciting. There’s a collection of doc that we can hope will grow and we hope that “zh” grows sibling directories of English versions. (You’re free to hope for your own favorite languages, too – I’m just being selfish. 🙂 ) Google Translate handles a few docs OK, but the majority of them Translate will handle only a few lines at a time.

The chip is rich in I/O. Seven i2c and three SPI ports are generous, but I’d be careful with that voltage level peering issue. A 2D GPU will help most desktop applications once appropriate drivers are refined. All the RAM seems to be on the SoC itself, so don’t count on user upgrades. In the processor block below, we see three blocks of processing and a mailbox unit to let them pass messages (like interrupts) between them. The two RV64G+ class units should be familiar to readers here. The Kendryte Processing Unit in the K210 was the Tensor-style processing unit so we’ll refer to K210 KPU FAQ. Let’s hand-wave the details of that for now, but that lets us know what the KPU can basically do.

Kendryte passes the doc hot potato back to Andes for some chip-level documentation. This is fine, since they would be the experts on some aspects, but it can be a bit of paper chase not knowing exactly which revisions of the doc corresponds to the cores in these chips. We’ve already discussed that “Andestar V5” isn’t exactly a tight version number scheme as it apparently covers at least some range of parts from 2018 to 2021. But we’ll work through what I can.

I’m inferring that the AndeStar V5 Instruction Extension Specification is in play right through version 1.4, the most recent there. As 1.3 added the not-quite-final Vector extensions and 1.4 added vector for bfloat16, both of which are listed as features of K510, we seem quite up to date. We get 73 pages of (English) doc covering features on the chip that are in the extended feature sets. Andes is new to me, so it’s worth a moment for me – and hopefully, the reader – to make a quick romp on what extensions above the common RISC-V opcode set allows. I won’t go into great detail as just knowing these are a thing and that they have the possibility to improve your code – BUT making your code not work on other branded architectures – is enough.

Extensions beyond stock RISC-V

The Andestar V5 ISA, as used in K510, isn’t targeting embedded or super low cost devices.  These may be deployed as part of a fleet, in managed racks, or as workstation class devices and if using compiler magic to get magic opcodes results means that you need one fewer rows of compute-bound number crunchers in a data center, that’s probably OK. So what have they done?

Start with the basics, but extend the extensions.  The Andestar V5m ISA is a superset of RV-IMAC. Some of the things that were vendor extensions (Vector wasn’t ratified until December of the second 2020) are now part of the official RISC-V extensions. Depending on the age of the doc we’re looking at, this can get a bit confusing, but I’m guessing that a smart decoder can perform compatibility with a customer base that was exclusively theirs and users of the newly ratified parts. One example of managed change is in the handling of “half floats”. We’ve long (sorry) had support for (32bit) floats and (64bit) doubles often in graphics work and machine learning, 32 bits is overkill. If you CAN use 16bit floats, you can effectively double the size of your caches, halve the number of data transfers, and handle more data inside a Vector operation. It looks like they’ve added half-floats to all the places that make sense.

Bit ops. Branch on a bit being set or clear. Match can be in opcode. Sign-extend a bitfield.

Address Scaling. It’s a somewhat frequent complaint (esp. from developers coming from ARM or x86) that address scaling has to be done by the programmer.  The examples given by that former ARM engineer are compelling.  Assembly programmers know it’s a bit of a pain to burn an extra temp register just to keep constantly multiplying (or tallying) the index by the size of the structure you’re traversing. Andes adds addressing modes to compute familiar LEA operations like “lea.d t3, t1, t2” which is “t3 = t1 * t2*8”. I think I recall Alibaba/T-Head adding similar extensions to their C904 and C910 designs.

Various performance enhancements like “find first byte” will help many algorithms. There are also opcodes for loading a number of words into consecutive registers and converting common data types to and from the 16-bit floats.

Tooling support from Andes

Fortunately (?) Andes maintains their own fork of Andes RISC-V GCC with their own GDB and binutils in order to support optimizer and debugging the chip extensions above. It’s not clear why they are keeping their own entire versions instead of mainstreaming them. I rarely see @andestech.com listed in the ChangeLogs or in the mailing lists of those tools. 

Andes provides these tools in their Andesight Eclipse IDEf for Windows and Linux users. They provide binaries of their Andes Development Kit which is probably possible to build for MacOS as the source is there. The Andes Github repos are a bit of a circular resolution mess and it can be challenging to find current, maintained sources for each of the pieces. Hopefully a mainstream component in a high volume, open market will help drive some consolidation and more code sharing in this area.

Kendryte partnering with an experienced RISC-V core maker should eliminate a lot of the birthing pains we experienced with K210. The RISC-V standards are more developed and plentiful AndesTech RISC-V documentation (in English) and having Linux kernels, boot managers, and drivers already in place should be awesome. Deep in the docs, we learn they used the Andes AX25MP as a base and wrapped up the features of the Andestar 5 ISA as:

  • RISC-V RV64I base integer instruction set
  • RISC-V RVC standard extension for compressed instructions
  • RISC-V RVM standard extension for integer multiplication and division
  • Optional RISC-V RVA standard extension for atomic instruction
  • Optional RISC-V “F” and “D” standard extensions for single/double-precision floating-point
  • Optional AndeStar DSP extension
  • Andes Performance extension
  • Andes CoDense extension

and Andestar extensions as:

  • StackSafe hardware stack protection extension
  • PowerBrake simple power/performance scaling extension
  • Custom performance counter events(My read is that these are “optional” features beyond RISC-V ratified sets that they have opted into when building K510.)

Kendryte themselves have already published much in the Kendryte Github repo. Buildroot, Berkeley Boot Loader BBL and Proxy Kernel pk, and a Docker image to compile K510 are already there in addition to the K510 docs. Prominently, the 575 pages of the K510 Technical Reference Manual will provide us register maps, descriptions, and electrical traits of the chip itself.  (It’s stamped ‘confidential’ all over it. /shrug)

Back to the K510 CRB features

The K510 CRB ships with 4GB of bootable eMMC that can be loaded with your favorite OS, or your OS can be kept on a TF card for easy loading from another computer. The 128MB of NAND flash can store the boot loader and small amounts of storage, like a $HOME or configuration files. (In some places, it declares 16GB of eMMC and others call out 4. We’ll know once we see boards!) 

The documentation is conflicting on the number and type of onboard LEDs. A WS2812 “Neopixel” is present and visible in the photos. Another LED of some type (Power?) may or may not be present.

Two switches allow booting from UART, SD, NAND, or eMMC. On other chips of similar capacity, we’ve seen SD and eMMC flashed with images that allow yet more boot sources, such as netboot via tftpboot or USB-attached storage.

The USB OTG socket seems to be of the old Mini-B variety and not contemporary USB-C, though the UART console appears to be USB-C. (Remember that USB-C is the connector and it IS legal to pair it with USB 2.0 signaling, as they’ve done here.)  That interface is provided via a common CH340 USB/Serial adapter on the board.

An AP6212 can be seen on the sheets.  That seems a bit of a dated choice, even for a 2.4Ghz-only product. 802.11 b/g/n tops out at 70Mbps and it’s Bluetooth 4.0. That’s fast enough for moderate network use and a pair of headphones, but seems like another choice to target this in compute lab or rack style environments – indeed, in environments like those the radios would be largely unused in favor of the provided copper ethernet jack.

There are plenty of video choices.  You can drive a 1080p TFT display or HDMI, but not both at the same time. It’s a standard HDMI socket. MIPI video input is provided and a 30pint FPC connector provides LCD panel video output. The encoder claims to do H.264 Baseline Main/High Profile with 8Kx8K JPEG and a maximum support of 1080p/60fps. It is not a 3D accelerator.

Wrapup

This board looks like an interesting compliment to the VisionFive by StarFive and the Allwinner Nezha. Perhaps it can follow the precedent for Nezha and lead with a deluxe developer kit and later offer smaller docking boards, perhaps even using the same CRB, that are a lower cost but offer little more than a power and ethernet cable or other combinations as demanded.

Will you be ordering one? What are your plans with it?

Personally, my VisionFive just arrived, so it’s already in my review queue. Exciting times for RISC-V!

 

In a recent blog post, Espressif announced the ESP32-C2.  The Twitter thread from John Lee revealed an interesting twist; more on that in a moment. ESP32-C2 is a WiFi4 + BLE5.0 device with a single RISC-V core and 272MB of memory. It uses the familiar Espressif tools like ESP-IDF and frameworks such as ESP-Jumpstart and ESP-RainMaker.  It has on-chip ROM to reduce the need for common routines in flash.

Espressif is proud of the cost and radio performance of this device. Reducing power consumption was also a goal, which should help deliver this part into more IoT class projects.

ESP32-C2  is a low-cost WiFi chip supporting the Matter standard.

“Matter” is a royalty-free home automation connectivity standard, introduced late in 2019. Matter aims to reduce fragmentation across different vendors, and achieve interoperability among smart home devices and Internet of things and is backed by Amazon, Apple, Google, and other big names in that space. In the soon-to-be-released Matter 1st release, it supports WiFi, Thread, and Ethernet protocols.

With WiFi being so pervasive, devices like this supporting both WiFi and Matter Thread will be important for many years.

Doc for ESP32-C2 is available now. Chips are just starting to sample, with no availability date yet given.

But now, back to the scoop…

John Lee is a Senior Customer Support Representative at Espressif. He runs the @espressif Twitter account and is a good read. He originally posted the above announcement. I knew that Espressif’s last few chips (ESP32-C3, ESP32-C6) had been RISC-V, but I also knew they had a long run with CPU cores from Tensillica. One of the great tricks that Espressif pulled off with ESP32-C3 was treating replacing the CPU core as such a minor point that it barely was mentioned in the marketing doc and hardly even reflected in the chip’s name. I asked John for a clarification: “Are all C and H series going to be RISC-V?”. John quickly answered Yes. In fact all of the subsequent chips are RISC V.” One of the things we gave up between ESP32 and ESP32-C3 was going down from two Tensillica cores to a single RISC-V core, so the natural question of multiple cores quickly followed.  John put that to rest with Expecting to go up to 4 one of these days.” Hairs continued to be split and he confirmed that meant ““All” as on actually all next generation Espressif chips across all product lines will be RISC-V instead of Tensilica Xtensa? Not just the Cx series”.

That’s actually pretty big news on its own for RISC-V. Espressif is committing that all next generation products will be RISC-V and that at least some of them will be as large as four cores. Since Espressif has long been a leader of SoCs and Modules that package the SoCs with antennas, oscillators, and such into a single (usually certification-approved) package for both hobbyists and in the commercial space, this should result in a huge number of RISC-V cores hitting the market, even though they’re somewhat invisible as the user “just” wants to open a radio connection and not necessarily program the radio themselves.

Thank you for that scoop, John!

 

About this article: This preliminary attack on Buttons on BL60x with Nuttx can be thought of as an article that’s part of Lup’s Book on BL606 generally and his notes on Nuttx on BL60x specifically. As I was the one that made this experiment, I documented it for the rest of you. As a spoiler, the experiment failed, but we learned important lessons along the way and THOSE lessons are worth sharing more than the actual resulting button work.

Electrical switches, or in their more passing form, buttons, are as simple as it gets electrically. A button is like a piece of wire: it’s connected or it is not. It closes the circuit or it doesn’t. Mechanically, switches can take many forms like normally open (the wire is missing until it’s physically operated) or normally closed (pressing it removes the connection).

On the PineDio Stack, we have one push button that is connected to our BL604 SoC.. The push button is next to the internal LEDs and is connected internally to GPIO12.

Schematic of GPIO_12 on PineDio

From the schematic, we see that GPIO12 is connected via a 4.7k resistor to the power rail. When open, the naturally resting position of this button, GPIO is left to float high because it’s wired to VCC via the R48 pullup resistor. This provides enough resistance to deliver voltage to prevent that pin from floating and is enough resistance that when we close pushbutton, driving GPIO12 to ground, we don’t risk the steadiness of our power source by shorting it even temporarily to ground.

PineDio Stack Bootstrap schematic

There is actually a second switch available in PineDio stack, but a bit subtle – in fact, by default, it’s missing! The GPIO8 pin that we jumper on boot is actually a form of a button. Whether by a button or a jumper, it can be connected to either the voltage source or the ground, Natively, that jumper/switch is read exactly once during bootup so the flash firmware can decide whether to run the flash reader or to run your code. As this tale isn’t about GPIO8 – indeed, using GPIO8 in your own designs would be questionable as closing that switch during power-on would result in your product “not booting” to the untrained eye – we shall ignore the GPIO8 pseudo-switch.

From the view of the BL604, our button on GPIO12 is an input and it is upon us to (somehow) configure it as such. We’ll take responsibility for that in a minute. We either read the +3.3V in the normal case or we read the 0V of ground when the button is pressed.

Each GPIO (16 on the BL602 and 23 on the BL604) can be configured as:

• Floating input
• Pull-up input
• Pull down input

• Pull-up interrupt input
• Pull-down interrupt input • Floating interrupt input
• Pull-up output
• Pull-down output

Our hardware designer here has helpfully provided us with external pull-ups to +3.3V, so we’ll configure it first as floating input and just read the button by polling it. This is OK if you’re accessing the button frequently or it’s a major component of your application’s life cycle. For example, the joystick buttons on PacMan are pretty much always being pressed in one direction and the game is doing little if it’s not, so it’s OK to dedicate the CPU to checking the buttons. A more typical application, which we’ll attempt later, lets the CPU receive an interrupt when the button status changes. For a stopwatch button or a screen menu change, that is a much more typical use as it frees your program execution from polling the button all the time.

Elsewhere in the schematic, we also see that the GPIO_12 pin can be used as an output to control the vibrator. We’ve since learned that option isn’t actually populated on the devices in our hands, so we’ll largely ignore the output options on GPIO_12.

Our BL602_BL604_RM_1.2_en Reference Manual has many dozens of pages dedicated to explaining how the GPIO pins work in great detail. While it’s perhaps helpful to know all the details (could the board designer have saved the cost of the pullup resistor if “Pull-up input” mode were known?) we will instead rely not only upon the GPIO functions of Nuttx, we will rely on the “Button” specializations.

In general speaking, there are two ways for a CPU to notice a change on a signal: it can generate an interrupt or it can poll that signal. For super precise timing or when the CPU has nothing else to do, polling is often preferred. For thing like a pushbutton that change quite infrequently, a processor interrupt is usually a designer’s choice.

The Nuttx Apps project provides an example Buttons app in apps/examples/buttons/, which is quite rich in features, but it can also be a bit overwhelming. We’ll instead create a smaller case more specialized for our hardware.

We set out to create a Nuttx application (not a driver) to learn about the button state. As such, we’d interface with the buttons through special files in /dev instead of using BL602-specific functions.

First, we confirm that we have Nuttx building and runnable on our hardware. Our /dev entry contains generic GPIO, but we need to specialize it.

ls /dev
/dev:
console
gpio0
gpio1
gpio2
i2c0
lcd0
null
spi0
spitest0
timer0
urandom
zero

Because we’re several episodes deep into these tutorials, we’ll touch on the steps, but not the details to wire up a new example. The recipe is very much the same as in the other chapters of the BL602 book.

$ cd apps/examples
$ mkdir button_test
$ cp tinycbor_test/* button_test
[ do a bunch of mechanical edits to make a “new” program - we’re sharing that here, so you don’t have to repeated it. ]
KConfig, Makefile are nearly a search and replace.
Button_test_main.c starts empty, with only a main() returning 0.

Instead of hand-editing things, we turn ourselves into the build process for now.
$ kconfig-tweak –enable CONFIG_EXAMPLES_BUTTON_TEST
$ make olddefconfig
$ make -j20

Perform a flash update, upload the program, and restart the demo
On the device, confirm that we’ve successfully linked our new build. Notice the presence of button_test:
# Builtin Apps:
bas i2c sh
bl602_adc_test ikea_air_quality_sensor spi
button_test lorawan_test spi_test
[ … ]

Now let’s start configuring our hardware.

Because GPIO in BL60x is currently in a transitional state, we’re just going to brute-force ourselves into the first entries. So in ./boards/risc-v/bl602/bl602evb/include/board.h we’ll just temporarily take over that slot from PineDio Stack. This is clearly not great for interoperability, but it sidesteps a number of issues that MisterTechBlog is already working on .


kconfig-tweak --enable CONFIG_ARCH_BUTTONS
kconfig-tweak --enable CONFIG_ARCH_IRQBUTTONS

N.B. These are included in our provided defconfig for this board, but for reasons I don’t understand, we still have to manually set them here to be effective.

make oldconfig

Rebuild Nuttx and reflash it to the board as you have in the other articles to follow along.

The best-laid plans of mice and men often go awry

Our original plan was to interface with the switch in all three ways that Nuttx knows how to do this, but the wheels fell off that idea while we were building it. (Yes, we did have wheels while we were building it because Lup and and I were consulting with each other and tag-teaming development, each working on different aspects.) If we did all this right, there would actually be nothing BL602-specific exposed in our test application and we’d have validated all our internal private handling. That latter bit was a success, in an awkward way – we validated that they didn’t work.

The three approaches are:

    1. Read the GPIO pin “raw”. Just open the device, read it, and report the status.
      Configure the GPIO interrupt facility to let main() in our application do something else – or nothing else, such as just being in a sleep().
    2. Configure the Nuttx GPIO interrupt infrastructure. Success ultimately relies on an upper half running in application space and a lower half running in kernel space to deliver this interruption of event flow to the application to hop out to registered function names and handle these events.
    3. Configure the Nuttx button infrastructure, configured via CONFIG_ARCH_IRQBUTTONS to deliver an asynchronous event into the application to interrupt the flow and tell it that a button close or open event has been made. This actually relies on the above internally to work.

For any of these to work, we have to tell Nuttx where our buttons are we do this in board.h with an entry like this:

    #define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_PULLUP | \
        GPIO_FUNC_SWGPIO | GPIO_PIN12)

Get to the code!

While the order in the provided sample program flows slightly differently than is described, it’s hopefully recognizable. (The code is structured as it was to reduce repetition when we presented this in three different approaches.)

There’s no magic in dump_buffer(). It’s fortified to protect a (human) debugger from printing control characters or lengthy buffers directly to the screen, but it’s quite simple:

static void dump_buffer(const int buf_size, const char* buf) {
    for (int i = 0; i < buf_size; i++) {
        printf("%02x(%c) ", buf[i], isalnum(buf[i]) ? buf[i] : '.');
     }
}

Raw GPIO reads is the simplest.


int fd = open(INPUT_DEV_NAME, O_RDONLY);
for (int pass = 0; pass < count; pass++) {
  char ibuf[20];
  printf("Pass %d of %d:", pass, count);
  int c = read(fd, ibuf, sizeof(ibuf) - 1);
  dump_buffer(c, ibuf);
  if (c > 0) {
    if (ibuf[0] == '0') {
        printf("- Pressed");
    }
  putchar('\n');
}
lseek(fd, 0L, SEEK_SET);
usleep(500000);
close(fd);

 

This simply checks if the GPIO pin is active, printing anything we get from the GPIO port in hex and in ASCII and adds “active” if so. By default, we check the button rather arbitrarily 20 times and we sleep half a second between passes. This provides a nice feedback loop allowing you to press and release the button a few times and see the screen change in response.

There are really only two lines that may be worthy of surprise. First, the data as we display in dump_buffer() and as we test in the zeroth byte of ibuf[] is not a binary 0 and 1 as you might expect. They are ASCII ‘0’ and ‘1’ (0x30 and 0x31) respectively. This might be a bit surprising to those experienced with device driver handling as you might expect a more raw 0 and 1 there. This is actually a peace offering to command-line users of the GPIO drivers; it’s simply convenient to be able to cat (or hexdump or read…) a port and see its status. It’s similarly convenient to be able to write to it via ‘echo 1 > /dev/whatever’ to blink an LED or start a motor or anything else that may be an output on this same driver. So the ASCII convention actually is convenient here.

The second potential sharp edge is that streaming reads of the GPIO node will not stream reads. You may expect to ‘cat /dev/gpioin0’ and see a stream of 1s until you press the button, at which point you’d see a stream of zeroes until you released the button. Adjust your expectation. Again, presumably for compatibility with command line tools that keep a short lifecycle of a device’s file descriptor, only the very first byte of that potential bytestream is ever valid. You could close and reopen the device to get back to the beginning, but that’s a bit costly as it increases the total number of potential system calls, the transitioning edge between OS application code and kernel mode. We thus use lseek() to just to back to the beginning and read it again.

This is all jolly well and very satisfying. We’ve hooked up a button logically to the operating system and we’re now able to read it and do something useful with it.

“And then, the murders began…”

Filled with confidence, I proceeded to code up the approach of using GPIO interrupts into user applications. Knowing that we needed to ultimately allow for device with way more than the single button on PineDio Stack, we thought about the configuration scheme. The existing scheme is a series of entries in board.h like this:

#define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_PULLUP | \
GPIO_FUNC_SWGPIO | GPIO_PIN12)

Initially, we ran into problems if the same pin were configured to be both an output and in input. On PindDio Stack, sharing the button with the vibe didn’t seem completely unreasonable. We could, perhaps, keep the port as an input most of the time and only change the direction when we knew we needed that GPIO line to be an interrupt. We’d lose button functionality while vibing, but that didn’t seem so bad. We put a TODO in the code and vowed to come back to that. Still, that killed most of a day to learn that lesson. (Spoiler: you just can’t do that on this chip. You HAVE to reverse the pin.)

We knew the dance between
#define BOARD_NGPIOIN 1 /* Amount of GPIO Input pins */
#define BOARD_NGPIOOUT 1 /* Amount of GPIO Output pins */
#define BOARD_NGPIOINT 1 /* Amount of GPIO Input w/ Interruption pins */

And

#define BOARD_GPIO_IN1 (GPIO_INPUT | GPIO_FLOAT | \
    GPIO_FUNC_SWGPIO | GPIO_PIN10)
#define BOARD_GPIO_OUT1 (GPIO_OUTPUT | GPIO_PULLUP | \
    GPIO_FUNC_SWGPIO | GPIO_PIN15)
#define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_FLOAT | \
    GPIO_FUNC_SWGPIO | GPIO_PIN19)

…and we knew those blocks were precarious. Keeping them in sync is awkward. We’d debugged those before and fixed several issues there. It certainly killed our demo app to not be able to have a pin readable as both an _IN1 and _INT1 device, but we thought we’d proceed and come back to it. Another TODO.

We talked about the potentially large numbers of buttons (even if multiplexed into a keyboard multiplexing layer, as is possible on the BL702/704/706) of this and we thought about the number of places in the BL602 code that were passing around bitmaps of the available pins in uint8_t’s. We fixed as many as we could, but that hung in our mind of needing consideration. Add a TODO.

We knew that several stars had to align in order to actually receive an interrupt on a pin at the hardware level. The interrupt source needs to be present, e.g. by pressing a button. The GPIO register itself has to have that port configured as an interrupt source. The GPIO global register has to unmask that interrupt. The CPU has to enable interrupts for the GPIO by setting the correct bit in BL602_IRQ_GPIO_INT0. The mask on the CPU core itself needs that interrupt enabled. Of course, an interrupt vector has to be present for the CPU core and successfully jumped through and that code then has to find an appropriate function registered at BL602 portability layer which is then responsible for calling the function registered in user layers. It just didn’t work.

We found that sometimes, replacing the portable interrupt or GPIO abstractions with the BL602-specific layers would sometimes help – and sometimes made them worse. It was definitely making the code less maintainable and simply doing unnatural things to the (otherwise sensible) abstraction models.

We started thinking through cases of pins being shared, such as in our vibe + button case and our interrupt + traditional read case. We also started having issues modeling hardware that was similar, but not quite the same and figuring out how that would map into shared apps that needed different configurations and thus, different board.h entries.

The TODOs kept piling up for code that was missing or just wrong. It was not pretty…and we weren’t getting particularly close to working code for what should have been a simple demo. The BL602 layer was, amongst many problems, just not compatible with the shared upper/lower split model that was needed for the final two approaches we sat out to write.

The good news is that there was light in the proverbial tunnel for us.

The current BL602 implementation used a very simple model of GPIO pins that was expecting a low count of input, output, and interrupt pins that were all independent and manually configured. We were clearly outgrowing that model. The other model offer in Nuttx was already on our radar as something we were going to have to implement soon-ish. Interrupt Expanders in Nuttx allow a 1:1 mapping between a device’s physical pin and its name in the /dev tree. They do away with the entries in config.h

While I was struggling with this code, Lup was coming off wrangling the SPIO driver for the display and working on the touch driver. Both of those were ALSO running into related issues in the BL602 port of Nuttx. Lup had already recognized that we were falling into the “sunken cost” development fallacy.

For example, we were each implementing hacks in the BL602 port (such as copying entire sections of code just to manipulate a single bit differently because the common code didn’t have access to the needed info to know the direction and type of the port) and the needed types were static and private.

This was our breaking point.

I had to take a few days away from the code for personal reasons and Lup reprioritized the next chapter in his book to be “Implement GPIO and Interrupt Expander” so we could get all three of these drivers (screen, touch, button) back on track with portable code being portable and possibly all working at the same time – something we couldn’t really do with the board.h model.

This article is both a bridge between some of the gaps in recent articles to explain the issues that necessitated the development of the GPIO Expander in Nuttx and to act as a placeholder until we can roll in a sensible button handler.

Thank you for reading this far and thank you for your patience while we sort this all out. Enhancing and fixing the bottom parts of the Nuttx BL02 part has been challenging and and distracting relative to the projects we’ve set out to undertake, but we hope you’ll find the results useful. We hope to provide enough encouragement and background for others to help in that journey and build upon it for both the public tree and in your own projects.

The Bouffalo BL602 family of parts is a very popular low-end RISC-V part. It has WiFi, Bluetooth, and a handful of GPIO parts with 1928K of RAM and 128K of ROMso it’s able to hold. The 192Mhz part with 276K of RAM and 128K of RAM is low cost (<$2 in bulk) making it popular for individuals with homemade prototypes or commercial use. Development boards like Pine64’s Pinecone and Pine nut series are easy ways to get FCC-certified radios in handy breadboard-ready packages.

But…

There’s always a “but”, isn’t there? 

The development process can be frustrating. There are several code uploaders that simply don’t work as expected, particularly on a MacOS environment. For this article, we’ll even fast-forward over that unpleasant fishing lessing and just give you a fish. Even Bouffalo’s own BLDevCube, if available for your OS, doesn’t get high marks. I’ve spent hours working with Bouffalo engineering and still don’t have it working.

Use https://github.com/spacemeowx2/blflash. Rust apparently doesn’t know how to set the bit rate above 230kbps (where POSIX ends) on MacOS, so you have to upload more slowly than our Linux peers. The command to use is cargo run flash /tmp/sdk_app_st7789.bin –baud-rate 230400 –initial-baud-rate 230400 –port /dev/tty.usbserial-1440. Poof. You now have an upload command that works, is scriptable, and is easy to recall from command line history.

The thing that’s harder to script is the amount of physical fiddling that’s required. You have to move a jumper on IO8 from L to H, press the reset to start the board’s native code downloader, then move the jumper back and press the reset again. It’s very easy to miss one of those steps while debugging a binary, so you end up looking at the source for version N, but running version N-1 on the device. As you can guess, it’s frustrating.

I’ve long had it in my mind that the jumper was a pin on the address bus and the CPU needed to be connected to one block to program it and another to run it. (That sounds wrong now that I’m typing that, but I have had hardware in my past that required this.) The schematic for the Pine64 board is dead simple as all the ‘magic’ is in the canned castellated board.

There is no magic to this pin. This pin is connected to GPIO8 on the BL602. The state of GPIO8 is polled exactly once in the bootup sequence. Though there are pullups via the jump to high or low, the pin floats in the ‘low’ state, which allows the device to boot to the flashed code by default. So if you just remove the jumper, the board runs the last code you squirted into it. That seems a nice default. How can we use this to our advantage?

What if we scavenged a momentary contact switch for this? PC power supplies haven’t had “real” power buttons in decades. There’s a momentary pushbutton that sends a request to the power supply to kindly turn off or on, based on the momentary push of a button that just usually happens to feel clicky. They usually just happen to have a header that’s on .100 posts, so they’ll just slide right on.

With this “hack” in place (“Number 143 will blow your mind!”) resetting the board can become a fluid motion that you can commit to muscle memory:

  1. Press and hold your newly attached button.
  2. Press the reset button.
  3. Release the reset button.
  4. Release your new ‘boot button’.
  5. Start your code download.
  6. Press the reset button to begin running your code

It’s probably possible to release the button too fast as several opcodes have to be executed to configure the processor and ultimately poll that button, but it’s my experience that as long as you treat it Press A-B Release B-A, it’ll come up executing the downloader every time. 

Also, to summarize the key parts of the BL engineering spec for the bootloader, it’s helpful to recognize the boot flow.

  1. Reset Vector -> chip setup -> check GPIO 8. Is it low? Jump to user code. Else, start bootloader.
  2. The bootloader will start spraying ‘.’ (period) character while it’s listening to the serial port. These are at 2,000,000 bps by default. This is unfortunate because it’s a high enough rate that many programs can’t listen at this spec. I’ve not counted them or put them on a scope, but I’d say there are 8-10 of these a second.
  3. Inside this same loop, it’s also listening to the serial port. If it receives a ‘U’ (0x55 – chosen to maximize bit toggles so it can sample widths) it will try to reset the serial bit rate to match that speed and the ‘.’ pattern will continue at the new rate. Depending on exactly when that ‘U’ is received, it may take a couple of these to sync up at another bit rate.
  4. Now that the initial bit rate has been agreed upon, a program like blflash can do “protocol stuff” (documented elsewhere) to send the download to the BL device.

If you’re running a program like CoolTerm to actually talk to the device, it’s useful to set it at the matching bitrate. You’ll know you’ve missed a step if  you see the streaming period characters because that means the device is listening to you typing, awaiting “protocol stuff” packets, instead of the upload program, like blflash. Starting and stopping (‘connecting’ and ‘disconnecting’ in CoolTerm) the application you may be viewing the serial port is another step to synchronize with this. It’s for this reason you should try to quickly get your app to a state where it can communicate via a screen or blinking lights just so you don’t have another step. That’s just an unfortunate reality of hardware makers giving us one port to use as both code uploader and as console.

Enjoy jumper-free life!

P.S. just attach it with one leg dangling free so you’re less likely to lose it.

Because of the difficulty downloading disk images from geographic distance or sites that may not translate well, this is a collection of important boot images for members of the D1 RISC-V processor. We have collected images and information from Fedora RISC-V Project and Debian RISC-V project  as well as some information by Sunxi.  The information is mostly targeting the Nezha class of boards, but may be useful for other boards based on the Allwinner D1or D1s/F133 chips, particularly those built by Sipeed.

There is a security measure enforced by our hosting service that everything has to be served as a zip file. Thus images that are already a compressed image (.zst, .gz, .img, etc.) are zipped again. Sorry.

Fedora

Debian

 

 

 

 

 

For a while, SiFive and LLVM were both developing support for RISC-V Vector 1.0. LLVM is now the only one in active development.

Jim Wilson, a GNU developer of decades, works for SiFive and while he wasn’t the one doing the work, it seemed likely he knew who did, so this is authoritative. He recently said:

There is no actively maintained gcc rvv support, and no ongoing gcc rvv development. Current work is all in LLVM, and LLVM is recommended if you want rvv support. … SiFive abandoned the gcc rvv work and is doing only llvm rvv work now. The gcc rvv branch is badly out of date.

Reading deeper into the GCC development list, this is really just a form of tough love as work on GCC’s RISC-V vector (and auto-vectorization in general) has been talked down before in July, 2021: 

 It isn’t up to date with the evolving RVV ISA spec, it isn’t up to date with the evolving RVV intrinsics spec, there are ugly hacks in the vectorizing optimization passes required to make it work, there is no autovectorization support, it is missing basic optimizations like eliminating duplicate vsetvli instructions, etc. The current status is that it is only useful as a toy for demos. SiFive and a few other organizations are contributing to the LLVM vector support, but no one is contributing to the gcc vector support. Alibaba has expressed some interest in contributing recently but it isn’t clear how we will handle their patches yet. The current stuff was mostly done by SiFive, but SiFive is not currently interested in funding this work.

I’m less sure of rjiejie‘s credentials, but they may work for Alibaba/T-Head, the maker of the cores used by Allwinner in D1 and D1s. That may be the “we” in his comment:

We have also supported/maintained the RVV v1.0 feature, you could download prebuilt gcc toolchain from Alibaba website[1].

Registration for the site is required and Google Translate doesn’t handle it well, so I’m not sure, but that may be a path for someone really needing GCC with Vector 1.0. It’s not clear if that work handles 0.7.1 of Vector as was used in D1. The C910-based products also seem to support 0.7.1, so their 1.0 support must be for future chips.

LLVM is one of several projects that has struggled with the issues of handling multiple V versions in the same code, but their resolution wasn’t clear. Simulator QEMU “solved” them problem when adding 1.0 by dropping support for Vector 0.7.1.

SiFive is “only” one of many core vendors and that they can’t be expected to carry the development/maintenance/support for such things by themselves, but it’s surprising (to me) that they’ve halted development.

T-Head has binaries (and maybe source) for GCC that supports V1.0 and probably 0.7.1, though that may be in a branch as it’s pretty clearly a dead end now that V1.0 has been ratified. LLVM and SiFive were, at least at one time, partnering in LLVM development. LLVM seems to have an active plan and are shipping V1.0 support now.

For GCC’s status to be “useful as a toy for demos”, it’s probably a disservice to even have it in the default builds of GCC until someone is willing to fund the couple of person-years that Jim mentions to get it on track. At least bugreports to LLVM are likely to get traction as it’s actively developed.

For now, if you’re developing vector code on RISC-V, prepare to pair your toolchain with the chip/simulator you’re using. It’s likely to be finicky for a while.

I’ve been away from writing for a bit for personal reasons and I’ve missed talking much about many events in the RISC-V world this year. Here’s a jumble of thoughts from October of 2021.

Low Points: BeagleV Starlight canceled, Nezha/D1 launch issues

I was lucky to have tinkered with the prerelease BeagleV board (codenamed Starlight) that featured the StarFive
JH-7100 SoC. It was well documented, had an amazing technical group of active participants from both corporate and hobbyist backgrounds, all working together on merit, and good tooling. Antmicro’s ‘Renode‘ emulator made developing on these parts a breeze.

Unfortunately for the business, BeagleV/Starlight and StarFive were unable to reach a production agreement and BeagleV Starlight project was cancelled. I did some software and hardware work that went into the proverbial chipper, but I managed to learn and refine some skills along the way. I remain hopeful that the low-volume (Two core) JH-7100 and later (Quad core, embedded GPU, PCIe) JH-7110 will be delivered at a similar price point by the likes of Antmicro or Radxa, which has already missed their ship date. It’s all resulted in some thrash, but it’s possible that all the players (Beagle, Antmicro, Radxa, Starfive) dust off and ship RISC-V boards.

I have possession of a Nezha developer board. This is the official development board made by Allwinner as a vehicle for their D1 chip. By contrast to StarFive, documentation on this device and board is poor.  The maker of the chip and the board, Allwinner,
having a pretty poor record playing nicely with open source developers with license violations being common. When it was at a price point similar to BeagleV, it seemed an underdog as a single core device, but it did have the claim to fame of being the first shipping device of supporting the RISC-V Vector extension. I’ve been a member of a few different discord/slack/telegram groups for this device and they’ve all been dominated by people stuck at the starting line: just finding a maintained distro that doesn’t require a login in Chinese and a phone number in China is a common challenge.

Unfortunately for many developers, D1 supported only 0.7.1 of Vector, which has source and binary incompatibilities with the final 1.0 version of that extension which is currently (October 2021) in final stages of public review. This part also really requires Allwinner’s own use of GCC/Binutils to use these extensions well. Interestingly, the RISC-V part of this SoC comes from Alibaba’s XuanTie C906 line, which was itself recently open-sourced, though there have been serious issues trying to land Alibaba’s incompatible work in upstream projects like GCC and QEMU.

I’d love to be able to comment more on the actual development board, but can’t as it appears my board is apparently totally DOA. I hope to be able to write more about it soon.

This board gets the (somewhat deserved) criticism of being overpriced when compared to high-volume devices like Pi and the (awkward) criticism of  having a single 1.0Ghz core and relying on an old version of the Vector specification. As the final version still doesn’t exist and fab times just plain take a while to get from Verilog  to real silicon, we can be only so mad at the first chip to support even a pre-release V spec. We can be more upset that the chip requires violating the RISC-V specification on reserved bits in the paging machinery. All this does lead to an up-looking highlight to finish up this catch-up.

On the Horizon: Allwinner D1s/F133

This week, there’s been interest in a new revision of the D1. The Allwinner D1s (sometimes called the “F133” for reasons I haven’t yet grasped) is a cost-optimized version of the original D1. Where the D1 really seemed to ship only with their own development board, Nezha, the D1s seems to come out of the gate ready for the likes of SeedStudio and Mango Pi’s ~$10USD RISC-V board  or in low quantity to put on your own open-source boards, like Xassette

It’s a slightly confusing product, but some of that may just be translation/documentation issues.  It’s cost-reduced, and that filters through to the boards we’ve seen so far. There’s 64MB of RAM on board, but it’s sold as “Linux ready”.  The removal of HDMI signaling means no monitor and 64MB will require a very stripped down system. Cramming Linux into the 8MB on a K210 was (barely) possible, so this must be possible, even if cramped.  Still, for a single-purpose or educational environment, that’s probably OK. The Allwinner F133 overview avoids any comparison to D1, refers to itself as “video decoding platform”, and even avoids use of the phrase “RISC-V” completely.

It’s interesting that one of the most controversial RISC-V chips of 2021 managed to ship a second revision this year while we have so many that have just seemingly collapsed under their own weight or never found their legs beyond original announcements. (Blink twice if you’re alive, PicoRio!) 

As we approach the end of the year, we’ve had quite some changes in the RISC-V ecosystem. It’s likely that the product families that have most met or exceeded my expectations are the BL602/706 family and Espressif’s menagerie of ESP32-C3 and ESP32-C6.

What have been your biggest disappointments or surprises in RISC-Ville? 

 

I normally don’t do “scoops”, but as I write this, I can find no other pages on Google in English mentioning the Bouffalolabs BL562 and BL564 RISC-V chips. Even Bouffalab’s own page is pretty scant right now. (I’m writing this late on 2021-03-31 and no, this isn’t April fool. Maybe it is and I’ve fallen for it, but it seems terribly non-funny…) However, this seems like a very interesting contender in the low-power RISC-V processor market. It’s very likely a subset of the already-successful BL602/BL602, but without the 2.4Ghz radios that give it WiFi or Bluetooth.  This also means the parts of the chip that have the most contentious NDA requirements for certification are simply not there.

Comparing Boufallolab’s own overview sheets of the BL562/4 and BL602/4 really highlights that only the yellow block, the RF radios, are different. The pin counts are the same as BL602/4, with at 32 or 40 pin QFN packages. It’s very likely the same RISC-V core running at speeds up to 192Mhz and with 276KB of RAM and 128KB of flash ROM.

It seems likely they’re pin-compatible, but we don’t yet have specification sheets with that level of information that I can find.

BL602 is already a price leading choice for low-end designs, with single-piece pricing of about $1USD. It’s easy to imagine that bulk orders can reduce by that a third or more. We can probably look to the BL602 for real-world performance measurements. The clock speed over the 108Mhz Gigadevices GD32VF103 family has given it a hand up in my own measurement. (I don’t have formal numbers.) GD32V, probably the most natural device to compare these two, ships in QFN36, LQFP48, LQFP64, and LQFP100 packages, so it has more I/O, notably USB support, which is absent in BL562.

This entry is a bit of a surprise as the medium (“runs Linux”) and high end(“runs a graphical desktop”) developments in RISC-V have been much publicized, it’s important to remember that not everything is IoT or needs to be able to render Netflix at 4K. With a smaller size, lower pin count, we score another gain of modernization. While GD32V’s 32K (max – there are smaller ones) of memory can feel a bit cramped, the 276KB of RAM may feel downright luxurious in some designs.

BL562

General purpose RISC/V SoC

BL602/604

RISC-V core with 802.11 and Bluetooth

As a general-purpose RISC-V processor, this is sure to score some commercial design wins where pennies count and hobby interest, where good development tools matter. Bouffalo Labs, in cooperation with SiFive, have an established SDK that’s been picked up by Pine64  and SeedStudio.  There has been some jockeying lately at high-end hobbyist or media-player class devices, so it’s refreshing to see another player come back, wearing a slightly different costume, with a solid part in the dollar (or less?) market that’ll keep our rectangles blinking.

Assuming it’s the same RISC-V core (surely!) as Bl602, it’ll build on the established SDK provided by Boufallo and forked by Pine64 for their  PineCone and PineNut lines and by Sipeed for their BL602 product and DoIt for DT-BL10.

Epilogue

Is it a scoop? I don’t really care. I’m always astonished how quickly things get to the likes of CNX, Reddit’s/r/risc-v, and the Twitter buzz. There is, of course, the time dilation between tech in China and the Western World. I was clicking around on Boufallo’s site, trying to find information on yet another part, and fiddled with the URL when I landed on BL-562.

At least for some short time, I think I have a reasonable claim on “first” and maybe even “most comprehensive”. 🙂

We’re all familiar with the fable of the boiling frogs, unable to sense the change they’re (literally!) immersed in. Enthusiasts of RISC-V architecture may be encountering the same right now: late 2020 gave us a steady stream of new hardware announcements, but we may not have a great sense of us since the hardware isn’t always possible to order yet. Let’s review some of the upcoming products in this market, duly nothing that products can change or get canceled before they even ship.

We had two major new families of entries in the iOT category. Both use the RISC-V to drive WiFi and Bluetooth radio stacks. Bouffalo Lab’s BL602 is available in quantity now. Starting around $2.50 for a module with multiple development boards in the $5-$10 range (including Pine64’s Nutcracker for PineCone and the DoIt DT-BL10), this chip starts with a core from SiFive and has 802.11 b/g/n and Bluetooth 5. The upcoming BL-702 family adds Zigbee radios. There is enough compute resources (CPU, RAM, Timers, etc.) that you can build your own software right onto the radio chip via their multitasking OS and open development kits. You may recognize this as the basic model popularized by Espressif in their ESP8266 in recent years.

Espressif also embraced RISC-V with their upcoming ESP32-C3 family. It’s interesting that this chip doesn’t even get a distinct name at this point as Espressif apparently sees the CPU core as only a small part of the product. Still, by volume, the ESP32-C3 is likely to become an extremely popular choice.

Moving up a step computationally, we enter more traditional chips and single-board computers. Alibaba’s Xuantie 910 is widening into a family of chips. The C906 is being marketed for more entry level class, but still featuring a load of I/O, multiple cores, support for the still-not-ratified Vector extensions, and more. Press releases tend to mix up the 910 and the 906, but they both seem pretty hot.  In late January, anAndroid Open Source Port of C910 was demonstrated. Embedded specialists Sipeed have announced a C906 development board that’ll run Debian and that starts at $12.50. If Sipeed does for that what they’ve done for GD32V and K210, we should see lots of interesting SBC projects from them.

Sipeed teases C906 RISC-V board

Rios is bringing us a claimed competitor to the Raspberry Pi called the PicoRio. It’s coming inthree stages:

  • PicoRio 1.0 is a headless, four-core RV64GC that’s capable of running Linux at 500Mhz. It’s been used from 2020H2 to an expectation of beta in 2021H1.
  • PicoRio 2.0 adds Imagination’s PowerVR GE7800 XE series GPU, which may finally bring a GPU-capable RISC-V development board into casual hobbyist price points.
  • PicoRio 3.0 strives to bring the performance to be comparable to a tablet or desktop computer.

Another entry in the Pi-class of hardware, though not at Pi Price, is the Beagle V from the group that brought us the famed Beagle Bone. It uses two of SiFive’s U74 cores at 1Ghz includes 8GiB of LPDDR4 RAM, gigabit Ethernet, an 802.11n Wi-Fi + Bluetooth 4.2 chipset, and a dedicated hardware video transcoder supporting H.264 and H.265 at 4K and 60fps.The system also offers four USB 3.0 ports, a full-size HDMI out, 3.5mm conventional audio jack, and a 40-pin GPIO header. As a snack for those interested in AI applications, it also features  a Tensilica Vision VP6 DSP for machine-vision applications, a Neural Network Engine, and a single-core NVDLA (Nvidia Deep Learning Accelerator).

Core provider SiFive is bolting Freedom U740 cores to a min-ITX design in HiFive Unmatched. X16 PCIe expansion, 16GB of DDR4 RAM, NVME M.2 slot, Gigabit ethernet, and four cores at 1.4Ghz should make this a entry-level desktop-class system, including host-CPU class of building for native applications at full scale. For professional developers, the $665 entry ticket should be more appealing that the $999 for the board’s predecessor, Unleashed.

The PicoRio V1 and Unmatched have already slipped from Q4 into 2021.

Still, while we’re not bathing in fresh alternatives to the GD32V and K210, we have several alternatives on the proverbial launching pad and several options to bring excitement into lives and toolboxes of RISC-V aficionados.

What do you see coming up? What are you most anxious to work with?

 

It is not an exaggeration that the current wave of IoT devices owes a lot to the Espressif ESP8266  family of devices. That means a new member of this family is a big deal and it’s pretty exciting that the newest, the ESP32-C3, moves to a RISC-V core. We have a draft of the ESP32-C3 data sheetfor those ready to dig in.

ESP8266, Quick History

In 2014, The ESP8266 came to the scene, bundling a full WiFi package, including antenna, ROM, RAM, and a CPU into a package that integrated with Hayes modem-like command set for communicating with a host that could be as simple as an Arduino or less. Eventually, enough was learned about the core, a Tensilica  Xtensa Diamond Standard 106Micro running at 80 MHz, that hackers were able to run their own code on board and often eliminate the “host” processor completely, often for under $10 at that time and in decline since.

ESP32 was the 2016 successor, bringing in Bluetooth and more powerful integrated CPU. Available as a chip or a (FCC-tested) module that included antennas, the most common configuration was dual-core, allowing a less cramped balance of a developer’s own code with the integrated feeding of the radio stack. The Xtensa LX6 cpu core was still not widely loved by programmers with toolchain issues remaining common.

Esp32-C3: Now with more RISC-V

Early in November 2020, we first got hints of a RISC-V design, the Bouffalo Labs BL602 family, making an attack on that market of low pin count, high integration devices striking a blow at the ESP32 price point of about $5. Late in November, we now have confirmation that (awkwardly named) ESP32-C3 is being released by Espressif as the newest member of their family, though details are only slowly coming out of China, as they do.

ESP32-C3 will be pin-compatible with the large ESP8266 family. It includes a 160Mhz 32-bit RISC-V core toreplace the Tensilica CPU. As you’d expect in 2020, b/g/n WiFi and Bluetooth Low-Energy (BLE) are table stakes. ESP32-C3 brings 400 kB of SRAM and 384 kB ROM.  

We don’t yet know what RISC-V core they are using (SiFive, Nuclei, etc.) or if they’ve created their own.  As this is likely to be a relatively humble RV32IMAC (or less!) design, we’d expect high degrees of compatibility with the wide variety of RISC-V tools that we already have. We don’t know if the trend of binary blobs (a problem being tackled by Pine64) will remain, but it’s likely they will given the regulatory landmine around radios.

With access to the wealth of dev tools, socket compatibility with ESP8266, and Espressif’s embrace of the maker communities, this device is sure to be a hit. Unfortunately, it’s a little too early for a stocking stuffer this year, but it’s one of a series of parts that’ll make RISC-V fun to follow in 2021.

The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.

I haven’t talked much about it (this being my first post and all…) but I’m pretty smitten with the current wave of RISC-V processors. I really like that it’s an open development platform, which means chip companies can buy or create the core that works with the basic instruction set and the programmers have a consistent and shared set of tools for the entire line ranging from $.10 parts for a home thermostat to a workstation class part. Rising tides and all that.

As a software guy, there are chip vendors that I’ve never heard of, but Bouffalolab Lab seems to be a pretty new name to most of us in the U.S. They’re making waves with the announcement of the BL602 and BL604 Systems-on-Chip (SOC). This part is just becoming available in the final week of October 2020 and it’s looking pretty interesting.

There are two parts in the immediate family: the BL602 and the BL604. The two parts differ by the number of GPIO pins and thus, the size of the external package. BL602 has 16 GPIOs and comes in a 32-pin QFP. The BL604 bumps to 23 GPIOs and rides in a 40-pin QFP. The 32-pin part should come in around 5mm per edge, so it’ll fit in really small applications. The integration is high enough that on the 32-pin part, half the pins are devoted to GPIO with the rest being the mandatory 3.3V, ground, crystal in, grounding, and other housekeeping. Outside the party, but similar enough that the SDK references them sometimes by accident, in the BL606 and 608. On those you go to ARM cores and trade wireless for audio, but that’s not our focus today.

It has 2.4Ghz radios, so that covers BTLE 5.0 and 802.11 b/g/n. Wi-Fi through WPA3 is supported. The microprocessor core runs at 192Mhz and comes from the SiFive cores. JTAG support is included and it claims support for the Segger family of debugging pods and software. It has 276K of SRAM, which is a really nice step up from the closest competing part, the GD32V’s which pack a mere 32K. Both parts have 128K of flash. There are two UARTs, but no USB controller like the GD32V. It has the standard alphabet soup that we expect in a post ESP8266 including SDIO, SPI, I2C, and such.

Bouffalo is a Chinese company and the documentation reflects that. Google Translate helps bring the doc and some code comments to us, but the reality is that many, many sections in the manual are just blank for now. The 34 page datasheet is oriented toward hardware developers, but does throw the smallest of bones to programmers. For example, it doesn’t tell you WHAT the UARTs are (it’s not a 16554) but it tells you where they are in memory.

A few companies are racing to put boards in the hands of developers. Both SiPeed and Pine64 seem to be early movers with prices hovering around $5USD with shipping being about as much again. I’m sure we’ll see an ocean of reference designs on Aliexpress and such soon. Currently, we have three such boards:

  • Pine64’s PineCone64: USB-C (yay!) via a CH340N and with RGB LED.
  • Doi.am’s DT-BL10: Micro USB
  • SiPeed’s BL602: Micro USB. Rumored to have FTDI serial interface. Might be same as Doi.am’s

Development environments include Eclipse + OpenOCD or Freedom Studio + OpenOCD. The company provides an app called DevCube (unrelated to the VR tool of the same name) to configure devices for production. This helps IoT makers prepare the flash for partitioning and OTA upgrades. It seems inevitable that the popular PlatformIO IDE will be supported. Since it’s a familiar SiFive core, just submitting the needed flash layout and programming data is probably on the order of dozens of lines of code to support PlatformIO for BL602/BL606.

If we look past the large number of blank doc pages and the high percentage of Chinese in the SDK, what do we find? This is actually the jackpot for RISC-V designs – there are working Mac, Linux, and Windows (Cygwin) GNU-based toolchains right in the tree at launch. Filenames are a bizarre mix of CamelCase and underscore_separators, even sometimes in the same directory. It ships with GCC 8.3, but since RISC-V was upstreamed a while ago, I’d expect upgrading to later versions to be easy. It relies heavily on Amazon’s FreeRTOS as the core. The entire package relies on external source (FreeRTOS, GNU Tool binaries, compression libs, etc.) by copying and not via git-referencing the modules by inclusion. This means you’re probably always on a stable build, but it also almost insures that you’ll always be behind releases of third party code, which can be bad in network-connected devices.  Amazon’s MQTT and Jobs interfaces are well represented. The expected libraries for a high-functioning 32-bit core are all there, with a pretty full ANSI libc with printf, compression and a filesystem for the flash. Startup code is a pretty straight SiFive example for GCC. There are 31 separately buildable apps for demonstrating features of the device, including a CLI for interfacing with it somewhat like boot monitors of days past. “Hello, World” is there, of course, but ironically doesn’t contain the string “hello” as it gets printed in a callback from the initialization state machine as it advances.

The network modules are, unfortunately, binary blobs at this time. I expect the community to make short work of changing that.

Overall, this looks like a good entry for the bottom/low-end space for wirelessly connected (or not) devices for hobbyist and commercial applications. The SiFive core is well regarded for performance and the performance should be better than the GD32V’s 108Mhz while the price lower than the AI-centric dual-core 64-bit Kendryte K210.

Start your engines with:

Official Boufallo SDK
Pine64 is offering hardware to developers that help get BL602 started.
SiPeed (makers of Longnan and Maix lines) has a BL602 SDK.
Doit has a $4 eval board with a BL602 SDK.
All three of those are forks from the Boufallo code. We don’t know yet how different the hardware is.

RJL