I was recently reading https://wp.josh.com/2020/01/22/wtf-sonoson-the-evil-companies-do-and-good-people-coming-together-to-stand-up-to-them/ where the author expressed an unpopular opinion on what was then the early problem of hardware becoming unusable when the companies lost interest. This is a bit off-brand for my posts here, but by definition, it’s my stage and I’ll write what I want.

(I’ll also come back to this article and link it up a little more than the original format of a response to a blog post would allow.)

It’s easy to seem smug here in the future of 2024, four years after this was written, but it’s a bit funny that one particular example was given:

Someday Spotify might kill an API and then you will not be able to Spotify with your old Sonos anymore. Disrupted access to services and overall functionality? Yes. But not bricking.

Since this article was written Spotify DID built a hardware thing (Literally named, “The Spotify Car Thing”), reliant upon an API, and DID kill software support and thus, that product. They probably learned from the noise and smoke around this and just offered refund prices to purchasers who were then free to dispose of theme.

“But wait! i am mad h4x0r! I’ll port ${SOFTWARE} to it and it shall regain former glory!”

Knock yourself out. There was even a burst of such reverse engineering effort. The reality is that a 4-core, underpowered, undocumented, ARM SOC was underpowered in 2018 and would have been on part with a humble phone from about 2013, but with half the RAM. Talented people would rather work on a NEW $9 SBC that’s WAY more powerful and better documented.

The result was that a few of them surely landed in hacker junk drawers but it’s likely that most of them went to that same landfill.

Google did something very similar with Stadia and Q, also. They couldn’t keep the required services up, so they just refunded them wholescale. They actually released new firmware to make joysticks and headsets turn into plain old bluetooth versions of the same. Lots of people were grumpy, but it was hard to see it as a financial loss.

For every company that offers a refund, an apology that it didn’t work out, and a recommendation to recycle the product we have a dozen IOT and Smart Home product-makers that disappear in the night and silently DO brick the product. If it’s your thermostat, you probably notice. How long can your camera or security system be offline without you noticing? It happened to thousands of people. In some cases it’s wishful that the companies could have prolonged the device’s life *to some users* by providing enough source code and doc that hackers could, say, keep the security cams streaming to a local NAS, but even providing that may require resources a company facing an abrupt end may be unable to provide.

This week, Fisker Auto is in bankruptcy and liquidating. What happens to firmware that routes the vehicles, monitors safety, may need changes to comply with laws or safety changes when the server running ‘secretserver.fiskerauto.com’ gets recycled itself? (A Tesla would be WAY less awesome without Tesla’s servers being around for traffic and charging updates. Certainly FSD – such as it is – relies heavily on the presence of the company being interested in providing the service. We certainly have nightmares of a Fisker-like ending.) I thin I remember Peloton faced a similar fate.

Insteon took a lot of heat for this, bricking Every Product they ever shipped. A group of Insteon customers actually purchased the remaining rights and all the rights to domains and servers to bring the lights back on – literally! This is a very rare instance.

Amazon is pulling support for the business edition of their robots, leaving 10-month old $2400 robots useless. https://www.theverge.com/2024/7/3/24190410/amazon-astro-business-robot-discontinued-refunds That’s not very awesome. Amazon didn’t shutter; they could do better.

Its a relatively new era of physical devices outlasting the online services that they depend on. As a consumer, we all want our purchases to last as long as possible, so there’s some natural disappointment. Some companies handle this well and some aren’t.

KInd of by definition, if you’re reading anything I wrote, you’re probably one to take things apart, look for JTAG pins, look for schematics, and see if you can go on your own. That’s fine for a $35 camera whose demise merely made you sad.  But what’s your expectation when you have a lot invested in that product or company, such as this guy that had invested thousands in Insteon products?

Companies able to issue full refunds are pretty clearly hard to get too mad at, but that clearly doesn’t work in every case, such as products sold through retails or like Peloton, where the company just folds.

I don’t have great suggestions either. It’s messy and it’s a fairly new problem for us in technology. I think the lifetime subscription for our 1st Gen Tivo was about my earliest encounter with a device that actually relied on an online presence.

What are your expectations?

One of the nifty things about common open source software is that lots of people write down answers to their own problems. (Like I’m doing now.) The problem is that commands change, software gets moved between packages, etc. and it’s realy hard to figure out what actually applies to anything. PHP seems particularly problem to this problem because it’s old (in web-years) and because it’s popular in the blogging community, so it gets blogged about a lot. (Like I’m doing now.)

I have Pi 3 running DietPi for some internal IoT/network monitoring duty. DietPi works really well on a headless configuration, so I’ve used it forever.

I also have an internal page that has some super-simple PHP that cranks off CURL calls to build connections to other pages. After finally finding where logs are THIS month (/var/log/apache2/error.log), I was getting:

PHP Fatal error: Uncaught Error: Call to undefined function curl_init() in /var/www/index.php:81\nStack trace:\n#0 /var/www/index.php(101): checkOnline()\n#1 /var/www/index.php(194): checkAlive()\n#2 {main}\n thrown in /var/www/index.php on line 81

php.ini definitely had extension=curl (not libcurl, not libcurl.so, not php8.3-curl, not the hundred other suggestions) and it was definitely being recognized because I could change the spelling of it and get different errors in the log. But whas I wasn’t getting was … curl.

Fast-forward about three hours.

dietpi@DietPi:~$ sudo apt-get install php8.3-curl php 8.2-curl
(and I notice now that it even fixed my typo. What a rare act of kindness from Linux CLI administration. 🙂

This is a total act of desparation because I can absolutely see the curl.so object in the /usr/lib/php/20220829/ extension_dir that’s helpfully provided by phpinfo(). But let’s try….

Reading package lists… Done
Building dependency tree… Done
Reading state information… Done
Note, selecting ‘php8.2-curl’ for regex ‘8.2-curl’
Note, selecting ‘php8.2-curl-dbgsym’ for regex ‘8.2-curl’
php8.3-curl is already the newest version (8.3.3-1+0~20240216.17+debian11~1.gbp87e37b).
php8.3-curl set to manually installed.
The following NEW packages will be installed:
php php8.2-curl php8.2-curl-dbgsym php8.3
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 223 kB of archives.
After this operation, 370 kB of additional disk space will be used.
Get:1 https://packages.sury.org/php bullseye/main armhf php8.3 all 8.3.3-1+0~20240216.17+debian11~1.gbp87e37b [27.6 kB]
Get:2 https://packages.sury.org/php bullseye/main armhf php all 2:8.3+94+0~20240205.51+debian11~1.gbp6faa2e [7,404 B]
Get:3 https://packages.sury.org/php bullseye/main armhf php8.2-curl armhf 8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3 [31.3 kB]
Get:4 https://packages.sury.org/php bullseye/main armhf php8.2-curl-dbgsym armhf 8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3 [157 kB]
Fetched 223 kB in 2s (146 kB/s)
Selecting previously unselected package php8.3.
(Reading database … 65559 files and directories currently installed.)
Preparing to unpack …/php8.3_8.3.3-1+0~20240216.17+debian11~1.gbp87e37b_all.deb …
Unpacking php8.3 (8.3.3-1+0~20240216.17+debian11~1.gbp87e37b) …
Selecting previously unselected package php.
Preparing to unpack …/php_2%3a8.3+94+0~20240205.51+debian11~1.gbp6faa2e_all.deb …
Unpacking php (2:8.3+94+0~20240205.51+debian11~1.gbp6faa2e) …
Selecting previously unselected package php8.2-curl.
Preparing to unpack …/php8.2-curl_8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3_armhf.deb …
Unpacking php8.2-curl (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …
Selecting previously unselected package php8.2-curl-dbgsym.
Preparing to unpack …/php8.2-curl-dbgsym_8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3_armhf.deb …
Unpacking php8.2-curl-dbgsym (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …
Setting up php8.2-curl (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …

Creating config file /etc/php/8.2/mods-available/curl.ini with new version
Setting up php8.3 (8.3.3-1+0~20240216.17+debian11~1.gbp87e37b) …
Setting up php (2:8.3+94+0~20240205.51+debian11~1.gbp6faa2e) …
Setting up php8.2-curl-dbgsym (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …
Processing triggers for libapache2-mod-php8.2 (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …
Processing triggers for php8.2-cli (8.2.16-1+0~20240216.40+debian11~1.gbp6cbea3) …
dietpi@DietPi:~$ sudo systemctl restart apache2

Viola! Sucess.

35+ years of administering UNIX-y systems and the darned things can still stump me some times. In this case, it’s actually distracting just how much documentation there is for queries like “dietpi Call to undefined function curl_init()” but the problem is the answers are for distros that no longer exist and PHP versions ranging from 7 to 5 to even 4 and, of course, the names of everything changing alont the way, the preferred web server changing from Nginx back to Apache, etc.

So here’s my attempt to litter the web with yet more search results, but hopefully this helps the next “me” that’s trying to get an updated system working again at 3am.

This blog has always leaned heavily into RISC-V, but this year has been a bit of a low in RISC-V for me. That’s probably worth an article of its own, but since I was just asked what I’ve been using lately, I have to admit my time has been spent on a family that’s not RISC-V: ESP32.

“But wait, there are ESP32 RISC-V parts!”

Yes. And ESP-IDF is pretty awesome to work with, but so far Espressif hasn’t announced a dual-core RISC-V part with radios and they haven’t shipped their dual-core part without radios. I don’t have high expectations for ESP32-P4 (see also: 2023’s low-lites for this class of parts)  but I’m loving the part they announced at the same time they announced their first part, the ESP32-C3 – the (Tensillica, not RISC-V) ESP32-S3. Since I was recently asked what I was using for my own hobbyist projects his year and why, let me catpure that here:

I’m really liking that ESP32-S3-N16R8 lately.

  • They’re dirt cheap at around $5USD at Aliexpress.
  • 16MB Flash. 8MB RAM. is “enough” for everything I do. I don’t buy specialized boards; I stock a bunch of these.
  • one USB-C cable gives me power, debug, serial console AND JTAG.
  • A second USB-C cable gives the latter two that don’t reset when the CPU is reset. :-/
  • Lots of GPIOs, but rules can be dumb. More in a moment.
  • Fastest device that Espressif makes. Yas, P4 will ship someday…but have no wifi.
  • Floating point AND a vector unit that’s usable in AI or compute-heavy things like speech recognition. More compute in AES and other dedicated blocks.
  • Unlike previous ESP32s, PSRAM is fast and actually works well.
  • Specialized peripherals that are useful for things other than intended purposes. Using solder pads as buttons is cool. (OK, that’s actually intended…) but RMT to program LCDs is pretty cool.
  • Dedicated GPIO lets you program weird devices FAST in almost single opcode timings. Almost. Far from many hundreds of clock cycles per write while arbiters swing across busses like in previous.
  • USB peripheral (don’t spend on CH340 or mess with jumper posts and cables) on board so flash can be a “disk drive” or device can be a mouse or a keyboard or a MIDI controller or whatever yout want it to be. (It’s USB 1.1, so it’s not a FAST disk drive, but it’s a convenient one.)
  • USB host – you can plug a keyboard or a mouse or a midi into IT.
  • A ton of GPIO (45), though a bunch of them are taken for USB, PSRAM, bootstrapping, etc. It can be hard to figure out what’s safe. [Atomic14](https://github.com/atomic14/esp32-s3-pinouts) is helping with that. There is a big ole pin multiplexor that lets you reassign a bunch of pins to a bunch of other pins, but not quite randomly.
  • Cheap!

I think ESP32-S3-N16R8 is similar to parts like www.vcc-gnd.com’s YD or Espressif’s own, but it’s cheap.

Downsides? There seem to be only a few road hazards that trip people up.

  •  “Safe” pins are hard to figure out. Really. See above.
  • Onboard WS2812B isn’t actualy hooked to GPIO48 (letting you use it…) until you solder blob that tiny “RGB” pad.
  • 5V isn’t provided to USB-C peripherals (“OTG” mode) until you put a solder blob on the other side of the board to connect the VCC rails of the USB sockets.
  • Confusion between ESP32-S3’s own USB/Serial bridge on the USB-C connector on the left and the one on the right that goes to the USB serial bridge from WCH … that definitely requires a shady looking driver for reliable communications on MacOS.

Why are these boards so interesting?

  • There have been a bunch of C906 boards this year, but none have radios and none at this price point.
  • There have been a few JH-7110 boards this year. They’re far from $5 and simply aimed at a totally different market than these.

If Espressif offered a RISC-V part with the S3 feature and peripheral set, that would be awesome, but that’s just not a point in the price/performance/feature curves that anyone has tried to hit so far. I’d love to see it, though. 

Should we talk more about any of the above?

What are YOU building with? For this discusion, I’m asking about $5 WITH WIFI kinds of products, so not a NAS or an edge firewall or something, but rather holiday lights or an internet radio. What’s your IoT GOTO device right now?

Beagle-V welcomes you to 2012.

I wanted to like the new Beagle-V. It was the developer edition of Beagle-V (that was canceled) that was my introduction to “big” RV64G systems and I loved working on that board. It was canceled abruptly during development and Beagle wandered around in the weeds for a few years, but they finally announced a new RISC-V Beagle-V that they intend to actually ship.

Since it’s basically another TH-1520 reference design, there’s not a huge amount to say about it, so I’d planned to say nothing. (I just can’t get excited about the Th-1520 and other C906/C910 derivatives.)

Then I looked closely at their picture.

USB 3.1 Gen 1 Cable sighting ... in 2023

They’re actually using a USB 3.1 Gen 1 Micro-B SuperSpeed Cable (… “Pro Plus Dominator 2000 ‘on a steeeck'” edition)  in 2023 almost ten years after USB-C became mainstream.

I can think of only three explanations.

1) Someone in purchasing found a “deal” on thousands of these much hated  and largely forgotten about cables.
2) Someone has an surplus inventory of cables from that hated connector to USB-C so it can actually connect to your machine and is hoping to liquidate them.
3) Common drug use in the workplace.

This connector was hated and almost immediately retired because it was the worst of all worlds. It had the flimsiness of Micro-B (the break-away design was actually a feature, not a bug – it was meant to sacrifice the cable instead of your $1000 phone if pulled on at an angle). It could be used with either a normal Micro B (with all the problems that connector had, including random power capabilities) or Micro B super speed, which was a very expensive and bulky connector, ensuring you couldn’t plug anything next to it. You never REALLY knew if all the pairs in the cable worked, so you’d get a speed that was either Super or not. Seagate used them on a generation or two of MyBook class drives and some laptop monitors used them because they needed the USB 3.1 bandwidth and USB-C hadn’t arrived yet in volume. There’s no mechanical latching, so you never know if it’s fully seated. As they were only popular for a few months, the odds of having spares are rare. The SuperSpeed branding was so weak you never really knew if that was a 5 or 10Gbps connection, though they’d sometimes degrade to 6 or 8 Gbps if a pair was sensed to fail.

I wouldn’t be surprised if these cables and accessories are out of manufacture. It’s not like you’re going to get a new pod for your protocol analyzer or extension cord or other nicities.

That connector is the USB equivalent of the CCS connector for charging EVs: they had another connector that they wanted to be compatible with, but they needed extra pins/wires, so they added a sidecar connector onto the connector.

CCS connector
CCS Connector. Human hand for scale.

Some will say it’s a bit silly to get worked up about the cable when you will power it from the barrel jack (which might be 5521 or 5525, each slightly incompatible, and in a variety of voltages or current instead of the perfectly lovely USB-C Power Delivery, which would get your computer AND power connection in a single cable.) or will never rely on a computer connection at all because WiFi and copper ethernet are provided, but details matter.

If this were an Amazon review, incompatibility with common, standard, relevant connectors is an immediate two stars off in my Amazon reviews.

Still, Beagleboard is a reputable company known for community-building. Lots of people prefer working with them over the companies that just lay down schematics on fiberglass and throw it over a wall as is done by many competitors that compete primarily on price. Certainly, the few months I spent working on the original Beagle-V were very pleasant for exactly that reason.  I wish them luck with this board.

…but don’t do it again, team, mmmkay?

Many of us have been pretty disappointed in the long lead time it takes to get chips from specification into production.  For RISC-V devotees, this was brought into clearest focus this year where November of 2021 brought us ratified specification for Vector Computing 1.0, in particular, but we’ve mostly developed via emulated cores in software or FPGAs  or through chips like Allwinner’s D1 family of parts which paired a single core with a pre-release version of the Vector spec that was already over a year old when the device shipped. Lucky for us, we may see history repeating in the one year part of that with first Vector 1.0 silicon coming late this calendar year, so likely November or December.

Many of us had hopes that StarFive, with their close ties to IP vendor SiFive, and their collective “dry-run” experience with shipping many hundreds of chips through BeagleV, Starlight, VisionFive which in the upcoming JH-7110 iteration DOES bring around 3D graphics, and four 1.5Ghz cores along with a comfortable (2-8GB) headroom of RAM. The Kickstarter from StarFive was successful with over 2,000 units and that’s easily one of the most anxiously awaited parts of 2022, with Pine64’s fast-following board adding PCIe graphics/other expansion slot,

The new part has generated less buzz, because while it has been known for a few months, it was under press embargo until now,  It comes from Shenzhen’s Bouffalo Lab which is relatively unknown outside of RISC-V developer circles. They’re very much a Chinese company and their Western presence can be pretty tricky to find a pulse for, but they have a family of developer tools with (mostly) enough English documentation, tools, and support. While they have really inexpensive I/O chips, their chips will be mostly known by readers of this page as being the brains of Pine64’s Pinecone and reduced pin count Pine Nut. In broad strokes, those BL602 and BL604 chips are comparable to the ESP32-C3, with a SiFive E24 core and a basket of I/O, including Bluetooth and WiFi. Cousins BL702 and 706 add more GPIO, may trade WiFi for Zigbee in certain models, and have cost/performance models that make it possible to emulate an FTDI in software, suitable for a $3.59 JTAG board ir drive full size panel displays while feeding WiFi services, GPIO monitoring, and such. They’re very flexible parts.

The zinger here is that for BL808, their newest chip (expected “soon”) we leave behind the SiFive cores and go with the cores that were open sourced by Alibaba’s chip division, T-Head about last year. Bouffalo was able to pair T-Head’s experience in high-speed cores with their own experience in fabbing high-volume/high-volume parts, and fuse in value like the new Vector 1.0 specifiction. Now that we have ~18 months or more of experience in simulating and building software for those parts via LLVM and, less so, GCC, that seems like a great partnership.

The coarse-level datasheet is almost self-deprecating. “Take four marginally related compute nodes and attach everything to everything” look:

Bouffalo did what they did best, and Sipeed is on deck to do for this chip what they did to the (then) ground-breaking GD32VF103 (zillions of <$10 RISC-V boards without cables and a very usable SDK) or the K210 – which they morphed into a dozen form factors and married an early Rocket design with a numeric computation unit made FL acceleration/AI  accessible to the < $20USD developer in many packages. So what makes BL808 a good date to bring to the computing ball of 202x? 

Integration. The likes of Sipeed, Pine64, and others will mount the board to a variety of backing form factors so people wanting access to these can just use them without having to wire-wrap them or hire a high speed digital logic team to take all the high speed timing craziness.

Tool stability. RISC-V is probably the first real ocean of silicon tech that’s had the software team delivering on high before the hardware team could make wafers. RISC-V is simulated, the tools are validated, and these tools are all available at the risk/scale/price point you want to pick.

ZZZZZZ TODO: Insert 3-wide frame of chip cut-ways and QR’s here.

There are already hundreds of pages of documentation available online. It’s probably not the best place, but it’s the first place I’ve seen that’s publicized in a way that doesn’t look like like a leak. :–)

Of course, the chips themselves have RealTimeCounters, 20-channel Direct Memory Access Controllers (as we do) , USB2,  JTAG, SPI, four UARTs and all those other creature comforts that we essentially expect to see in our $10 chips these days. (Pricing hasn’t been announced…)  This part has so many processing/IO cores that it’s actually hard to distinguish them.

“The wireless subsystem includes a RISC-V 32-bit high-performance CPU, integrated Wi-Fi /BT/Zigbee wireless…”
“The multimedia subsystem includes a RISC-V 64-bit ultra-high-performance CPU and integrates video processing modules such as DVP/CSI/ H264/NPU, which can be widely used in various AI fields such as video surveillance/smart speakers….”
“NPU (numeric processing unit) HW NN (hardware neural networking) co-processor (BLAI-100 – Bouffalo Logic Artificial Intellligence) generally used for AI applications
Of course, there’s also a low-power 32-bit RISC-V unit to babyset THOSE four compute modules, because it’s 2020 and why the hell not!!!

You literally end up with M0 having “32-bit RISC-V CPU with a 5-stage pipeline structure, supports RISC-V 32/16-bit mixed instruction set, contains 64 external interrupt sources, and 4 bits can be used to configure interrupt priority.”
D0 has “a 64-bit RISC-V CPU with a 5-stage pipeline structure, supports the RISC-V RV64IMAFCV instruction architec- ture, contains 67 external interrupt sources, and 3 bits can be used to configure the interrupt priority.”

As a software engineer, your job as a shepherd is to keep all the computing power your customers have being asked to pay for busy, but not overloaded. Don’t awaken a 64-bit core with an FPU fi you can service your immediate need (maybe it’s a temperature sensore recognizing something is hell-bound)  can be handled by a mostly 16-bit, integer-only RISC-V part. Of course, lighting up the numeric inference cores brings on a very different source of power and performance tradeoffs.

Of course, the chip has the mandatory boat of timers, PWMs, ethernet (10-100Mbps only)  and more. It really is quite ridiculous what a couple of dollars and 88 pins will buy in modern time. It’s an added bonus that these parts are expected to be available with less than a 104-week lead time. 🙂

These look like very cool chips and I look forward to seeing board from the likes o Sipeed, and maybe Pine64 or BeagleV very soon. I haven’t seem formal pricing yet, but I expect to see full boards for less than comparable D1 boards, but to have the added benefits of standard compliance (ahem, those page table bits and jumping the gun on V without pushing it into the reserved opcode space…) over the Allwinner parts. These should be priced way under the JH-7110’s, but have the edge of NPU’s (particularly when pairdd with Sipeed’s new MaixWHATISTHATCALLED?LOOKITUPROBERT) library that makes NPU/Tensor-style programming pretty easy..

Programmers, what tools do you need to see to takme these boards?
Hardware types, what playgrounds can you build for the programmers to fill?

Eventually: cc to lupyuen, caesar, bouffalo team, others for comments…

The much anticipated products from Sipeed, The M1S Dock and M0 Sense are now being delivered to customers. Mine arrived in the U.S on December 20, to my surprise as the tracking number never fired on USPS Informed Delivery and Fedex did not announce the delivery. These were purchased boards and are not prerelease.

M1S Dock

M1S Dock is a board with the Bouffalo BL808 Processor. It features three RISC-V cores: one 480Mhz 64-bit -T-head D906 variant that’s similar to the one in Allwinner’s D1 (including the outdated 0.7.1 vector unit, alas), one 320Mhz T-Head 32-bit E907 for coprocessing, and one low-power 150 Mhz T-Head RV32EMC core for super low power use, such as keyword recognition to awaken the others on demand. As a bonus, it contains NPU BLAI-100 (Bouffalo Lab AI engine) for video/audio detection/recognition.

The M1S Dock starts at $10.80 for the board with headers and ranges to $24 with camera, LCD, and case.

The device supports:

  • 2.4 GHz 802.11 b/g/n Wi-Fi  4
  • Bluetooth 5.x dual mode (classic + BLE)
  • IEEE 802.15.4 for Zigbee
  • 10/100M Ethernet through add-on board

There is 64MB of RAM and a “real” MMU with RV32, so while you’re not going to run your favorite Fedora workstation-class configuration on it, a ‘normal’ embedded Linux kernel and supporting utilities is quite practical.

Optional peripherals from Sipeed, pictured below, include the display, a debug board (which features yet another RISC-V part, the BL706, to bit-bang the debug protocol (which appears to NOT be JTAG), a camera, and a hard plastic case.

Image of M1SDock and M0Sense
M1SDock and M0Sense

Assembling the case is best described as painful. While it looks like a flexible silicone case, it’s not. It’s a hard plastic with a rubbery texture. The screen has to be removed from the double-stick tape holding it to the board, have the screen passed through the hole, have the screen fastened to the board, and then the board threaded into the case. Since the double-sided tape for the screen has a small area, I’m not expecting to be able to remove and re-insert the screen very many times.  If I’d known what a pain it was, I wouild have certainly soldered down the provided .100 posts before mounting it.

Image of Back of M1s Dock
Back of M1s Dock
Image of Front of assembled Sipeed M1s Dock
Front of assembled Sipeed M1s Dock

 

Sipeed has done well providing documentation for the M1S Dock, including pinouts, a full SDK (with Bouffalo Labs) , AI Model and Framework, and a handy drag & drop approach to burning firmware. and many M1S Dock demos.

M0 Sense

Also delivered are the M0Sense boards. These are a lovable little alternative to nRF52480-class hardware. The featured processor is the BL702 at 144Mhz. Twelve of the sixteen pins are available I/Os and the board comes with Bluetooth, including BLE. The SiFive core is attached to 132K of ram and 512K of flash. The board provides an IMU and a USB Full-speed (12Mbps) interface. Computationally they may not take the dual-cores (and PIO) of the RP2040 products, but these are great alternatives in the RISC-V world that offer easy programming and plenty of powerful I/O.

The board starts at $4.50 USD. Adding the .96 screen makes it $5.99.

Sipeed has done well providing documentation for the M0Sense, including pinouts, a full SDK (with Bouffalo Labs) , AI Model and Framework, and a handy drag & drop approach to burning firmware. and many M0 Sense demos.

Summary

Between these boards, you have a very low-end sensor board with ML abilities for $4 that includes I2C, SPI, and all the normal things to connect to your own sensors AND a relatively high-end MCU with a dedicated ML coprocessor. With M1S Dock being a cousin to Pine64’s OX64, we’re sure to see a ton of software development around them. They’ve taken the sharp edges of Bouffalo’s unpleasant boot loader by providing a drag-and-drop capable boot loader. The BL808’s available RAM, performance, and price really makes it difficult to lean into the Kendryte K210 class of boards as we enter 2023.

I really look forward to exploring these boards in coming weeks and months. What do you plan to do with them?

Issues with USB-C powered development boards

Below – Hall of Shame:

The symptom: dead boards

More than once in my development time, I’ve been an early receiver of a development board that powers via USB-C. Invariably, the board is so new that there is no documentation provided with the board and little to nothing already existing on the web about it – after all, helping create some of that documentation is probably why I have the board. There may be no source code, no schematics, and no doc. I’m even pretty liberal in accepting early development documentation in Chinese – Google Translate is pretty amazing on technical material and Google Lens can help crack the case of all-too-common case of Chinese text baked of  images. (This is bad for accessibility, such as screen readers or even visual assists that may beed to expand text and be able to do a reflow…but that’s a different rant.)

For first power-up, there’s not really an expectation of any code being flashed into whatever kind of flash memory is available.  With SMT LEDs bragging about being .8 or even .65mm, it’s far from given that I’ll recognize an LED on the board at all just by visual inspection and even at that scale, silk-screens don’t much work, so the parts aren’t likely labeled. Similarly, there are often not recognizable chips on the boards that set expectations of how it should act if successfully connected. Sure, if there’s an FTDI 232H or a CH340, we can know to look for a USB serial device enumerating on the host PCI bus. However, microcontrollers like the ESP32-C3 or BL706 integrate USB right on the chip and may or may not implement USB CDC protocol, While that’s awesome for development (hooray, it’s a disk drive and I can just copy firmware to it!) it means you can’t depend on the visual cue of a Finder window popping open when the board is recognized.

This is a lot of words to provide background to the lede I’ve already buried in the title. The short version is that it’s not totally unexpected to connect a board and have exactly no visual confirmation the board is running and no recognizable signs of life from the computer.

In addition to all the above possible causes (no LEDs, no code flashed, required external boot knocking sequence required, no boot device present, device permanently in reset because it’s a jumper not a button (yes, really) there’s another that’s far more frustrating: the board developer did not read and understand the USB-C specification.

The cause: a poor understanding of USB-C

USB-C is more than USB 3.1 in a flippy connector, though that is undeniably nifty. It fundamentally changes how power is transferred over the wire because it allows bidirectional charging as well as bidirectional signaling (formerly “USB On The Go”) but because there’s way power potentially involved, particularly because it’s the first time more than 5v may be present on the power rails.  Failure to handle USB-C’s power requirements correctly can result in 24VDC being sent to your 25 year old floppy drive that was built into a 5V-only world and that’s bad.

For older USB, you could always count on at least 100mA of 5V on the power rails. As a practical matter, you could count on 500mA from most devices even without explicitly enumerating on the bus or even 900mA starting with USB 3.0. For lots of tiny development boards, that’s all plentiful.

On USB-C, there are two new pins on the bus named CC1 and CC2. If you have a device that may either provide or receive power (your laptop or your phone can charge your earbuds, but they can be charged over the same plug) then you need a Real USB controller chip  like an STM32 or a MAX77958 on the bus and you need a real EE that can read and understand the relevant specs in order to implement your Dual Role Port as that’s a more complicated case than a Sink Port (a load) or a Source Port.

A compliant USB-C Power Delivery Source Port (e.g. a high-quality USB-C charger or a laptop with actual USB-C jacks)  will monitor the CC1 and CC2 pins for voltages, nominally provided by a pair of resistors forming a voltage divider. The presence of a pair of 5.1k resistors (at a cost of a fraction of a penny in quantity to tie each of CC1 and CC2 to ground ) tells the power supply to deliver 5V at 3A.

If your device has a USB-C jack and does NOT provide those two resistors, the USB Power Source is under no obligation to provide power to your device. Your device will be unpowered and, most likely, will not work.

The full USB-C and related specs run into thousands of pages and this is probably just caused by misunderstanding that USB-C is just like its predecessors. Fortunately, there are good descriptions like this primer from ST on implementing USB-C power

“But it works on my PC”

If you have a USB-A to USB-C cable, it is known there can be no bidirectional charging so the resistors are present in the cable.

“Can I save an eighth of a cent and use one 5.1K resistor instead of two?”

No. Raspberry Pi foundation learned this very publicly and frustrated thousands of customers in the Raspberry Pi 4 defective USB implementation.

“But it almost works if I do it…”

It works except when it dosn’t. It may fail on e-marked cables (very common amongst uses of high-quality, high-power gear) but actual experts can explain why you need two individual 5.1k resistors on USB-C devices. Googler Benson Leung made a substantial name for himself in the early days of USB-C popularity by buying a large variety of USB-C cables and devices, finding that spec compliance was farcical, and working with Amazon to improve quality of devices in the marketplace.

“But that’s an old problem. Nobody would build a board without those resistors today.”

Pi 4 was just a high-profile early victim. Boards with this problem are still rolling out.  

Allwinner Nezha

Sipeed engineering confirmed that the Allwinner Nezha RISC-V development board does not pulldown CC1 and CC2.  That port is theoretically bidirectional and should thus use an actual USB PD controller chip because it will not boot from a USB-C power source.

Bouffalo Labs BL706 AVB

Dev Kit for Bouffalo Lab BL706 Audio Video Board
The Bouffalo Labs BL706 Audio Video Board does not provide those resistors, presumably for similar reasons. The board will not boot from USB-C.

WCH CH32V307 EVT 

CH32V307V-EVT-R1 RISC-V development board
The WCH CH32V307 EVT board provides empty soldering pads on the back of the board at R9 and R10 for you to add your own to the debug port . Without them, the CH32V307EVT will not boot from a USB-C power source. That design choice is a bit strange because while similar pads are provided for the full speed and high speed jacks (which could be hosts or device) the WCH-Link port looks like it can be only a device 

I have a board like that! I’m not an EE What can I do?

As ridiculous as this sounds, the lowest cost, easiest way to work around this is to use a USB-C to USB-A adapter (a hub will work, too, but will be more expensive if you’re shopping) and then a USB-A to USB-C cable. The result is to have a cable with USB-C on both ends, but a mail and female USB-A in the middle. That will add the ressistors in question and all three of the boards above will successfully boot from a USB-C Power Delivery power supply OR directly from the port on a MacBook Pro.

Share your war tales

Do you have a board like this? Share below to help get the word out that it’s not 2014 and partially implementing USB-C is Not OK.

Though it was just announced last week, people are talking about Bouffalo Labs’ BL808 like it’s a Symmetric Multi Processing (SMP) system. (This is the chip used in Pine64’s SBC called Ox64.) I just don’t see that happening. The opcodes for 32 and 64-bit encodings of RISC-V are quite similar, which is why so much code to run on both is the same except for those #defines for SW/SD and LW/LD you see in all the programs meant to run on both. The attached program snippet shows a trivial example of a needed change to preserve sign extension.

It was a known and conscious decision that the RV32 and RV64 RISC-V opcodes are encoded differently and are NOT compatible. This was a known difference from systems like x86 where 8086->Xeon source and binaries all have reasonable(ish) source and binary compatibility. Even proposals to address this before there was an installed base were dismissed. See quotes like “For embedded systems, it’s hard to see why running RV32 binaries on RV64 systems is compelling.”, yet BL808 is a compelling case that really blurs the lines between an MCU and a CPU.

I’m not sure (yet) how address space in BL808 will work, but it’s likely that there will be a way to compile/link RV32 and RV64 objects or executables together for the upload case and have the primary processor point the secondary processor(s) to the other segments using different encodings. It’s likely that RV32 and RV64 address spaces and text segments will remain relatively isolated with a yarn fence between them[1], and assigned to different tasks with different stacks and “process spaces” even if they’re not processes in the UNIX sense.

I just don’t think that the equivalent of do_runrun() or run_queue() that picks the next task off the scheduler and finds the next task is going to be deciding whether to run any given task on the primary or a secondary core. The cores, beyond their obvious capability and clock speed differences, just plain aren’t compatible enough for that.

I suspect we’ll think of this system more like M1 with dedicated coprocessors. You’ll likely spin up a coprocessor that does, say, MPEG encoding and communicates with the Big Computer via DMA or shared memory queues or something. It’s even possible that the big and little cores may run the “same” operating system, say Nuttx, built in different ways and communicating via message queues or fifos or other established IPC mechanisms.

May you live in interesting times, indeed!

[1] A weak enforcement.

➜ blisp git:(master) ✗ cat x.s
main:
li a0, 0x1234
ret
➜ blisp git:(master) ✗ riscv64-unknown-elf-gcc -mabi=ilp32 -march=rv32g -c -s x.s && riscv64-unknown-elf-objdump --disassemble x.o
x.o: file format elf32-littleriscv
Disassembly of section .text:
00000000 <main>:
0: 00001537 lui a0,0x1
4: 23450513 addi a0,a0,564 # 1234 <main+0x1234>
8: 00008067 ret
➜ blisp git:(master) ✗ riscv64-unknown-elf-gcc -mabi=lp64 -march=rv64g -c -s x.s && riscv64-unknown-elf-objdump --disassemble x.o
x.o: file format elf64-littleriscv
Disassembly of section .text:

0000000000000000 <main>:
0: 00001537 lui a0,0x1
4: 2345051b addiw a0,a0,564 # Note THIS OPCODE IS DIFFERENT!
8: 00008067 ret

I don’t have enough LilyGO products in my lab; I should probably have more. They seem to make clever products aimed at the developer/hobbyist market (that’s me!) but it seems that they get undercut by features on on one product, then are months late to market on the next, It seems they manage to remain on my radar while escaping a place on my bench. I learned of T-LilyGO, which is an improved version of what’s best known as a Speed Longnan Nano (GD32V) just a few weeks after I bought a bucket of Nanos.

LilyGO’s latest product, the T-PicoC3, manages to pull a unique development twist. The product marriage is “obvious”, though I haven’t seen it done. I’m writing about this because of one obscure feature. First, let’s explore roots of what makes it awesome.

The two chips that make T-Pico3 great

T-Pico3 is about $15USD shipped to the US. It manages to use not one, but TWO of the season’s most deservedly buzz-worthy MCUs. Both the ESP32-C3, a RISC-V part with really great pin density and development SDK support, and the Pico 2040 – as well as an onboard antenna, an external IPEX connector, a 7789 display controller w/ 1.14″ LCD, tons of GPIO, and way more in the familiar dev board size. Barrels of (virtual) ink have been written on whether the ESP32-C3’s SiFive-backed RISC-V core or the RP2040’s dual ARM Cortex M0’s are “better”. The answer, of course, is “it depends” and I’m not taking sides in this article. But what if you’re a student wanting to learn both ARM and RISC-V and you don’t want to choose between your peanut butter and chocolate, but just want one yummy(!) product that mixes them both?

T-PicoC3_en.jpg

So on one tiny PCB, they deliver the dual-core+specialized bit-banging capacity of RP-2040 (which has been used to bitbang DVI (!) and countless light chasers based on WS2812, which is a protocol with odd timing requirements) with the ESP32-C3 seemingly left to handle a full TCP-IP stack, Bluetooth, compression, security, and other such tasks while “additionally” being a quite capable 160Mhz RISC-V core of its own.

Putting “the two great tastes that taste great together” sounds like a good idea (let one chip specialize in networking, two RP2040 PIO bit cores blink out a CAN bus or 2812 (or other) LED blinkies) and hold it all down with some MicroPython on either (or both) of the Cortex M0 cores. Whether you’re a student that really just wants to backpack a single board for both ARM and RISC-V programming or you’re building a robotics or IoT thing, it’s just easy to imagine these going well together. You can make crazy combinations of dedicated interrupt controllers, GPIO controllers, interrupt domains, etc. MIx them up as you see fit.

…because I’m writing while hungry.

Debugging your creation

Normally, for ease of debugging, I nod to the ESP32-C3. Inside the part are dedicated cores that run a FTDI-like parallel controller that can be used for JTAG debugging AND a USB communications class, so your device’s serial console can appear on the same plug you’re powering the unit from. For someone that values mobility while debugging, it’s awesome. So those are the obvious connections for the USB lines.

But how do you debug the RP2040?

I’m normally a big fan of USB-C. Beyond the speed, the power – both in literal volts and amps and in the capacity of devices you can attach – I dig that they can be flipped end for end (no “host” and “target” end) and that they can be flipped from top to bottom, making them impossible to plug in upside down. The receptacles are symmetric. Compliant USB-C cables are only kind symmetrical.  It’s lesser known that the tops and the bottoms of the plugs are actually not fully asymmetrical.  The controllers actually engage in strategic lies that pull off the flippability trick. These same strategic lies are what allows the cables to smuggle additional kinds of data, such as Thunderbolt signals or GPIO and JTAG pins in the case of the breakout board for Pine64‘s Pinecil. Even with these mistruths, it’s common practice to uphold the guidelines for the cable to work either way.

Do you hate your users?

For a hybrid board like we’ve described above where we’re trying to attach two quite different SoC’s to the host, the “obvious” thing to do would be to add something like CH340 to give USB powers to the RP2040 console. Then you add a USB hub and tie both chips behind the hub, allowing all three devices to appear to the host. A more sophsticated design might tie the 2040 instead to a serial from the ESP32-C, but then you lose USB-master mode for the 2040. A circuit-layout Jenga master may have been able to find a USB-C pinout that let one device ride shotgun on the bus of another along the approach of the Pinecil’s exposed JTAG lines but I think that has compromise if you’re pretending to be a host instead of a target. Instead, we’re left to imagine this conversation happening within LilyGO’s engineering:

“What if we built an interface that worked completely differently if plugged upside down?”

“Why would you build a Cursed USB-C device? Do you hate your users?”

“What if we made it blink different colors, but unreliably?”

“OK.”

So our imagined engineers dutifully run off and after a little USB Selective Disobedience, successfully deliver power and ground – safely – in either orientation. (That’s the red and green in the below diagram and that’s intentionally made infallible.) With one mating between the cable and the board, the ESP32 has ownership of D+ and D- so the JTAG and serial ports associated with the RISC-V side of the house are presented to the host. You can use esptool to program and manage that device or program it via JTAG. In the other orientation, RP2040 gets the port and it’s either a USB mass storage device awaiting a .uf2 boot file to chomp on or a connection from the Thonny Python IDE.

Imagine the top (“blue led”/RP2040) being attached to the top D+/D= pair on the right and the bottom (“Red LED”/ESP32-C3) being attached to the bottom D-/D+ pair.

USB Type-C connector Pinout

Great. Now we’ve created a product that works exactly like a ‘normal’ user would never expect it to work, but, given the target audience, this is probably OK. Only it leads to hilarious disclaimers like this:

When connecting, the onboard LED lamp will be indicated according to the connected chip (due to cable problems, it is possible that the indicator light is opposite to the actual connected chip, or even two LED lights at the same time, please replace another cable when two led lights up at the same time)

 

The moral(s) of this story

Morel Mushroom In Leaves Close-up   No, no, no. Those are morels. They’re different.

  1. LilyGo makes some pretty cool stuff. Their products aren’t necessarily destined for “Raspberry Pi” levels of creativity and ubiquitousness, but they have some nifty and fun mixups that can save a budding EE (or a struggling SWE) from rolling their own designs. Many of their products are straight-forward mashups of existing low-cost circuits, but on one convenient board. Not everything needs to be complicated to be useful.
  2. Sometimes, there just are not points awarded for style. It’s easy to imagine a $15 product that may spend a semester or two inside a students backpack or a one-off that’s inside your IoT robot that needs both WiFi and finely controlled WS2812 “lasers” – or, heck, real lasers cutting into or measuring something. These may be programmed less than a dozen times before they’re retired (the first case) or sent into duty and unlikely to be connected to a PC ever again (the second case) As long as the users are in on the ‘joke’ that a cable that should never need to be flipped sometimes needs to be flipped, maybe that’s OK. Making that connection twice as reliable but requiring twice as many cables, a hub, defining the interaction of all these devices potentially controlling  the bus at the same time, etc. is money you’ll never get back.
  3. It’s absolutely not, however, OK to do this to an end-user, mass produced product unless you do, in fact, hate your users. (Hint: they will retaliate somehow….) Consumers want standard things to work in standard ways.

History

The Kendryte K210 seems to have been one of the early success stories for RISC-V, if not in mainstream computing, certainly in maker mindshare. The 64-bit device had two cores, enough support peripherals to be useful for your robotics project, enough AI to recognize faces or do image detection and following for your self-driving robot project, and ran a chopped-down Linux if you really needed it, though this was all pretty precarious in 8MB of core. Obviously, the successor device should address these and bring up some 2020 level specs from the 2018-ish design we saw with K210. That device even had a name leaked or rumored: “Kendryte K510.”

I can find rumors and predictions for K510 as far back as 2019. Canaan (another name for Kendryte, as best I can tell) themselves talked about K510 in December of 2019:

Zhang said that the new generation of K510 chip has been greatly optimized in algorithm and architecture. Compared with the first-generation chip, the K510’s computing power will increase by 5-10 times, and it will be developed for 5G scenarios.

Finally, almost nine months ago, K510 was formally announced, but no reference designs or availability was given, so it stayed in my “hype” folder. While I still haven’t seen hardware shipping, we now have some faith that hardware is now purchasable.

Enter the new developer’s reference board

AnalogLamb is offering the DEV-AI0002, a K510 Dual RISC-V64 Core AI Board with Dual Camera and LCD. 

 

It looks like a substantial board, offering dual-core RISC-V64 CPU with frequency up to 800 MHz. They claim 3 TeraFLOPS is possible. (Editor’s note: the K510 doc repeatedly says “800Mhz”, but that’ll be hard to do with a 5 stage, in order CPU like this…)   If true, that would put the device on par with the fastest GPUs from 2009, a large Xeon from 2015, or a beefy gaming machine from 2020. That said, this power isn’t coming from a 3D GPU; only a 2D GPU is cited for this board.

Beyond the power of the SoC itself, the reference boards add 512MB LPDDR3@1600MHz, a Camera Board with two camera sensors and Base Board. There are services for an LCD display, 1000M ethernet RJ45, HDMI, USB, TF Card, GPIO, UART and Audio Interface. CRB adds:

  • K510 integrate the dual-core RISC-V64 CPU and DSP up to 800MHz
  • Up to 3 TFLOPS AI, Ultra low-power wake-up VAD
  • Input high-definition triple camera, MIPI CSI/DVP interface;
  • Output: 4Video Layer + 3 OSD Layer;
  • High-quality H264 video encoding, 2 channels 1080P@60;
  • 2D image accelerator: zoom, crop, rotate, OSD overlay.
  • Camera Sensor Board with two sensors
  • 512MB LPDDR3@1600MHz
  • 1000M Ethernet RJ45 Interface and Wireless Module
  • HDMI and a LCD Display
  • USB OTG and USB Type-C Power Supply
  • USB to UART for Debug
  • TF Card Interface and GPIOs

It’s following the model of D1 and Raspberry Pi Compute Model in using a main board to carry the SoC and a larger board to bring out I/O connectors like TF (“TransFlash” is the term for uncertified SD Cards) sockets, USB 2.0, GPIO, Gig Eth, HDMI, and such. The unspoken theory is that decoupling these allows smaller (pronounced “cheaper”) carrier boards and replacing the CPU modules with newer ones as they come to market. It’s the future we were promised with Pentium-II “cartridges”. The K510 CRB Hardware Guide is one of the few in the initial doc release that’s Google Translate handle well to convert to English. The acronym isn’t known to me (yet – comments welcome!) but I’m assuming it’s “Customer Reference Board”. In some fantasy land, the K1020

40-pin GPIO connector – with a twist

Though the 40 pin connector may make you think of a Raspberry Pi-like expansion bus, but the pinouts are incompatible. It seems that a direct link to section 3.15 doesn’t work, so I’ll just repeat it here:

Figure 3-18 40P pin header expansion interface Table 3-4 Expansion interface definition

Numbering definition Numbering definition
1 VDD_1V8 2 GND
3 VDD_1V8 4 GND
5 VDD_3V3 6 GND
7 VDD_3V3 8 GND
9 VDD_5V 10 GND
11 VDD_5V 12 GPIO_1V8_95
13 GPIO_3V3_114 14 GPIO_3V3_115
15 GPIO_1V8_92 16 GPIO_1V8_96
17 GPIO_1V8_105 18 GPIO_1V8_107
19 GPIO_1V8_104 20 GPIO_1V8_106
twenty one GPIO_1V8_118 twenty two GPIO_1V8_119
twenty three GPIO_1V8_93 twenty four GPIO_1V8_94
25 GPIO_3V3_125 26 GPIO_3V3_124
27 GPIO_3V3_127 28 GPIO_3V3_126
29 GND 30 GND

(I kept twenty one through twenty-four as words because that’s how Google Translate presents them to English readers. Is there some significance to this in the original Chinese?)

While there are a few 3.3Volt lines, a majority of them are 1.8V. While this board doesn’t really seem to target IoT hobbyist style projects, this will provide a challenge for anyone that DOES want to attach their favorite Adafruit or Sparkfun gizmoid of the Pi or Arduino-class products that are almost universally 3.3v these days. There are few 3.3V lines available on this connector, so they might run out quickly.  If you’re connecting a 3.3V device to a 1.8v host, you’ll need to brush up on the details of level shifting or find a component that’s better suited.  Most 3.3v devices will read the maximum high of a 1.8V signal as a “low”, meaning it would be unable to recognize any change in voltage reliably. A $200 board really isn’t meant for running robotics servos and air sensors. Save those projects for a Dr. Who HiFive (RISC-V, of course!) Inventor Kit

Andestar V5, the primary core (two, actually) of the K510

The processor itself takes a big step up from the RISC-V Rocket design that was used at the heart of the K210. The tech docs show that they’re using an Andestar V5 design from Andes Technology, but clearly updated from the Andestar V5 they announced in September of 2019.  Of particular note, we see the Vector (presumably 1.0) support which was only ratified in December of 2021. That’s pretty exciting. There’s a collection of doc that we can hope will grow and we hope that “zh” grows sibling directories of English versions. (You’re free to hope for your own favorite languages, too – I’m just being selfish. 🙂 ) Google Translate handles a few docs OK, but the majority of them Translate will handle only a few lines at a time.

The chip is rich in I/O. Seven i2c and three SPI ports are generous, but I’d be careful with that voltage level peering issue. A 2D GPU will help most desktop applications once appropriate drivers are refined. All the RAM seems to be on the SoC itself, so don’t count on user upgrades. In the processor block below, we see three blocks of processing and a mailbox unit to let them pass messages (like interrupts) between them. The two RV64G+ class units should be familiar to readers here. The Kendryte Processing Unit in the K210 was the Tensor-style processing unit so we’ll refer to K210 KPU FAQ. Let’s hand-wave the details of that for now, but that lets us know what the KPU can basically do.

Kendryte passes the doc hot potato back to Andes for some chip-level documentation. This is fine, since they would be the experts on some aspects, but it can be a bit of paper chase not knowing exactly which revisions of the doc corresponds to the cores in these chips. We’ve already discussed that “Andestar V5” isn’t exactly a tight version number scheme as it apparently covers at least some range of parts from 2018 to 2021. But we’ll work through what I can.

I’m inferring that the AndeStar V5 Instruction Extension Specification is in play right through version 1.4, the most recent there. As 1.3 added the not-quite-final Vector extensions and 1.4 added vector for bfloat16, both of which are listed as features of K510, we seem quite up to date. We get 73 pages of (English) doc covering features on the chip that are in the extended feature sets. Andes is new to me, so it’s worth a moment for me – and hopefully, the reader – to make a quick romp on what extensions above the common RISC-V opcode set allows. I won’t go into great detail as just knowing these are a thing and that they have the possibility to improve your code – BUT making your code not work on other branded architectures – is enough.

Extensions beyond stock RISC-V

The Andestar V5 ISA, as used in K510, isn’t targeting embedded or super low cost devices.  These may be deployed as part of a fleet, in managed racks, or as workstation class devices and if using compiler magic to get magic opcodes results means that you need one fewer rows of compute-bound number crunchers in a data center, that’s probably OK. So what have they done?

Start with the basics, but extend the extensions.  The Andestar V5m ISA is a superset of RV-IMAC. Some of the things that were vendor extensions (Vector wasn’t ratified until December of the second 2020) are now part of the official RISC-V extensions. Depending on the age of the doc we’re looking at, this can get a bit confusing, but I’m guessing that a smart decoder can perform compatibility with a customer base that was exclusively theirs and users of the newly ratified parts. One example of managed change is in the handling of “half floats”. We’ve long (sorry) had support for (32bit) floats and (64bit) doubles often in graphics work and machine learning, 32 bits is overkill. If you CAN use 16bit floats, you can effectively double the size of your caches, halve the number of data transfers, and handle more data inside a Vector operation. It looks like they’ve added half-floats to all the places that make sense.

Bit ops. Branch on a bit being set or clear. Match can be in opcode. Sign-extend a bitfield.

Address Scaling. It’s a somewhat frequent complaint (esp. from developers coming from ARM or x86) that address scaling has to be done by the programmer.  The examples given by that former ARM engineer are compelling.  Assembly programmers know it’s a bit of a pain to burn an extra temp register just to keep constantly multiplying (or tallying) the index by the size of the structure you’re traversing. Andes adds addressing modes to compute familiar LEA operations like “lea.d t3, t1, t2” which is “t3 = t1 * t2*8”. I think I recall Alibaba/T-Head adding similar extensions to their C904 and C910 designs.

Various performance enhancements like “find first byte” will help many algorithms. There are also opcodes for loading a number of words into consecutive registers and converting common data types to and from the 16-bit floats.

Tooling support from Andes

Fortunately (?) Andes maintains their own fork of Andes RISC-V GCC with their own GDB and binutils in order to support optimizer and debugging the chip extensions above. It’s not clear why they are keeping their own entire versions instead of mainstreaming them. I rarely see @andestech.com listed in the ChangeLogs or in the mailing lists of those tools. 

Andes provides these tools in their Andesight Eclipse IDEf for Windows and Linux users. They provide binaries of their Andes Development Kit which is probably possible to build for MacOS as the source is there. The Andes Github repos are a bit of a circular resolution mess and it can be challenging to find current, maintained sources for each of the pieces. Hopefully a mainstream component in a high volume, open market will help drive some consolidation and more code sharing in this area.

Kendryte partnering with an experienced RISC-V core maker should eliminate a lot of the birthing pains we experienced with K210. The RISC-V standards are more developed and plentiful AndesTech RISC-V documentation (in English) and having Linux kernels, boot managers, and drivers already in place should be awesome. Deep in the docs, we learn they used the Andes AX25MP as a base and wrapped up the features of the Andestar 5 ISA as:

  • RISC-V RV64I base integer instruction set
  • RISC-V RVC standard extension for compressed instructions
  • RISC-V RVM standard extension for integer multiplication and division
  • Optional RISC-V RVA standard extension for atomic instruction
  • Optional RISC-V “F” and “D” standard extensions for single/double-precision floating-point
  • Optional AndeStar DSP extension
  • Andes Performance extension
  • Andes CoDense extension

and Andestar extensions as:

  • StackSafe hardware stack protection extension
  • PowerBrake simple power/performance scaling extension
  • Custom performance counter events(My read is that these are “optional” features beyond RISC-V ratified sets that they have opted into when building K510.)

Kendryte themselves have already published much in the Kendryte Github repo. Buildroot, Berkeley Boot Loader BBL and Proxy Kernel pk, and a Docker image to compile K510 are already there in addition to the K510 docs. Prominently, the 575 pages of the K510 Technical Reference Manual will provide us register maps, descriptions, and electrical traits of the chip itself.  (It’s stamped ‘confidential’ all over it. /shrug)

Back to the K510 CRB features

The K510 CRB ships with 4GB of bootable eMMC that can be loaded with your favorite OS, or your OS can be kept on a TF card for easy loading from another computer. The 128MB of NAND flash can store the boot loader and small amounts of storage, like a $HOME or configuration files. (In some places, it declares 16GB of eMMC and others call out 4. We’ll know once we see boards!) 

The documentation is conflicting on the number and type of onboard LEDs. A WS2812 “Neopixel” is present and visible in the photos. Another LED of some type (Power?) may or may not be present.

Two switches allow booting from UART, SD, NAND, or eMMC. On other chips of similar capacity, we’ve seen SD and eMMC flashed with images that allow yet more boot sources, such as netboot via tftpboot or USB-attached storage.

The USB OTG socket seems to be of the old Mini-B variety and not contemporary USB-C, though the UART console appears to be USB-C. (Remember that USB-C is the connector and it IS legal to pair it with USB 2.0 signaling, as they’ve done here.)  That interface is provided via a common CH340 USB/Serial adapter on the board.

An AP6212 can be seen on the sheets.  That seems a bit of a dated choice, even for a 2.4Ghz-only product. 802.11 b/g/n tops out at 70Mbps and it’s Bluetooth 4.0. That’s fast enough for moderate network use and a pair of headphones, but seems like another choice to target this in compute lab or rack style environments – indeed, in environments like those the radios would be largely unused in favor of the provided copper ethernet jack.

There are plenty of video choices.  You can drive a 1080p TFT display or HDMI, but not both at the same time. It’s a standard HDMI socket. MIPI video input is provided and a 30pint FPC connector provides LCD panel video output. The encoder claims to do H.264 Baseline Main/High Profile with 8Kx8K JPEG and a maximum support of 1080p/60fps. It is not a 3D accelerator.

Wrapup

This board looks like an interesting compliment to the VisionFive by StarFive and the Allwinner Nezha. Perhaps it can follow the precedent for Nezha and lead with a deluxe developer kit and later offer smaller docking boards, perhaps even using the same CRB, that are a lower cost but offer little more than a power and ethernet cable or other combinations as demanded.

Will you be ordering one? What are your plans with it?

Personally, my VisionFive just arrived, so it’s already in my review queue. Exciting times for RISC-V!

 

In a recent blog post, Espressif announced the ESP32-C2.  The Twitter thread from John Lee revealed an interesting twist; more on that in a moment. ESP32-C2 is a WiFi4 + BLE5.0 device with a single RISC-V core and 272MB of memory. It uses the familiar Espressif tools like ESP-IDF and frameworks such as ESP-Jumpstart and ESP-RainMaker.  It has on-chip ROM to reduce the need for common routines in flash.

Espressif is proud of the cost and radio performance of this device. Reducing power consumption was also a goal, which should help deliver this part into more IoT class projects.

ESP32-C2  is a low-cost WiFi chip supporting the Matter standard.

“Matter” is a royalty-free home automation connectivity standard, introduced late in 2019. Matter aims to reduce fragmentation across different vendors, and achieve interoperability among smart home devices and Internet of things and is backed by Amazon, Apple, Google, and other big names in that space. In the soon-to-be-released Matter 1st release, it supports WiFi, Thread, and Ethernet protocols.

With WiFi being so pervasive, devices like this supporting both WiFi and Matter Thread will be important for many years.

Doc for ESP32-C2 is available now. Chips are just starting to sample, with no availability date yet given.

But now, back to the scoop…

John Lee is a Senior Customer Support Representative at Espressif. He runs the @espressif Twitter account and is a good read. He originally posted the above announcement. I knew that Espressif’s last few chips (ESP32-C3, ESP32-C6) had been RISC-V, but I also knew they had a long run with CPU cores from Tensillica. One of the great tricks that Espressif pulled off with ESP32-C3 was treating replacing the CPU core as such a minor point that it barely was mentioned in the marketing doc and hardly even reflected in the chip’s name. I asked John for a clarification: “Are all C and H series going to be RISC-V?”. John quickly answered “Yes. In fact all of the subsequent chips are RISC V.” One of the things we gave up between ESP32 and ESP32-C3 was going down from two Tensillica cores to a single RISC-V core, so the natural question of multiple cores quickly followed.  John put that to rest with “Expecting to go up to 4 one of these days.” Hairs continued to be split and he confirmed that meant “”All” as on actually all next generation Espressif chips across all product lines will be RISC-V instead of Tensilica Xtensa? Not just the Cx series”.

That’s actually pretty big news on its own for RISC-V. Espressif is committing that all next generation products will be RISC-V and that at least some of them will be as large as four cores. Since Espressif has long been a leader of SoCs and Modules that package the SoCs with antennas, oscillators, and such into a single (usually certification-approved) package for both hobbyists and in the commercial space, this should result in a huge number of RISC-V cores hitting the market, even though they’re somewhat invisible as the user “just” wants to open a radio connection and not necessarily program the radio themselves.

Thank you for that scoop, John!

 

 

 

About this article: This preliminary attack on Buttons on BL60x with Nuttx can be thought of as an article that’s part of Lup’s Book on BL606 generally and his notes on Nuttx on BL60x specifically. As I was the one that made this experiment, I documented it for the rest of you. As a spoiler, the experiment failed, but we learned important lessons along the way and THOSE lessons are worth sharing more than the actual resulting button work.

Electrical switches, or in their more passing form, buttons, are as simple as it gets electrically. A button is like a piece of wire: it’s connected or it is not. It closes the circuit or it doesn’t. Mechanically, switches can take many forms like normally open (the wire is missing until it’s physically operated) or normally closed (pressing it removes the connection).

On the PineDio Stack, we have one push button that is connected to our BL604 SoC.. The push button is next to the internal LEDs and is connected internally to GPIO12.

Schematic of GPIO_12 on PineDio

From the schematic, we see that GPIO12 is connected via a 4.7k resistor to the power rail. When open, the naturally resting position of this button, GPIO is left to float high because it’s wired to VCC via the R48 pullup resistor. This provides enough resistance to deliver voltage to prevent that pin from floating and is enough resistance that when we close pushbutton, driving GPIO12 to ground, we don’t risk the steadiness of our power source by shorting it even temporarily to ground.

PineDio Stack Bootstrap schematic

There is actually a second switch available in PineDio stack, but a bit subtle – in fact, by default, it’s missing! The GPIO8 pin that we jumper on boot is actually a form of a button. Whether by a button or a jumper, it can be connected to either the voltage source or the ground, Natively, that jumper/switch is read exactly once during bootup so the flash firmware can decide whether to run the flash reader or to run your code. As this tale isn’t about GPIO8 – indeed, using GPIO8 in your own designs would be questionable as closing that switch during power-on would result in your product “not booting” to the untrained eye – we shall ignore the GPIO8 pseudo-switch.

From the view of the BL604, our button on GPIO12 is an input and it is upon us to (somehow) configure it as such. We’ll take responsibility for that in a minute. We either read the +3.3V in the normal case or we read the 0V of ground when the button is pressed.

Each GPIO (16 on the BL602 and 23 on the BL604) can be configured as:

• Floating input
• Pull-up input
• Pull down input

• Pull-up interrupt input
• Pull-down interrupt input • Floating interrupt input
• Pull-up output
• Pull-down output

Our hardware designer here has helpfully provided us with external pull-ups to +3.3V, so we’ll configure it first as floating input and just read the button by polling it. This is OK if you’re accessing the button frequently or it’s a major component of your application’s life cycle. For example, the joystick buttons on PacMan are pretty much always being pressed in one direction and the game is doing little if it’s not, so it’s OK to dedicate the CPU to checking the buttons. A more typical application, which we’ll attempt later, lets the CPU receive an interrupt when the button status changes. For a stopwatch button or a screen menu change, that is a much more typical use as it frees your program execution from polling the button all the time.

Elsewhere in the schematic, we also see that the GPIO_12 pin can be used as an output to control the vibrator. We’ve since learned that option isn’t actually populated on the devices in our hands, so we’ll largely ignore the output options on GPIO_12.

Our BL602_BL604_RM_1.2_en Reference Manual has many dozens of pages dedicated to explaining how the GPIO pins work in great detail. While it’s perhaps helpful to know all the details (could the board designer have saved the cost of the pullup resistor if “Pull-up input” mode were known?) we will instead rely not only upon the GPIO functions of Nuttx, we will rely on the “Button” specializations.

In general speaking, there are two ways for a CPU to notice a change on a signal: it can generate an interrupt or it can poll that signal. For super precise timing or when the CPU has nothing else to do, polling is often preferred. For thing like a pushbutton that change quite infrequently, a processor interrupt is usually a designer’s choice.

The Nuttx Apps project provides an example Buttons app in apps/examples/buttons/, which is quite rich in features, but it can also be a bit overwhelming. We’ll instead create a smaller case more specialized for our hardware.

We set out to create a Nuttx application (not a driver) to learn about the button state. As such, we’d interface with the buttons through special files in /dev instead of using BL602-specific functions.

First, we confirm that we have Nuttx building and runnable on our hardware. Our /dev entry contains generic GPIO, but we need to specialize it.

ls /dev
/dev:
console
gpio0
gpio1
gpio2
i2c0
lcd0
null
spi0
spitest0
timer0
urandom
zero

Because we’re several episodes deep into these tutorials, we’ll touch on the steps, but not the details to wire up a new example. The recipe is very much the same as in the other chapters of the BL602 book.

$ cd apps/examples
$ mkdir button_test
$ cp tinycbor_test/* button_test
[ do a bunch of mechanical edits to make a “new” program - we’re sharing that here, so you don’t have to repeated it. ]
KConfig, Makefile are nearly a search and replace.
Button_test_main.c starts empty, with only a main() returning 0.

Instead of hand-editing things, we turn ourselves into the build process for now.
$ kconfig-tweak –enable CONFIG_EXAMPLES_BUTTON_TEST
$ make olddefconfig
$ make -j20

Perform a flash update, upload the program, and restart the demo
On the device, confirm that we’ve successfully linked our new build. Notice the presence of button_test:
# Builtin Apps:
bas i2c sh
bl602_adc_test ikea_air_quality_sensor spi
button_test lorawan_test spi_test
[ … ]

Now let’s start configuring our hardware.

Because GPIO in BL60x is currently in a transitional state, we’re just going to brute-force ourselves into the first entries. So in ./boards/risc-v/bl602/bl602evb/include/board.h we’ll just temporarily take over that slot from PineDio Stack. This is clearly not great for interoperability, but it sidesteps a number of issues that MisterTechBlog is already working on .


kconfig-tweak --enable CONFIG_ARCH_BUTTONS
kconfig-tweak --enable CONFIG_ARCH_IRQBUTTONS

N.B. These are included in our provided defconfig for this board, but for reasons I don’t understand, we still have to manually set them here to be effective.

make oldconfig

Rebuild Nuttx and reflash it to the board as you have in the other articles to follow along.

The best-laid plans of mice and men often go awry

Our original plan was to interface with the switch in all three ways that Nuttx knows how to do this, but the wheels fell off that idea while we were building it. (Yes, we did have wheels while we were building it because Lup and and I were consulting with each other and tag-teaming development, each working on different aspects.) If we did all this right, there would actually be nothing BL602-specific exposed in our test application and we’d have validated all our internal private handling. That latter bit was a success, in an awkward way – we validated that they didn’t work.

The three approaches are:

    1. Read the GPIO pin “raw”. Just open the device, read it, and report the status.
      Configure the GPIO interrupt facility to let main() in our application do something else – or nothing else, such as just being in a sleep().
    2. Configure the Nuttx GPIO interrupt infrastructure. Success ultimately relies on an upper half running in application space and a lower half running in kernel space to deliver this interruption of event flow to the application to hop out to registered function names and handle these events.
    3. Configure the Nuttx button infrastructure, configured via CONFIG_ARCH_IRQBUTTONS to deliver an asynchronous event into the application to interrupt the flow and tell it that a button close or open event has been made. This actually relies on the above internally to work.

For any of these to work, we have to tell Nuttx where our buttons are we do this in board.h with an entry like this:

    #define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_PULLUP | \
        GPIO_FUNC_SWGPIO | GPIO_PIN12)

Get to the code!

While the order in the provided sample program flows slightly differently than is described, it’s hopefully recognizable. (The code is structured as it was to reduce repetition when we presented this in three different approaches.)

There’s no magic in dump_buffer(). It’s fortified to protect a (human) debugger from printing control characters or lengthy buffers directly to the screen, but it’s quite simple:

static void dump_buffer(const int buf_size, const char* buf) {
    for (int i = 0; i < buf_size; i++) {
        printf("%02x(%c) ", buf[i], isalnum(buf[i]) ? buf[i] : '.');
     }
}

Raw GPIO reads is the simplest.


int fd = open(INPUT_DEV_NAME, O_RDONLY);
for (int pass = 0; pass < count; pass++) {
  char ibuf[20];
  printf("Pass %d of %d:", pass, count);
  int c = read(fd, ibuf, sizeof(ibuf) - 1);
  dump_buffer(c, ibuf);
  if (c > 0) {
    if (ibuf[0] == '0') {
        printf("- Pressed");
    }
  putchar('\n');
}
lseek(fd, 0L, SEEK_SET);
usleep(500000);
close(fd);

 

This simply checks if the GPIO pin is active, printing anything we get from the GPIO port in hex and in ASCII and adds “active” if so. By default, we check the button rather arbitrarily 20 times and we sleep half a second between passes. This provides a nice feedback loop allowing you to press and release the button a few times and see the screen change in response.

There are really only two lines that may be worthy of surprise. First, the data as we display in dump_buffer() and as we test in the zeroth byte of ibuf[] is not a binary 0 and 1 as you might expect. They are ASCII ‘0’ and ‘1’ (0x30 and 0x31) respectively. This might be a bit surprising to those experienced with device driver handling as you might expect a more raw 0 and 1 there. This is actually a peace offering to command-line users of the GPIO drivers; it’s simply convenient to be able to cat (or hexdump or read…) a port and see its status. It’s similarly convenient to be able to write to it via ‘echo 1 > /dev/whatever’ to blink an LED or start a motor or anything else that may be an output on this same driver. So the ASCII convention actually is convenient here.

The second potential sharp edge is that streaming reads of the GPIO node will not stream reads. You may expect to ‘cat /dev/gpioin0’ and see a stream of 1s until you press the button, at which point you’d see a stream of zeroes until you released the button. Adjust your expectation. Again, presumably for compatibility with command line tools that keep a short lifecycle of a device’s file descriptor, only the very first byte of that potential bytestream is ever valid. You could close and reopen the device to get back to the beginning, but that’s a bit costly as it increases the total number of potential system calls, the transitioning edge between OS application code and kernel mode. We thus use lseek() to just to back to the beginning and read it again.

This is all jolly well and very satisfying. We’ve hooked up a button logically to the operating system and we’re now able to read it and do something useful with it.

“And then, the murders began…”

Filled with confidence, I proceeded to code up the approach of using GPIO interrupts into user applications. Knowing that we needed to ultimately allow for device with way more than the single button on PineDio Stack, we thought about the configuration scheme. The existing scheme is a series of entries in board.h like this:

#define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_PULLUP | \
GPIO_FUNC_SWGPIO | GPIO_PIN12)

Initially, we ran into problems if the same pin were configured to be both an output and in input. On PindDio Stack, sharing the button with the vibe didn’t seem completely unreasonable. We could, perhaps, keep the port as an input most of the time and only change the direction when we knew we needed that GPIO line to be an interrupt. We’d lose button functionality while vibing, but that didn’t seem so bad. We put a TODO in the code and vowed to come back to that. Still, that killed most of a day to learn that lesson. (Spoiler: you just can’t do that on this chip. You HAVE to reverse the pin.)

We knew the dance between
#define BOARD_NGPIOIN 1 /* Amount of GPIO Input pins */
#define BOARD_NGPIOOUT 1 /* Amount of GPIO Output pins */
#define BOARD_NGPIOINT 1 /* Amount of GPIO Input w/ Interruption pins */

And

#define BOARD_GPIO_IN1 (GPIO_INPUT | GPIO_FLOAT | \
    GPIO_FUNC_SWGPIO | GPIO_PIN10)
#define BOARD_GPIO_OUT1 (GPIO_OUTPUT | GPIO_PULLUP | \
    GPIO_FUNC_SWGPIO | GPIO_PIN15)
#define BOARD_GPIO_INT1 (GPIO_INPUT | GPIO_FLOAT | \
    GPIO_FUNC_SWGPIO | GPIO_PIN19)

…and we knew those blocks were precarious. Keeping them in sync is awkward. We’d debugged those before and fixed several issues there. It certainly killed our demo app to not be able to have a pin readable as both an _IN1 and _INT1 device, but we thought we’d proceed and come back to it. Another TODO.

We talked about the potentially large numbers of buttons (even if multiplexed into a keyboard multiplexing layer, as is possible on the BL702/704/706) of this and we thought about the number of places in the BL602 code that were passing around bitmaps of the available pins in uint8_t’s. We fixed as many as we could, but that hung in our mind of needing consideration. Add a TODO.

We knew that several stars had to align in order to actually receive an interrupt on a pin at the hardware level. The interrupt source needs to be present, e.g. by pressing a button. The GPIO register itself has to have that port configured as an interrupt source. The GPIO global register has to unmask that interrupt. The CPU has to enable interrupts for the GPIO by setting the correct bit in BL602_IRQ_GPIO_INT0. The mask on the CPU core itself needs that interrupt enabled. Of course, an interrupt vector has to be present for the CPU core and successfully jumped through and that code then has to find an appropriate function registered at BL602 portability layer which is then responsible for calling the function registered in user layers. It just didn’t work.

We found that sometimes, replacing the portable interrupt or GPIO abstractions with the BL602-specific layers would sometimes help – and sometimes made them worse. It was definitely making the code less maintainable and simply doing unnatural things to the (otherwise sensible) abstraction models.

We started thinking through cases of pins being shared, such as in our vibe + button case and our interrupt + traditional read case. We also started having issues modeling hardware that was similar, but not quite the same and figuring out how that would map into shared apps that needed different configurations and thus, different board.h entries.

The TODOs kept piling up for code that was missing or just wrong. It was not pretty…and we weren’t getting particularly close to working code for what should have been a simple demo. The BL602 layer was, amongst many problems, just not compatible with the shared upper/lower split model that was needed for the final two approaches we sat out to write.

The good news is that there was light in the proverbial tunnel for us.

The current BL602 implementation used a very simple model of GPIO pins that was expecting a low count of input, output, and interrupt pins that were all independent and manually configured. We were clearly outgrowing that model. The other model offer in Nuttx was already on our radar as something we were going to have to implement soon-ish. Interrupt Expanders in Nuttx allow a 1:1 mapping between a device’s physical pin and its name in the /dev tree. They do away with the entries in config.h

While I was struggling with this code, Lup was coming off wrangling the SPIO driver for the display and working on the touch driver. Both of those were ALSO running into related issues in the BL602 port of Nuttx. Lup had already recognized that we were falling into the “sunken cost” development fallacy.

For example, we were each implementing hacks in the BL602 port (such as copying entire sections of code just to manipulate a single bit differently because the common code didn’t have access to the needed info to know the direction and type of the port) and the needed types were static and private.

This was our breaking point.

I had to take a few days away from the code for personal reasons and Lup reprioritized the next chapter in his book to be “Implement GPIO and Interrupt Expander” so we could get all three of these drivers (screen, touch, button) back on track with portable code being portable and possibly all working at the same time – something we couldn’t really do with the board.h model.

This article is both a bridge between some of the gaps in recent articles to explain the issues that necessitated the development of the GPIO Expander in Nuttx and to act as a placeholder until we can roll in a sensible button handler.

Thank you for reading this far and thank you for your patience while we sort this all out. Enhancing and fixing the bottom parts of the Nuttx BL02 part has been challenging and and distracting relative to the projects we’ve set out to undertake, but we hope you’ll find the results useful. We hope to provide enough encouragement and background for others to help in that journey and build upon it for both the public tree and in your own projects.

The Bouffalo BL602 family of parts is a very popular low-end RISC-V part. It has WiFi, Bluetooth, and a handful of GPIO parts with 1928K of RAM and 128K of ROMso it’s able to hold. The 192Mhz part with 276K of RAM and 128K of RAM is low cost (<$2 in bulk) making it popular for individuals with homemade prototypes or commercial use. Development boards like Pine64’s Pinecone and Pine nut series are easy ways to get FCC-certified radios in handy breadboard-ready packages.

But…

There’s always a “but”, isn’t there? 

The development process can be frustrating. There are several code uploaders that simply don’t work as expected, particularly on a MacOS environment. For this article, we’ll even fast-forward over that unpleasant fishing lessing and just give you a fish. Even Bouffalo’s own BLDevCube, if available for your OS, doesn’t get high marks. I’ve spent hours working with Bouffalo engineering and still don’t have it working.

Use https://github.com/spacemeowx2/blflash. Rust apparently doesn’t know how to set the bit rate above 230kbps (where POSIX ends) on MacOS, so you have to upload more slowly than our Linux peers. The command to use is cargo run flash /tmp/sdk_app_st7789.bin –baud-rate 230400 –initial-baud-rate 230400 –port /dev/tty.usbserial-1440. Poof. You now have an upload command that works, is scriptable, and is easy to recall from command line history.

The thing that’s harder to script is the amount of physical fiddling that’s required. You have to move a jumper on IO8 from L to H, press the reset to start the board’s native code downloader, then move the jumper back and press the reset again. It’s very easy to miss one of those steps while debugging a binary, so you end up looking at the source for version N, but running version N-1 on the device. As you can guess, it’s frustrating.

I’ve long had it in my mind that the jumper was a pin on the address bus and the CPU needed to be connected to one block to program it and another to run it. (That sounds wrong now that I’m typing that, but I have had hardware in my past that required this.) The schematic for the Pine64 board is dead simple as all the ‘magic’ is in the canned castellated board.

There is no magic to this pin. This pin is connected to GPIO8 on the BL602. The state of GPIO8 is polled exactly once in the bootup sequence. Though there are pullups via the jump to high or low, the pin floats in the ‘low’ state, which allows the device to boot to the flashed code by default. So if you just remove the jumper, the board runs the last code you squirted into it. That seems a nice default. How can we use this to our advantage?

What if we scavenged a momentary contact switch for this? PC power supplies haven’t had “real” power buttons in decades. There’s a momentary pushbutton that sends a request to the power supply to kindly turn off or on, based on the momentary push of a button that just usually happens to feel clicky. They usually just happen to have a header that’s on .100 posts, so they’ll just slide right on.

With this “hack” in place (“Number 143 will blow your mind!”) resetting the board can become a fluid motion that you can commit to muscle memory:

  1. Press and hold your newly attached button.
  2. Press the reset button.
  3. Release the reset button.
  4. Release your new ‘boot button’.
  5. Start your code download.
  6. Press the reset button to begin running your code

It’s probably possible to release the button too fast as several opcodes have to be executed to configure the processor and ultimately poll that button, but it’s my experience that as long as you treat it Press A-B Release B-A, it’ll come up executing the downloader every time. 

Also, to summarize the key parts of the BL engineering spec for the bootloader, it’s helpful to recognize the boot flow.

  1. Reset Vector -> chip setup -> check GPIO 8. Is it low? Jump to user code. Else, start bootloader.
  2. The bootloader will start spraying ‘.’ (period) character while it’s listening to the serial port. These are at 2,000,000 bps by default. This is unfortunate because it’s a high enough rate that many programs can’t listen at this spec. I’ve not counted them or put them on a scope, but I’d say there are 8-10 of these a second.
  3. Inside this same loop, it’s also listening to the serial port. If it receives a ‘U’ (0x55 – chosen to maximize bit toggles so it can sample widths) it will try to reset the serial bit rate to match that speed and the ‘.’ pattern will continue at the new rate. Depending on exactly when that ‘U’ is received, it may take a couple of these to sync up at another bit rate.
  4. Now that the initial bit rate has been agreed upon, a program like blflash can do “protocol stuff” (documented elsewhere) to send the download to the BL device.

If you’re running a program like CoolTerm to actually talk to the device, it’s useful to set it at the matching bitrate. You’ll know you’ve missed a step if  you see the streaming period characters because that means the device is listening to you typing, awaiting “protocol stuff” packets, instead of the upload program, like blflash. Starting and stopping (‘connecting’ and ‘disconnecting’ in CoolTerm) the application you may be viewing the serial port is another step to synchronize with this. It’s for this reason you should try to quickly get your app to a state where it can communicate via a screen or blinking lights just so you don’t have another step. That’s just an unfortunate reality of hardware makers giving us one port to use as both code uploader and as console.

Enjoy jumper-free life!

P.S. just attach it with one leg dangling free so you’re less likely to lose it.

Because of the difficulty downloading disk images from geographic distance or sites that may not translate well, this is a collection of important boot images for members of the D1 RISC-V processor. We have collected images and information from Fedora RISC-V Project and Debian RISC-V project  as well as some information by Sunxi.  The information is mostly targeting the Nezha class of boards, but may be useful for other boards based on the Allwinner D1or D1s/F133 chips, particularly those built by Sipeed.

There is a security measure enforced by our hosting service that everything has to be served as a zip file. Thus images that are already a compressed image (.zst, .gz, .img, etc.) are zipped again. Sorry.

Fedora

Debian

  • Debian riscv64/D1 0.6.1 image, RVBoards
  • Debian riscv64/D1 0.4.1 LXDE RVBoards
  • Debian riscv64/D1 0.4 LXDE image, RVBoards
  • Debian riscv64/D1 v0.3 MIPI card
  • Debian riscv64/D1 v0.3 HDMI card
  • Debian riscv64/D1 v0.2 LXDE image, RVBoards
  • Debian riscv64/D1 v0.2 console

 

 

 

 

 

For a while, SiFive and LLVM were both developing support for RISC-V Vector 1.0. LLVM is now the only one in active development.

Jim Wilson, a GNU developer of decades, works for SiFive and while he wasn’t the one doing the work, it seemed likely he knew who did, so this is authoritative. He recently said:

There is no actively maintained gcc rvv support, and no ongoing gcc rvv development. Current work is all in LLVM, and LLVM is recommended if you want rvv support. … SiFive abandoned the gcc rvv work and is doing only llvm rvv work now. The gcc rvv branch is badly out of date.

Reading deeper into the GCC development list, this is really just a form of tough love as work on GCC’s RISC-V vector (and auto-vectorization in general) has been talked down before in July, 2021: 

 It isn’t up to date with the evolving RVV ISA spec, it isn’t up to date with the evolving RVV intrinsics spec, there are ugly hacks in the vectorizing optimization passes required to make it work, there is no autovectorization support, it is missing basic optimizations like eliminating duplicate vsetvli instructions, etc. The current status is that it is only useful as a toy for demos. SiFive and a few other organizations are contributing to the LLVM vector support, but no one is contributing to the gcc vector support. Alibaba has expressed some interest in contributing recently but it isn’t clear how we will handle their patches yet. The current stuff was mostly done by SiFive, but SiFive is not currently interested in funding this work.

I’m less sure of rjiejie‘s credentials, but they may work for Alibaba/T-Head, the maker of the cores used by Allwinner in D1 and D1s. That may be the “we” in his comment:

We have also supported/maintained the RVV v1.0 feature, you could download prebuilt gcc toolchain from Alibaba website[1].

Registration for the site is required and Google Translate doesn’t handle it well, so I’m not sure, but that may be a path for someone really needing GCC with Vector 1.0. It’s not clear if that work handles 0.7.1 of Vector as was used in D1. The C910-based products also seem to support 0.7.1, so their 1.0 support must be for future chips.

LLVM is one of several projects that has struggled with the issues of handling multiple V versions in the same code, but their resolution wasn’t clear. Simulator QEMU “solved” them problem when adding 1.0 by dropping support for Vector 0.7.1.

SiFive is “only” one of many core vendors and that they can’t be expected to carry the development/maintenance/support for such things by themselves, but it’s surprising (to me) that they’ve halted development.

T-Head has binaries (and maybe source) for GCC that supports V1.0 and probably 0.7.1, though that may be in a branch as it’s pretty clearly a dead end now that V1.0 has been ratified. LLVM and SiFive were, at least at one time, partnering in LLVM development. LLVM seems to have an active plan and are shipping V1.0 support now.

For GCC’s status to be “useful as a toy for demos”, it’s probably a disservice to even have it in the default builds of GCC until someone is willing to fund the couple of person-years that Jim mentions to get it on track. At least bugreports to LLVM are likely to get traction as it’s actively developed.

For now, if you’re developing vector code on RISC-V, prepare to pair your toolchain with the chip/simulator you’re using. It’s likely to be finicky for a while.

I’ve been away from writing for a bit for personal reasons and I’ve missed talking much about many events in the RISC-V world this year. Here’s a jumble of thoughts from October of 2021.

Low Points: BeagleV Starlight canceled, Nezha/D1 launch issues

I was lucky to have tinkered with the prerelease BeagleV board (codenamed Starlight) that featured the StarFive
JH-7100 SoC. It was well documented, had an amazing technical group of active participants from both corporate and hobbyist backgrounds, all working together on merit, and good tooling. Antmicro’s ‘Renode‘ emulator made developing on these parts a breeze.

Unfortunately for the business, BeagleV/Starlight and StarFive were unable to reach a production agreement and BeagleV Starlight project was cancelled. I did some software and hardware work that went into the proverbial chipper, but I managed to learn and refine some skills along the way. I remain hopeful that the low-volume (Two core) JH-7100 and later (Quad core, embedded GPU, PCIe) JH-7110 will be delivered at a similar price point by the likes of Antmicro or Radxa, which has already missed their ship date. It’s all resulted in some thrash, but it’s possible that all the players (Beagle, Antmicro, Radxa, Starfive) dust off and ship RISC-V boards.

I have possession of a Nezha developer board. This is the official development board made by Allwinner as a vehicle for their D1 chip. By contrast to StarFive, documentation on this device and board is poor.  The maker of the chip and the board, Allwinner,
having a pretty poor record playing nicely with open source developers with license violations being common. When it was at a price point similar to BeagleV, it seemed an underdog as a single core device, but it did have the claim to fame of being the first shipping device of supporting the RISC-V Vector extension. I’ve been a member of a few different discord/slack/telegram groups for this device and they’ve all been dominated by people stuck at the starting line: just finding a maintained distro that doesn’t require a login in Chinese and a phone number in China is a common challenge.

Unfortunately for many developers, D1 supported only 0.7.1 of Vector, which has source and binary incompatibilities with the final 1.0 version of that extension which is currently (October 2021) in final stages of public review. This part also really requires Allwinner’s own use of GCC/Binutils to use these extensions well. Interestingly, the RISC-V part of this SoC comes from Alibaba’s XuanTie C906 line, which was itself recently open-sourced, though there have been serious issues trying to land Alibaba’s incompatible work in upstream projects like GCC and QEMU.

I’d love to be able to comment more on the actual development board, but can’t as it appears my board is apparently totally DOA. I hope to be able to write more about it soon.

This board gets the (somewhat deserved) criticism of being overpriced when compared to high-volume devices like Pi and the (awkward) criticism of  having a single 1.0Ghz core and relying on an old version of the Vector specification. As the final version still doesn’t exist and fab times just plain take a while to get from Verilog  to real silicon, we can be only so mad at the first chip to support even a pre-release V spec. We can be more upset that the chip requires violating the RISC-V specification on reserved bits in the paging machinery. All this does lead to an up-looking highlight to finish up this catch-up.

On the Horizon: Allwinner D1s/F133

This week, there’s been interest in a new revision of the D1. The Allwinner D1s (sometimes called the “F133” for reasons I haven’t yet grasped) is a cost-optimized version of the original D1. Where the D1 really seemed to ship only with their own development board, Nezha, the D1s seems to come out of the gate ready for the likes of SeedStudio and Mango Pi’s ~$10USD RISC-V board  or in low quantity to put on your own open-source boards, like Xassette

It’s a slightly confusing product, but some of that may just be translation/documentation issues.  It’s cost-reduced, and that filters through to the boards we’ve seen so far. There’s 64MB of RAM on board, but it’s sold as “Linux ready”.  The removal of HDMI signaling means no monitor and 64MB will require a very stripped down system. Cramming Linux into the 8MB on a K210 was (barely) possible, so this must be possible, even if cramped.  Still, for a single-purpose or educational environment, that’s probably OK. The Allwinner F133 overview avoids any comparison to D1, refers to itself as “video decoding platform”, and even avoids use of the phrase “RISC-V” completely.

It’s interesting that one of the most controversial RISC-V chips of 2021 managed to ship a second revision this year while we have so many that have just seemingly collapsed under their own weight or never found their legs beyond original announcements. (Blink twice if you’re alive, PicoRio!) 

As we approach the end of the year, we’ve had quite some changes in the RISC-V ecosystem. It’s likely that the product families that have most met or exceeded my expectations are the BL602/706 family and Espressif’s menagerie of ESP32-C3 and ESP32-C6.

What have been your biggest disappointments or surprises in RISC-Ville? 

 

I normally don’t do “scoops”, but as I write this, I can find no other pages on Google in English mentioning the Bouffalolabs BL562 and BL564 RISC-V chips. Even Bouffalab’s own page is pretty scant right now. (I’m writing this late on 2021-03-31 and no, this isn’t April fool. Maybe it is and I’ve fallen for it, but it seems terribly non-funny…) However, this seems like a very interesting contender in the low-power RISC-V processor market. It’s very likely a subset of the already-successful BL602/BL602, but without the 2.4Ghz radios that give it WiFi or Bluetooth.  This also means the parts of the chip that have the most contentious NDA requirements for certification are simply not there.

Comparing Boufallolab’s own overview sheets of the BL562/4 and BL602/4 really highlights that only the yellow block, the RF radios, are different. The pin counts are the same as BL602/4, with at 32 or 40 pin QFN packages. It’s very likely the same RISC-V core running at speeds up to 192Mhz and with 276KB of RAM and 128KB of flash ROM.

It seems likely they’re pin-compatible, but we don’t yet have specification sheets with that level of information that I can find.

BL602 is already a price leading choice for low-end designs, with single-piece pricing of about $1USD. It’s easy to imagine that bulk orders can reduce by that a third or more. We can probably look to the BL602 for real-world performance measurements. The clock speed over the 108Mhz Gigadevices GD32VF103 family has given it a hand up in my own measurement. (I don’t have formal numbers.) GD32V, probably the most natural device to compare these two, ships in QFN36, LQFP48, LQFP64, and LQFP100 packages, so it has more I/O, notably USB support, which is absent in BL562.

This entry is a bit of a surprise as the medium (“runs Linux”) and high end(“runs a graphical desktop”) developments in RISC-V have been much publicized, it’s important to remember that not everything is IoT or needs to be able to render Netflix at 4K. With a smaller size, lower pin count, we score another gain of modernization. While GD32V’s 32K (max – there are smaller ones) of memory can feel a bit cramped, the 276KB of RAM may feel downright luxurious in some designs.

BL562

General purpose RISC/V SoC

BL602/604

RISC-V core with 802.11 and Bluetooth

As a general-purpose RISC-V processor, this is sure to score some commercial design wins where pennies count and hobby interest, where good development tools matter. Bouffalo Labs, in cooperation with SiFive, have an established SDK that’s been picked up by Pine64  and SeedStudio.  There has been some jockeying lately at high-end hobbyist or media-player class devices, so it’s refreshing to see another player come back, wearing a slightly different costume, with a solid part in the dollar (or less?) market that’ll keep our rectangles blinking.

Assuming it’s the same RISC-V core (surely!) as Bl602, it’ll build on the established SDK provided by Boufallo and forked by Pine64 for their  PineCone and PineNut lines and by Sipeed for their BL602 product and DoIt for DT-BL10.

Epilogue

Is it a scoop? I don’t really care. I’m always astonished how quickly things get to the likes of CNX, Reddit’s/r/risc-v, and the Twitter buzz. There is, of course, the time dilation between tech in China and the Western World. I was clicking around on Boufallo’s site, trying to find information on yet another part, and fiddled with the URL when I landed on BL-562.

At least for some short time, I think I have a reasonable claim on “first” and maybe even “most comprehensive”. 🙂

We’re all familiar with the fable of the boiling frogs, unable to sense the change they’re (literally!) immersed in. Enthusiasts of RISC-V architecture may be encountering the same right now: late 2020 gave us a steady stream of new hardware announcements, but we may not have a great sense of us since the hardware isn’t always possible to order yet. Let’s review some of the upcoming products in this market, duly nothing that products can change or get canceled before they even ship.

We had two major new families of entries in the iOT category. Both use the RISC-V to drive WiFi and Bluetooth radio stacks. Bouffalo Lab’s BL602 is available in quantity now. Starting around $2.50 for a module with multiple development boards in the $5-$10 range (including Pine64’s Nutcracker for PineCone and the DoIt DT-BL10), this chip starts with a core from SiFive and has 802.11 b/g/n and Bluetooth 5. The upcoming BL-702 family adds Zigbee radios. There is enough compute resources (CPU, RAM, Timers, etc.) that you can build your own software right onto the radio chip via their multitasking OS and open development kits. You may recognize this as the basic model popularized by Espressif in their ESP8266 in recent years.

Espressif also embraced RISC-V with their upcoming ESP32-C3 family. It’s interesting that this chip doesn’t even get a distinct name at this point as Espressif apparently sees the CPU core as only a small part of the product. Still, by volume, the ESP32-C3 is likely to become an extremely popular choice.

Moving up a step computationally, we enter more traditional chips and single-board computers. Alibaba’s Xuantie 910 is widening into a family of chips. The C906 is being marketed for more entry level class, but still featuring a load of I/O, multiple cores, support for the still-not-ratified Vector extensions, and more. Press releases tend to mix up the 910 and the 906, but they both seem pretty hot.  In late January, anAndroid Open Source Port of C910 was demonstrated. Embedded specialists Sipeed have announced a C906 development board that’ll run Debian and that starts at $12.50. If Sipeed does for that what they’ve done for GD32V and K210, we should see lots of interesting SBC projects from them.

Sipeed teases C906 RISC-V board

Rios is bringing us a claimed competitor to the Raspberry Pi called the PicoRio. It’s coming inthree stages:

  • PicoRio 1.0 is a headless, four-core RV64GC that’s capable of running Linux at 500Mhz. It’s been used from 2020H2 to an expectation of beta in 2021H1.
  • PicoRio 2.0 adds Imagination’s PowerVR GE7800 XE series GPU, which may finally bring a GPU-capable RISC-V development board into casual hobbyist price points.
  • PicoRio 3.0 strives to bring the performance to be comparable to a tablet or desktop computer.

Another entry in the Pi-class of hardware, though not at Pi Price, is the Beagle V from the group that brought us the famed Beagle Bone. It uses two of SiFive’s U74 cores at 1Ghz includes 8GiB of LPDDR4 RAM, gigabit Ethernet, an 802.11n Wi-Fi + Bluetooth 4.2 chipset, and a dedicated hardware video transcoder supporting H.264 and H.265 at 4K and 60fps.The system also offers four USB 3.0 ports, a full-size HDMI out, 3.5mm conventional audio jack, and a 40-pin GPIO header. As a snack for those interested in AI applications, it also features  a Tensilica Vision VP6 DSP for machine-vision applications, a Neural Network Engine, and a single-core NVDLA (Nvidia Deep Learning Accelerator).

Core provider SiFive is bolting Freedom U740 cores to a min-ITX design in HiFive Unmatched. X16 PCIe expansion, 16GB of DDR4 RAM, NVME M.2 slot, Gigabit ethernet, and four cores at 1.4Ghz should make this a entry-level desktop-class system, including host-CPU class of building for native applications at full scale. For professional developers, the $665 entry ticket should be more appealing that the $999 for the board’s predecessor, Unleashed.

The PicoRio V1 and Unmatched have already slipped from Q4 into 2021.

Still, while we’re not bathing in fresh alternatives to the GD32V and K210, we have several alternatives on the proverbial launching pad and several options to bring excitement into lives and toolboxes of RISC-V aficionados.

What do you see coming up? What are you most anxious to work with?

 

It is not an exaggeration that the current wave of IoT devices owes a lot to the Espressif ESP8266  family of devices. That means a new member of this family is a big deal and it’s pretty exciting that the newest, the ESP32-C3, moves to a RISC-V core. We have a draft of the ESP32-C3 data sheetfor those ready to dig in.

ESP8266, Quick History

In 2014, The ESP8266 came to the scene, bundling a full WiFi package, including antenna, ROM, RAM, and a CPU into a package that integrated with Hayes modem-like command set for communicating with a host that could be as simple as an Arduino or less. Eventually, enough was learned about the core, a Tensilica  Xtensa Diamond Standard 106Micro running at 80 MHz, that hackers were able to run their own code on board and often eliminate the “host” processor completely, often for under $10 at that time and in decline since.

ESP32 was the 2016 successor, bringing in Bluetooth and more powerful integrated CPU. Available as a chip or a (FCC-tested) module that included antennas, the most common configuration was dual-core, allowing a less cramped balance of a developer’s own code with the integrated feeding of the radio stack. The Xtensa LX6 cpu core was still not widely loved by programmers with toolchain issues remaining common.

Esp32-C3: Now with more RISC-V

Early in November 2020, we first got hints of a RISC-V design, the Bouffalo Labs BL602 family, making an attack on that market of low pin count, high integration devices striking a blow at the ESP32 price point of about $5. Late in November, we now have confirmation that (awkwardly named) ESP32-C3 is being released by Espressif as the newest member of their family, though details are only slowly coming out of China, as they do.

ESP32-C3 will be pin-compatible with the large ESP8266 family. It includes a 160Mhz 32-bit RISC-V core toreplace the Tensilica CPU. As you’d expect in 2020, b/g/n WiFi and Bluetooth Low-Energy (BLE) are table stakes. ESP32-C3 brings 400 kB of SRAM and 384 kB ROM.  

We don’t yet know what RISC-V core they are using (SiFive, Nuclei, etc.) or if they’ve created their own.  As this is likely to be a relatively humble RV32IMAC (or less!) design, we’d expect high degrees of compatibility with the wide variety of RISC-V tools that we already have. We don’t know if the trend of binary blobs (a problem being tackled by Pine64) will remain, but it’s likely they will given the regulatory landmine around radios.

With access to the wealth of dev tools, socket compatibility with ESP8266, and Espressif’s embrace of the maker communities, this device is sure to be a hit. Unfortunately, it’s a little too early for a stocking stuffer this year, but it’s one of a series of parts that’ll make RISC-V fun to follow in 2021.

The GD32VF103 RISC-V System-on-chip from Gigadevices fit an amazing price to performance rate. Their 108Mhz speed, on-board RAM, and low cost (parts around $1.30USD with boards like Longnan Nano commonly under $5) make them a favorite of hobbyists.

There’s a nuance buried in the specification of these parts that allows for faster setting and clearing of the GPIO registers than I’ve seen in any of the example code for these. This approach makes no difference if you’re just toggling a “power on” LED or other low frequency signal, but in a multitasking operating system or a high performance application, there is an easy optimization. 

Common practice

We’ll use the Longnan Nano board just to have a tangible example to talk about. GPIO pin 2 is found in the GPIOA register bank. This pin is connected to a blue LED on the board. It’s wired “backward” from the obvious meaning; you turn the bit off to make the light turn on. This means we often see code like this:

if (on) {
            ((GPIO*) GPIOA)->output_control &= ~( LED_BLUE );
} else {
            ((GPIO*) GPIOA)->output_control |= ( LED_BLUE );
}

This is a pretty common idiom in low-level code: we read the output_control register, mask off the blue bit, and store it or we read the output control register, logically or in the blue bit, and we store it. While we can do better if we use dedicated functions to differentiate off and on or if we can rely on inlining and constant propagation, as a matter of perspective, it takes GCC about 44 bytes to implement this.

Hazards lie ahead!

This code also has problems in a multitasking or preemptive environment. What if something ELSE is modifying any other bit in the GPIO A outputs? Maybe the hardware people helpfully put the bit for the LED in the same register as the launch missile bit. (Thanx, guys!) Maybe you have a multitasking OS and something else may interrupt your access to GPIOA between the time you do the load and the time you do the store. (With blinking LEDs and nothing else on the GPIO, as is the case for a Nano with no external hardware, this doesn’t matter). In real life code, you probably need to raise an interrupt priority level or grab a mutex on the GPIO or something else to prevent competing code from stomping on the reads and writes. To help visualize the problem, let’s look at the generated code. (This is for the red LED that’s on pin 13 of GPIOC, but follow the problem.)

0x08008e1a <+28>:	lui	a4,0x40011
0x08008e1e <+32>:	lw	a5,12(a4) # (MARK A) offset 12 at 0x40011 is the GPIO C register. Read that into A5
0x08008e20 <+34>:	lw	s0,12(sp) # this is just the compiler restoring the saved s0 register so we can return later.
0x08008e22 <+36>:	lui	a3,0x2.   # Since this is bit #13 and we can only load immediate 12 bits, load upper of a3 here.
0x08008e24 <+38>:	or	a5,a5,a3. # or the bits in A5 (that we read out of the chip) with or 0x20000 to set bit 13
0x08008e26 <+40>:	sw	a5,12(a4) # (MARK B) store that into the output register.

If anything else touches that register between MARK A and MARK B, Bad Things are going to happen and you may risk launching missiles instead of blinking a light depending on what else is in that register. This is why you probably need to brace it with a mutex or whatever is appropriate for your system.

There must be a better way!

There is a better way and it’s unique to the GPIO registers, but it seems like something that Gigadevices brought forward from ARM-land when they “found inspiration” in the GPIO system of Blue Pill, which is very similar. Join us now on page 104 of the 536 page hymnal, GD32VF103 User Manual EN V1.0.

There is no need to read-then-write when programming the GPIOx_OCTL at bit level, user can modify only one or several bits in a single atomic APB2 write access by programming ‘1’ to the bit operate register (GPIOx_BOP, or for clearing only GPIOx_BC). The other bits will not be affected.

That’s pretty awesome! The chip will guarantee atomicity. All we have to do is write the bit number into the GPIOx_BOP to set the bit or the bit number into GPIOx_BC to clear that GPIO line. Going back to our example of the blue LED in GPIOA that’s on bit 2, we can thus write 1 << 2, which is 4 into GPIOA_BOP to turn off the LED (remember, on the demo board, they’re backward) or write a 4 into GPIOA_BC to turn it on.

((GPIO*) GPIOA)->bit_op &= ~( LED_BLUE );

We can’t affect any other bits in the register and that means we don’t have to read it and we don’t have to worry about atomicity issues needing to grab a mutex or raise the spl. When we look at the equivalent of the code above, once all the conditional stuff is stripped away in the same way.

0x08008db0 <+6>: lui a5,0x40011 # 0x40011 << 12 - 2028 = 0x40010814
0x08008db2 <+8>: li a4,4 # load up our bit number into A4
0x08008db4 <+10>: sw a4,-2028(a5) # store a4 into  40010814

The same store to 0x40010814, bit_clear, would turn off that GPIO pin.

This appears to be unique to the GPIO registers in the GD32V line.  The comparable GPIO registers in competing parts like the Kendryte K210 don’t have this feature. 

In a standalone, general purpose function like this, the measurements are small. If you’re able to reduce these to functions or templates that have constant arguments and can be inlined, but don’t need to gra a mutex, it’s a potentially large difference.

It’s easy to argue that if saving a few clock cycles on GPIO accesses in 2020 is a priority, that you’ve lead a bad life and are being punished. That may be true, but that’s the life of an embedded systems engineer. A store of a constant to a constant address is usually “better” than a read, a modify, and a write. If that GPIO access is controlling the laser that’s cutting into your eyeball, you may appreciate the code being as streamlined as you can get.

Longnan Nano with GD32V MCU and an OLED display.