The Kendryte K210 seems to have been one of the early success stories for RISC-V, if not in mainstream computing, certainly in maker mindshare. The 64-bit device had two cores, enough support peripherals to be useful for your robotics project, enough AI to recognize faces or do image detection and following for your self-driving robot project, and ran a chopped-down Linux if you really needed it, though this was all pretty precarious in 8MB of core. Obviously, the successor device should address these and bring up some 2020 level specs from the 2018-ish design we saw with K210. That device even had a name leaked or rumored: “Kendryte K510.”
I can find rumors and predictions for K510 as far back as 2019. Canaan (another name for Kendryte, as best I can tell) themselves talked about K510 in December of 2019:
Zhang said that the new generation of K510 chip has been greatly optimized in algorithm and architecture. Compared with the first-generation chip, the K510’s computing power will increase by 5-10 times, and it will be developed for 5G scenarios.
Finally, almost nine months ago, K510 was formally announced, but no reference designs or availability was given, so it stayed in my “hype” folder. While I still haven’t seen hardware shipping, we now have some faith that hardware is now purchasable.
Enter the new developer’s reference board
AnalogLamb is offering the DEV-AI0002, a K510 Dual RISC-V64 Core AI Board with Dual Camera and LCD.
It looks like a substantial board, offering dual-core RISC-V64 CPU with frequency up to 800 MHz. They claim 3 TeraFLOPS is possible. (Editor’s note: the K510 doc repeatedly says “800Mhz”, but that’ll be hard to do with a 5 stage, in order CPU like this…) If true, that would put the device on par with the fastest GPUs from 2009, a large Xeon from 2015, or a beefy gaming machine from 2020. That said, this power isn’t coming from a 3D GPU; only a 2D GPU is cited for this board.
Beyond the power of the SoC itself, the reference boards add 512MB LPDDR3@1600MHz, a Camera Board with two camera sensors and Base Board. There are services for an LCD display, 1000M ethernet RJ45, HDMI, USB, TF Card, GPIO, UART and Audio Interface. CRB adds:
- K510 integrate the dual-core RISC-V64 CPU and DSP up to 800MHz
- Up to 3 TFLOPS AI, Ultra low-power wake-up VAD
- Input high-definition triple camera, MIPI CSI/DVP interface;
- Output: 4Video Layer + 3 OSD Layer;
- High-quality H264 video encoding, 2 channels 1080P@60;
- 2D image accelerator: zoom, crop, rotate, OSD overlay.
- Camera Sensor Board with two sensors
- 512MB LPDDR3@1600MHz
- 1000M Ethernet RJ45 Interface and Wireless Module
- HDMI and a LCD Display
- USB OTG and USB Type-C Power Supply
- USB to UART for Debug
- TF Card Interface and GPIOs
It’s following the model of D1 and Raspberry Pi Compute Model in using a main board to carry the SoC and a larger board to bring out I/O connectors like TF (“TransFlash” is the term for uncertified SD Cards) sockets, USB 2.0, GPIO, Gig Eth, HDMI, and such. The unspoken theory is that decoupling these allows smaller (pronounced “cheaper”) carrier boards and replacing the CPU modules with newer ones as they come to market. It’s the future we were promised with Pentium-II “cartridges”. The K510 CRB Hardware Guide is one of the few in the initial doc release that’s Google Translate handle well to convert to English. The acronym isn’t known to me (yet – comments welcome!) but I’m assuming it’s “Customer Reference Board”. In some fantasy land, the K1020
40-pin GPIO connector – with a twist
Though the 40 pin connector may make you think of a Raspberry Pi-like expansion bus, but the pinouts are incompatible. It seems that a direct link to section 3.15 doesn’t work, so I’ll just repeat it here:
Figure 3-18 40P pin header expansion interface Table 3-4 Expansion interface definition
Numbering definition Numbering definition 1 VDD_1V8 2 GND 3 VDD_1V8 4 GND 5 VDD_3V3 6 GND 7 VDD_3V3 8 GND 9 VDD_5V 10 GND 11 VDD_5V 12 GPIO_1V8_95 13 GPIO_3V3_114 14 GPIO_3V3_115 15 GPIO_1V8_92 16 GPIO_1V8_96 17 GPIO_1V8_105 18 GPIO_1V8_107 19 GPIO_1V8_104 20 GPIO_1V8_106 twenty one GPIO_1V8_118 twenty two GPIO_1V8_119 twenty three GPIO_1V8_93 twenty four GPIO_1V8_94 25 GPIO_3V3_125 26 GPIO_3V3_124 27 GPIO_3V3_127 28 GPIO_3V3_126 29 GND 30 GND
(I kept twenty one through twenty-four as words because that’s how Google Translate presents them to English readers. Is there some significance to this in the original Chinese?)
While there are a few 3.3Volt lines, a majority of them are 1.8V. While this board doesn’t really seem to target IoT hobbyist style projects, this will provide a challenge for anyone that DOES want to attach their favorite Adafruit or Sparkfun gizmoid of the Pi or Arduino-class products that are almost universally 3.3v these days. There are few 3.3V lines available on this connector, so they might run out quickly. If you’re connecting a 3.3V device to a 1.8v host, you’ll need to brush up on the details of level shifting or find a component that’s better suited. Most 3.3v devices will read the maximum high of a 1.8V signal as a “low”, meaning it would be unable to recognize any change in voltage reliably. A $200 board really isn’t meant for running robotics servos and air sensors. Save those projects for a Dr. Who HiFive (RISC-V, of course!) Inventor Kit
Andestar V5, the primary core (two, actually) of the K510
The processor itself takes a big step up from the RISC-V Rocket design that was used at the heart of the K210. The tech docs show that they’re using an Andestar V5 design from Andes Technology, but clearly updated from the Andestar V5 they announced in September of 2019. Of particular note, we see the Vector (presumably 1.0) support which was only ratified in December of 2021. That’s pretty exciting. There’s a collection of doc that we can hope will grow and we hope that “zh” grows sibling directories of English versions. (You’re free to hope for your own favorite languages, too – I’m just being selfish. 🙂 ) Google Translate handles a few docs OK, but the majority of them Translate will handle only a few lines at a time.
The chip is rich in I/O. Seven i2c and three SPI ports are generous, but I’d be careful with that voltage level peering issue. A 2D GPU will help most desktop applications once appropriate drivers are refined. All the RAM seems to be on the SoC itself, so don’t count on user upgrades. In the processor block below, we see three blocks of processing and a mailbox unit to let them pass messages (like interrupts) between them. The two RV64G+ class units should be familiar to readers here. The Kendryte Processing Unit in the K210 was the Tensor-style processing unit so we’ll refer to K210 KPU FAQ. Let’s hand-wave the details of that for now, but that lets us know what the KPU can basically do.
Kendryte passes the doc hot potato back to Andes for some chip-level documentation. This is fine, since they would be the experts on some aspects, but it can be a bit of paper chase not knowing exactly which revisions of the doc corresponds to the cores in these chips. We’ve already discussed that “Andestar V5” isn’t exactly a tight version number scheme as it apparently covers at least some range of parts from 2018 to 2021. But we’ll work through what I can.
I’m inferring that the AndeStar V5 Instruction Extension Specification is in play right through version 1.4, the most recent there. As 1.3 added the not-quite-final Vector extensions and 1.4 added vector for bfloat16, both of which are listed as features of K510, we seem quite up to date. We get 73 pages of (English) doc covering features on the chip that are in the extended feature sets. Andes is new to me, so it’s worth a moment for me – and hopefully, the reader – to make a quick romp on what extensions above the common RISC-V opcode set allows. I won’t go into great detail as just knowing these are a thing and that they have the possibility to improve your code – BUT making your code not work on other branded architectures – is enough.
Extensions beyond stock RISC-V
The Andestar V5 ISA, as used in K510, isn’t targeting embedded or super low cost devices. These may be deployed as part of a fleet, in managed racks, or as workstation class devices and if using compiler magic to get magic opcodes results means that you need one fewer rows of compute-bound number crunchers in a data center, that’s probably OK. So what have they done?
Start with the basics, but extend the extensions. The Andestar V5m ISA is a superset of RV-IMAC. Some of the things that were vendor extensions (Vector wasn’t ratified until December of the second 2020) are now part of the official RISC-V extensions. Depending on the age of the doc we’re looking at, this can get a bit confusing, but I’m guessing that a smart decoder can perform compatibility with a customer base that was exclusively theirs and users of the newly ratified parts. One example of managed change is in the handling of “half floats”. We’ve long (sorry) had support for (32bit) floats and (64bit) doubles often in graphics work and machine learning, 32 bits is overkill. If you CAN use 16bit floats, you can effectively double the size of your caches, halve the number of data transfers, and handle more data inside a Vector operation. It looks like they’ve added half-floats to all the places that make sense.
Bit ops. Branch on a bit being set or clear. Match can be in opcode. Sign-extend a bitfield.
Address Scaling. It’s a somewhat frequent complaint (esp. from developers coming from ARM or x86) that address scaling has to be done by the programmer. The examples given by that former ARM engineer are compelling. Assembly programmers know it’s a bit of a pain to burn an extra temp register just to keep constantly multiplying (or tallying) the index by the size of the structure you’re traversing. Andes adds addressing modes to compute familiar LEA operations like “lea.d t3, t1, t2” which is “t3 = t1 * t2*8”. I think I recall Alibaba/T-Head adding similar extensions to their C904 and C910 designs.
Various performance enhancements like “find first byte” will help many algorithms. There are also opcodes for loading a number of words into consecutive registers and converting common data types to and from the 16-bit floats.
Tooling support from Andes
Fortunately (?) Andes maintains their own fork of Andes RISC-V GCC with their own GDB and binutils in order to support optimizer and debugging the chip extensions above. It’s not clear why they are keeping their own entire versions instead of mainstreaming them. I rarely see @andestech.com listed in the ChangeLogs or in the mailing lists of those tools.
Andes provides these tools in their Andesight Eclipse IDEf for Windows and Linux users. They provide binaries of their Andes Development Kit which is probably possible to build for MacOS as the source is there. The Andes Github repos are a bit of a circular resolution mess and it can be challenging to find current, maintained sources for each of the pieces. Hopefully a mainstream component in a high volume, open market will help drive some consolidation and more code sharing in this area.
Kendryte partnering with an experienced RISC-V core maker should eliminate a lot of the birthing pains we experienced with K210. The RISC-V standards are more developed and plentiful AndesTech RISC-V documentation (in English) and having Linux kernels, boot managers, and drivers already in place should be awesome. Deep in the docs, we learn they used the Andes AX25MP as a base and wrapped up the features of the Andestar 5 ISA as:
- RISC-V RV64I base integer instruction set
- RISC-V RVC standard extension for compressed instructions
- RISC-V RVM standard extension for integer multiplication and division
- Optional RISC-V RVA standard extension for atomic instruction
- Optional RISC-V “F” and “D” standard extensions for single/double-precision floating-point
- Optional AndeStar DSP extension
- Andes Performance extension
- Andes CoDense extension
and Andestar extensions as:
- StackSafe hardware stack protection extension
- PowerBrake simple power/performance scaling extension
- Custom performance counter events(My read is that these are “optional” features beyond RISC-V ratified sets that they have opted into when building K510.)
Kendryte themselves have already published much in the Kendryte Github repo. Buildroot, Berkeley Boot Loader BBL and Proxy Kernel pk, and a Docker image to compile K510 are already there in addition to the K510 docs. Prominently, the 575 pages of the K510 Technical Reference Manual will provide us register maps, descriptions, and electrical traits of the chip itself. (It’s stamped ‘confidential’ all over it. /shrug)
Back to the K510 CRB features
The K510 CRB ships with 4GB of bootable eMMC that can be loaded with your favorite OS, or your OS can be kept on a TF card for easy loading from another computer. The 128MB of NAND flash can store the boot loader and small amounts of storage, like a $HOME or configuration files. (In some places, it declares 16GB of eMMC and others call out 4. We’ll know once we see boards!)
The documentation is conflicting on the number and type of onboard LEDs. A WS2812 “Neopixel” is present and visible in the photos. Another LED of some type (Power?) may or may not be present.
Two switches allow booting from UART, SD, NAND, or eMMC. On other chips of similar capacity, we’ve seen SD and eMMC flashed with images that allow yet more boot sources, such as netboot via tftpboot or USB-attached storage.
The USB OTG socket seems to be of the old Mini-B variety and not contemporary USB-C, though the UART console appears to be USB-C. (Remember that USB-C is the connector and it IS legal to pair it with USB 2.0 signaling, as they’ve done here.) That interface is provided via a common CH340 USB/Serial adapter on the board.
An AP6212 can be seen on the sheets. That seems a bit of a dated choice, even for a 2.4Ghz-only product. 802.11 b/g/n tops out at 70Mbps and it’s Bluetooth 4.0. That’s fast enough for moderate network use and a pair of headphones, but seems like another choice to target this in compute lab or rack style environments – indeed, in environments like those the radios would be largely unused in favor of the provided copper ethernet jack.
There are plenty of video choices. You can drive a 1080p TFT display or HDMI, but not both at the same time. It’s a standard HDMI socket. MIPI video input is provided and a 30pint FPC connector provides LCD panel video output. The encoder claims to do H.264 Baseline Main/High Profile with 8Kx8K JPEG and a maximum support of 1080p/60fps. It is not a 3D accelerator.
This board looks like an interesting compliment to the VisionFive by StarFive and the Allwinner Nezha. Perhaps it can follow the precedent for Nezha and lead with a deluxe developer kit and later offer smaller docking boards, perhaps even using the same CRB, that are a lower cost but offer little more than a power and ethernet cable or other combinations as demanded.
Will you be ordering one? What are your plans with it?
Personally, my VisionFive just arrived, so it’s already in my review queue. Exciting times for RISC-V!