DUMONT CYBERNETICS - CLI Basics

With a proven UART interface and not a lot else I started working on a basic command line interface. I've written systems like this a few times before for commercial products but those implementations are all locked away in private code bases. The CLI is pretty simple, it takes bytes from stdin and echoes or modifies them then when you press enter it goes back through the last line of text it received and looks for a command.

Getting STDIN/STDOUT working

I'm using the GCC ARM none toolchain meaning it's based on the GCC compiler for the ARM architecture and it has bindings for operating system "none", i.e. it doesn't have any. The way this is handled in newlib (the C library that provides printf, strlen, malloc etc.) is that things that would talk to the operating system like open() or gettimeofday() are just stubs, when you actually use them the linker stage will complain that they don't exist.

To get the UART to do stdin and stdout so we can use printf, scanf and other stdio functions we need to provide real implementations of the _read() and _write() stub functions. For now I've just set these functions up to only allow reading from stdin and writing to stdout/stderr and always to forward the bytes to the console UART. Later on we'll make these more complex so you can read and write files that are on one of the SD cards as well as directing console output to the UART or to the local display.

I've used the STM32 Low-Level library here instead of the higher level Hardware Abstraction Layer (HAL). I want to make my UART present each individual byte to the software as quick as possible and to use the FreeRTOS queues for buffering since these are thread safe and will correctly block/unblock tasks waiting on input from stdin. I used queues rather than the newer stream buffer in FreeRTOS because I find it useful to be able to printf from multiple tasks without having to buffer somewhere else and then write to the UART, stream buffers can only work with one task writing to them so they don't support this whereas queues are a little more overhead but can be written to from any task.

int _read(int fd, void *buf, size_t count)
{
    int nread = -1;
    if (fd == STDIN_FILENO)
    {
        char *char_buf = buf;
        for (size_t n = 0;n<count;n++)
        {
            xQueueReceive(uart_stdio_rx_q, &char_buf[n], portMAX_DELAY);
        }
        nread = count;
    }
    else
    {
        errno = EINVAL;
    }
    return nread;
}

The read function checks to see if you're trying to read from stdin, the only readable file at the moment and returns an error if not. Then it blocks until it has read the required number of bytes from the FreeRTOS queue. This is really simple, it doesn't support non-blocking reads or other features for now but it does mean we can write an efficient console application which reads a single byte from stdin and will block until a UART receive interrupt is fired.

int _write(int fd, const void *buf, size_t count)
{
    int nwritten = -1;
    if ((fd == STDOUT_FILENO) || (fd == STDERR_FILENO))
    {
        const char *char_buf = (const char *)buf;
        char last_char = '\0';
        for (size_t n = 0; n<count;n++)
        {
            if ((char_buf[n] == '\n') && (last_char != '\r'))
            {
                xQueueSendToBack(uart_stdio_tx_q, "\r", portMAX_DELAY);
            }
            xQueueSendToBack(uart_stdio_tx_q, &char_buf[n], portMAX_DELAY);
            last_char = char_buf[n];
            LL_USART_EnableIT_TXE(uart_stdio_uart);
        }
        nwritten = count;
    }
    else
    {
        errno = EINVAL;
    }
    return nwritten;
}

Writing is a tiny bit more complex because it has to enable the transmit complete interrupt and does newline expansion. First it checks again to see that you're trying to write to stdout or stderr and if not returns an error. After that it starts processing the bytes you sent if it finds a newline character \n that wasn't preceded by a carriage return \r then it inserts the carriage return. This is standard behaviour on Linux machines and you'll find it on other emebedded platforms like the Raspberry Pi Pico as well. If you don't send the \r terminals like Putty will move down a line because of the \n but won't go back to the left of the screen. Once a character has been added to the queue the transmit buffer empty interrupt is enabled to make sure we keep checking for more data to send.

void uart_stdio_init(USART_TypeDef *uart)
{
    uart_stdio_rx_q = xQueueCreate(UART_STDIO_RX_BUF_SIZE, 1);
    uart_stdio_tx_q = xQueueCreate(UART_STDIO_TX_BUF_SIZE, 1);

    LL_USART_EnableIT_RXNE(uart);
    uart_stdio_uart = uart;
}

There's a tiny bit of setup to do that's called at the beginning of the default task, this is just creating the read and write queues and enabling the recieve interrupt on the UART.

void uart_stdio_irq_handler(USART_TypeDef *uart)
{
    BaseType_t woken = pdFALSE;
    if (LL_USART_IsActiveFlag_RXNE(uart))
    {
        char val = uart->RDR;
        xQueueSendToBackFromISR(uart_stdio_rx_q, &val, &woken);
    }
    if (LL_USART_IsActiveFlag_TXE(uart))
    {
        char val;
        if (xQueueReceiveFromISR(uart_stdio_tx_q, &val, &woken) == pdTRUE)
        {
            uart->TDR = val;
        }
        else
        {
            LL_USART_DisableIT_TXE(uart);
        }
    }
    portEND_SWITCHING_ISR(woken);
}

Finally there's the UART interrupt handler. This is called from the CubeMX generated stm32h7xx_it.c file (in CM7/Core/Src) which has the UART3 ISR handler defined. It's a really slim function to make the handling of UART interrupts as fast and efficient as possible. If there are bytes waiting in the receive buffer read them and put them in the RX queue. If there is empty space in the transmit buffer check to see if there's anything waiting in the TX queue, if there is stuff it in if not mask the transmit interrupt. It'll get unmasked again next time we have something to add to the TX queue.

With that all working I swapped the loop to use printf instead of calling the HAL_UART routine and still got "Hello World" out once a second but now I could use stdin as well.

Writing a command line dispatcher

I've written a pretty simple command line dispatcher, it reads a character from STDIN and decides what to do with it, mostly it just puts it in a buffer and waits for the next one. The only special cases so far are carriage return and newline which are handled the same (unless they come together and then the newline is just ignored because the carriage return came first) they trigger parsing of the line buffer. Backspace is handled and removes the last character from the line buffer and sends "backspace - space - backspace" to the terminal which causes it to go back a space then overwrite what was there with a space then go back again so you can see the last character has been erased. In future it would be good to add cursor key handling (for editing the command and scrolling history) and tab completion.

Once a line buffer is sent to be executed I use the standard library strtok_r to break it up into tokens (note using the thread safe version!). If the string is empty strtok can get confused so there's a blank line check first. The tokens are put into a list and a count is kept so we can pass the count as argc and the token list as argv to the command. Once the token list is prepared the code searches through two lists of commands which are just compile-time generated lists of strings that represent the command and functions that can carry out the work. The first list is built in commands which are part of the shell implementation, at the moment there's just a dummy command for helping to debug tokenisation but in time that should include the usual suspects of cd, ls, echo etc. The second list is linke in from a file outside the shell code itself that list is the commands that are implemented by the application. There's room in the search function to handle further sources of commands, I'm thinking of searching a PATH for binaries that could be loaded from SD card or even files that could be read and fed into the shell as scripts.At the moment the focus is on simplicity.

I've created the shell as a separate project called "DRASH" or "Dragon Shell". You can find the shell library on GitHub. There's an example project inside it which sets up a normal Linux host to pass keystrokes through to the application so it can emulate the character by character processing that's done in the embedded system on the end of a UART.

Putting it all together

I've pulled the shell program into the existing project and tweaked a few settings in the CubeMX file (like switching the UART to LL driver and enabling the RTC). I added a first couple of commands:

check-clocks: Does a really simple sample clocks and then pause for 10 seconds and prints how many ticks each clock had in that time, it checks the RTOS timer against the RTC and the high-frequency timer which is configured for CPU usage profiling in the RTOS. Since each of these clocks is independent you should be able to see that the number of ticks on one or other of them is wrong compared to the other two if it was misconfigured.
ps: This is a basic Process Status command which basically just presents the output from a standard utility that's built into FreeRTOS. It gives a list of the current tasks on the processor and their state (blocked, running, etc.) It's handy for checking if a task has got stuck somewhere and isn't blocking on the RTOS like expected. It will also be useful for checking memory usage in the tasks.

To get the clocks one working I had to enable the RTC and write a little glue layer like I did for stdio. The newlib callback to get the current time of day is _gettimeofday() that returns a UNIX timestamp in a struct so I had to write some code to convert the STM32 RTC output into a UNIX timestamp that could be returned. Now I can use time(NULL) in any code I need wall-time just like I would working on Linux.

All in all this section seems to have gone pretty smoothly, I've worked with Newlib a lot before and was pretty familiar with these stubs I needed to create. They make a good building platform for extensible frameworks, the POSIX API that they facilitate has lasted for decades and scales well from embedded up to supercomputers.