NanoOs

30-Nov-2025 - “Open the Pod Bay Doors, HAL.”

One thing that’s bugged me about my OS is that it required a small amount of C++. I’m great at C, but I’m mediocre at C++. The main reason for me doing this project was to improve my embedded C skills. The fact that I’ve had to use C++ to interface with Arduino®’s libraries is just a nuisance. There wasn’t a lot of it in the code, but just enough so that all of the main libraries had to be .cpp files instead of .c. It was just aggravating.

Now, because of the way the Arduino® libraries work, it’s not possible to completely get away from C++. It was possible, however, to isolate it. The only things that I really needed Arduino®-specific functionality for was interacting with the hardware. Everything else was OS or application logic that was completely independent of that. Since the part that I wanted to isolate was all hardware specific, the solution here was obvious: Rearchitect the code with a Hardware Abstraction Layer (HAL).

Having a HAL also provides a means to do a few other things I’ve been wanting to do. One of the things I’ve been thinking about is porting NanoOs to a different system - one that isn’t Arduino®-based. Having a HAL would allow that to be possible. I’ve also been wanting to run it in a simulator as a regular Linux® application. That would (a) allow my development to go much faster and (b) allow me to write tests and do analysis on the code with proper development tools. So, with multiple motivating factors before me, I set out to make a proper HAL.

At this point in the life of the project, there was a fair bit to consider, especially if I intended to also be able to run the OS as a regular userspace application. The basics of abstracting simple IO were pretty straightforward, but more involved things like simulating peripherals and running overlay code out of memory were going to require some thought. As an old IBM® manual said, “Some design may be required.”

The first thing I had to consider was how to represent the HAL. The whole point of this exercise is to require as little change to the core logic as possible in order to support a new environment in the future. That meant I really needed to think about how the rest of the code was going to interact with it and what the integration points would be.

I did some research on how different modern operating system kernels do hardware abstraction. The most-famous one is, of course, modern NT-based Windows®. On Windows®, the HAL is a DLL, which obviously wouldn’t work here, but the general idea of having a single location for a collection of swappable functions might. I looked into other operating systems as well. Somewhat disappointingly, Linux®, which is a monolithic kernel, uses distributed abstractions instead of a monolithic HAL. Basically, each subsystem layer is responsible for its own form of hardware abstraction. The XNU kernel (of macOS®) has something called “I/O Kit”, which is a formal, object-oriented framework that manages most hardware abstraction, but not all. And, of course, microkernels do almost zero hardware management in kernel space, at least in general. I didn’t dig into any one microkernel in detail, but I would suspect that at a basic level, they’re either tightly coupled to the hardware or they have some form of small, monolithic abstraction if they intend to be portable.

Anyway, I outright rejected the idea of distributed abstractions since it flew in the face of my goal of isolating the hardware-specific code to a single part of the codebase. Having processes manage some portion of abstraction the way a microkernel does is inevitable given that my SD card interface is implemented as a process. For basic functionality, though, I decided to go with a structure of function pointers that are initialized by something outside of the main OS logic (i.e. something that’s done before the scheduler is started). This provided the extra benefit that I could have the HAL hold data as well. This became important immediately.

Since it was fresh on my mind from my recent work with overlays, the first thing I did was to remove the hardcoded constants for the overlay map and put them into the HAL. In the new HAL structure, I defined two variables: A pointer to a NanoOsOverlayMap structure and a size of the overlay. I created a .cpp file to hold the implementation of the HAL (which has to be C++ because all of the Arduino®-specific code will eventually be in there) that had the values set in a static member variable and a halArduinoNano33IotInit function that returned a pointer to it. This would allow me to have different implementations of the HAL just by changing the function that’s called to initialize the global HAL pointer. I then went about changing all the hardcoded constants to reference the member elements of the global HAL pointer.

The next thing I did was to take on serial port initialization. That was currently being done directly in the root Arduino® project file (“NanoOs.ino” in this case). To make the code more portable, I needed a way to do it in the HAL. So, I added an initializeSerialPort function pointer to the HAL structure, added an arduinoNano33IotInitializeSerialPort function to the implementation library, and set the function pointer to the new function in the HAL implementation. This was the point at which I formally included Arduino®-specific code into the HAL. The serial ports are referenced externally by zero-based integers which map into a static array of pointers to Arduino® HardwareSerial objects internally. I also added a getNumSerialPorts function to the HAL that just returned the number of elements in the array. I changed the code in the setup function of the NanoOs.ino file to iterate over the available serial ports and call the initializeSerialPort function on each of them.

After that, I took on reading and writing data from/to a serial port. For the rest of the OS, interactions with the serial ports were abstracted through the Console library, but the Console library itself still directly utilized Arduino®-specific functions (and was therefore a C++ file). I added a pollSerialPort function to the HAL that would check for a single character and a writeSerialPort function that would write a series of bytes. I then changed all the existing Arduino®-specific code in the Console library to the equivalent logic that used the HAL. This broke the Console library’s dependence on Arduino® code and consequently allowed it to be convereted to a pure C library!

The next round of abstraction changes was considerably more-involved. I needed to break the SD card library’s dependence on Arduino® code. The existing SD card implementation relied on SPI, so I would obviously have to abstract the Arduino® SPI mechanisms. However, as should come as no surprise, SPI relies on Arduino®’s GPIO mechanisms, so I would have to come up with an abstraction for that as well.

I started at the bottom, GPIO, and worked my way upward. If I was going to do a full HAL implementation, there are six (6) things I would need to implement in addition to getting the number of available GPIOs: DIO configuration, digital read, digital write, AIO configuration, analog read, and analog write. However, (a) I’m not going for a full implementation and (b) I would have no way to test one. So, I restricted myself to only getting the number of DIOs (and omitting the number of AIOs), configuring a DIO, and doing a digital write since that’s all that my SD card library was using.

Ultimately, the implementation was pretty simple, however I did have to think about how to organize the calls to configure a DIO and write to one. I made both of them take an integer DIO pin number as their first parameter (which has to begin at 2 because 0 and 1 are used for Serial1 RX and TX, respectively). For arduinoNano33IotConfigureDio, I made the second parameter a boolean that determined whether or not to configure the pin for output and for arduinoNano33IotWriteDio, I made the second parameter a boolean that determined whether or not the write was to set the pin high. Once I had the functions in place, I replaced all the existing calls to the Arduino® pinMode and digitalWrite functions with their HAL equivalents.

Abstracting SPI was not as simple. The first question was how to address a SPI device. It was clear that I was going to have to have state information for a device, but I really didn’t like the idea of exposing that. The whole point of abstracting the hardware was to simplify what’s required to interact with it. Since this is an embedded system with limited resources and the number of DIOs, I decided to simply address them by a numerical ID that corresponded to an index into an array of state structures.

There were four (4) operations that I decided I needed to abstract: initializing a SPI device, starting a SPI transfer, ending a SPI transfer, and transferring 8 bits. I thought about abstracting 16-bit or 32-bit transfers but (a) I have nothing in my code today that uses those sizes and so would have no way to test them and (b) I REALLY didn’t want to get into issues of endianness. So, for now, I don’t have transfers for those bit widths.

In the interest of general-purpose abstraction, I decided that my initSpiDevice would take a user-defined SPI device number in addition to the numerical IDs of the four SPI line DIOs (chip select, clock, controller-out-peripheral-in, and controller-in-peripheral-out), despite the fact that all of the SPI pins except for chip select are hardcoded in the Arduino® implementation. This provided some flexibility for the caller to define their own layout of device numbers in addition to future-proofing things a bit. The startSpiTransfer and endSpiTransfer functions both take a previously-configured SPI device ID as the sole parameter and the spiTransfer8 function takes one as its first parameter and the data to transfer as the second parameter. That made for a pretty clean and simple interface.

With those abstractions in place, I was able to remove all of the Arduino®-specific code from my SD card library and convert it to a pure C library! In the end, I actually didn’t use any of the DIO functionality directly. In the implementation of my initSpiDevice function, I called the configureDio function for the chip select pin, thereby eliminating the need for the caller to meddle with DIOs directly. While I was at it, I pulled all the SPI-specific stuff into its own library and just left the generic process-level interface in the base library.

That completed the last of the kernel-level code that needed to be converted. At that point, only a few user-level libraries remained. The first one was the user library that abstracts basic IO functions. I had a number of simple functions that wrote to the main serial port. I changed them over to use the new HAL->writeSerialPort function. I also had some minor tweaks to make to allow the library to be fully converted from C++ to C.

The second userspace library - and the last library that needed to be converted from C++ to C - was the library that housed the bulk of the standard C implementation for user space. The functionality that needed to change here was related to time. I had directly used Arduino®’s millis function to keep track of the system time and elapsed time. Functionality related to the system clock is obviously something that belongs in a HAL.

I wanted to do a little bit better than I had done up to that point in terms of time plus do a little future-proofing in preparation for porting to a different architecture. I decided to implement four (4) functions: setSystemTime, getElapsedMilliseconds, getElapsedMicroseconds, and getElapsedNanoseconds. The getElapsed* functions return the number of units that have elapsed since a provided start time where a start time of 0 represents midnight, Jan 1, 1970 and any other time is a value that’s been returned from a previous call to the same function. setSystemTime is pre-enablement work so that I can, at some point, get the current time of day and then start reporting values that are closer to the true current time. With those functions in place, I was able to convert the last of the OS code to being dependent only on the HAL, and also remove the last of the need for C++ outside of the HAL.

Then came the acid test. Now that I had successfully isolated hardware-specific code to the HAL, it should be possible to make a simulation environment that would allow my OS to run as a regular Linux® application. It was time to put that idea to the test!

I’m going to skip over the details of the nightmare that followed. I started my changes in support of a simulator on November 19th. I didn’t get a mostly-working environment until November 22nd. During those three days, I radically restructured my headers. To summarize, my headers weren’t independent enough to work correctly in an application context. I switched from headers including other headers to most headers being explicitly included in the .c files. There are only a very few exceptions to this now.

In addition to the application driver, there were two libraries I had to write: One for the HAL simulator and one for the SD card simulator. This was pretty straightforward. I redirected all of the serial output to stderr and configured stdin to be non-blocking so that I could poll it for a byte at a time from the Console library in the same way that I do on hardware. I changed the time functions to use standard POSIX time calls. I just put in dummy functions for all of the DIO and SPI calls. For the SD card simulator, I just opened a Unix® device node and did direct block read and write calls. That way, the filesystem process on top stayed 100% identical.

The bigger challenge in simulation, however, was getting the overlay logic to work correctly. This was the first time I had ever read compiled code into memory and attempted to execute it directly from within an application. My initial thought was that I could do something similar to the way I compiled the userspace programs and tell the linker to put an 8K buffer at the correct address. Sadly, this is not the case. For security reasons, instructions given to the linker about the addresses to place things at are “suggestions” to the OS at runtime. The OS randomizes locations to avoid putting data at known locations and creating a security hole. After fumbling around with this for a bit, I discovered that the solution was to do an mmap call that specifies the appropriate address and marks the memory as readable, writeable, and executable.

I was initially using the same SD card as the Arduino®, just loaded with x86_64 binaries instead of ARM® binaries. That got annoying pretty fast. I figured a much better solution would be to just have a small RAM drive with the binaries loaded into it. So, I made a loop device with a 100 MB file under /tmp (which is on a tmpfs), formatted it as exFAT, and copied the userspace files there. That worked beautifully and greatly sped up my development process. That also left the SD card to be used exclusively by the hardware, so I could actually test things both places pretty quickly.

One change I made during development of the simulator was to dynamically determine the number of consoles the Console process manages. I had previously hardcoded it to always have two consoles for two reasons: (1) the Arduino® supported two serial ports (one over USB and one over hardware UART) and (2) my goal is to simulate an early version of Unix® with two consoles. In simulation, though, two consoles didn’t make a lot of sense. Sure, I could just send output to both consoles to stdout or stderr, but why bother? Also, that clutters up the screen. So, I changed so that the Console queries the HAL for the number of serial ports that are available and spawns UP TO two instead of always assuming two. I then changed the scheduler to only spawn as many shells as there are console ports. That made things much cleaner.

Another change I made was to properly handle the backspace key in the console. Up to this point, I’d been mostly relying on the Arduino® IDE, which lets you type in a line of text at a time before it’s sent over the USB serial port. Yeah, I had the one over hardware UART too, but it was mostly just a proof of concept - I didn’t do any real debugging in that environment. Since the simulator only had one console and that console was analagous to the hardware UART, I really needed to be able to correct my typos on the command line. So, hooray! I have a working backspace key now, too!!

I’m sure I’ll make more changes to the HAL over time. I’m not entirely happy with the way that the SD card process is currently abstracted. I thought about putting the code directly into the HAL but decided against it. Having the SD card run as an independent process and minimizing the HAL’s footprint keeps things more isolated and keeps the architecture closer to a microkernel design. I can’t really claim that NanoOs is a microkernel, but I want to avoid the monolithic design as much as possible. The older I get, the more I dislike monolithic kernels. There’s just too much opportunity for unintended interactions that way. Anyway, I don’t know what the answer is to further abstracting the SD card process but I’m sure I’ll change it when I come up with something I think works better.

I’ve spent the past week doing some code cleanup and bug fixes. There’s still more to do. I noticed today there’s a problem when running commands with pipes that I’ll have to look into at some point. I’d like to get to the point of having some unit tests that can run as an application as well and start really getting the quality of this thing up. There’s always something else on the radar.

To be continued…

Table of Contents