print page


Machine Learning for resource constrained embedded systems

Contact Prof. Dr. rer. nat. Volker Turau
Staff Dr. Marcus Venzke
Start 1. January 2019
End 31. December 2022
Financing Hamburg University of Technology

Project Description

After decades of intensive research, Machine Learning (ML) is increasingly finding its way into real-world applications. This was driven by the progress of computing systems with enormous computing power and the availability of large amounts of data. ML has enormous potential to enhance the performance of devices and machines in a wide variety of application areas. In the next five to ten years, ML will also find its way into embedded systems. Already today, such applications can be created for machines with powerful processing units (e.g. industrial PCs). But smaller devices usually only have inexpensive, low-power small microcontrollers. Some microcontrollers already have 32-bit architecture, floating-point arithmetic, and instructions for vector processing. However, 8-bit architectures without floating point arithmetic are still widely used. RAM sizes between 2 kB and 512 kB are typical. In addition, many devices are powered by batteries. Therefore, for the use of ML on small microcontrollers, there is the big challenge to adapt the requirements of power, computation and memory resources to the reality of such systems.

The project will therefore investigate how machine learning - mainly in the form of artificial neural networks (ANNs) - can be used efficiently on low-power microcontrollers. In this context, ML is used for typical applications of embedded systems, such as the evaluation of sensor signals or the optimization of communication in sensor networks. It is also considered, which problem sizes can be implemented on resource-constrained microcontrollers with a given performance.

The resource requirements are reduced by combining different approaches. For example, the ML problem must be reduced to what is absolutely necessary for the application, e.g., by preprocessing inputs or removing unnecessary inputs. Used ANNs must not be chosen too large (number of layers, neurons, connections between neurons). Alternatively, a large ANN is trained first and then compressed. Bit widths can be reduced for weights and intermediate results (e.g. fixed point, binary or ternary values, hierarchical quantization). Subproblems that do not need to be executed in the embedded system (such as training, optimization runs, preprocessing) should be moved to a powerful system as much as possible.

The first application is an embedded system that recognizes hand gestures (e.g., hand movement from left to right or from top to bottom) with an ANN. It should be producible for a price below one Euro in mass productionn and installed as a component in devices. Light sensors in a 3x3 matrix serve as a kind of compound eye. Their signals are processed in the prototype by an ATmega328P microcontroller (8-bit, 16 MHz, 2 kB RAM, 32 kB flash memory). For this purpose, an ANN is executed to classify detected movements into gestures.


Florian Meyer and Volker Turau. QMA: A Resource-efficient, Q-learning-based Multiple Access Scheme for the IIoT. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), IEEE, October 2021, pp. 864–874. Washington DC, USA / Virtually.
@InProceedings{Telematik_icdcs_2021, author = {Florian Meyer and Volker Turau}, title = {QMA: A Resource-efficient, Q-learning-based Multiple Access Scheme for the IIoT}, booktitle = {2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)}, pages = {864-874}, publisher = {IEEE}, day = {7-10}, month = oct, year = 2021, location = {Washington DC, USA / Virtually}, }
Abstract: Many MAC protocols for the Industrial Internet of Things, such as IEEE 802.15.4 and its extensions, require contention-based channel access for management traffic, e.g., for slot (de)allocations and broadcasts. In many cases, subtle but hidden patterns characterize this secondary traffic, but present contention-based protocols are unaware of these patterns and therefore cannot exploit them. Especially in dense networks, these protocols often do not provide sufficient throughput and reliability for primary traffic, i.e., they cannot allocate transmission slots in time. In this paper, we propose QMA, a contention-based multiple access scheme based on Q-learning. It dynamically adjusts transmission times to avoid collisions by learning patterns in contention-based traffic. We show that QMA solves the hidden node problem without the overhead for RTS/CTS messages and, for example, increases throughput from 10 packets/s to 50 packets/s in a hidden three-node scenario without sacrificing reliability. Additionally, QMA's scalability is evaluated in a realistic scenario for slot (de)allocation in IEEE 802.15.4 DSME, where it achieves up to twice more slot (de)allocations per second.
Marcus Venzke, Daniel Klisch, Philipp Kubik, Asad Ali, Jesper Dell Missier and Volker Turau. Artificial Neural Networks for Sensor Data Classification on Small Embedded Systems. Technical Report Report arXiv:2012.08403, e-Print Archive - Computing Research Repository (CoRR), Cornell University, December 2020.
@TechReport{Telematik_Venzke_ANNsES, author = {Marcus Venzke and Daniel Klisch and Philipp Kubik and Asad Ali and Jesper Dell Missier and Volker Turau}, title = {Artificial Neural Networks for Sensor Data Classification on Small Embedded Systems}, number = {Report arXiv:2012.08403}, institution = { e-Print Archive - Computing Research Repository (CoRR)}, address = {Cornell University}, month = dec, year = 2020, }
Abstract: In this paper we investigate the usage of machine learning for interpreting measured sensor values in sensor modules. In particular we analyze the potential of artificial neural networks (ANNs) on low-cost microcontrollers with a few kilobytes of memory to semantically enrich data captured by sensors. The focus is on classifying temporal data series with a high level of reliability. Design and implementation of ANNs are analyzed considering Feed Forward Neural Networks (FFNNs) and Recurrent Neural Networks (RNNs). We validate the developed ANNs in a case study of optical hand gesture recognition on an 8-bit microcontroller. The best reliability was found for an FFNN with two layers and 1493 parameters requiring an execution time of 36 ms. We propose a workflow to develop ANNs for embedded devices.
Florian Meyer and Volker Turau. Towards Delay-Minimal Scheduling through Reinforcement Learning in IEEE 802.15.4 DSME. In Proceedings of the First GI/ITG KuVS Fachgespräche Machine Learning and Networking, February 2020. München, Germany.
@InProceedings{Telematik_meyer_FGMLVS, author = {Florian Meyer and Volker Turau}, title = {Towards Delay-Minimal Scheduling through Reinforcement Learning in IEEE 802.15.4 DSME}, booktitle = {Proceedings of the First GI/ITG KuVS Fachgespr{\"a}che Machine Learning and Networking}, pages = , publisher = {}, day = {20-21}, month = feb, year = 2020, location = {M{\"u}nchen, Germany}, }
Abstract: The rise of wireless sensor networks (WSNs) in industrial applications imposes novel demands on existing wire- less protocols. The deterministic and synchronous multi-channel extension (DSME) is a recent amendment to the IEEE 802.15.4 standard, which aims for highly reliable, deterministic traffic in these industrial environments. It offers TDMA-based channel access, where slots are allocated in a distributed manner. In this work, we propose a novel scheduling algorithm for DSME which minimizes the delay in time-critical applications by employing reinforcement learning (RL) on deep neural networks (DNN).