print page

ToleranceZone

Fault tolerant middle-ware idioms based on self-stabilizing techniques

Start 1. February 2012
Financing German Research Foundation (DFG)

Project Description

Wireless sensor network applications are inherently difficult to develop due to their tight integration into the physical world, scarce resources and notoriously unreliable radio communication that frequently lead to transient faults. This combined with the long-lasting unattended operation mode is an enormous challenge. This research project pursues the goal of furnishing essential operations of wireless sensor networks with fault tolerance based on self-stabilizing algorithms and thus to lay foundation for a permanent and uninterrupted operation of these networks. The project will be based on two fundamental assumptions. Firstly, fault tolerance is not an add-on to an existing middleware, but a recurrent theme throughout the design of the software infrastructure that glues together all components. Secondly, fault tolerance must be a self-organizing property. The latter assumption demands for a highly decentralized mode of operation. The project will substantiate the claim that the employment of self-stabilizing algorithms will bring about the same degree of fault tolerance, as this can be achieved with state of the art middleware platforms. We will evidence that with the new approach the consumption of resources is considerably reduced and that adaption to new types of errors is an inherent feature. The quantitative analysis of these claims will be carried out through a comparison of existing middleware platforms and a prototypical implementation of our approach. The anticipated results will considerably enhance the field of fault tolerance for wireless sensor networks. At the end of the project new algorithms and new methodologies will be available to bring this type of network closer to real world applications.

Publications

Volker Turau. A Distributed Algorithm for Finding Hamiltonian Cycles in Random Graphs in O(log n) Time. In Lecture Notes in Computer Science - Proceedings of 25th International Colloquium on Structural Information and Communication Complexity (Scirocco 2018), June 2018, pp. 72–87. Ma'ale HaHamisha, Israel.
@InProceedings{Telematik_Scirocco_2018, author = {Volker Turau}, title = {A Distributed Algorithm for Finding Hamiltonian Cycles in Random Graphs in O(log n) Time}, booktitle = {Lecture Notes in Computer Science - Proceedings of 25th International Colloquium on Structural Information and Communication Complexity (Scirocco 2018)}, pages = {72-87}, day = {18-21}, month = jun, year = 2018, location = {Ma'ale HaHamisha, Israel}, }
Abstract: It is known for some time that a random graph G(n, p) con- tains w.h.p. a Hamiltonian cycle if p is larger than the critical value pcrit = (log n+log log n+ωn)/n. The determination of a concrete Hamil- tonian cycle is even for values much larger than pcrit a nontrivial task. In this paper we consider random graphs G(n, p) with p in Ω ̃(1/√n), where Ω ̃ hides poly-logarithmic factors in n. For this range of p we present a distributed algorithm AHC that finds w.h.p. a Hamiltonian cycle in O(log n) rounds. The algorithm works in the synchronous model and uses messages of size O(log n) and O(log n) memory per node.
Volker Turau. Computing the Fault-Containment-Time of Self-Stabilizing Algorithms using Markov Chains and Lumping. In Stabilization, Safety, and Security of Distributed Systems - 19th International Symposium(SSS 2017), November 2017, pp. 62–77. Boston, USA.
@InProceedings{Telematik_SSS_2017, author = {Volker Turau}, title = {Computing the Fault-Containment-Time of Self-Stabilizing Algorithms using Markov Chains and Lumping}, booktitle = {Stabilization, Safety, and Security of Distributed Systems - 19th International Symposium(SSS 2017)}, pages = {62-77}, day = {5-8}, month = nov, year = 2017, location = {Boston, USA}, }
Abstract: The analysis of self-stabilizing algorithms is in the vast majority of all cases limited to the worst case stabilization time starting from an arbitrary configuration. Considering the fact that these algorithms are intended to provide fault tolerance in the long run this is not the most relevant metric. From a practical point of view the worst case time to recover in case of a single fault is much more crucial. This paper presents techniques to derive upper bounds for the mean time to recover from a single fault for self-stabilizing algorithms Markov chains in combination with lumping. To illustrate the applicability of the techniques they are applied to a self-stabilizing coloring algorithm.
Volker Turau and Gerry Siegemund. Scalable Routing for Topic-based Publish/Subscribe Systems under Fluctuations. In Proceedings of International Conference on Distributed Computing Systems - ICDCS 2017, June 2017, pp. 1608–1617. Atlanta, USA.
@InProceedings{Telematik_ICDS_2017, author = {Volker Turau and Gerry Siegemund}, title = {Scalable Routing for Topic-based Publish/Subscribe Systems under Fluctuations}, booktitle = {Proceedings of International Conference on Distributed Computing Systems - ICDCS 2017}, pages = {1608-1617}, day = {5-8}, month = jun, year = 2017, location = {Atlanta, USA}, }
Abstract: The loose coupling and the inherent scalability make publish/subscribe systems an ideal candidate for event-driven services for wireless networks using low power protocols such as IEEE 802.15.4. This work introduces a distributed algorithm to build and maintain a routing structure for such networks.The algorithm dynamically maintains a multicast tree for each node. While previous work focused on minimizing these trees we aim to keep the effort to maintain them in case of fluctuations of subscribers low. The multicast trees are implicitly defined by a novel structure called augmented virtual ring. The main contribution is a distributed algorithm to build and maintain this augmented virtual ring. Maintenance operations after sub-and unsubscriptions require message exchange in a limited region only. We compare the average lengths of the constructed forwarding paths with an almost ideal approach. As a result of independent interest we present a distributed algorithm using messages of size O(logn) for constructing virtual rings of graphs that are on average shorter than rings based on depth first search.
Gerry Siegemund and Volker Turau. PSVR - Self-Stabilizing Publish/Subscribe Communication for Ad-Hoc Networks (Short Paper). In Proceedings of Stabilization, Safety, and Security of Distributed Systems - 18th International Symposium, November 2016, pp. 346–351. Lyon, France.
@InProceedings{Telematik_SSS_2016, author = {Gerry Siegemund and Volker Turau}, title = {PSVR - Self-Stabilizing Publish/Subscribe Communication for Ad-Hoc Networks (Short Paper)}, booktitle = {Proceedings of Stabilization, Safety, and Security of Distributed Systems - 18th International Symposium}, pages = {346-351}, day = {7-10}, month = nov, year = 2016, location = {Lyon, France}, }
Abstract: PSVR is a novel routing algorithm for pub/sub systems in ad-hoc networks focusing on scenarios where communications links are unstable and nodes frequently change subscriptions. It is a compromise of size and maintenance effort for routing tables due to sub- and unsubscriptions and the length of routing paths. Designed in a self-stabilizing manner it scales well with network size. The evaluation with real world deployment reveals that PSVR only needs slightly more messages than a close to optimal routing structure for publication delivery, and creates shorter routing paths than an existing self-stabilizing algorithm.
Stefan Lohs, Jörg Nolte, Gerry Siegemund and Volker Turau. Influence of Topology-Fluctuations on Self-Stabilizing Algorithms. In 2016 International Conference on Distributed Computing in Sensor Systems DCOSS 2016 Poster Abstract, May 2016, pp. 122–124. Washington, DC, USA.
@InProceedings{DCOSS_POSTER_2016, author = {Stefan Lohs and J{\"o}rg Nolte and Gerry Siegemund and Volker Turau}, title = {Influence of Topology-Fluctuations on Self-Stabilizing Algorithms}, booktitle = {2016 International Conference on Distributed Computing in Sensor Systems DCOSS 2016 Poster Abstract}, pages = {122-124}, day = {26-28}, month = may, year = 2016, location = {Washington, DC, USA}, }
Abstract: Self-stabilizing systems have in theory the unique and provable ability, to always return to a valid system state even in the face of failures. These properties are certainly desirable for domains like wireless ad-hoc networks with numerous unpredictable faults. Unfortunately, the time in which the system returns to a valid state is not predictable and potentially unbound. The failure rate typically depends on physical phenomena and in self-stabilizing systems each node tries to react to failures in an inherently adaptive fashion by the cyclic observation of the states of its neighbors. When state changes are either too quick or too slow the system might never reach a state that is sufficiently stable for a specific task. In this paper, we investigate the influences of the error rate on the (stability) convergence time on the basis of topology information acquired in real network experiments. This allows us to asses the asymptotic behavior of relevant self-stabilizing algorithms in typical wireless networks.
Sandra Beyer, Stefan Lohs, Jörg Nolte, Reinhardt Karnapke and Gerry Siegemund. Self-Stabilizing Structures for Data Gathering in Wireless Sensor Networks. In Proceedings of the International Conference on Sensor Technologies and Applications (Sensorcomm), August 2015, pp. 1–8. Venice, Italy.
@InProceedings{Cottbus_Sensorcomm_2015, author = {Sandra Beyer and Stefan Lohs and J{\"o}rg Nolte and Reinhardt Karnapke and Gerry Siegemund}, title = {Self-Stabilizing Structures for Data Gathering in Wireless Sensor Networks}, booktitle = {Proceedings of the International Conference on Sensor Technologies and Applications (Sensorcomm)}, pages = {1-8}, month = aug, year = 2015, location = {Venice, Italy}, }
Abstract: Wireless Sensor Networks (WSN) enable a number of applications, with monitoring of habitats, office buildings, or restricted areas most prominent among them. All of these applications have one thing in common: the need to commu- nicate. However, the nature of the wireless medium results in quite a few problems. Lossy communication links with transient faults require acknowledgments, retransmissions, and route re- pair mechanisms. Tree- or similar structures for data gathering scenarios lead to increased load closer to the sink, with congestion, higher buffer space requirements, and energy drain as results. The second problem is often addressed by aggregation and reduction schemes. These schemes are bound to fail, however, when the underlying structure is compromised due to changes in the connectivity between nodes. Therefore, it is necessary to focus on the structures first of all. We address the problem of transient faults by using the inherent fault tolerance of self- stabilizing algorithms when building and using tree- or tiers (communication-) structures. In this paper we show that self- stabilizing structures are suitable for data gathering scenarios in WSN by comparison of the connectivity achieved by our self- stabilizing tiers algorithm and the tree algorithm from Dolev with that of Collection Tree Protocol (CTP), the standard data- gathering protocol for TinyOS.
Gerry Siegemund, Volker Turau and Christoph Weyer. A Dynamic Topology Control Algorithm for Wireless Sensor Networks. In Proceedings of the International Conference on Ad-hoc, Mobile and Wireless Networks, ADHOC-NOW 2015, June 2015, pp. 3–18. Athens, Greece.
@InProceedings{Telematik_Adhoc-Now_2015, author = {Gerry Siegemund and Volker Turau and Christoph Weyer}, title = {A Dynamic Topology Control Algorithm for Wireless Sensor Networks}, booktitle = {Proceedings of the International Conference on Ad-hoc, Mobile and Wireless Networks, ADHOC-NOW 2015}, pages = {3-18}, month = jun, year = 2015, location = {Athens, Greece}, }
Abstract: Topology control algorithms (TCAs) are used in wireless sensor networks to reduce interference by carefully choosing communication links. Since the quality of the wireless channel is subject to fluctuations over time TCAs must repeatedly recompute the topology. TCAs ensure quick adjustment to new or deteriorating links while preventing precipitant changes due to transient faults. This paper contributes a novel dynamic TCA that provides a compromise between agility and stability, and constructs connected topologies for low latency routing. Furthermore, it enforces memory restrictions and is of high practical relevance for real sensor network hardware.
Gerry Siegemund, Volker Turau and Khaled Maâmra. A Self-stabilizing Publish/Subscribe Middleware for Wireless Sensor Networks. In Proceedings of the International Conference on Networked Systems (NetSys), March 2015, pp. 1–8. Cottbus, Germany.
@InProceedings{Telematik_NetSys_2015, author = {Gerry Siegemund and Volker Turau and Khaled Maâmra}, title = {A Self-stabilizing Publish/Subscribe Middleware for Wireless Sensor Networks}, booktitle = {Proceedings of the International Conference on Networked Systems (NetSys)}, pages = {1-8}, month = mar, year = 2015, location = {Cottbus, Germany}, }
Abstract: This paper presents a scalable, self-stabilizing middleware for channel-based publish/subscribe systems for wireless sensor networks. The middleware eventually provides safety and liveness properties such as the guaranteed delivery of all published messages to all subscribers of the corresponding channel and the correct handling of subscriptions and unsubscriptions, while no error occurs. We consider transient message and memory corruptions and also respect dynamic network changes such as node and link removals and additions. We assume the message passing model and guarantee delivery of publications to new subscribers after O(n) steps.
Brahim Negazzi, Volker Turau, Mohammed Haddad and Hamamache Kheddouci. A Self-Stabilizing Algorithm for Edge Monitoring Problem. In Proceedings of the 16th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'14), September 2014, pp. 93–105. Paderborn, Germany.
@InProceedings{Telematik_SSS_2014_Edge_Monitoring_Problem, author = {Brahim Negazzi and Volker Turau and Mohammed Haddad and Hamamache Kheddouci}, title = {A Self-Stabilizing Algorithm for Edge Monitoring Problem}, booktitle = {Proceedings of the 16th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'14)}, pages = {93-105}, day = {28-30}, month = sep, year = 2014, location = {Paderborn, Germany}, }
Abstract: Self-monitoring is a simple and effective mechanism for the security of wireless sensor networks(WSNs), especially to cope against compromised nodes. A node v can monitor an edge e if both end-nodes of e are neighbors of v: i.e., e together with v forms a triangle in the graph. Moreover, some edges need more than one monitor. Finding a set of monitoring nodes satisfying all monitoring constraints is called the edge-monitoring problem. The minimum edge-monitoring problem is long known to be NP-complete. In this paper, we present a novel silent self-stabilizing algorithm for computing a minimal edge-monitoring set. Correctness and termination are proven for the unfair distributed daemon.
Gerry Siegemund, Volker Turau and Khaled Maâmra. Brief Announcement: Publish/Subscribe on Virtual Rings. In Proceedings of the 16th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'14), September 2014, pp. 343–345. Paderborn, Germany.
@InProceedings{Telematik_SSS_2014_Virtual_Ring, author = {Gerry Siegemund and Volker Turau and Khaled Maâmra}, title = {Brief Announcement: Publish/Subscribe on Virtual Rings}, booktitle = {Proceedings of the 16th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'14)}, pages = {343-345}, day = {28-30}, month = sep, year = 2014, location = {Paderborn, Germany}, }
Abstract: This paper introduces a scalable, self-stabilizing, channel-based publish/subscribe system for wireless sensor networks. As base structure a virtual ring is maintained. We consider message and memory corruptions and also respect dynamic network changes, such as, node and link removals and additions.
Gerry Siegemund, Volker Turau, Christoph Weyer, Stefan Lobs and Jörg Nolte. Brief Announcement: Agile and Stable Neighborhood Protocol for WSNs. In Proceedings of the 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'13), November 2013, pp. 376–378. Osaka, Japan.
@InProceedings{Telematik_SSS_2013_Neighborhood, author = {Gerry Siegemund and Volker Turau and Christoph Weyer and Stefan Lobs and J{\"o}rg Nolte}, title = {Brief Announcement: Agile and Stable Neighborhood Protocol for WSNs}, booktitle = {Proceedings of the 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'13)}, pages = {376-378}, day = {13-16}, month = nov, year = 2013, location = {Osaka, Japan}, }
Abstract: Self-stabilizing algorithms (SSA) are defined on the assumption that either the system's topology is fixed over time or topology changes are isolated events occurring at a very low rate. These assumptions are not valid in wireless sensor networks (WSNs) where link qualities change rapidly. The contribution of this paper is a neighbourhood management protocol (NMP) providing a neighbourhood relation sufficiently stable to apply existing SSAs in WSNs.
Brahim Negazzi, Volker Turau, Mohammed Haddad and Hamamache Kheddouci. A Self-Stabilizing Algorithm for Maximal p-Star Decomposition of General Graphs. In Proceedings of the 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'13), November 2013, pp. 74–85. Osaka, Japan.
@InProceedings{Telematik_SSS_2013_MaxPStar, author = {Brahim Negazzi and Volker Turau and Mohammed Haddad and Hamamache Kheddouci}, title = {A Self-Stabilizing Algorithm for Maximal p-Star Decomposition of General Graphs}, booktitle = {Proceedings of the 15th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'13)}, pages = {74-85}, day = {13-16}, month = nov, year = 2013, location = {Osaka, Japan}, }
Abstract: A p-star is a complete bipartite graph K{1,p} with one center node and p leaf nodes. In this paper we propose the first distributed self-stabilizing algorithm for graph decomposition into p-stars. For a graph G and an integer p>=1, this decomposition provides disjoint components of G where each component forms a p-star. We prove convergence and correctness of the algorithm under an unfair distributed daemon. The stabilization time is 2*floor(n/p+1)+2 rounds.
Gerry Siegemund, Volker Turau, Stefan Lohs and Jörg Nolte. Directed Link Utilization with Mahalle+. In Proceedings of the 12th GI/ITG KuVS Fachgespräch "Drahtlose Sensornetze" (FGSN'13), September 2013. Cottbus, Germany.
@InProceedings{Telematik_STLN_2013_DirectedLinkUtil, author = {Gerry Siegemund and Volker Turau and Stefan Lohs and J{\"o}rg Nolte}, title = {Directed Link Utilization with Mahalle+}, booktitle = {Proceedings of the 12th GI/ITG KuVS Fachgespr{\"a}ch "Drahtlose Sensornetze" (FGSN'13)}, day = {12-13}, month = sep, year = 2013, location = {Cottbus, Germany}, }
Abstract: Self-stabilization provides non-masking fault toler- ance in distributed systems. Self-stabilizing algorithms (SSA) are defined on the assumption that either the system’s topology is fixed over time or topology changes are isolated events occurring at a very low rate. These assumptions are not valid in wireless sensor networks (WSNs) where link qualities change rapidly. Therefore, neighborhood management protocols (NMP) are used to ensure the stability of the network topology for a longer time period. Furthermore, symmetrical links between nodes are more desirable than unsymmetrical ones, therefore, unidirectional links are often omitted. This paper presents an augmentation of the NMP Mahalle+to transparently utilize certain unidirectional links to increase the performance of SSAs running on top of Mahalle+.
Gerry Siegemund and Stefan Lohs. GlueAPI - Joining REFLEX and CometOS. Technical Report urn:nbn:de:gbv:830-tubdok-12285, Hamburg University of Technology, Hamburg, Germany, 2013.
@TechReport{Telematik_Siegemund_2013_GlueAPI, author = {Gerry Siegemund and Stefan Lohs}, title = {GlueAPI - Joining REFLEX and CometOS}, number = {urn:nbn:de:gbv:830-tubdok-12285}, institution = {Hamburg University of Technology}, address = {Hamburg, Germany}, year = 2013, }
Abstract: Wireless sensor networks (WSN) are build for various tasks in arbitrary environments. Operating systems for sensor nodes, the components of WSNs, appear in multiple forms, from a variety of vendors, universities, or private persons. The GlueAPI is a merge layer to combine the usability of two such operating systems: REFLEX of the Brandenburg University of Technology and CometOS of the Hamburg University of Technology. Both operating systems are suitable for simulations using the OMNeT++ framework and several hardware platforms.
Stefan Lohs, Gerry Siegemund, Jörg Nolte and Volker Turau. Mission Statement: ToleranceZone A Self-Stabilizing Middleware for Wireless Sensor Netzworks. In Proceedings of the 11th GI/ITG KuVS Fachgespräch "Drahtlose Sensornetze" (FGSN'12), September 2012. Darmstadt, Germany.
@InProceedings{Telematik_LSNT_2012_TZone_Mission, author = {Stefan Lohs and Gerry Siegemund and J{\"o}rg Nolte and Volker Turau}, title = {Mission Statement: ToleranceZone A Self-Stabilizing Middleware for Wireless Sensor Netzworks}, booktitle = {Proceedings of the 11th GI/ITG KuVS Fachgespr{\"a}ch "Drahtlose Sensornetze" (FGSN'12)}, day = {13-14}, month = sep, year = 2012, location = {Darmstadt, Germany}, }
Abstract: Wireless sensor networks (WSN) can be used in a wide range of monitoring and controlling applications. These networks consist of nodes with sparse resources, which makes application implementation challenging. Therefore, many mid- dleware systems were developed in the last decade. Furthermore, unattended and long-living deployments of WSNs need fault-tolerant software architectures. The goal of the TOLERANCEZONE project is to design a self-stabilizing middleware, which supports the development of autonomously recovering and highly fault-tolerant WSN applications.

Students' theses

Completed Theses