WP 2 – Monitoring EdgeCompute Infrastructures

Realizing factory automation where industrial processes are managed from the edge or cloud means that the real-time requirements must be supported beyond the network, also involving the edge and cloud infrastructure and the control applications themselves. For flexibility, scalability and cost, such edge and cloud infrastructures are being realized through lightweight virtualization technologies such as docker containers managed by frameworks such as Kubernetes (K8s). When industrial control applications that previously ran on dedicated hardware transform into a collection of containerized micro-services running in the edge/cloud, ensuring that real-time guarantees are met across the entire communication and processing chain becomes a challenge. In this context, performance monitoring becomes crucial in order to understand system performance, detect service violations, and initiate needed system re-configurations and optimizations. Effective monitoring requires capturing both the performance of the various infrastructure components involved and the performance of the user traffic or IoT application as well as methods to correlate behaviors and derive results from the collected data.

This WP contributes to answering the following research questions concerning real-time performance monitoring and optimization of container-based edge services:

  • How can we design a distributed real-time performance monitoring framework for container-based edge services?
  • How can we apply technologies such as BPF for flexible and scalable performance monitoring of containers?
  • How can we extend network telemetry to capture fine-grained timing of application traffic across container chains?
  • How can we use the monitoring data to model, validate, predict and optimize system performance for meeting the timing requirements of real-time industrial IoT applications?

The following activities will be carried out in WP2:
Activity 2.1: Design of a real-time monitoring framework
Activity 2.2: System monitoring
Activity 2.3: Application traffic performance monitoring
Activity 2.4: Data analytics and optimization