Fleet Management¶
This use case describes how to manage a large fleet of IoT devices across multiple tenants using Complete Device Management.
Stack context
Device management plane (ThingsBoard, hawkBit, WireGuard) lives in the Tenant-Stack. Platform-wide fleet dashboards are in the Provider-Stack (Grafana → Provider TimescaleDB).
Scenario¶
An industrial equipment manufacturer ships 500 Linux-based controllers to customers across three regions. Each customer is a separate tenant. The manufacturer needs to:
- Provision all devices automatically when they first power on at the customer site.
- Push firmware updates in a controlled, staged manner without disrupting production.
- Monitor device health in real time.
- Allow on-call engineers to remotely debug devices without VPN client software on their laptops.
Setup¶
1. Deploy a Tenant-Stack per Customer¶
Each customer operates an independent Tenant-Stack. Follow the Tenant Onboarding use case to provision the stack and connect it to the Provider-Stack via the JOIN workflow.
2. Assign Operators¶
Add operator users to the tenant Keycloak realm. They receive:
cdm-operatorrole → ThingsBoard Customer User, hawkBit read + trigger, Grafana Editor.
3. Pre-configure Device Images¶
Bake the following into the Yocto OS image before shipping:
/opt/cdm/enroll.sh — enrollment script
/opt/cdm/ca-fingerprint — Tenant step-ca Sub-CA fingerprint
/etc/cdm/device-config — TENANT_API_URL, TB_MQTT_HOST, HAWKBIT_URL, TSDB_URL
The device ID is derived from the hardware serial number at first boot.
Day-to-Day Operations¶
Viewing the Fleet¶
- Open ThingsBoard → Devices (filter by tenant or device profile
cdm-x509). - The device list shows:
- Online/offline status (last activity timestamp)
- Current firmware version (
sw_versionattribute) - WireGuard IP
- Active alarm count
Triggering a Fleet-Wide Firmware Update¶
- Build and sign the RAUC bundle in CI/CD.
- Upload to hawkBit (automate via REST API in your CI pipeline).
- Create a rollout:
- Group 1: 5% of devices (canary) —
actionType: soft(device installs at next reboot) - Group 2: 25% — activated after Group 1 reaches 95% success
- Group 3: 70% — activated after Group 2 reaches 95% success
- Monitor in hawkBit Rollout view and Grafana OTA dashboard.
Handling a Failed Update¶
If a device reports ota_status: failure:
- ThingsBoard raises an OTA Failure alarm.
- Operator opens the Terminal Widget and inspects logs:
- If the bundle was corrupt, re-upload a corrected version and re-trigger the deployment.
- RAUC automatically reverts to the previous slot after failed boot attempts.
Scaling Considerations¶
| Scale | Recommendation |
|---|---|
| < 100 devices | Single Docker Compose node is sufficient |
| 100–1000 devices | Separate DB nodes (managed PostgreSQL, MySQL); keep app containers on Docker Compose |
| > 1000 devices | Move to Kubernetes with Helm charts; scale ThingsBoard and TimescaleDB horizontally |
| > 10,000 devices | Consider ThingsBoard PE (cluster mode), TimescaleDB distributed hypertables, and hawkBit cluster |