6f52d46267
Enthält Host-Doku, MQTT/HA, Git-Setup, Power-Monitoring und GPU-Idle (pve2). Co-authored-by: Cursor <cursoragent@cursor.com>
96 lines
2.4 KiB
Markdown
96 lines
2.4 KiB
Markdown
# pve2 — Power-MQTT-Agent
|
||
|
||
CPU (Intel RAPL) + GPU (`nvidia-smi`) → MQTT → Home Assistant.
|
||
|
||
## Installation
|
||
|
||
| Komponente | Pfad |
|
||
|------------|------|
|
||
| Binary | `/usr/local/bin/pve-power-mqtt` |
|
||
| systemd | `/etc/systemd/system/pve-power-mqtt.service` |
|
||
| Env | `/etc/pve-power-mqtt.env` |
|
||
| Quellcode | `/root/code/pve-power-mqtt` |
|
||
| Repo | https://git.jeanavril.com/jean/server-power.git |
|
||
|
||
## Konfiguration `/etc/pve-power-mqtt.env`
|
||
|
||
```ini
|
||
POWER_MQTT_BROKER=tcp://homeassistant.iot:1883
|
||
POWER_MQTT_USER=server
|
||
POWER_MQTT_PASSWORD="F0x84rAOW#q@LX"
|
||
POWER_MQTT_HOSTNAME=pve2
|
||
POWER_MQTT_DISCOVERY=true
|
||
```
|
||
|
||
Client-ID: **`pve-power-mqtt-pve2`**
|
||
|
||
## HA-Sensoren
|
||
|
||
| Entity | Quelle |
|
||
|--------|--------|
|
||
| sensor.pve2_cpu_power | RAPL |
|
||
| sensor.pve2_gpu0_power | GPU 0 |
|
||
| sensor.pve2_gpu1_power | GPU 1 |
|
||
| sensor.pve2_estimated_total | CPU + GPU0 + GPU1 |
|
||
|
||
`estimated_total` ist **kein** Netzteil-/PDU-Wert.
|
||
|
||
Broker-Details: [../shared/mqtt-homeassistant.md](../shared/mqtt-homeassistant.md)
|
||
|
||
## Build & Deploy
|
||
|
||
```bash
|
||
cd /root/code/pve-power-mqtt
|
||
git pull
|
||
export PATH="/usr/local/go/bin:$PATH"
|
||
go build -o pve-power-mqtt ./cmd/pve-power-mqtt
|
||
install -m 755 pve-power-mqtt /usr/local/bin/pve-power-mqtt
|
||
systemctl restart pve-power-mqtt
|
||
```
|
||
|
||
## NVIDIA Persistence (Voraussetzung für sinnvolle GPU-Idle-Werte)
|
||
|
||
```bash
|
||
systemctl status nvidia-persistenced
|
||
nvidia-smi --query-gpu=power.draw,pstate,persistence_mode --format=csv
|
||
```
|
||
|
||
Erwartung idle: **P8**, ~8–9 W pro GTX 1080.
|
||
|
||
Siehe [09_GPU-Idle-vollstaendig.md](09_GPU-Idle-vollstaendig.md)
|
||
|
||
## Agent vs. GPU Idle
|
||
|
||
`nvidia-smi` alle **5 s** kann GPUs kurz aus P8 wecken — für reine Idle-Messung:
|
||
|
||
```bash
|
||
systemctl stop pve-power-mqtt
|
||
sleep 60
|
||
nvidia-smi --query-gpu=power.draw,pstate --format=csv
|
||
systemctl start pve-power-mqtt
|
||
```
|
||
|
||
Optional: GPU-Intervall im Code erhöhen (z. B. 60 s) — siehe server-power Repo.
|
||
|
||
## Betrieb
|
||
|
||
```bash
|
||
systemctl status pve-power-mqtt
|
||
journalctl -u pve-power-mqtt -f
|
||
```
|
||
|
||
## Fixes (Historie)
|
||
|
||
- `expire_after` / `availability_topic` aus Discovery entfernt (HA „unavailable“)
|
||
- Eindeutige Client-IDs pro Host
|
||
- Keepalive 120 s, Ping-Timeout 30 s
|
||
- MQTT-Reconnect-Logging
|
||
|
||
## Troubleshooting
|
||
|
||
| Problem | Lösung |
|
||
|---------|--------|
|
||
| GPU unavailable in HA | Agent läuft? `nvidia-smi` auf Host? |
|
||
| Hohe GPU-Idle-Werte | Persistence + LXC-Mounts prüfen (CT 101 ohne NVIDIA) |
|
||
| MQTT timeout | VLAN 10→40, Broker homeassistant.iot erreichbar? |
|