Ansible ins Repo migrieren und zentrale SSH-Keys in shared/ssh.

Playbooks liegen unter pve1/ansible und pve2/ansible; authorized_keys als Fragmente mit Deploy-Skript und Ziel-Matrix für Proxmox, VM 101 und CTs. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-28 11:24:31 +02:00
parent 842e66996f
commit e98e3a2b84
27 changed files with 876 additions and 5 deletions
@@ -14,17 +14,25 @@ Stattdessen:
 ```
 /etc/cron.weekly/pve-lxc-disk-maintenance
        ↓ (Symlink)
-/root/ansible/run-disk-maintenance.sh
+/root/ansible/run-disk-maintenance.sh   ← Symlink nach /root/docu/pve2/ansible
        ↓
 ansible-playbook playbooks/disk-maintenance.yml
        ↓ SSH
   docker (101) · media (109) · AIDEV (110)
 ```

-## Verzeichnisstruktur
+## Verzeichnisstruktur (Git)
+
+Quelle im Repo **`docu`**, auf pve2 deployen:
+
+```bash
+cd /root/docu && git pull
+ln -sfn /root/docu/pve2/ansible /root/ansible
+```

 ```
-/root/ansible/
+/root/docu/pve2/ansible/          # (= /root/ansible nach Symlink)
+├── README.md
 ├── ansible.cfg
 ├── run-disk-maintenance.sh      → von cron.weekly aufgerufen
 ├── inventory/
@@ -39,6 +47,8 @@ ansible-playbook playbooks/disk-maintenance.yml
        └── handlers/main.yml
 ```

+SSH-Keys für Ansible → [../shared/ssh/README.md](../shared/ssh/README.md)
+
 ## Verwaltete Hosts

 | Ansible-Host | VMID | IP | Besonderheiten |
@@ -47,7 +57,7 @@ ansible-playbook playbooks/disk-maintenance.yml
 | media | 109 | 192.168.20.6 | Jellyfin-Cache-Pfad |
 | aidev | 110 | 10.100.2.13 | Dev-Tooling optional |

-SSH als `root` vom Proxmox-Host — Key-Auth war bereits eingerichtet.
+SSH als `root` vom Proxmox-Host — Public Key `root@pve2` muss in den CTs stehen ([shared/ssh](../shared/ssh/README.md)).

 ## Was das Playbook macht

@@ -101,7 +111,7 @@ echo '0 3 * * * root /root/ansible/run-disk-maintenance.sh' > /etc/cron.d/pve-lx

 ## Konfiguration anpassen

-Globale Werte: `/root/ansible/inventory/group_vars/all.yml`
+Globale Werte: `/root/docu/pve2/ansible/inventory/group_vars/all.yml` (oder `/root/ansible/…` via Symlink)

 ```yaml
 journal_max_size: 200M
@@ -0,0 +1,42 @@
+# Ansible auf pve2 — LXC Disk Maintenance
+
+Wöchentliche Wartung für CTs **101 docker**, **109 media**, **110 AIDEV** per SSH vom Proxmox-Host.
+
+| Pfad | Inhalt |
+|------|--------|
+| [ansible.cfg](ansible.cfg) | Defaults |
+| [inventory/hosts.yml](inventory/hosts.yml) | Hosts + CT-Variablen |
+| [inventory/group_vars/all.yml](inventory/group_vars/all.yml) | Schwellwerte |
+| [playbooks/disk-maintenance.yml](playbooks/disk-maintenance.yml) | Playbook |
+| [roles/disk_cleanup/](roles/disk_cleanup/) | Tasks (Journal, Docker, fstrim, …) |
+| [run-disk-maintenance.sh](run-disk-maintenance.sh) | Cron-Einstieg |
+
+Doku: [../06_Ansible-Automatisierung.md](../06_Ansible-Automatisierung.md)
+
+## Ausführen
+
+```bash
+cd /root/docu/pve2/ansible   # oder: /root/ansible → Symlink
+./run-disk-maintenance.sh
+# oder
+ansible-playbook playbooks/disk-maintenance.yml
+```
+
+## Cron (pve2)
+
+```text
+/etc/cron.weekly/pve-lxc-disk-maintenance → /root/ansible/run-disk-maintenance.sh
+```
+
+Nach Symlink auf dieses Verzeichnis bleibt der Cron gültig.
+
+## Deploy
+
+```bash
+cd /root/docu && git pull
+ln -sfn /root/docu/pve2/ansible /root/ansible
+```
+
+## SSH
+
+Ansible verbindet als **root** zu den CTs — Host-Key `root@pve2` muss in CT `authorized_keys` stehen → [../../shared/ssh/README.md](../../shared/ssh/README.md).
@@ -0,0 +1,12 @@
+[defaults]
+inventory = inventory/hosts.yml
+roles_path = roles
+remote_user = root
+host_key_checking = False
+retry_files_enabled = False
+gathering = implicit
+stdout_callback = yaml
+interpreter_python = auto_silent
+
+[privilege_escaping]
+paramiko = ansible.paramiko_ssh.paramiko_ssh
@@ -0,0 +1,33 @@
+---
+# Disk maintenance defaults — tune per host in inventory if needed
+disk_maintenance_enabled: true
+
+# systemd journal
+journal_max_size: 200M
+
+# Docker
+docker_prune_stopped_containers_older_than: 168h   # 7 days
+docker_prune_dangling_images: true
+docker_prune_unused_images_older_than: 336h        # 14 days (aggressive tag)
+docker_prune_build_cache_older_than: 336h
+docker_prune_dangling_volumes: true
+docker_log_truncate_threshold: 50M
+docker_log_truncate_target: 10M
+
+# LVM thin provisioning — critical on Proxmox local-lvm / nvme_second
+fstrim_enabled: true
+
+# Frigate recordings on docker CT (matches config.yaml retain.days: 30)
+frigate_recordings_retain_days: 30
+frigate_clips_retain_days: 14
+
+# Jellyfin transcode/image cache (not metadata — that is library artwork)
+jellyfin_cache_max_age_days: 30
+
+# Optional dev tooling (AIDEV)
+npm_cache_clean: false
+apt_clean: true
+
+# Alert thresholds for summary output
+disk_warn_percent: 80
+thin_pool_warn_percent: 85
@@ -0,0 +1,17 @@
+all:
+  children:
+    lxc_containers:
+      hosts:
+        docker:
+          ansible_host: 192.168.10.101
+          proxmox_vmid: 101
+          frigate_recordings_path: /mnt/records/recordings
+          frigate_clips_path: /mnt/records/clips
+        media:
+          ansible_host: 192.168.20.6
+          proxmox_vmid: 109
+          jellyfin_cache_path: /opt/stacks/jellyfin/config/cache
+        aidev:
+          ansible_host: 10.100.2.13
+          proxmox_vmid: 110
+          dev_tooling_cleanup: true
@@ -0,0 +1,37 @@
+---
+# Weekly disk maintenance for Proxmox LXC containers
+# Run from the Proxmox host: ansible-playbook playbooks/disk-maintenance.yml
+#
+# Tags:
+#   aggressive  — also prune unused images older than 14 days
+#   frigate     — enforce recording/clip retention on docker CT
+#   jellyfin    — clean stale transcode/image cache on media CT
+#   dev-tooling — npm cache clean on AIDEV (off by default)
+
+- name: LXC disk maintenance
+  hosts: lxc_containers
+  become: true
+  gather_facts: true
+  vars:
+    disk_maintenance_enabled: true
+  roles:
+    - role: disk_cleanup
+      when: disk_maintenance_enabled | bool
+
+- name: Report Proxmox thin pool usage
+  hosts: localhost
+  connection: local
+  gather_facts: false
+  tasks:
+    - name: Get LVM thin pool stats
+      ansible.builtin.shell: lvs pve/data nvme_second/nvme_second -o vg_name,lv_name,data_percent 2>/dev/null --noheadings
+      register: thin_pools
+      changed_when: false
+
+    - name: Thin pool summary
+      ansible.builtin.debug:
+        msg: |
+          Proxmox thin pools after maintenance:
+          {{ thin_pools.stdout }}
+
+          Schedule: see /etc/cron.weekly/pve-lxc-disk-maintenance
@@ -0,0 +1,17 @@
+---
+journal_max_size: 200M
+docker_prune_stopped_containers_older_than: 168h
+docker_prune_dangling_images: true
+docker_prune_unused_images_older_than: 336h
+docker_prune_build_cache_older_than: 336h
+docker_prune_dangling_volumes: true
+docker_log_truncate_threshold: 50M
+docker_log_truncate_target: 10M
+fstrim_enabled: true
+frigate_recordings_retain_days: 30
+frigate_clips_retain_days: 14
+jellyfin_cache_max_age_days: 30
+npm_cache_clean: false
+apt_clean: true
+disk_warn_percent: 80
+thin_pool_warn_percent: 85
@@ -0,0 +1,5 @@
+---
+- name: Restart docker
+  ansible.builtin.service:
+    name: docker
+    state: restarted
@@ -0,0 +1,224 @@
+---
+- name: Disk usage before maintenance
+  ansible.builtin.shell: df -hT / | tail -1
+  register: disk_before
+  changed_when: false
+
+- name: Show disk before
+  ansible.builtin.debug:
+    msg: "{{ inventory_hostname }} before: {{ disk_before.stdout }}"
+
+- name: Vacuum systemd journal
+  ansible.builtin.command: "journalctl --vacuum-size={{ journal_max_size }}"
+  register: journal_vacuum
+  changed_when: "'Vacuuming done' in journal_vacuum.stdout"
+  failed_when: false
+
+- name: Clean apt cache
+  ansible.builtin.apt:
+    autoclean: true
+    autoremove: true
+    clean: true
+  when: apt_clean | bool
+
+- name: Check if docker is available
+  ansible.builtin.command: docker info
+  register: docker_info
+  changed_when: false
+  failed_when: false
+  tags: [always, docker]
+
+- name: Truncate oversized Docker container logs
+  ansible.builtin.shell: |
+    set -o pipefail
+    find /var/lib/docker/containers -name '*-json.log' -size +{{ docker_log_truncate_threshold }} \
+      -exec truncate -s {{ docker_log_truncate_target }} {} \;
+    echo done
+  args:
+    executable: /bin/bash
+  register: log_truncate
+  changed_when: log_truncate.stdout is search('done')
+  when: docker_info is defined and docker_info.rc == 0
+
+- name: Prune stopped containers
+  ansible.builtin.command: >-
+    docker container prune -f --filter until={{ docker_prune_stopped_containers_older_than }}
+  register: container_prune
+  changed_when: "'Total reclaimed space' in container_prune.stdout and '0B' not in container_prune.stdout.split('Total reclaimed space')[1].split('\n')[0]"
+  when: docker_info is defined and docker_info.rc == 0
+
+- name: Prune dangling images
+  ansible.builtin.command: docker image prune -f
+  register: image_prune_dangling
+  changed_when: "'Total reclaimed space' in image_prune_dangling.stdout and '0B' not in image_prune_dangling.stdout.split('Total reclaimed space')[1].split('\n')[0]"
+  when:
+    - docker_info is defined
+    - docker_info.rc == 0
+    - docker_prune_dangling_images | bool
+
+- name: Prune unused images older than threshold
+  ansible.builtin.command: >-
+    docker image prune -af --filter until={{ docker_prune_unused_images_older_than }}
+  register: image_prune_old
+  changed_when: "'Total reclaimed space' in image_prune_old.stdout and '0B' not in image_prune_old.stdout.split('Total reclaimed space')[1].split('\n')[0]"
+  when:
+    - docker_info is defined
+    - docker_info.rc == 0
+    - docker_prune_unused_images_older_than | length > 0
+  tags:
+    - aggressive
+
+- name: Prune docker build cache
+  ansible.builtin.command: >-
+    docker builder prune -af --filter until={{ docker_prune_build_cache_older_than }}
+  register: builder_prune
+  changed_when: "'Total:' in builder_prune.stdout"
+  failed_when: false
+  when: docker_info is defined and docker_info.rc == 0
+
+- name: Prune dangling docker volumes
+  ansible.builtin.command: docker volume prune -f
+  register: volume_prune
+  changed_when: "'Total reclaimed space' in volume_prune.stdout and '0B' not in volume_prune.stdout.split('Total reclaimed space')[1].split('\n')[0]"
+  when:
+    - docker_info is defined
+    - docker_info.rc == 0
+    - docker_prune_dangling_volumes | bool
+
+- name: Check for existing Docker daemon.json
+  ansible.builtin.stat:
+    path: /etc/docker/daemon.json
+  register: docker_daemon_json
+  when: docker_info is defined and docker_info.rc == 0
+
+- name: Ensure Docker log rotation defaults
+  ansible.builtin.copy:
+    dest: /etc/docker/daemon.json
+    owner: root
+    group: root
+    mode: "0644"
+    force: false
+    content: |
+      {
+        "log-driver": "json-file",
+        "log-opts": {
+          "max-size": "10m",
+          "max-file": "3"
+        }
+      }
+  notify: Restart docker
+  when:
+    - docker_info is defined
+    - docker_info.rc == 0
+    - not docker_daemon_json.stat.exists
+
+- name: Remove old Frigate recording day folders
+  ansible.builtin.shell: |
+    set -euo pipefail
+    retain={{ frigate_recordings_retain_days }}
+    cutoff=$(date -d "-${retain} days" +%Y-%m-%d)
+    removed=0
+    for d in "{{ frigate_recordings_path }}"/20??-??-??; do
+      [ -d "$d" ] || continue
+      day=$(basename "$d")
+      if [[ "$day" < "$cutoff" ]]; then
+        rm -rf "$d"
+        echo "removed $day"
+        removed=1
+      fi
+    done
+    [ "$removed" -eq 0 ] || true
+  args:
+    executable: /bin/bash
+  register: frigate_recording_cleanup
+  changed_when: frigate_recording_cleanup.stdout | length > 0
+  when:
+    - frigate_recordings_path is defined
+    - frigate_recordings_path | length > 0
+  tags:
+    - frigate
+
+- name: Remove old Frigate clip previews
+  ansible.builtin.find:
+    paths: "{{ frigate_clips_path | default('') }}/previews"
+    age: "{{ frigate_clips_retain_days }}d"
+    file_type: any
+    recurse: true
+  register: old_frigate_clips
+  when:
+    - frigate_clips_path is defined
+    - frigate_clips_path | length > 0
+
+- name: Delete old Frigate clip files
+  ansible.builtin.file:
+    path: "{{ item.path }}"
+    state: absent
+  loop: "{{ old_frigate_clips.files | default([]) }}"
+  when:
+    - frigate_clips_path is defined
+    - frigate_clips_path | length > 0
+  loop_control:
+    label: "{{ item.path }}"
+  tags:
+    - frigate
+
+- name: Clean stale Jellyfin cache files
+  ansible.builtin.find:
+    paths: "{{ jellyfin_cache_path | default('') }}"
+    age: "{{ jellyfin_cache_max_age_days }}d"
+    file_type: file
+    recurse: true
+  register: old_jellyfin_cache
+  when:
+    - jellyfin_cache_path is defined
+    - jellyfin_cache_path | length > 0
+
+- name: Delete stale Jellyfin cache
+  ansible.builtin.file:
+    path: "{{ item.path }}"
+    state: absent
+  loop: "{{ old_jellyfin_cache.files | default([]) }}"
+  when:
+    - jellyfin_cache_path is defined
+    - jellyfin_cache_path | length > 0
+  loop_control:
+    label: "{{ item.path }}"
+  tags:
+    - jellyfin
+
+- name: Clean npm cache on dev hosts
+  ansible.builtin.command: npm cache clean --force
+  when:
+    - dev_tooling_cleanup | default(false) | bool
+    - npm_cache_clean | bool
+  changed_when: true
+  failed_when: false
+  tags:
+    - dev-tooling
+
+- name: Run fstrim on root filesystem
+  ansible.builtin.command: fstrim -v /
+  register: fstrim_result
+  changed_when: "'trimmed' in fstrim_result.stdout and '0 B' not in fstrim_result.stdout"
+  when: fstrim_enabled | bool
+
+- name: Docker disk summary
+  ansible.builtin.command: docker system df
+  register: docker_df
+  changed_when: false
+  failed_when: false
+  when: docker_info is defined and docker_info.rc == 0
+
+- name: Disk usage after maintenance
+  ansible.builtin.shell: df -hT / | tail -1
+  register: disk_after
+  changed_when: false
+
+- name: Maintenance summary
+  ansible.builtin.debug:
+    msg: |
+      {{ inventory_hostname }}:
+        before: {{ disk_before.stdout }}
+        after:  {{ disk_after.stdout }}
+        fstrim: {{ fstrim_result.stdout | default('skipped') }}
+        docker: {{ docker_df.stdout | default('n/a') }}
@@ -0,0 +1,9 @@
+#!/bin/bash
+# Weekly disk maintenance — runs Ansible playbook from Proxmox host
+set -euo pipefail
+export ANSIBLE_CONFIG=/root/ansible/ansible.cfg
+LOG=/var/log/pve-lxc-disk-maintenance.log
+exec >>"$LOG" 2>&1
+echo "=== $(date -Is) disk maintenance start ==="
+ansible-playbook /root/ansible/playbooks/disk-maintenance.yml
+echo "=== $(date -Is) disk maintenance done ==="