Files
tg-admin-bot/CONFIG.en.md

190 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Configuration
This project uses `config.yaml`. Start from `config.example.yaml`.
## telegram
- `token` (string, required): Telegram bot token.
- `admin_id` (int, required): Telegram user id with admin access.
- `admin_ids` (list<int>): Optional list of admins (first is primary for alerts).
## paths
- `artifact_state` (string): JSON file for artifact state.
- `runtime_state` (string): File for runtime state (mutes, metrics, etc.).
- `restic_env` (string): Path to a file with RESTIC_* environment variables.
## thresholds
- `disk_warn` (int, percent): Disk usage warning threshold.
- `load_warn` (float): Load warning threshold.
- `high_load_warn` (float): Critical load threshold.
## alerts
- `enabled` (bool): Enable resource alerts.
- `interval_sec` (int): Poll interval.
- `cooldown_sec` (int): Cooldown between alerts.
- `notify_cooldown_sec` (int): Global alert dedup cooldown (defaults to `cooldown_sec`).
- `load_only_critical` (bool): Only send critical load alerts (no warn/OK).
- `quiet_hours` (object): Quiet hours for noncritical alerts.
- `enabled` (bool): Enable quiet hours.
- `start` (string): Start time `HH:MM` (e.g. `23:00`).
- `end` (string): End time `HH:MM` (e.g. `08:00`).
- `allow_critical` (bool): Allow critical alerts during quiet hours.
- `auto_mute` (list): Per-category auto mutes by time window.
- `category` (string): load/disk/smart/raid/ssl/docker/test.
- `start` (string): Start `HH:MM`.
- `end` (string): End `HH:MM` (can wrap over midnight).
- `auto_mute_on_high_load_sec` (int): auto-mute `load` category for N seconds on critical load (0 disables).
- `notify_recovery` (bool): Send recovery notifications.
- `smart_enabled` (bool): Enable SMART health polling.
- `smart_interval_sec` (int): SMART poll interval.
- `smart_cooldown_sec` (int): SMART alert cooldown.
- `smart_temp_warn` (int): SMART temperature warning (C).
- `raid_enabled` (bool): Enable md RAID polling (`/proc/mdstat`).
- `raid_interval_sec` (int): RAID poll interval.
- `raid_cooldown_sec` (int): RAID alert cooldown.
## disk_report
- `threshold` (int): Disk usage threshold for auto snapshot.
- `cooldown_sec` (int): Cooldown between snapshots.
- `top_dirs` (int): How many directories to show.
- `docker_dir` (string): Path to docker data.
- `logs_dir` (string): Path to logs.
## audit
- `enabled` (bool): Enable audit logging.
- `path` (string): Log file path. Default `/var/server-bot/audit.log`.
- `rotate_when` (string): Rotation schedule for `TimedRotatingFileHandler`. Example `W0` for weekly on Monday.
- `backup_count` (int): How many rotated files to keep.
## incidents
- `enabled` (bool): Enable incidents logging.
- `path` (string): Log file path. Default `/var/server-bot/incidents.log`.
- `rotate_when` (string): Rotation schedule for `TimedRotatingFileHandler`. Example `W0` for weekly on Monday.
- `backup_count` (int): How many rotated files to keep.
## logging
- `enabled` (bool): Enable bot logging.
- `path` (string): Log file path. Default `/var/server-bot/bot.log`.
- `rotate_when` (string): Rotation schedule for `TimedRotatingFileHandler`. Example `W0` for weekly on Monday.
- `backup_count` (int): How many rotated files to keep.
- `level` (string): Log level (`INFO`, `WARNING`, `ERROR`).
## safety
- `dry_run` (bool): If `true`, dangerous actions (upgrade/reboot/backup) are skipped.
## reports
- `weekly.enabled` (bool): Enable weekly report.
- `weekly.day` (string): Weekday `Mon`..`Sun` (default `Sun`).
- `weekly.time` (string): Local time `HH:MM` (default `08:00`).
## selftest
- `schedule.enabled` (bool): Enable auto self-test.
- `schedule.time` (string): Local time `HH:MM` (default `03:30`).
## queue
- `max_pending_alert` (int): Alert if pending tasks >= this value.
- `avg_wait_alert` (int): Alert if average wait exceeds N seconds.
- `cooldown_sec` (int): Cooldown between queue alerts (default 300s).
## external_checks
- `enabled` (bool): Enable background checks.
- `state_path` (string): State file for uptime, default `/var/server-bot/external_checks.json`.
- `timeout_sec` (int): Check timeout in seconds.
- `interval_sec` (int): Background check interval.
- `services` (list): List of checks.
- `name` (string): Service name.
- `type` (string): `http`, `tcp`, `ping`.
- `url` (string): URL for `http`.
- `host` (string): Host for `tcp`/`ping`.
- `port` (int): Port for `tcp`.
## arcane
- `base_url` (string): Arcane API base url.
- `api_key` (string): Arcane API key.
- `env_id` (int): Arcane environment id.
## npmplus
Used for SSL certificate status.
- `base_url` (string): NPMplus API base url, for example `https://10.10.10.10:81/api`.
- `identity` (string): Login email.
- `secret` (string): Login password.
- `token` (string): Optional static token (not recommended if it expires).
- `verify_tls` (bool): Set to `false` for self-signed TLS.
- `alerts.enabled` (bool): Enable expiry notifications.
- `alerts.days` (list): Thresholds in days (e.g. 30/14/7/1).
- `alerts.cooldown_sec` (int): Cooldown between identical alerts.
- `alerts.interval_sec` (int): Check interval.
Token flow:
- First token: `POST /api/tokens` with `identity` and `secret`.
- Refresh: `GET /api/tokens` using the cached token.
## gitea
- `base_url` (string): Gitea base url, for example `http://localhost:3000`.
- `token` (string): Optional API token.
- `verify_tls` (bool): Set to `false` for self-signed TLS.
## openwrt
- `host` (string): Router address, for example `10.10.10.1`.
- `user` (string): SSH user (usually `root`).
- `port` (int): SSH port (usually `22`).
- `identity_file` (string): Path to SSH key (optional).
- `strict_host_key_checking` (bool): Set to `false` to skip key confirmation.
- `timeout_sec` (int): SSH request timeout.
## security
- `reboot_password` (string): Password required before reboot.
## docker
- `autodiscovery` (bool): Discover containers by name/label.
- `watchdog` (bool): Enable container watchdog notifications.
- `label` (string): Optional label filter `key=value`.
- `match` (list): Name substrings used for discovery.
- `aliases` (map): Alias -> real container name.
- `containers` (map): Explicit container list (legacy modules). Each item can define:
- `name` (string): Container name.
- `url` (string): Health URL for the URLs check.
Example:
```yaml
telegram:
token: "YOUR_TELEGRAM_BOT_TOKEN"
admin_id: 123456789
paths:
artifact_state: "/opt/tg-bot/state.json"
restic_env: "/etc/restic/restic.env"
audit:
enabled: true
path: "/var/server-bot/audit.log"
rotate_when: "W0"
backup_count: 8
npmplus:
base_url: "https://10.10.10.10:81/api"
identity: "your@email.com"
secret: "yourPassword"
verify_tls: false
```