Commit Graph

218 Commits

Author SHA1 Message Date
Claude
4162186ae0 Merge remote-tracking branch 'upstream/main' into feat/network-probes
# Conflicts:
#	agent/connection_manager.go
2026-04-18 01:19:49 +00:00
Uğur Tafralı
a71617e058 feat(agent): Add EXIT_ON_DNS_ERROR environment variable (#1929)
Co-authored-by: henrygd <hank@henrygd.me>
2026-04-17 19:26:11 -04:00
xiaomiku01
578ba985e9 Merge branch 'main' into feat/network-probes
Resolved conflict in internal/records/records.go:
- Upstream refactor moved deletion code to records_deletion.go and
  switched averaging functions from package-level globals to local
  variables (var row StatsRecord / params := make(dbx.Params, 1)).
- Kept AverageProbeStats and rewrote it to match the new local-variable
  pattern.
- Dropped duplicated deletion helpers from records.go (they now live in
  records_deletion.go).
- Added "network_probe_stats" to the collections list in
  records_deletion.go:deleteOldSystemStats so probe stats keep the same
  retention policy.
2026-04-17 13:49:18 +08:00
henrygd
981c788d6f agent: make sure prefixed ALL_PROXY env var works (#1919) 2026-04-14 14:46:43 -04:00
Rafael Marmelo
f5576759de agent: Allow agent to connect to hub via SOCKS5 proxy 2026-04-14 14:46:43 -04:00
xiaomiku01
485830452e fix(agent): exclude DNS resolution from TCP probe latency
Resolve the target hostname before starting the timer so the
measurement reflects pure TCP handshake time only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:21:15 +08:00
xiaomiku01
2fd00cd0b5 feat(agent): use native ICMP sockets with fallback to system ping
Replace the ping-command-only implementation with a three-tier
approach using golang.org/x/net/icmp:

1. Raw socket (ip4:icmp) — works with root or CAP_NET_RAW
2. Unprivileged datagram socket (udp4) — works on Linux/macOS
   without special privileges
3. System ping command — fallback when neither socket works

The method is auto-detected on first probe and cached for all
subsequent calls, avoiding repeated failed attempts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:09:12 +08:00
xiaomiku01
9a5959b57e fix: address network probe code quality issues
- Use shared http.Client in ProbeManager to avoid connection/transport leak
- Skip probe goroutine and agent request when system has no enabled probes
- Validate HTTP probe target URL scheme (http:// or https://) on creation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:40:27 +08:00
Lars Lehtonen
1556e53926 fix(agent): dropped linux battery error (#1908) 2026-04-10 18:33:42 -04:00
xiaomiku01
865e6db90f feat(agent): add ProbeManager with ICMP/TCP/HTTP probes and handlers
Implements the core probe execution engine (ProbeManager) that runs
network probes on configurable intervals, collects latency samples,
and aggregates results over a 60s sliding window. Adds two new
WebSocket handlers (SyncNetworkProbes, GetNetworkProbeResults) for
hub-agent communication and integrates probe lifecycle into the agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 01:21:38 +08:00
FlintyLemming
3793b27958 fix(agent): use nvme_total_capacity fallback for NVMe disk size (#1899)
Some enterprise NVMe drives (e.g. Dell Ent NVMe CM7 U.2) report capacity
via nvme_total_capacity instead of user_capacity.bytes in smartctl output.
The NVMe SMART parser now falls back to nvme_total_capacity when
user_capacity.bytes is zero.
2026-04-09 15:50:59 -04:00
henrygd
0ae8c42ae0 fix(hub): System.HasUser - return true if SHARE_ALL_SYSTEMS=true (#1891)
- move hub's GetEnv function to new utils package to more easily share
across different hub packages
- change System.HasUser to take core.Record instead of user ID string
- add tests
2026-04-08 20:13:39 -04:00
henrygd
ea80f3c5a2 fix(agent): add safety check for read returning negative bytes (#1799) 2026-04-07 18:41:22 -04:00
Sven van Ginkel
c4009f2b43 feat: add more disk I/O metrics (#1866)
Co-authored-by: henrygd <hank@henrygd.me>
2026-04-04 18:28:05 -04:00
henrygd
6b5e6ffa9a agent: small refactoring and tests for battery package (#1872) 2026-04-02 21:07:14 -04:00
henrygd
d656036d3b agent: refactor new battery package (#1872) 2026-04-02 21:07:14 -04:00
svenvg93
80b73c7faf feat: implement the battery diectly instead of depency 2026-04-02 21:07:14 -04:00
Sven van Ginkel
7f565a3086 fix(agent): show correct NVMe capacity for Apple SSDs (#1873)
Co-authored-by: henrygd <hank@henrygd.me>
2026-04-02 15:36:05 -04:00
henrygd
f670e868e4 agent: add SENSORS_TIMEOUT env var (#1871) 2026-04-02 15:10:49 -04:00
henrygd
7f4f14b505 fix(agent,windows): raise timeout on first sensor collection to allow LHM to start 2026-03-31 16:10:59 -04:00
henrygd
2fda4ff264 agent: update LibreHardwareMonitorLib to 0.9.6 2026-03-31 15:55:02 -04:00
henrygd
cef09d7cb1 fix(agent): fix windows root disk detection if exe not running on root disk (#1863) 2026-03-31 12:58:42 -04:00
Sven van Ginkel
80135fdad3 fix(agent): exclude nested virtual fs when mounting host root to /extra-filesystems in Docker (#1859) 2026-03-30 13:48:54 -04:00
henrygd
6a207c33fa agent: change disk.Partitions(false) to true - likely fixes empty partition list in docker as of gopsutil 4.26.2 2026-03-29 12:33:45 -04:00
henrygd
afdc3f7779 fix(agent): allow GPU_COLLECTOR=nvml without nvidia-smi (#1849) 2026-03-28 18:58:16 -04:00
henrygd
a227c77526 agent: detect podman correctly when using socket proxy (#1846) 2026-03-28 17:43:29 -04:00
henrygd
b53fdbe0ef fix(agent): find macmon if /opt/homebrew/bin is not in path (#1746) 2026-03-27 13:52:22 -04:00
Jim Haff
ad21cab457 Prevent temperature collection from blocking agent stats (#1839) 2026-03-26 20:03:51 -04:00
henrygd
e3e453140e fix(agent): isolate container network rate tracking per cache interval
Previously, the agent shared a single PrevReadTime timestamp across all
collection intervals (e.g., 1s and 60s). This caused the 60s collector
to divide its accumulated 60s byte delta by the tiny time elapsed since
the last 1s collection, resulting in astronomically inflated network
rates. The fix introduces per-cache-time read time tracking, ensuring
calculations for each interval use their own independent timing context.
2026-03-24 13:07:56 -04:00
henrygd
be70840609 test: update tests that use os.Setenv to t.Setenv 2026-03-20 15:00:28 -04:00
henrygd
48ddc96a0d systemd: allow timer monitoring with SERVICE_PATTERNS (#1820) 2026-03-17 15:11:44 -04:00
henrygd
ed50367f70 fix(agent): add fallback for podman container health (#1475) 2026-03-15 17:59:59 -04:00
henrygd
5bfe4f6970 agent: include ip in container port if not 0.0.0.0 or :: 2026-03-15 14:58:21 -04:00
henrygd
380d2b1091 add ports column to containers table (#1481) 2026-03-14 19:29:39 -04:00
henrygd
a7f99e7a8c agent: support new Docker API Health field (#1475) 2026-03-14 15:26:44 -04:00
henrygd
bd94a9d142 agent: improve disk discovery / IO mapping and add tests (#1811) 2026-03-13 16:03:27 -04:00
henrygd
e527534016 ensure deprecated system fields are migrated to newer structures
also removes refs to legacy load avg fields (l1, l5, l15) that were
around for a very short period
2026-03-10 18:46:57 -04:00
VACInc
963fce5a33 agent: mark mdraid rebuild as warning, not failed (#1797) 2026-03-09 17:54:53 -04:00
Sven van Ginkel
d38c0da06d fix: bypass NIC auto-filter when interface is explicitly whitelisted via NICS (#1805)
Co-authored-by: henrygd <hank@henrygd.me>
2026-03-09 17:47:59 -04:00
henrygd
6b1ff264f2 gpu(amd): add workaround for misreported sysfs filesize (#1799) 2026-03-09 14:53:52 -04:00
henrygd
5e1b028130 refactor(smart): improve perf by skipping ata_device_statistics parsing if unnecessary 2026-03-08 15:19:50 -04:00
henrygd
638e7dc12a fix(smart): handle negative ATA device statistics values (#1791) 2026-03-08 13:34:16 -04:00
henrygd
73c262455d refactor(agent): move GetEnv to utils package 2026-03-07 14:12:17 -05:00
henrygd
0c4d2edd45 refactor(agent): add utils package; rm utils.go and fs_utils.go 2026-03-07 13:50:49 -05:00
henrygd
8f23fff1c9 refactor: mdraid comments and organization
also hide serial / firmware in smart details if empty, remove a few
unnecessary ops, and add a few more passed state values
2026-02-27 14:23:10 -05:00
VACInc
02c1a0c13d Add Linux mdraid health monitoring (#1750) 2026-02-27 13:42:47 -05:00
henrygd
69fdcb36ab support ZFS ARC on freebsd 2026-02-26 18:38:54 -05:00
henrygd
b91eb6de40 improve root I/O device detection and fallback (#1772)
- Match FILESYSTEM directly against I/O devices if partition lookup
fails
- Fall back to the most active I/O device if no root device is detected
- Add WARN logs in final fallback case to most active device
2026-02-26 18:11:33 -05:00
henrygd
ec69f6c6e0 improve disk I/O device matching for partition-to-disk mismatches (#1772)
findIoDevice now normalizes device names and falls back to prefix-based
matching when partition names differ from IOCounter names (e.g. nda0p2 →
nda0 on FreeBSD). The most-active prefix-related device is selected,
avoiding the broad "most active of all" heuristic that caused Docker
misattribution in #1737.
2026-02-26 16:59:12 -05:00
henrygd
004841717a add checks for non-empty CPU times during initialization (#401) 2026-02-25 19:04:29 -05:00