Commit Graph

97 Commits

Author SHA1 Message Date
henrygd
c38d04b34b Add health command for hub and align agent health command 2025-03-15 00:23:12 -04:00
henrygd
edefc6f53e add health check for agent
- Updated command-line flag parsing.
- Moved GetAddress and GetNetwork to server.go
2025-03-14 03:33:25 -04:00
henrygd
521be05bc1 gpu.go refactoring and jetson fixes
- Fixed usage and power values
- Added new test cases
- Moved some variables to constants
2025-03-13 21:32:53 -04:00
henrygd
f397ab0797 fix: improve error logging for temperature sensor retrieval 2025-03-06 05:38:49 -05:00
henrygd
d25c7c58c1 fix: SYS_SENSORS context error (#643) 2025-03-06 05:36:20 -05:00
henrygd
6767392ea8 refactor: update some types in docker.go 2025-03-05 23:40:23 -05:00
henrygd
0443a85015 fix: correct typo in Docker stats collection variable name 2025-03-04 17:39:49 -05:00
henrygd
c4d8deb986 feat: agent data cache to support connections to multiple hubs (#341) 2025-03-04 16:25:45 -05:00
henrygd
681286eb4f fix: add User-Agent to resolve Docker Desktop bug (#513, #603)
- also added body closure I forgot earlier whoops
2025-03-04 01:56:22 -05:00
henrygd
31431fd211 refactor: improve GPU data parsing
- Use byte-based regex matching instead of string-based matching
- Increase buffer size for GPU data
- Switch to `bufio.Scanner`
2025-03-04 00:15:10 -05:00
henrygd
ba7db28e80 test(gpu): add case for AMD multi-GPU and different power property (#414) 2025-02-22 12:45:47 -05:00
henrygd
6b41a98338 gpu: add tests and refactor to support amd on windows 2025-02-21 00:56:40 -05:00
henrygd
baf56fe83b fix: refresh interfaces if agent starts before network online (#466) 2025-02-21 00:21:47 -05:00
henrygd
96f9128d1a agent: add lock for gatherStats 2025-02-21 00:20:41 -05:00
henrygd
7485f79071 refactor(agent): refactor option parsing logic for agent command 2025-02-19 19:39:24 -05:00
henrygd
d170e7a00d feat(agent): NETWORK env var and support for multiple keys
- merges agent.Run with agent.NewAgent
- separates StartServer method
- bumps go version to 1.24
- add tests
2025-02-19 00:32:27 -05:00
henrygd
5ea6eb08a1 feat: PRIMARY_SENSOR env var to choose dashboard temp 2025-02-11 15:11:46 -05:00
henrygd
3afab00937 feat: display peak GPU usage in dashboard 2025-02-08 19:24:38 -05:00
henrygd
e6054058b9 feat: add temperatures to dashboard
- Refactor temperature related code and move to standalone function
2025-02-07 21:27:15 -05:00
Henry Dollman
83668e5727 fix(gpu): handle power for dedicated amd gpus (#414) 2025-01-30 20:28:31 -05:00
Henry Dollman
120aff0d18 config: prefix environment variables with BESZEL_AGENT_ (#502) 2025-01-29 20:13:07 -05:00
hank
76347f25e5 fix(gpu): prevent nvidia-smi from running on tegra devices 2025-01-24 23:12:39 -05:00
hank
c157f38957 gpu: Add closure for Jetson and improve compatibility 2025-01-24 22:07:37 -05:00
Links
d185dfdef8 get Jetson GPU Information 2025-01-24 19:17:33 -05:00
Henry Dollman
1ac165d7d3 include stats in error log when encoding stats fails 2025-01-05 17:58:38 -05:00
Henry Dollman
8e531e6b3c fix: handle duplicate GPU names (#361) 2025-01-05 16:40:22 -05:00
Henry Dollman
b08219dacf refactor agent gpu code to make it easier to add intel / jetson 2024-12-17 17:12:58 -05:00
Henry Dollman
b4bc8a31aa add check / reset for invalid disk i/o rates 2024-11-24 15:56:12 -05:00
Henry Dollman
4cb7b97416 change podman socket path to use current uid 2024-11-12 18:14:43 -05:00
Henry Dollman
b1db450e00 enable gpu monitoring by default 2024-11-12 18:13:57 -05:00
Henry Dollman
2e8ac98924 Improve disk discovery slightly by checking partition labels 2024-11-12 18:11:44 -05:00
Henry Dollman
3cd11d6bc4 improve podman support (#211) 2024-11-12 11:59:56 -05:00
Henry Dollman
03de73560c add gpu power consumption chart 2024-11-08 20:31:22 -05:00
Henry Dollman
cd10727795 gpu usage and vram charts 2024-11-08 18:00:30 -05:00
Henry Dollman
8262a9a45b progress on gpu metrics 2024-11-08 16:52:50 -05:00
Henry Dollman
655bfc95ca add ability to specify partition for extra disk using folder name 2024-11-04 20:52:27 -05:00
Henry Dollman
741575df15 revert tweaks for old docker. needs more testing. 2024-11-02 14:43:35 -04:00
Henry Dollman
df0f3a154f rtl layout progress and updates to arabic translations 2024-10-31 16:48:28 -04:00
Henry Dollman
f8fc74116c rm *sensors.Warnings conversion - gopsutil windows uses different type 2024-10-26 14:02:19 -04:00
Henry Dollman
4094df3a61 fix: skip temperature collection if SENSORS is empty string (#196) 2024-10-24 15:10:20 -04:00
Henry Dollman
4a78ce1b16 skip temperatures code if sensors whitelist is set to empty string 2024-10-23 18:37:38 -04:00
Henry Dollman
539c0ccb1d retry failed containers separately so we can run them in parallel (#58) 2024-10-21 17:00:13 -04:00
Henry Dollman
b5c158d1b3 update debug logs 2024-10-19 18:12:25 -04:00
Henry Dollman
8bf7a0e1d6 add DOCKER_TIMEOUT env var 2024-10-19 16:33:33 -04:00
Henry Dollman
ee92e338cb update debug log locations 2024-10-16 18:12:43 -04:00
Henry Dollman
59d541dd1d fix edge case overwriting extra filesystem with root io fallback 2024-10-16 15:26:12 -04:00
Henry Dollman
6c31263e60 add bandwidth alerts 2024-10-12 17:22:25 -04:00
Henry Dollman
6cf6661f2e raise docker client timeout to 8 seconds if version <= 24 2024-10-12 12:24:53 -04:00
Henry Dollman
5b0fac429b move update functions to agent / hub packages 2024-10-10 18:36:01 -04:00
Henry Dollman
efca56ceca add temp debug logs to troubleshoot #196 2024-10-10 18:28:24 -04:00