Posh1698 0 Posted ... (edited) Environment: - OS: Debian 13 (Trixie) - Kernel: 6.12.57+deb13-amd64 - AirVPN Suite: 2.0.0 (22 July 2025) - Docker installed with 15 bridge networks Issue: Bluetit fails to start every time with the following in journalctl: Bluetit daemon started with PID XXXXXX Reading run control directives from file /etc/airvpn/bluetit.rc Network check mode is gateway getGatewayFromRouteTable(): Received invalid packet from socket Bluetit successfully terminated Setting networkcheck off in bluetit.rc does not resolve the issue — the error still occurs. Root cause (suspected): The machine has 15 Docker bridge interfaces, each with its own route entry in the kernel routing table. The netlink socket dump of all routes is significantly larger than a typical system. Bluetit appears to fail parsing this response in `getGatewayFromRouteTable()`, possibly due to a fixed buffer size or unexpected iteration behavior over virtual interfaces. Confirmed by comparison: On a second machine running kernel 6.12.73+deb13-amd64 with a clean routing table (no Docker bridges), Bluetit starts without issue. Stopping the Docker service does not resolve the issue, as the bridge interfaces and their routes persist in the routing table until reboot. Request Please investigate getGatewayFromRouteTable() for robustness when the routing table contains a large number of entries or virtual/bridge interfaces. Edited ... by Posh1698 Quote Share this post Link to post
Staff 10478 Posted ... @Posh1698 Hello and thank you for your report! The function you mention loops through all routing messages requested from the kernel, until NLMSG_DONE is returned. It does not store anything in arrays or fixed memory. The error message you see is thrown only if the kernel returns an empty message or NLMSG_ERROR. Can you please tell us whether by running dockerd alone, with only one or no virtual interfaces, the problem persists? The problem will be investigated soon in order to reproduce and address it for the next Suite release, thank you again! Kind regards Quote Share this post Link to post
Posh1698 0 Posted ... (fyi: I let Claude run over the code - I am no cpp developer) Thank you for the quick response. Regarding your question about running dockerd with zero or one virtual interface: stopping dockerd does not remove bridge interfaces or routes — they persist until reboot. We have not yet tested with only one interface, but the comparison with our second machine (same kernel, clean routing table, Bluetit works) strongly suggests the route count is the deciding factor. Regarding the buffer, we believe we have identified the precise failure mechanism: The socket is opened as SOCK_DGRAM (PF_NETLINK, SOCK_DGRAM). With SOCK_DGRAM netlink, the kernel splits the RTM_GETROUTE dump into multiple datagrams of approximately 4096 bytes each. The code at line 369 accumulates these into a fixed 8192 byte stack buffer: https://gitlab.com/AirVPN/AirVPN-Suite/-/blob/master/src/network.cpp#L369 The failure sequence: - Datagram 1: recv() returns ~4096 bytes → msgLen = 4096, buffer has 4096 bytes remaining - Datagram 2: recv() returns ~4096 bytes → msgLen = 8192, buffer has 0 bytes remaining - Datagram 3: recv(sock, bufPtr, sizeof(msgBuf) - msgLen) = recv(sock, bufPtr, 0) → returns 0 - NLMSG_OK(nlHdr, 0) evaluates nlmsg_len(16) <= 0 → false → throws "Received invalid packet from socket" A system with few routes requires only 1-2 datagrams and fits comfortably in the 8192 byte buffer. A system with many routes (15+ Docker bridge interfaces) requires 3+ datagrams and hits this limit. The fix would be to either use a dynamically allocated buffer that grows as needed, or switch to SOCK_RAW which allows reading the full dump in one call with a sufficiently large buffer. We are happy to provide any additional information or testing to help reproduce and fix this. Quote Share this post Link to post