Walid MOGHRABI
2018-08-13 10:34:00 UTC
package: x2gobroker-agent
version: 0.0.4.0-0~1038~ubuntu16.04.1
priority: bug
I don't have a "0" value anymore since latest fixes so the loadchecker process don't crash anymore but still, there is something strange.
Here is a fragment of my loadchecker logs from this morning.
Just to give you the context, I have 22 servers which are all automaticaly started at 6 AM (wake on lan) and they are absolutely the same (blade servers with same CPU, memory amount, bios version, ...).
I checked our monitoring to see if users were correctly distributed over the farm and at 7:30AM, I had about 7 or 8 users connected but 4 of them were on tce-server-21 where I should have had 1 user on 8 servers.
Here is the loadchecker log fragment :
***@tce-manager-01 [~] # grep -B 1 'loadavgXX:1;' /var/log/x2gobroker/loadchecker.log
...
2018-07-24 07:15:01,200 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:15:01,622 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23810; numCPU:16; typeCPU:2400;
--
2018-07-24 07:17:50,354 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:17:50,779 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23686; myMemAvail:23812; numCPU:16; typeCPU:2400;
--
2018-07-24 07:20:32,550 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:20:32,964 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:23:21,610 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:23:22,034 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811; numCPU:16; typeCPU:2400;
--
2018-07-24 07:26:03,872 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:26:04,286 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:28:52,917 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:28:53,338 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:31:35,252 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:31:35,670 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811; numCPU:16; typeCPU:2400;
--
2018-07-24 07:34:24,424 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:34:24,842 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809; numCPU:16; typeCPU:2400;
As you can see, there is only 1 server with a loadavgXX = 1 (which means that in fact, we got a zero value from the broker agent).
This is not normal, at 7:34, there were 4 users already connected to this server and most of my other servers were empty.
Restarting x2gobroker-loadchecker service fixed the issue.
I think there is a problem in retrieving this informations ... even memAvail seem strange on this server to me ... with 4 connected users, it should have been lower than that.
I also think the number of connected users should be taken into account when calculating the load factor (maybe this is already the case, not sure about that).
---
DISCLAIMER: This e-mail is private and confidential and may contain proprietary or legally privileged information. It is for the intended recipient only. If you have received this email in error, please notify the author by replying to it and then destroy it. If you are not the intended recipient you must not use, disclose, distribute, copy, print or rely on this e-mail or any attachment. Thank you
version: 0.0.4.0-0~1038~ubuntu16.04.1
priority: bug
I don't have a "0" value anymore since latest fixes so the loadchecker process don't crash anymore but still, there is something strange.
Here is a fragment of my loadchecker logs from this morning.
Just to give you the context, I have 22 servers which are all automaticaly started at 6 AM (wake on lan) and they are absolutely the same (blade servers with same CPU, memory amount, bios version, ...).
I checked our monitoring to see if users were correctly distributed over the farm and at 7:30AM, I had about 7 or 8 users connected but 4 of them were on tce-server-21 where I should have had 1 user on 8 servers.
Here is the loadchecker log fragment :
***@tce-manager-01 [~] # grep -B 1 'loadavgXX:1;' /var/log/x2gobroker/loadchecker.log
...
2018-07-24 07:15:01,200 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:15:01,622 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23810; numCPU:16; typeCPU:2400;
--
2018-07-24 07:17:50,354 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:17:50,779 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23686; myMemAvail:23812; numCPU:16; typeCPU:2400;
--
2018-07-24 07:20:32,550 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:20:32,964 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:23:21,610 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:23:22,034 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811; numCPU:16; typeCPU:2400;
--
2018-07-24 07:26:03,872 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:26:04,286 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:28:52,917 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:28:53,338 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23684; myMemAvail:23809; numCPU:16; typeCPU:2400;
--
2018-07-24 07:31:35,252 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:31:35,670 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23685; myMemAvail:23811; numCPU:16; typeCPU:2400;
--
2018-07-24 07:34:24,424 - loadchecker - INFO - Executing agent command on remote host tce-server-21 (10.50.0.221): sh -c '/usr/lib/x2go/x2gobroker-agent foo checkload'
2018-07-24 07:34:24,842 - loadchecker - INFO - Broker agent answered: OK; loadavgXX:1; memAvail:23683; myMemAvail:23809; numCPU:16; typeCPU:2400;
As you can see, there is only 1 server with a loadavgXX = 1 (which means that in fact, we got a zero value from the broker agent).
This is not normal, at 7:34, there were 4 users already connected to this server and most of my other servers were empty.
Restarting x2gobroker-loadchecker service fixed the issue.
I think there is a problem in retrieving this informations ... even memAvail seem strange on this server to me ... with 4 connected users, it should have been lower than that.
I also think the number of connected users should be taken into account when calculating the load factor (maybe this is already the case, not sure about that).
---
DISCLAIMER: This e-mail is private and confidential and may contain proprietary or legally privileged information. It is for the intended recipient only. If you have received this email in error, please notify the author by replying to it and then destroy it. If you are not the intended recipient you must not use, disclose, distribute, copy, print or rely on this e-mail or any attachment. Thank you