- Priority - Low
- Affecting Server - RBX8
We've blocked outgoing port 25/26 on RBX8 / server8 to prevent some spam sending.
Normal email flow won't be affected since it's going via another port towards our outgoing mail-relay.
If you're connecting to external SMTP servers - please use port 587.
- Date - 13/03/2018 10:59
- Last Updated - 13/03/2018 11:00
- Priority - Critical
- Affecting System - All servers
Due to recent discovered security vulnerabilities in many x86 CPU's, we'll have to upgrade kernels across our infrastructure and reboot our systems.
We've patched a few systems already where the software update is available - we're waiting a bit with our hosting infrastructure until the kernel has gone to "production", and have been in production for roughly 48 hours to ensure stability.
We'll reboot systems one by one during evenings - we've no specific date yet when we'll start, but there will be downtime expected, hopefully only 5-10 minutes per server in case no issues happen.
Servers might be down for longer depending on how the system behaves during reboot, but we'll do anything to prevent reboot issues like with had with server3a recently.
This post will be updated as we patch our webservers, other infrastructure gets patched in the background where there's no direct customer impact.
The patching does bring a slight performance degradation to the kernel, the actual degradation vary depending on the workload of servers, so we're unsure what effect it will have for individual customers, it's something we will monitor post-patching.
Update 05/01/2018 5.23pm:
We'll update a few servers this evening, 2 of 3 vulnerabilities has will be fixed by this update, so we'll have to perform another reboot of the servers during next week as well when the update is available.
We do try to keep downtime at the absolute minimum, but due to the impact these vulnerabilities have, we rather perform the additional reboot of our infrastructure to keep the systems secure.
We're sorry for the inconvenience caused by this.
Update 05/01/2018 6.25pm:
We'll do as many servers as possible this evening, if we get no surprises (e.g. non-bootable servers), everything should be patched pretty quickly, we start from highest number towards lowest so as following:
These 6 servers are the only ones that are directly impacting customers - for same reason, these restarts are performed during the evening (after 10pm) to minimize the impact on visitors.
Other services such as support system, mail relays, statistics, backups, will be rebooted as well - we redirect traffic to other systems as possible.
Expected downtime per host should be roughly 5 minutes if the kernel upgrades go as planned, longer downtime can occur in case the systems enter state where we have to manually recover it afterwards.
Update 05/01/2018 8.34pm:
server4.hosting4real.net will get postponed to tomorrow (06/01/2018) at earliest, since the kernel is still in "beta" state from CloudLinux, depending on the outcome we'll decide to either perform the upgrade tomorrow, or postpone to sunday.
For the other servers, we plan to start today at 10pm with server8 and after that proceeding with server7 and so on.
Update 05/01/2018 9.57pm: We start with server8 in a few minutes.
Update 05/01/2018 10.07pm: Server8 done, with 4 minutes downtime - we proceed with server7.
Update 05/01/2018 10.15pm: Server7 done, with 3-4 minutes downtime - we proceed with server6.
Update 05/01/2018 10.39pm: Server6 done, with 9 minutes downtime (high php/apache load) - we have to redo server7 since the microcode didn't get applied.
Update 05/01/2018 11.00pm: Server5 done, with 3 minutes downtime - proceeding with server3a.
Update 05/01/2018 11.13pm: Server3a done with 5 minutes of downtime - we'll proceed with server4 tomorrow when the CloudLinux 6 patch should be available.
Update 05/01/2018 11.49pm: Server5 experienced an issue with MySQL - the issue was caused by LVE mounts getting mounted before the MySQL partition (/var/lib/mysql) got mounted as it should, this resulted in MySQL being unavailable in a state that sites connecting using a socket (most sites do this) would not be able to connect, and all sites connecting via 127.0.0.1 would be able to connect just fine.
In our monitoring site we run on every server, we do not check for both TCP and socket connections towards MySQL being available, as a result the monitoring system didn't see this error directly and thus triggered an alarm.
We'll change our monitoring page to perform an additional check to connect both via TCP and via socket - we expect this change in our monitoring page to be completed by noon tomorrow.
We're sorry for the inconvenience caused by the extended downtime on server5
Update 08/01/2018 8.47pm: We'll patch server4 today, starting at 10pm. We'll try to keep downtime as short as possible, however - the change required here is slightly more complicated which increases the risk.
We're still waiting for some microcode updates that we have to apply to all servers once they're available - we're hoping for them to arrive by the end of the week.
Update 08/01/2018 9.58pm: We'll start the update of server4 in a few minutes.
Update 08/01/2018 10.15pm: We're reverting the kernel to the old one, since the new kernel has issues with booting. Current status is loading up the rescue image to boot the old kernel.
Update 08/01/2018 11.02pm: Meanwhile we're trying to get server4 back online, we've initialized our backup procedure and started to restore accounts from the latest backup onto another server to ensure customers getting online as fast as possible.
Update 08/01/2018 11.22pm: We've restored about 10% of the accounts on a new server.
Update 09/01/2018 00.56am: Information about the outage of server4 can be found here: https://shop.hosting4real.net/serverstatus.php?view=resolved - with title "Outage of server4 (Resolved)"
Update 10/01/2018 06.31am: A new version of microcodes will soon be released to fix more vulnerabilities, when the version is ready, we'll update a single server (server3a) to verify it's enabling the new features.
In case the features gets enabled, we'll upgrade the remaining servers (excluding server4) 24 hours later.
Update 16/01/2018 8.32am: We will perform a microcode update today on server3a to implement a fix for the CPU, this means we'll have to reboot the server, so with an expected ~ 3-5 minutes downtime. We will do the reboot after todays hosting account migrations which start at 9pm, so the server3a update will happen around 9.30 or 10pm.
Update 16/01/2018 9.28pm: We'll reboot server3a.
Update 16/01/2018 9.33pm: Server has been rebooted - total downtime of 2 minutes.
- Date - 05/01/2018 07:00 - 19/01/2018 23:59
- Last Updated - 16/01/2018 21:33