I won’t sugarcoat this.
The recent batch of speculative execution flaws, nicknamed Meltdown and Spectre, will likely hold the torch as the most widespread and all-encompassing security events in modern history. Notice that I didn’t say they are the worst flaws we’ve ever seen. If you are keeping up with perpetual news updates on the topic, you will likely find a group of system and security professionals recommending remaining calm through this, while simultaneously being active and systematic. I agree with that recommendation. This isn’t in attempt to handwave or hide the severity of the situation – but it is important to understand how these flaws affect you and your company and develop a plan accordingly.
If you aren’t familiar with how speculative execution works and why it has become a problem, I really recommend pausing for a moment to read this excellent explanation from Linux operating system provider RedHat – https://www.redhat.com/en/blog/what-are-meltdown-and-spectre-heres-what-you-need-know
If you prefer the cliff notes version, I will summarize.
Modern processors attempt to make smart guesses to perform tasks as quickly as possible. However, given the right circumstance this may result in an unintentional leak of data from executed contents of memory. Because of this, computer systems are now at risk of showing that potentially confidential data to malicious software. This data could include everything from passwords to personally identifiable information (PII).
Despite this, it is important to be clear about a few things:
This is not a flaw that can be remotely executed. Unlike other security events we’ve had recently – like Heartbleed (http://heartbleed.com/) – this kind of attack cannot be carried out across the Internet remotely without an additional interaction. To successfully read privileged data from an un-patched system requires an executable to run inside the system — we’ll get to shared servers in a minute! In the world of IT security, remote code execution vulnerabilities (RCE) always rank highest and by themselves, neither Meltdown or Spectre qualify for this classification.
Secondly, the level of effort required to produce these attacks is very complicated. First, an attacker would have to find success in some type of existing exploit through a known RCE or social engineering attempt. Assuming they were successful, they would then need to read chunks of memory from an un-patched system. Reading from memory like this would require processing of a considerable amount of data, especially without knowing exactly what they are looking for or when they should be looking for it. Additionally, as the flawed chip is also bound to the physical hardware that it runs on, there is no way for this attack to read instruction sets directly from other systems and their processors in any environment. All this means that any actual attack would likely be extremely targeted. Again, this does not mean that you shouldn’t be concerned about all of this, but reading contents from memory in real-time is quite a substantial amount of information to parse through.
So, yes, there is a real flaw that is a concern and you do need to take action.
After going through all this information and understanding what we are dealing with, I believe there are two classifications of systems that should be prioritized:
1. Cloud servers hosted with hyperscalers such as Amazon AWS or Microsoft Azure.
2. Desktops and personal devices (phones, laptops, tablets)
Let’s talk about cloud servers.
If you read through the Red Hat article they call out this same concern, and for good reason.
- These flaws are capable of reading executed data sets from the physical processor which breaks the normal isolation a virtual machine is held in – meaning one virtual server may read contents of another virtual server.
- In cloud platforms this would mean that if one customer’s server is successfully exploited then it could read information from a different customer’s system. This is obviously a huge concern and it’s why the big hyperscalers like AWS and Azure were granted early access to details of these flaws and then rushed to patch.
The Zumasys Cloud is built on Intel platforms running VMware and Microsoft products and so, like everyone else, we have work to do to get our systems patched and protected.
- However, unlike the hyperscalers, The Zumasys Cloud, does not have an automated online system that allows clients (or an unknown entity) to purchase and build VMs. There is no “swipe and go” instant-on servers anywhere in our environment.
- In the world of big hyperscalers where anyone with a valid credit card can build one or one-hundred VMs anywhere – this is a much larger concern because it is nearly impossible to control who can run VMs on the platform without fundamentally breaking how normal well-intentioned customers interact.
In a separate communication we will outline Zumasys’ plan and timeline and will communicate how we are rolling out our mitigation response.
It is important to understand that the risk levels are different between Zumasy’ virtual private cloud design model and the hyperscalers of the world like AWS and Azure.
- The current best practices for protecting desktops is the same as they have been
- Educate your users on identifying phishing scams and links; use multi-factor authentication for sensitive logins; filter web requests against malware or virus sites.
These basic strategies still work because, as stated above, the remote attack vector is unchanged.
Ok, so you know what you need to do, and you are ready to build an action plan, right?
But, there is one last sticking point – performance impact after patching. Early patch adopters have reported performance degradation as high as 40% immediately after making the recommended changes. Will the patch have a negative performance impact on your environment? The honest answer right now is that “it depends.” Realistically there is really no way to predict every system’s needs, but the feedback shows that this will likely impact database servers the most.
The best advice is to plan to patch all systems and be sure that there is a roll-back plan for those that are performance-critical. So far, these performance impacts are happening when the operating system and/or application itself is patched.
There is an abundance of information out there regarding which vendors and systems are affected and what you should be doing to protect your IT infrastructure. Below is an aggregation of the most current and important articles we’ve read.
While this started with Intel/ARM there are potential vulnerabilities for both AMD and AIX POWER systems. Additionally, some of the patches may affect AMD with unknown side effects.
Some general insight on how different processors and versions of windows may perform after patching:
These are the two VMware articles that are absolute must-reads for anyone running vSphere environments.
ESXi and the micro-code updates in this patch will be one of your main lines of defense!
It is also very important to note the potential impact on vMotion/EVC and the classification of these patches in VUM.
The primary source for Microsoft patching. This article is getting updated very often and now includes info on how to check systems via PowerShell.
Microsoft halted patch deployment because of incompatibility with AV vendors. This may cause your deployment strategy to be adjusted or amended to complete successfully
In the Linux world there are many variants depending on your distribution source, but you should be watching for the latest kernel patches against 4.4 and 4.13
A useful breakdown of the potential impact to MSSQL performance and advice on the patching process
Here is a comprehensive breakdown of all Google products (including Android and Cloud services) with their resolution status
The Browser Breakdown – each one of their patch status and defensive mechanisms explained:
Apple’s consolidated blog post for all products
And finally – the genesis for all of this with all the PHD-intense research