Wednesday, April 12, 2017

SharePoint 2016 - Zero-Downtime Patching - The biggest improvements

 

Disclaimer: this article only covers single aspects of the upgrade process. For a complete list of upgrade approaches including all details please check the following articles:

Patching SharePoint servers has always been a challenge. A major culprit for administrators is the fact that installing SharePoint fixes will cause downtime to the SharePoint farm. For SharePoint Server 2016 it was announced that we will have zero-downtime patching. This article intends to outline the reasons for downtimes with previous versions of SharePoint and the changes implemented in SharePoint Server 2016 to allow to perform zero-downtime patching including its constraints.

Historically this downtime can occur in two different steps:

  1. Installing the binaries
  2. Running the SharePoint configuration wizard.

Installing the binaries

While installing the actual binaries it is required to stop/restart SharePoint specific windows services and IIS websites to ensure that the updated files (e.g. dlls) are loaded.

The server specific downtime usually starts with running the hotfix executable till the hotfix installation is done.

With the huge hotfix files we are used to from SP2013 this can take several hours. Most of this time accounts for the extraction of the exe and cab file and then applying the various installer msp files which are included in the hotfix package.

Russ Maxwell has created a script which helps to significantly reduce the amount of time required to install the actual fixes but you will still experience a downtime on the specific server during the installation of the fix.

To overcome this issue we recommended to have high availability per role in your farm (at least two servers per role) to ensure that you can patch one server while the other one is online and continues to serve the requests. That means during patching you can remove one server from load balancing, patch it and add it back to load balancing. Then repeat the same steps for the other server(s).

While the patches are applied using this steps your farm will have servers with two different patch levels but that is fine as SharePoint will run in backward compatible mode if the database schema is older than the patch level of the servers in the farm.

So using redundant servers, you can achieve zero-downtime during the hotfix installation even with SharePoint 2013.

After the patch has been installed on all server you need to run the SharePoint configuration wizard.

Running the SharePoint configuration wizard

The SharePoint configuration wizard needs to be run on all machines in the farm to complete the hotfix installation.

You can use either the command line version (psconfig.exe) or the UI version (psconfigui.exe). I have discussed benefits and caveats of using the different methods in a separate article.

It is recommended to first run the configuration wizard on one of the app servers in the farm to perform the database upgrade. After this step is completed the configuration wizard can be started on all the other servers in the farm to perform the server specific upgrade operations (copy dlls, install features, apply security settings, …) on each of these servers (again you should remove the servers where psconfig is running from load balancing as services are restarted).

With SharePoint 2013 and before the database upgrade step will cause downtime on all servers in the farm as psconfig is updating databases which are used by all servers in the SharePoint farm. Depending on the specific updates applied to the database stored procedures, views, triggers and constraints could be dropped and recreated and other database content could be updated. SQL Queries issued by the SharePoint servers during this upgrade can fail if (e.g.) a stored procedure was called while the upgrade job was dropping it to replace it with an updated version or a fix contains (e.g.) two different changes to two different stored procedures where the changes depend on each other but only one stored proc was updated when the request comes in.

These limitations could also cause failed upgrades or excessive slowdowns in the upgrade process because of resource contention and locking. For these reasons accessing SharePoint content while the database is upgraded is unsupported and untested.

A partial workaround for this exists in SharePoint 2013 for updating the content database: After installing the patch and before running the SharePoint configuration wizard customers can run

Upgrade-SPContentDatabase -UseSnapshot …

Upgrade-SPContentDatabase can perform the same upgrade for a content database as the SharePoint configuration wizard. The benefit of using this Cmdlet is that you can run it in parallel in several powershell windows against different content databases. That can help to significantly reduce the database upgrade time after installing a hotfix as it allows to upgrade several content databases in parallel. The SharePoint configuration wizard would upgrade the content databases sequentially one after the other.

Another benefit is the -UseSnapshot parameter listed above.

This parameter will create a snapshot of the specified database and then perform all upgrade operations that apply to the database. Existing connections to the content database from the different SharePoint servers in the farm will be set to use the snapshot for the duration of the upgrade, and then switched back after successful completion of upgrade. Be aware that this parameter can only be used with versions of SQL Server that support creation and use of snapshots, for example, SQL Server Enterprise edition.

As all regular SharePoint operations are executed against the snapshot while the content database is upgraded the problems listed above will not occur. The caveat here is that the snapshot is read-only. That means although it prevents the above listed problems related to dropping stored procedures while they are in use they will prevent all write operations to the SQL database.

After upgrading the SharePoint content databases you still have to run the SharePoint configuration wizard to upgrade all the other SharePoint databases – and this can also cause downtime to services accessing these databases during the upgrade.

So it is not possible to achieve zero-downtime for read-write operations during the hotfix installation with SharePoint 2013.

 

Changes in SharePoint Server 2016

With SharePoint Server 2016 the amount of MSP files has been significantly reduced. This will ensure that the time to patch the servers will be much shorter. Aside of that you still have to ensure that you have more than one server per role (high availability) to guarantee zero-downtime patching as the windows services have to be restarted during patching. MinRole is not required for this! All the details on how to technically apply the patches can be found in the following Technet article:

The biggest improvements over SP2013 are on the upgrade side. Several improvements have been made to ensure that upgrading the SharePoint databases does not lead to a downtime for the end user. These changes include changes on how the changes are applied but also restricting what changes are allowed to be done in a hotfix request. E.g. all stored procedures have to be backward compatible to ensure that if one stored procedure is updated with a hotfix it can still be called from older stored procedures which are updated through a later step in the update cycle. We also update the stored procedures without dropping them to prevent outages if a stored proc is called while it is in the limbo state between drop and recreation.

These changes are long tested in SharePoint online where upgrades are performed for all SharePoint online servers every couple of weeks while the service is live without any read-only windows for customers.

These improvements apply as well to content databases and to other SharePoint databases that need to be upgraded. You can still use Upgrade-SPContentDatabase to speed up the database upgrade step but using the -SnapShot parameter will not bring a benefit when trying to minimize the downtime. It can actually be counter-productive as it leads to a read-only content database.

With the improvements implemented and using redundant servers for each role it is possible to achieve zero-downtime during the hotfix installation with SharePoint Server 2016.

No comments:

Post a Comment