I experienced an issue this week with ConfigMgr cloud management gateway not communicating with the primary site. The CMG essentially looked like it died, even though the site itself looked completely healthy. The root cause is still being investigated by Microsoft, but we managed to find a workaround without losing devices which were internet only connected.
The SMS_CLOUD_ProxyConnector.log contained the below error; this one was strange as it’s a single instance so only should be communicating via 443.
ERROR: Failed to build Tcp connection <GUID> with server XXXX.XXX.COM:10140. Exception: System.Net.WebException: Unexpected status code End from proxy server than Continue~~ at ERROR: Failed to build Http connection <GUID> with server XXXX.XXX.COM:443. Exception: System.Net.WebException: The remote server returned an error: (990) BGB Session Ended.~~ at System.Net.HttpWebRequest.GetResponse()~~ at
Microsoft support advised us that this error indicated that the IIS configuration on the CMG tried to update the cert information but failed to start.
ERROR: System.ArgumentException: No TimeStamp node~~ at Microsoft.ConfigurationManager.AzureRoles.ProxyService.ProxyConnectorInfo.Parse(String xml)~~ at Microsoft.ConfigurationManager.AzureRoles.ProxyService.WebRole.HandleProxyConnectorInfoChanges(String connectorInfoXml)~~ at Microsoft.ConfigurationManager.AzureRoles.ProxyService.WebRole.OnStart() CMGSetup 7/4/2018 12:43:48 AM 6 (0x0006)
CMG Event Log
When we checked the event log on the CMG, it had errors relating to the MSDTC and the W3WP service crashing. This is still being investigated but is never a good sign on a server which you technically shouldn’t need to touch.
Log Name: Application Source: Microsoft-Windows-MSDTC Client 2 Date: 7/4/2018 12:43:31 AM Event ID: 4879 Task Category: CM Level: Warning Keywords: Classic User: N/A Computer: RD00155D52B74C Description: MSDTC encountered an error (HR=0x80000171) while attempting to establish a secure connection with system RD00155D52B74C.
The primary concern with the CMG and internet only connected devices was to ensure we don’t lose connectivity to the devices. The connectivity is primarily based upon the MutualAuth URL which contains an ID as part of it which clients connect to. So to get around the issue, we pressed the big red button and had to rebuild the CMG without losing the MutualAuth…
- Make a note of the MutualAuthPath
SELECT [MutualAuthPath] FROM [dbo].[vProxy_Roles] Where [RoleTypeID] = '6'
- Copy the Proxy Settings
SELECT * FROM PROXYSETTINGS
- Provisioned a new CMG (leaving the connection point role intact)
- Flipped the Connection Point on the primary server to the new CMG from the drop down
- Made a note of the MutualAuth and ProxyConnectionInfo(Was still the same ID just different server name)
- Removed the original CMG
- Confirmed MutualAuth hadn’t changed
- Recreated the original CMG (Note check the cloud service in Azure as it disappears from ConfigMgr but seen it not completely remove the resource)
- Flipped the connection point back to the new original CMG
- Compared MutualAuth and ProxyConnectionInfo from the original and looks to be the same
Working through these steps restored management connectivity without losing the connection string. It still doesn’t explain why they originally died in the first place though.
Update: I was advised by a colleague that the product team advised you “should” be able to delete a CMG as long as you leave the MP setting intact but I’ve yet to test to this 🙂