Cause
Due to an Azure service outage affecting Azure Automation; automation jobs were not running on a Hybrid Worker and going to a failed state or hanging.
The issue was raised with Microsoft, who advised us to remove a certificate from the Microsoft Monitoring Agent store. However, this was before they were fully aware of the outage. After realising it was a wider outage the removal completely broke the Hybrid Worker.
Upon the service outage being rectified, the Hybrid Worker never came back up again…
Issue
To try and bring the Hybrid Worker back online. I went through the usual steps of removing the Hybrid Worker troubleshooting article. When attempting to add/remove the Hybrid Worker from the server, it just kept returning back the below error:
Add-HybridRunbookWorker : Failed to create a self-signed certificate.
At line:1 char:1
+ Add-HybridRunbookWorker -Name $HybridGroupName -EndPoint $AutomationE ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (AgentService.Hy...RunbookWorkerV2:AddRunbookWorkerV2) [Add-HybridRunbookWorker], CertificateCreationException
+ FullyQualifiedErrorId : Failed to create a self-signed certificate.,AgentService.HybridRegistration.PowerShell.Commandlets.AddRunbookWorker
Resolution
To resolve the issue, I used the good old PSExec tool to run the command as system.
- Stopped the service “HealthService”
- Removed HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HybridRunbookWorker
- Downloaded PSEXEC
- Ran PSEXEC64.exe -i -s -d CMD (You could just run PowerShell from here too) PSexec Parameter information
- Opened PowerShell from the System CMD
- Ran the Add-HybridWorker command first. This added another Hybrid Worker of the same name
- Ran Remove-HybridWorker which completely removed the stale record
- Reran Add-HybridWorker and confirmed all work back online
- Restarted the server “HealthService”