Guidance: Missing Domain Controller Heartbeat
Cyber Security may receive an automated ticket that a DC is no longer sending heart beats to Azure Monitor/Sentinel. It is very important that we find the root cause per incident as if this was because of a malicious actor this can be very risky as the machines are the DCs.
There can be a number of different causes, here are some examples:
1. Scheduled/Unscheduled Power Outage
2. Issues with the Server Hardware or Hypervisor
3. Windows Update
4. Scheduled/Unscheduled Long Reboot
5. Microsoft Monitoring Agent/ Azure Monitor Issues
6. Device compromise
Process if DC Heartbeat is still failing:
- If Device is accessible via Jumbox-prd01, Restart the MMA Service Microsoft Monitoring Agent and Microsoft Monitoring Agent Azure VM Extension Heartbeat service.
- If it is not accessible, escalate to ITTS for them to restart the server via the ILO
Things to check for Root Cause Analysis:
1. Search SNOW for PRTG Alerts for the DC
2. Check if any Power Outages have been reported onsite (Service Desk should do this)
3. Check Redcentric Support and see if there are any tickets related to the DC or HyperV linked to the DC
4. Ping DC to check connectivity
5. Sentinel Query to confirm timing (Be aware of time zones as Sentinel Logs are in UTC) and if heartbeats are now being received from DC
6. Log onto Jumpbox-prd01 -> RDP to the affected DC -> CMD. Check when device was last restarted. Command – systeminfo (System Boot Time)
7. Log onto Jumpbox-prd01 -> RDP to the affected DC -> PowerShell. Check Update History of Device to see if it matches up to heartbeat failure.
Are there any open Incidents, Requests or Changes relating to the server in question or the -Site-HYPERV-01? This can be searched by typing in the server’s name in the top right-hand corner of ServiceNow. Please note these down in the ticket and provide analysis on whether this would cause the DC to be unavailable.