Skip to content

Online

December 20th, 2024 — Downtime

DNS Outage

Monitoring — resolved
STATUS: MONITORING, RESOLVED
DNS / Limited # of Dedicated Servers / Shared / Virtual / Semi-dedicated.
Impact was felt across email, web & internal services.
Incident
  • Impact: Due to zone misconfigurations, components of our DNS cluster were knocked offline, leading to DNS resolution issues across the fleet. This impacted users on all systems relying on affected DNS servers — email, websites, internal software.
  • Length of Impact: Varied anywhere from 15 minutes to 4 hours depending on impacted client / system.

Root Cause
  • Primary Issue:
    • Misconfiguration in zone files caused the named service to crash.
    • The corrupted zone files led to improper loading of DNS zones, triggering the service failure.
  • Cascading issues caused:
    • Custom code external to the DNS stack exacerbated the issue by propagating blank configurations to some dedicated/virtual servers.

Resolution
  • The corrupted zone files were identified and corrected.
  • Named service was restarted successfully after validating all DNS zone files.
  • Enhanced monitoring has been implemented to track DNS health and service uptime.

Mitigation & Next Steps
  1. Immediate Actions Taken:

    • Verified and corrected all zone files across affected systems.
    • Restarted the named service on all impacted servers.
    • Monitored closely for stability post-fix.
  2. Preventative Measures:

    • Add additional validation for zone file integrity before deployment.
    • Update custom DNS stack code to include fail-safes and stricter error handling.

Conclusion

This incident highlighted some gaps in zone file validation. More frequent dns zone scanning/linting is required to mitigate long-term. The planned mitigations aim to prevent recurrence and ensure faster resolution if similar issues arise.

IMPACTED SYSTEMS

DNS / Limited # of Dedicated servers / Shared / Virtual / Semi-dedicated

November 17th, 2022 — Network / Power outage

Resolved

IMPACT

Shared / Dedicated

3h 52m of downtime.

Of utmost importance — thank you for your patience. As elements of our infrastructure continue to return to service, let’s break down what’s occurred:

During routine maintenance, a breaker was tripped in our datacenter resulting large chunks of our fleet going offline. We’re still actively working with the datacenter to bring 100% of our clients up, but many are online. Fused.com & anything hosted on it are still offline.

Moving forward we’ll be offloading the bulk of our external site, callback system & helpdesk to a 3rd party network so that we can ensure communication is as attentive as you’re accustomed to regardless of fleet status.

Thank you.

David McKendrick
Fused