Monitoring - Incident Report – Interim RCA
Date: April 8, 2025
Prepared by: Alberto Fernandez, Director Platform Engineering

Overview:

On Monday, April 7, 2025, we identified an issue impacting our New Jersey (NJ) data center, which was experiencing unusually high bandwidth utilization. Upon discovery, our engineering team took immediate action to expand the circuits in the NJ facility and initiated an in-depth investigation to determine the source of the increased traffic.

Initial Response:

We promptly performed sanity checks on the existing circuits to rule out any physical degradation or faults and proceeded to upgrade the circuits. Despite these mitigations, the bandwidth consumption remained abnormally high even with seemingly lower usage levels.

Root Cause Analysis – Ongoing:

Further investigation revealed that a network interface located in our Las Vegas (Vegas) data center was introducing packet drops. These packet losses negatively affected the SBus messaging system, which facilitates communication between all three of our cores. As a result, the SBus queue began to grow excessively, causing retransmissions and creating systemic side effects, including degraded performance in data centers that were otherwise healthy.

Corrective Action:

To prevent the Vegas site from continuing to destabilize the broader infrastructure, we made the strategic decision to take the Vegas core offline. Prior to this, we carefully migrated customers out of the Vegas environment to avoid service disruption. During the migration, however, we observed that some customers were manually reverting their configurations to route traffic back through the Vegas data center. These actions inadvertently nullified our remediation efforts.

To preserve the integrity of the infrastructure and accelerate recovery, we made the decision to disable the portal feature allowing customers to manually select their data center location. This control will remain disabled while we restore all environments to a healthy and fully optimized state.

Next Steps:

We are actively developing a remediation and validation plan to address the networking issues within the Vegas data center. This plan includes a comprehensive round of testing to ensure long-term stability before reintegrating the Vegas site back into our production ecosystem.

We strongly recommend that all customers whitelist the IP addresses of all three data centers within their firewall configurations. This ensures seamless service continuity in the event that we need to rehome services to a different location as part of our platform’s stability and automated recovery processes.

Additionally, we advise against manually configuring phones, as this can lead to inconsistencies and potential service disruptions. Instead, we encourage the use of our automated provisioning systems and DNS SRV records, which are designed with built-in self-healing capabilities to enhance reliability and minimize downtime.
Transparency and Forward Commitment:

This document serves as an interim incident report and is not intended to serve as the final Root Cause Analysis (RCA). A formal RCA will be published once all issues are fully resolved and the affected infrastructure is confirmed to be stable.

Our commitment to service excellence includes maintaining transparency during incidents. We will continue to provide timely updates and documentation to ensure our partners and customers are fully informed throughout the process.

We appreciate your patience and understanding as we work diligently to restore full operational integrity across our environment.

Apr 09, 2025 - 13:09 EDT
Update - As part of our continued commitment to enhancing the stability, reliability, and overall performance of our platform, we are implementing an important change that will affect how data center locations are assigned for your domains.

To reduce latency, minimize inter-region and geo-redundant calls, and streamline backend operations, we will be removing the option to manually select the data center location for your domains through the portal. Going forward, all domains will be automatically assigned to the optimal data center based on a combination of system performance metrics, geographic proximity, and redundancy best practices.
This change is being made with your service experience in mind. By centralizing and automating data center assignment, we can significantly:
• Improve overall service reliability and uptime
• Reduce the risk of cross-region routing issues
• Enhance failover efficiency and response times
• Simplify management and reduce potential for configuration conflicts

We understand that some users may have previously utilized manual data center selection to align with specific internal preferences. However, our new intelligent assignment model ensures a more consistent and high-performing experience for all customers without requiring manual input.

This update will take effect today. There will be no impact to your existing domain functionality, and you will continue to receive the same level of service and support you expect from us.

If you have any questions or concerns about this update, please don’t hesitate to reach out to our support team at [support contact info].
Thank you for your understanding and continued trust in our platform. We’re confident this improvement will contribute to an even more stable and seamless experience moving forward.

Apr 08, 2025 - 17:02 EDT
Update - Engineers are relocating devices away from the Las Vegas core to safely troubleshoot the network issues affecting this node. Updates will follow.
Apr 08, 2025 - 15:52 EDT
Update - Service has been restored to the Las Vegas core.
Apr 08, 2025 - 09:11 EDT
Update - Due to network issues at the Las Vegas data center, the Las Vegas core will remain offline. The server will be brought back online once the network has been tested and validated.
Apr 07, 2025 - 23:38 EDT
Update - Service has been restored in the New Jersey data center.
Apr 07, 2025 - 22:15 EDT
Update - We will be temporarily redirecting traffic from the New Jersey data center to perform network tests. Updates to follow.
Apr 07, 2025 - 21:29 EDT
Update - The New Jersey data center experienced a network event earlier this morning that has since cleared. This event could have caused delays with the Xima integration. Updates to follow.
Apr 07, 2025 - 12:40 EDT
Update - The network issue has been identified and corrected. Services in Las Vegas have been fully restored and are now operating normally.
Apr 05, 2025 - 17:28 EDT
Update - We are continuing to work on a fix for this issue.
Apr 05, 2025 - 17:26 EDT
Identified - We have identified the cause of the network issue and are working on a solution. Updates to follow.
Apr 04, 2025 - 18:21 EDT
Investigating - Due to network connectivity issues between the New Jersey and Las Vegas data centers, we have decided to 503 the Las Vegas core. All traffic directed to the Las Vegas core will be redirected to the Florida core. If you have any questions or concerns, please reach out to support@viirtue.com or call 1-833-VIIRTUE
Apr 04, 2025 - 09:45 EDT

About This Site

Welcome to the Viirtue status page. This page will communicate any service impairments across all platforms.

Apollo Under Maintenance
90 days ago
100.0 % uptime
Today
Manager Portal Operational
90 days ago
100.0 % uptime
Today
API Operational
90 days ago
100.0 % uptime
Today
FL2 Core Server Operational
90 days ago
100.0 % uptime
Today
Messaging (SMS/MMS) Operational
90 days ago
100.0 % uptime
Today
WebMeetings Operational
90 days ago
100.0 % uptime
Today
WebPhone Operational
90 days ago
100.0 % uptime
Today
Provisioning Server Operational
90 days ago
100.0 % uptime
Today
NJ2 Core Server Operational
90 days ago
100.0 % uptime
Today
Mobile App - iOS ? Operational
90 days ago
100.0 % uptime
Today
Mobile App - Android ? Operational
90 days ago
100.0 % uptime
Today
Analytics QoS ? Operational
90 days ago
100.0 % uptime
Today
Recording NJ ? Operational
90 days ago
100.0 % uptime
Today
Recording FL ? Operational
90 days ago
100.0 % uptime
Today
LV Core Server Under Maintenance
90 days ago
100.0 % uptime
Today
LV QoS Server Under Maintenance
90 days ago
100.0 % uptime
Today
LV Recording Server Under Maintenance
90 days ago
100.0 % uptime
Today
Zeus Operational
90 days ago
100.0 % uptime
Today
API Operational
90 days ago
100.0 % uptime
Today
Manager Portal Operational
90 days ago
100.0 % uptime
Today
FL Core Server Operational
90 days ago
100.0 % uptime
Today
NJ Core Server Operational
90 days ago
100.0 % uptime
Today
Messaging (SMS/MMS) Operational
90 days ago
100.0 % uptime
Today
WebPhone Operational
90 days ago
100.0 % uptime
Today
WebMeetings Operational
90 days ago
100.0 % uptime
Today
Provisioning Server Operational
90 days ago
100.0 % uptime
Today
Mobile App - iOS ? Operational
90 days ago
100.0 % uptime
Today
Mobile App - Android ? Operational
90 days ago
100.0 % uptime
Today
Analytics QoS ? Operational
90 days ago
100.0 % uptime
Today
Recording-Cloud ? Operational
90 days ago
100.0 % uptime
Today
Recording FL ? Operational
90 days ago
100.0 % uptime
Today
ViiBE Operational
90 days ago
100.0 % uptime
Today
Fax Operational
90 days ago
100.0 % uptime
Today
MobileConnect Operational
90 days ago
100.0 % uptime
Today
iOS Operational
90 days ago
100.0 % uptime
Today
Android Operational
90 days ago
100.0 % uptime
Today
Windows Operational
90 days ago
100.0 % uptime
Today
MacOS Operational
90 days ago
100.0 % uptime
Today
Cloudflare ? Operational
90 days ago
100.0 % uptime
Today
Data Centers Under Maintenance
90 days ago
100.0 % uptime
Today
Florida Data Center Operational
90 days ago
100.0 % uptime
Today
New Jersey Data Center Operational
90 days ago
100.0 % uptime
Today
Las Vegas Data Center Under Maintenance
90 days ago
100.0 % uptime
Today
Microsoft Teams Integration Operational
90 days ago
100.0 % uptime
Today
Direct SBC Integration ? Operational
90 days ago
100.0 % uptime
Today
TeamMates Integration Operational
90 days ago
100.0 % uptime
Today
SIP Trunk Cluster Operational
90 days ago
100.0 % uptime
Today
NJ SIP Trunk Cluster Core ? Operational
90 days ago
100.0 % uptime
Today
FL SIP Trunk Cluster Core ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Apr 26, 2025

No incidents reported today.

Apr 25, 2025

No incidents reported.

Apr 24, 2025

No incidents reported.

Apr 23, 2025

No incidents reported.

Apr 22, 2025

No incidents reported.

Apr 21, 2025

No incidents reported.

Apr 20, 2025

No incidents reported.

Apr 19, 2025

No incidents reported.

Apr 18, 2025

No incidents reported.

Apr 17, 2025

No incidents reported.

Apr 16, 2025

No incidents reported.

Apr 15, 2025
Completed - The scheduled maintenance has been completed.
Apr 15, 23:00 EDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 15, 22:00 EDT
Scheduled - Our team will be performing scheduled maintenance on the Zeus cluster. We do not anticipate any service disruptions during the maintenance window.

If you have any questions or require assistance, please contact our support team at support@viirtue.com or call 1-833-VIIRTUE.

Apr 15, 12:05 EDT
Apr 14, 2025

No incidents reported.

Apr 13, 2025

No incidents reported.

Apr 12, 2025

No incidents reported.