Slow Page Loading and Intermittent Login Issues

Incident Report for Incident IQ Platform Status

Postmortem

On Monday December 2, Incident IQ experienced a service interruption that caused degradation of services for several customers.

‌

Timeline of events:

9:39 AM ET - Began receiving customer reports of slowness/degradation of services for portions of the platform to include login authentication.
11:41 AM ET - All services and platform responsiveness levels restored

Cause of Incident:

As part of normal operating procedures, Incident IQ scales platform capacity to meet the needs of our current customer and predictive demands.
Incident IQ uses the Microsoft Azure platform to automatically scale infrastructure as part of normal operating procedures. The process to scale encountered errors on Monday December 2nd and took five times longer than normal to complete.
The combination of infrastructure not scaling on normal schedules and the larger load than the available infrastructure caused the outage.

Remediation:

Services were ultimately restored by a combination of manually adding infrastructure resources and the automated scaling job completion
Incident IQ is working alongside Microsoft and have identified that unforeseeable abnormally high transactions on the SQL environment during scaling caused the process to run longer.
We are continuing to work with Microsoft to prevent a recurrence of this issue.

The reliability of our platform remains of the utmost importance to us. We understand the impact these moments have on our customers. The remediations put in place to prevent a recurrence of this particular incident, as well as the processes we have in place to continuously improve the platform, provide us with a level of certainty that we are able to stay ahead of unexpected surges in traffic.

As before, we do sincerely apologize for this disruption and want to thank you for your patience and partnership as we worked through this issue.

Posted Dec 09, 2024 - 09:00 EST

Resolved

This incident has been resolved and will be updated with a postmortem as soon as our investigation is complete.

We understand how essential our platform is for your schools’ day-to-day activities, and we are committed to ensuring its reliability. We deeply regret any inconvenience this may have caused and appreciate your understanding as we worked to resolve the issue.

Posted Dec 02, 2024 - 17:09 EST

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 02, 2024 - 11:49 EST

Identified

We are experiencing some intermittent site responsiveness issues that are causing slow page loading times and intermittent login problems. Our Engineering Team is currently investigating the cause and working on a solution. We will provide updates here as soon as they are available. Apologies for the inconvenience and thanks for your patience.

Posted Dec 02, 2024 - 11:20 EST

Update

During our continued platform monitoring, we have identified additional intermittent loading and responsiveness issues. Our Engineering team is focused 100% on returning the platform to optimal operational status. We will post updates here as they become available.

Posted Dec 02, 2024 - 10:55 EST

Monitoring

We have identified the issue and implemented a resolution. We are continuing to monitor the platform for stability and responsiveness. Additional updates regarding the cause of the issue and remediation steps will be posted here as soon as they are available.

Posted Dec 02, 2024 - 10:16 EST

Investigating

Posted Dec 02, 2024 - 09:46 EST

This incident affected: Platform.