Infinite loop during sign-in
Incident Report for Factbird
Postmortem

On November 21st, 2024, at 5:28 PM CET, the Factbird application and its authorization server experienced a communication mismatch caused by a faulty health check that incorrectly marked the interface between the two services as operational. This issue resulted in users attempting to sign in being redirected repeatedly between the two services, manifesting as a blinking error screen.

Efforts to stabilize the services using an older protocol revealed the root cause: a version mismatch in the authorization server. Unlike our continuous integration testing environments, the production server had been pinned to an incorrect version. This caused it to invalidate client application requests, forcing users back to the application to initiate a new handshake.

To resolve the issue, we synchronized the authorization server to the correct version, beginning at 7:06 PM CET on the first production environment. After thorough validation and monitoring, we proceeded to update the remaining environments, completing the deployment at 7:54 PM CET.

We deeply regret the inconvenience caused and remain committed to enhancing our processes to prevent similar incidents in the future.

Posted Nov 21, 2024 - 20:43 UTC

Resolved
Both our application and authorization service have no stabilized and users are able to sign in again.
Posted Nov 21, 2024 - 20:20 UTC
Monitoring
We have now rolled back the change in our authorization service and we are now actively monitoring our services for any further problems related to this incident.
Posted Nov 21, 2024 - 18:21 UTC
Identified
We are working on rolling back a release that results in an infinite loop of redirects between our sign-in page and our application.
Posted Nov 21, 2024 - 17:04 UTC
This incident affected: Authentication Service.