On November 21st, 2024, at 5:28 PM CET, the Factbird application and its authorization server experienced a communication mismatch caused by a faulty health check that incorrectly marked the interface between the two services as operational. This issue resulted in users attempting to sign in being redirected repeatedly between the two services, manifesting as a blinking error screen.
Efforts to stabilize the services using an older protocol revealed the root cause: a version mismatch in the authorization server. Unlike our continuous integration testing environments, the production server had been pinned to an incorrect version. This caused it to invalidate client application requests, forcing users back to the application to initiate a new handshake.
To resolve the issue, we synchronized the authorization server to the correct version, beginning at 7:06 PM CET on the first production environment. After thorough validation and monitoring, we proceeded to update the remaining environments, completing the deployment at 7:54 PM CET.
We deeply regret the inconvenience caused and remain committed to enhancing our processes to prevent similar incidents in the future.