Skip to main content

[2022-08-11]: Cloning Spider failed to start after deployment

Date

2022-08-11 22:28 CEST

Summary

  • Cloning Spider failed to start after deployment
  • Why?
  • It tried to apply migrations and failed
  • Why?
  • Migrations were out of date for the deployed image
  • Why?
  • TF Cloud applied old image to it during deployment
  • Why?
  • TF Cloud stored old image in its state and were applying it
  • Why?
  • During creating of TF files we forgot to add "lifecycle->ignore_changes" section

Authors

Rafal Kuc, Stepan Maksymchuk

Impact

The information about the impact of the issue including:

  • affected infrastructure elements
  • affected product features
  • affected users

Detection

Dariusz Krol brought it up on the #-outages channel in Slack. It worth noting that we had this issue for quite a while, but never invested our effort to find the root cause to the point of resolving it. We used workaround to deploy Cloning Spider via GitHub actions and that was sufficient to fix the occurrence of the issue.

Resolution

We've added "lifecycle->ignore_changes" section for the Cloning Spider that includes template[0].spec[0].containers[0].image which says to TF to ignore changes in image and not deploy the one stored in its state.

Timeline

The timeline of the events related to the issue in form of the table:

TimeDescription
2022-08-10 22:30:00The issue was reported via outages channel in Slack
2022-08-10 23:15:00Initial investigation was done and results were shared
2022-08-11 17:00:00More deep investigation was done, workaround was found and performed unblocking cloning spider
2022-08-11 22:00:00Root cause was found and issue was fixed

Action Items (optional)

Issue: #3927 PR: #3930

Lessons Learned (optional)

Working with TF, we need to keep in mind that it stores the current state, so if we change the state outside of TF (like in this case via GitHub actions), we need to make sure that TF ignores these changes.