[2022-08-11]: Cloning Spider failed to start after deployment
Date
2022-08-11 22:28 CEST
Summary
- Cloning Spider failed to start after deployment
- Why?
- It tried to apply migrations and failed
- Why?
- Migrations were out of date for the deployed image
- Why?
- TF Cloud applied old image to it during deployment
- Why?
- TF Cloud stored old image in its state and were applying it
- Why?
- During creating of TF files we forgot to add "lifecycle->ignore_changes" section
Authors
Rafal Kuc, Stepan Maksymchuk
Impact
The information about the impact of the issue including:
- affected infrastructure elements
- affected product features
- affected users
Detection
Dariusz Krol brought it up on the #-outages
channel in Slack. It worth noting that we had
this issue for quite a while, but never invested our effort to find the root cause to the
point of resolving it. We used workaround to deploy Cloning Spider via GitHub actions and
that was sufficient to fix the occurrence of the issue.
Resolution
We've added "lifecycle->ignore_changes" section for the Cloning Spider that includes
template[0].spec[0].containers[0].image
which says to TF to ignore changes in image and
not deploy the one stored in its state.
Timeline
The timeline of the events related to the issue in form of the table:
Time | Description |
---|---|
2022-08-10 22:30:00 | The issue was reported via outages channel in Slack |
2022-08-10 23:15:00 | Initial investigation was done and results were shared |
2022-08-11 17:00:00 | More deep investigation was done, workaround was found and performed unblocking cloning spider |
2022-08-11 22:00:00 | Root cause was found and issue was fixed |
Action Items (optional)
Lessons Learned (optional)
Working with TF, we need to keep in mind that it stores the current state, so if we change the state outside of TF (like in this case via GitHub actions), we need to make sure that TF ignores these changes.