Quick Comparison
| Product | Best For | Rating |
|---|---|---|
| LaunchDarkly | Best Overall | 4.7/5 |
| Flagsmith | Best Budget | 4.6/5 |
| Split.io | Best Premium | 4.7/5 |
| Optimizely Rollouts | Best for Teams | 4.5/5 |
| Unleash | Best Compact | 4.6/5 |
I have implemented feature flags at three companies (early-stage startup, mid-size SaaS, large enterprise) and learned which patterns scale and which create technical debt. Hereโs the walkthrough that actually works.
Why Feature Flags
Without flags: Deploy code = release feature. Risky.
With flags: Deploy code dark โ enable in production for testing โ enable for 1% of users โ ramp to 10% โ 50% โ 100% โ remove flag.
Benefits:
- Deploy without releasing (continuous deployment, separate from feature launches)
- Kill switches for emergent issues
- A/B testing infrastructure
- Gradual rollouts to catch regressions
- Beta programs and entitlements
Flag Types
Release Flags (short-lived): Control when a new feature becomes visible. Lifetime: 1-4 weeks until 100% rollout. Then remove.
Permission Flags: Long-lived. Control which users/orgs see a feature. Examples: enterprise features, beta access. Permanent.
Experiment Flags: Control A/B/n test variations. Lifetime: 2-8 weeks until statistical significance. Then ramp winner to 100% and remove.
Kill Switch Flags (operational): Long-lived but rarely flipped. Disable a feature in production emergencies. Permanent.
Configuration Flags: Anti-pattern - donโt use flags for static config. Use environment variables or config files.
Implementation Patterns
Simple boolean
if feature_flag_enabled("new_checkout_flow", user_id):
return new_checkout_flow(order)
return legacy_checkout_flow(order)
Use for: simple on/off rollouts.
Percentage rollout
if feature_flag_percentage("new_search", user_id, percentage=10):
# 10% of users hash to this branch
return new_search(query)
return legacy_search(query)
Hash user_id consistently so same user gets same treatment across requests.
Targeted rollout
if feature_flag_targeted("premium_features", user_id,
targeting={'plan': ['enterprise', 'pro']}):
return premium_view(data)
return standard_view(data)
Use for: entitlements, beta programs.
Multi-variant
variant = feature_flag_variant("checkout_variant", user_id)
if variant == "A":
return checkout_a(order)
elif variant == "B":
return checkout_b(order)
return checkout_control(order)
Use for: A/B/n testing.
Rollout Strategy
Week 1-2: Internal testing
- Flag enabled for employees only
- Targeting by email domain or user IDs
- Real-world testing in production with low risk
Week 3: Beta users
- Flag enabled for 100 beta program users
- Targeted by user opt-in or specific user IDs
- Active monitoring and feedback collection
Week 4: 1% rollout
- Random 1% of users
- Monitor metrics dashboards intensely
- Roll back if any regression detected
Week 5: 10% rollout
- Watch for issues at scale
- Confirm metrics align with expectations
- Roll back if needed
Week 6: 50% rollout
- Stress test infrastructure
- Confirm no infrastructure scaling issues
- Final regression checks
Week 7: 100% rollout
- Full release
- Remove flag from code in next sprint
Kill Switch Pattern
For risky features, build kill switch from day 1:
if feature_flag_enabled("payment_v2_killswitch"):
raise FeatureDisabledError("Payment v2 temporarily disabled")
# Normal payment v2 code
The kill switch is normally OFF. Activate only when issues occur. Allows immediate disable without code deploy.
Cleanup Discipline
Dead flags create technical debt:
Quarterly flag audit:
- List all flags in code
- Identify which have been at 100% for 30+ days
- Identify which have been at 0% for 90+ days
- Schedule cleanup in next sprint
Cleanup process:
- Confirm flag is at 100% rollout
- Remove flag check from code
- Delete legacy code path
- Remove flag definition from service
- Verify in production
Avoid: Keeping flags โjust in caseโ after rollout. They become bug magnets and confuse future engineers.
Common Mistakes
Nested flag dependencies: Flag A controls feature, Flag B modifies behavior when A enabled. Creates state explosion. Result: 4 code paths to test per request. Avoid.
Flag as config: Using flags for things that donโt change in production. Use environment variables for config.
Forgetting cleanup: 18-month-old flags at 100% rollout are common in codebases without discipline. Audit quarterly.
No metrics: Rolling out without monitoring impact. Define success metrics before flag rollout, monitor during.
Cascading rollback complexity: Adding flag B that depends on flag Aโs state. Makes rollback messy. Keep flags orthogonal.
Service vs Custom
Custom (config file or database):
- 0-5 flags
- Single team
- Low frequency of changes
- Minimal cost
- High maintenance burden as flags grow
Service (LaunchDarkly, Statsig, Split.io):
- 10+ flags
- Multi-team
- Frequent changes (multiple per day)
- Cost:-thousands/month
- Better visibility, audit trail, targeting
Open source (Unleash, GrowthBook):
- Self-hosted alternative to commercial services
- Free but requires ops investment
- Good for cost-conscious teams with ops capability
My Recommendations
For startups:
- Custom flags initially via config file
- Migrate to Unleash (self-hosted) at 10+ flags
- Migrate to Statsig (free tier <100K MAU) at experimentation scale
For mid-size SaaS:
- LaunchDarkly or Statsig
- Establish flag governance from day 1
- Quarterly cleanup discipline
For enterprise:
- LaunchDarkly for breadth of features
- Centralized flag service team
- Strong governance and approval flows
- Integration with deploy pipelines
Metrics to Track
Per flag, track:
- Rollout percentage over time
- Affected users
- Performance impact (latency, errors)
- Business metrics impact
- Flag lifetime
Per system, track:
- Total active flags
- Flags >90 days old at 100%
- Flag-related incidents
- Time spent debugging flag issues
Frequently asked questions
What are feature flags?+
Conditional code paths that enable/disable features without redeploying. Decouples deployment from release - you ship code dark, then turn it on for specific users or percentages later. Critical for safe deployments and A/B testing.
Build my own or use a service?+
For 0-5 flags, build your own (config file or database). For 10+ flags or organization-wide use, dedicated services (LaunchDarkly, Statsig, Split.io, Unleash) are worth the cost. Custom systems become unmanageable past ~20 flags.
How long should flags live?+
Short flags (release-related): 1-4 weeks. Permission flags (entitlement): permanent. Experiment flags: 2-8 weeks then either ramp to 100% or remove. Flags that have lived 6+ months are usually candidates for cleanup.
Common mistakes?+
Forgetting to clean up flags after 100% rollout (creates dead code). Cascading flag dependencies that become un-debuggable. Using flags as configuration rather than feature toggles. Not having a kill switch for new features.
Cost of feature flag services?+
LaunchDarkly:+/month depending on scale. Statsig: free tier for <100K MAUs. Split.io:. Unleash: free open source + paid SaaS. For early-stage startups, free tiers cover initial needs.