GCP

Gemini 3.5 Flash region toggle removed — migrate to Vertex AI endpoints & traffic-split

Google removed the Gemini 3.5 Flash region-scoped feature toggle in mid‑June 2026, forcing teams to use endpoints, model versions, and traffic-split controls.

June 20, 2026·3 min read·AI researched · AI written · AI reviewed

Google just changed the control plane for Gemini 3.5 Flash: the feature-management toggle was removed on June 16, 2026 across Global, US, and EU multi-region deployments. That’s not a cosmetic tweak — it collapses a rollout primitive many teams used for staged experiments, region-specific opt-outs, and compliance-based gating.

If your team relied on toggling behavior at the region level to do slow rollouts, soft launches, or emergency feature kills, you now have to switch patterns. The removal simplifies parity across regions (fewer accidental drift scenarios), and it’s the kind of governance move enterprise platform teams should have pushed for five years ago. But make no mistake: it’s going to break workflows that treated region toggles as a lightweight safety valve.

What just changed

  • Effective June 16, 2026, the region-scoped feature-management toggle for Gemini 3.5 Flash has been removed for multi-region endpoints (global, us, eu). You must now manage behavior at the endpoint, deployed model, or model-version level using Vertex AI's Model and Endpoint APIs and the endpoint traffic-splitting controls.

Why this matters for platform teams

Two simple operational patterns collapse when toggles go away:

  1. Safe, region-by-region canaries. Teams that turned new model behaviors on in a single region and ramped by geography can’t do that via a single toggle anymore.
  2. Emergency region-scoped disables. If a region had a compliance or cost exception, operators previously switched that behavior off locally. That lever is removed.

The right response is to manage deployment topology explicitly: create separate endpoints or model versions per rollout cohort and use Vertex AI endpoint traffic-splitting, blue/green endpoints, or region-aware routing in your edge/proxy layer. Treat model configuration as immutable build artifacts in CI/CD. If you haven’t already automated endpoint creation and trafficSplit management in your Terraform/Argo/Cloud Build pipelines, this change will force that work — which is good for reproducibility but painful the first week.

And a word on cost/governance: uniform behavior across regions reduces the risk of inconsistent inference billing or policy drift. If you’ve wrestled with surprise token costs and cross-region agent billing, the simplification is helpful — albeit blunt.

Other mid‑June GCP updates worth folding into planning

  • GKE and TPU tooling: Updates in mid‑June include improved telemetry for TPU workflows and changes to node auto-provisioning behavior when you run mixed CPU/TPU workloads. If you use TPU node types, double-check node pool templates, taints/tolerations, and startup scripts for TPU drivers; autoscaler expectations may shift.

  • BigQuery + Gemini-based Assistant (Preview): BigQuery now has a preview integration with a Gemini-based assistant surfaced through Vertex AI for query help and schema/lineage hints. AI-assisted lineage can help impact analysis for SQL changes and model retraining triggers; plan to fold these previews into CI for dataset migrations and schema evolution workflows.

  • Managed Spark runtimes: Google delayed a mid‑June rollout of minor Spark runtime updates by about a week. If you scheduled benchmarks or cost/perf comparisons around the earlier window, shift your upgrade testing; the delay affects upgrade gating and budget forecasts for batch ML jobs.

  • GA and incremental releases: Several location and platform APIs reached GA or had minor updates in mid‑June. Also expect incremental updates to contact-center and CCaaS integrations—include those downstream impacts in capacity and placement planning.

Opinion: This is the right call, executed bluntly

Removing the Gemini 3.5 Flash toggle is the right call for enterprise governance and predictable billing. Region-scoped feature toggles create invisible surfaces that leak compliance, auditability, and cost assumptions. However, Google chose a blunt instrument rather than a migration runway: teams now have to upgrade their deployment topology immediately. If your model delivery pipelines aren’t treating endpoints and versions as first-class, immutable artifacts, you’ll be firefighting.

Final thought

Treat this as an operational deadline: bake model endpoints, versioning, and traffic-split automation into your CI/CD this quarter. The era of invisible, region-scoped model feature toggles is ending — and the teams that win will be those that treated models like services from day one. For background on cost and runtime patterns for Gemini-era Vertex AI, our earlier piece on Vertex AI Gemini 3.x agent billing and token costs is still relevant.

Sources

gemini-3-5vertex-aigkebigquery
← All articles
GCP

Vertex AI Agent Engine: Sessions, Memory Bank & Code Execution billing begins 2026-01-28

Vertex AI Agent Engine will charge for Sessions, Memory Bank, and Code Execution starting 2026-01-28. Teams must rethink agent state and cost telemetry.

Jun 19, 2026·3mvertex-aigemini
GCP

Google Gemini Enterprise Agent Platform pricing: AI Cost Summary Agent (Preview) and token-rate details

Google Cloud added an AI Cost Summary Agent (Preview) and published Gemini Enterprise pricing with explicit storage, per-session, and token rates and discounts.

Jun 18, 2026·3mgemini-enterprise-agent-platformgcp
GCP

Vertex AI Gemini 3.x: agent billing, token costs, and Cloud Run GPU patterns

Gemini 3.x on Vertex AI is billed by input and output tokens; agent orchestrations can generate multiple billable events. Track tokens, retrieval, and compute.

Jun 16, 2026·3mvertex-aigemini