Bỏ qua đến nội dung
DevOps Lab

GCP — GitOps networking

Portfolio đầy đủ (hero, sections, diagram) — port first-class trong app, không iframe.

☁️ Google Cloud Platform Portfolio

GCP Cloud EngineerEnterprise Landing Zone · Networking · Security · IaC

Thiết kế và vận hành enterprise GCP environments với landing zone, Shared VPC, Private Service Connect, VPC Service Controls và toàn bộ infrastructure-as-code bằng Terraform.

Compute EngineGKECloud RunCloud FunctionsTerraformCloud BuildCloud ArmorVPC Service ControlsShared VPCPrivate Service ConnectCloud InterconnectIdentity-Aware Proxy
4GCP Projects (hierarchy)
3Shared VPC host projects
100%Terraform-managed
0Public IPs (PROD workloads)

Tổng quan hệ thống GCP

🏗️
Enterprise Landing Zone
Resource hierarchy: Organization → Folders → Projects. Org policies, billing budgets, centralized logging sink vào BigQuery. Theo GCP Cloud Adoption Framework.
🌐
Shared VPC Architecture
Host project quản lý tập trung subnets, firewall rules. Service projects mount vào Shared VPC. Private Service Connect cho internal service mesh.
🔒
Zero-Trust Security
VPC Service Controls tạo security perimeter. Cloud Armor WAF + DDoS. Identity-Aware Proxy cho internal apps. Không có workload nào có public IP trong PROD.
Hybrid Connectivity
Cloud Interconnect (Dedicated/Partner) cho on-prem. HA VPN làm backup. Cloud DNS với DNS peering giữa VPCs và on-prem resolver.
🤖
Infrastructure as Code
100% Terraform với module hierarchy mirror resource hierarchy. Remote state trên GCS với object versioning. Cloud Build pipeline cho automated apply.
📊
Observability
Cloud Monitoring dashboards, log-based alerting, Cloud Trace. Security posture via Security Command Center. Cost visibility qua Billing Export → BigQuery → Looker Studio.
📐 Key Design Decisions
Shared VPCTách network governance (host project) khỏi workload (service projects). Networking team kiểm soát toàn bộ subnet, firewall — dev teams không thể tự mở port.
VPC-SCSecurity perimeter bọc PROD project: Cloud Storage, BigQuery, Artifact Registry, Secret Manager. Data exfiltration bị chặn ngay cả khi IAM bị compromise.
PSCPrivate Service Connect thay thế VPC peering. Không transitive, không IP overlap issues. GKE → Cloud SQL, Memorystore, internal APIs đều qua PSC endpoints.
Org PolicyDisable external IPs (compute.vmExternalIpAccess), restrict resource locations (gcp.resourceLocations), enforce uniform bucket-level access. Enforce tại Org level — không override được.
Interconnect HA2 Dedicated Interconnect circuits × 2 edge availability zones = 99.99% SLA. HA VPN làm cold standby. BGP failover tự động <1 phút.

Công nghệ sử dụng

⚙️ Compute & Serverless
ServiceUse CaseConfig
GKE AutopilotProduction workloadsasia-southeast1
Compute EngineBastion, build agents, legacy VMsSpot VMs enabled
Cloud RunStateless microservices, APIsVPC egress via PSC
Cloud Functions v2Event-driven, pub/sub triggersGen2 (Cloud Run backed)
Cloud StorageTerraform state, artifacts, data lakeVPC-SC protected
🌐 Networking
ServiceUse CaseDetail
Shared VPCCentralized network governanceHost + 4 service projects
Cloud Load BalancingGlobal L7 HTTPS, regional L4 TCPGLB + ILB
Private Service ConnectInternal service endpointsNo transitive routing
Cloud InterconnectOn-prem hybrid connectivity2×10Gbps HA
Cloud DNSPrivate zones + DNS peeringOn-prem forwarding
🔒 Security
ServiceRoleScope
IAM + Workload IdentityIdentity & access managementOrg + Project level
VPC Service ControlsData exfiltration preventionPROD perimeter
Cloud ArmorWAF + DDoS protectionGLB security policy
Identity-Aware ProxyZero-trust app accessInternal tools
Secret ManagerSecrets & credential managementCMEK encrypted
🤖 IaC & CI/CD
ToolRoleDetail
TerraformInfrastructure provisioningModules + remote GCS backend
Cloud BuildTerraform CI/CD pipelineTrigger on PR → plan, merge → apply
GitHub ActionsApp CI + Workload IdentityOIDC → no long-lived keys
AnsibleVM config management, OS hardeningDynamic inventory via GCP plugin
Artifact RegistryDocker images, Terraform modulesVPC-SC + CMEK
📋 GCP Services — Quick Reference Map
Compute
  • GKE Autopilot
  • Compute Engine
  • Cloud Run
  • Cloud Functions v2
  • Batch
Storage & Data
  • Cloud Storage
  • Cloud SQL (HA)
  • Memorystore (Redis)
  • BigQuery
  • Pub/Sub
Security
  • IAM + Org Policy
  • VPC Service Controls
  • Cloud Armor
  • Identity-Aware Proxy
  • Secret Manager (CMEK)
Ops & Observability
  • Cloud Monitoring
  • Cloud Logging
  • Cloud Trace
  • Security Command Center
  • Billing Export → BQ

Enterprise Landing Zone — Resource Hierarchy

🎯 Tại sao cần Landing Zone?

Landing Zone là foundation trước khi deploy bất kỳ workload nào. Không có Landing Zone, mỗi team tự tạo project → billing không kiểm soát, IAM loạn, network flat, không audit trail.

Với Landing Zone: centralized governance, Org Policies làm guardrail, networking nhất quán, audit logs và billing budgets tự động — theo GCP Cloud Adoption Framework (Learn · Lead · Scale · Secure).

Resource hierarchy — Organization → Folders → ProjectsORGANIZATIONduynguyen.comFolders (PROD / Non-Prod / Shared)Host / NetworkShared VPCWorkload ProjectsGKE · Cloud Run · DataSecurity / AuditSCC · Logging

👥 Multi-Team IAM — Production-Ready Access Design (GCP)

🧭 2 Tầng IAM — Landing Zone + Project Level
Tầng 1 — Landing Zone (Org-level)
Google Groups synced từ Google Workspace/Okta → IAM bindings tại Folder level → Org Policies làm guardrail. Define ai được phép làm gì ở folder nào.
Tầng 2 — Project Level
IAM bindings chi tiết cho GKE, Cloud SQL, GCS, VM + K8s RBAC per-namespace. GCP không có SCP như AWS nhưng Org Policies làm guardrail tương đương.
🏢 Layer 1 — Google Workspace Groups + Org Policies
sre@
SRE / Platform
dev@
App Engineers
data@
Data / ML
security@
Audit / Compliance
finops@
Billing / Cost
📁 PROD folder
sre@ → editor (scoped) security@ → viewer (all) dev@ → NO folder access
📁 Non-Prod folder
dev@ → editor (staging) data@ → viewer sre@ → owner (full)
🔒 Org Policies
disableSAKeyCreation · requireOsLogin · No primitive roles PROD
🗂️ Layer 2+3 — Team × Resource Access Matrix
TeamGKE PRODCloud SQLGCSBigQueryVM / IAPSecret Mgr
sre@container.admin → cluster-admincloudsql.admin (full DBA)storage.admin allbigquery.adminosAdminLogin + MFA PRODsecretmanager.admin
dev@container.developer → ns-developer (own ns)cloudsql.client · no DROPobjectCreator (app bucket)osLogin DEV onlysecretAccessor /dev/*
data@container.viewer (data-ns)cloudsql.viewer · SELECT onlyobjectViewer /data/dataViewer❌ Denied❌ Denied
security@container.viewer all nscloudsql.viewer read-onlyobjectViewer allmetadataViewerDescribe onlyList (no value)
finops@getBucketTaggingDescribe (cost labels)

GCP Networking Deep Dive

Shared VPC tách network governance (host project quản subnets, firewall, routes) khỏi service projects gắn workload. Private Service Connect thay VPC peering cho kết nối dịch vụ nội bộ; Cloud NAT + Cloud Router cho egress có kiểm soát. Chi tiết Shared VPC và bảng so sánh với peering nằm ở mục Shared VPC Deep Dive.


Shared VPC — Enterprise Production Architecture

🤔 Folders làm gì? Tại sao cần Shared VPC?
📁 Folders — Group + Governance boundary

Folder là logical container nhóm projects có cùng governance. Org Policies và IAM kế thừa từ trên xuống — set một lần ở folder, projects bên trong inherit.

  • Environment isolation: Prod / Non-Prod / Management folder → tách blast radius.
  • IAM delegation: grant folder-level role → áp dụng xuống mọi project trong folder.
  • compute.restrictSharedVpcSubnetworks per folder → team chỉ dùng subnet được gán.
🌐 Shared VPC — Tại sao không dùng VPC Peering?
VPC PeeringShared VPC
RoutingNon-transitive (A↔B, B↔C ≠ A↔C)Tất cả service projects dùng chung 1 VPC → full mesh tự động
Network governanceMỗi project tự quản firewallCentralized — Host project team kiểm soát toàn bộ
IP overlapKhông được overlap CIDRKhông vấn đề — cùng 1 VPC
ScaleMax 25 peering per VPCKhông giới hạn service projects
Interconnect sharingCần replicate mỗi project1 Interconnect → tất cả service projects dùng chung

Key insight: Shared VPC giảm nhu cầu replicate cùng một giải pháp trên mỗi project — ví dụ một Cloud Interconnect cho toàn bộ service projects.

🗺️ Production-Ready Shared VPC — Full Architecture (asia-southeast1)
Shared VPC full architecture🌍 INTERNETUsers · Partners · External APIs⚡ Global LB (GLB)EXTERNAL_MANAGED L7🛡 Cloud ArmorWAF · DDoS · GeoHOST PROJECT — prj-prod-net (Network governance centralized)VPC: vpc-prod-shared · Custom mode · Global routing · 10.0.0.0/8 · No default routes, no auto subnetssubnet-lb-proxy10.10.100.0/24purpose: REGIONAL_MANAGED_PROXYILB L7 requiredsubnet-app10.10.0.0/20 (4096 IPs)GKE nodes + Cloud RunPGA: ✅ FlowLogs: ✅secondary: pods /17, svc /22subnet-data10.10.16.0/20 (4096 IPs)Cloud SQL · MemorystorePGA: ✅ FlowLogs: ✅No secondary ranges neededsubnet-psc10.10.200.0/28 (16 IPs)PSC consumer endpointspurpose: PRIVATE_SERVICE_CONNECTsubnet-nat (optional)10.10.250.0/28Cloud NAT source IPspurpose: PRIVATE_NATStatic egress poolBGP link IPs169.254.x.x/29 (link-local)Interconnect VLAN attachCloud Router BGP peersNot routable in VPCFIREWALL RULES (VPC-level, tag-based, managed by Host Project team ONLY)deny-all-ingress(65534) · allow-internal(10.0.0.0/8 → tag:internal) · allow-ilb-health(130.211.0.0/22,35.191.0.0/16 → tag:ilb-backend) · allow-iap-ssh(35.235.240.0/20 → tag:ssh) · allow-nat-egress(tag:nat → 0.0.0.0/0)SERVICE PROJECT: prj-prod-appWorkloads only · No network admin rights · Uses subnet-appGKE AutopilotCluster: gke-prod-ase1go-api pods 10.10.0.xapi-app pods 10.10.0.xNode IP: subnet-appPod CIDR: 100.64.0.0/17Svc CIDR: 100.96.0.0/22Cloud RunVPC egress: ALL_TRAFFICgo-api-svc (private ingress)demo-svc (private ingress)Subnet: subnet-appNo external IPNEG → GLB backendInternal LB L7 (INTERNAL_MANAGED)10.10.0.50 : 443 · Uses subnet-lb-proxy · Routes internal microservice trafficCloud Functions v2Pub/Sub triggersVPC connector → subnet-apptag: nat (for egress)Workload Identitygo-api-sa → secretmanagergke-sa → storage, arNo SA key filesSERVICE PROJECT: prj-prod-dataManaged data services · Uses subnet-data · VPC-SC protectedCloud SQL HAPostgreSQL 15PRIMARY 10.10.16.2STANDBY 10.10.16.3 (HA)Private IP onlyPSC producer attachmentCMEK: Cloud KMS keyBackups: gs://prod-sql-bkMemorystoreRedis 7.0 HAPRIMARY 10.10.16.10REPLICA 10.10.16.11Private IP onlyPSC producer attachmentIn-transit encryption: ✅AOF persistence: ✅☁ Cloud Storage (VPC-SC perimeter)gs://prod-app-dataStandard · CMEK · UBLVersioning: ✅gs://prod-sql-backupNearline · CMEK · UBLRetention lock: 7dAccess ONLY via PSC endpoint 10.10.200.4 (no public googleapis.com)📊 BigQuery (prj-prod-data)Datasets: billing_export · app_analytics · audit_logsVPC-SC protected · Column-level security · Authorized viewsSERVICE PROJECT: prj-prod-platformCI/CD · Monitoring · Artifact Registry · DNSPSC Consumer Endpoints (subnet-psc)psc-cloudsql 10.10.200.2 → Cloud SQL service attachmentpsc-redis 10.10.200.3 → Memorystore service attachmentpsc-all-apis 10.10.200.4 → Google APIs (all-apis bundle)psc-artifact-reg 10.10.200.5 → Artifact RegistryDNS override: *.googleapis.com → 10.10.200.4 (private zone: dns-hub)DNS override: *.pkg.dev → 10.10.200.5 (private zone: dns-hub)Artifact Registrydocker · go-modulesCMEK · VPC-SCasia-southeast1Cloud BuildPrivate pool (VPC)→ push to AR→ deploy GKECloud Monitoring · Cloud Logging · SCCLog sink: ALL projects → BigQuery (billing_export) · GCS archive 1yrDashboards: GKE · Cloud Run · Cloud SQL · Cost per projectCloud DNS — dns-hub projectPrivate zone: internal.duynguyen.com (all VPCs via peering)Fwd zone: corp.duynguyen.local → 192.168.1.53 (on-prem, via Interconnect)PSC zones: googleapis.com, pkg.dev → PSC IPsCloud NAT (asia-southeast1 · Host Project)Static IPs: 34.x.x.1, 34.x.x.2 · Tags: nat · GKE nodes + Cloud Run → internet (egress only)No VM has external IP · Org Policy: vmExternalIpAccess = DENYCloud Router + Interconnect + HA VPN (Host Project)Dedicated Interconnect: 2×10Gbps BGP ASN65001 prio:100 · HA VPN: 2 tunnels prio:200 · Failover <60sAdvertised: 10.10.0.0/16 → on-prem · Received: 192.168.0.0/16 ← on-prem🏢 ON-PREMISES (192.168.0.0/16)DNS: 192.168.1.53 · Corp devices (BeyondCorp) · LDAP/AD directoryDedicated Interconnect colocation: Equinix SG2 · BGP ASN: 65002🔒 ORG-LEVEL CONTROLSOrg Policies (8 constraints) · Access Policy (VPC-SC) · Billing Budgets · SCC PremiumBootstrap SA: terraform@prj-bootstrap · IAM binding: Org Admin role only on bootstrapPSCILB L7

🎯 GCP Interview Deep Dive — Hay Bị Hỏi

❓ Q1 — GCP tương đương NAT GW / Internet GW của AWS là gì?
GCP: Không có "Internet Gateway" tường minh
  • GCP VPC là global — không attach IGW. Subnet có Private Google Access để reach GCP APIs mà không cần public IP.
  • Inbound internet: GLB + public IP → forward vào private backend. Không có “public subnet” như AWS.
  • Outbound từ instance có public IP: route 0.0.0.0/0 qua default internet gateway (ẩn, không cần config).
  • Outbound từ instance không có public IP: phải dùng Cloud NAT.
Cloud NAT — outbound only
  • Attach vào Cloud Router (không attach vào subnet như AWS). Cloud Router là BGP speaker; Cloud NAT dùng Cloud Router để route.
  • Auto-allocate external IPs hoặc reserved IP. SNAT: private IP → NAT external IP.
  • GCP APIs không đi qua Cloud NAT — dùng Private Google Access hoặc PSC endpoint thay thế.
GCP networking interview flowsInbound (internet → GKE pod):🌍 InternetGLBCloud ArmorBackend ServiceNEG (pod IP)GKE Podprivate IP✓ no public IPOutbound (pod → internet: pypi, github):GKE PodCloud NAT+ Cloud Router🌍 Internetno inbound ←Outbound (pod → GCP APIs: Secret Manager, GCS):GKE PodPSC Endpoint10.10.200.4 (DNS)GCP APIsGoogle backboneno NAT, no internetAWS IGW ↔GCP default internet gw (ẩn, auto) + GLB cho inboundAWS NAT GW ↔GCP Cloud NAT (attach Cloud Router, không attach subnet)AWS VPC Endpoint ↔GCP Private Service Connect (PSC) + Private Google AccessAWS public subnet ↔GCP không có concept này — GLB nhận traffic, pod luôn privateKey diff:GCP VPC global → 1 VPC nhiều region, không phải regional như AWS
❓ Q2 — PSC vs VPC Peering vs Shared VPC — khi nào dùng cái nào?
PSC vs peering vs Shared VPCPrivate Service Connect1 service → N consumers, CIDR overlap OKConsumer AConsumer Bdiff account OKConsumer CProducer VPCNEG/ILB endpointforwarding rule✅ expose 1 service, CIDR overlap OK✅ managed service (CloudSQL, Redis)VPC Peering2 VPC full routing, NOT transitiveVPC A10.0.0.0/16project XVPC B10.1.0.0/16project Y⚠ VPC C không reach VPC A qua B(non-transitive!)✅ 2 VPC full mesh, same region❌ N VPCs = N² peerings, CIDR no overlap❌ Không scale được (prefer Shared VPC)Shared VPC ⭐ (project này dùng)1 host VPC, N service projects share subnetsHost Project: prj-prod-net (owns subnets)subnet-app 10.10.0.0/20 · subnet-data 10.10.16.0/20prj-prod-appGKE · Cloud Runprj-prod-dataCloud SQL · Redisprj-prod-platformCI/CD · Registry✅ Centralized firewall (host project controls)✅ Shared Interconnect / Cloud NAT✅ Org Policy restrict subnets per folder❌ Không dùng nếu need cross-org isolation hoàn toàn❌ Service project workloads dùng host VPC (coupling)
ScenarioDùng gìLý do
GKE + Cloud SQL + Cloud Run trong cùng org, cần share networkShared VPCCentralized firewall, subnet governance, Interconnect sharing
Expose Cloud SQL cho service project khác mà không share subnetPSCManaged service endpoint, CIDR isolation, cross-project OK
2 VPC cần nói chuyện, không share host projectVPC PeeringSimple 2-way routing, không transitive, CIDR không overlap
GCP managed services (Secret Manager, GCS) từ private podPSC (all-APIs endpoint)Không qua internet, không NAT, VPC-SC enforced
N projects cần reach on-prem qua 1 InterconnectShared VPCHost project owns Cloud Router + Interconnect, service projects inherit
💡GCP vs AWS tương đương: GCP vs AWS tương đương: Shared VPC ≈ Transit Gateway (ở mức sở hữu network). PSC ≈ AWS PrivateLink. VPC Peering ≈ AWS VPC Peering (cùng giới hạn non-transitive).
❓ Q3 — Cloud SQL logs + GKE workload logs → Cloud Logging / Elasticsearch

Cloud SQL: bật audit/slow query flags → log tự động vào Cloud Logging → sink BigQuery hoặc Pub/Sub → ES. GKE: Fluentbit built-in (Autopilot) đưa stdout vào Cloud Logging; có thể thêm Vector/Alloy cho multi-sink tới Loki.

Cloud SQL → Cloud Logging
  • Flags: cloudsql.enable_pgaudit, log_min_duration_statement, log_connections — log ship tự động tới Cloud Logging (không cần agent trên instance như RDS + CW agent).
  • Log sink: Logging → BigQuery (phân tích SQL) hoặc Pub/Sub → Dataflow/Logstash → Elasticsearch.
GKE workload logs
  • GKE Autopilot: Fluentbit DaemonSet đưa stdout/stderr vào Cloud Logging; label pod để filter.
  • RabbitMQ / sidecar: log stdout → cùng pipeline; ES self-hosted qua sink + Pub/Sub nếu cần.
terraform (Cloud SQL + sinks)
# Cloud SQL — audit + slow query
resource "google_sql_database_instance" "prod" {
  settings {
    database_flags {
      name  = "cloudsql.enable_pgaudit"
      value = "on"
    }
    database_flags {
      name  = "log_min_duration_statement"
      value = "1000"
    }
    database_flags {
      name  = "log_connections"
      value = "on"
    }
  }
}

# Log sink → BigQuery
resource "google_logging_project_sink" "cloudsql_bq" {
  name        = "cloudsql-to-bigquery"
  destination = "bigquery.googleapis.com/projects/${var.project}/datasets/audit_logs"
  filter      = "resource.type=cloudsql_database"
}

# Log sink → Pub/Sub → ES pipeline
resource "google_logging_project_sink" "to_pubsub" {
  name        = "logs-to-pubsub"
  destination = "pubsub.googleapis.com/projects/${var.project}/topics/logs-export"
  filter      = "resource.type=k8s_container OR resource.type=cloudsql_database"
}
Cloud SQL và GKE log paths trên GCPCloud SQL:Cloud SQLCloud LoggingBigQueryorPub/SubDataflow/ LogstashElasticsearchGKE / RabbitMQ:Pod stdoutFluentbit (built-in)Cloud LoggingLoki / ES← Autopilot: Fluentbit sẵn có
❓ Q4 — Internet → private GKE service + xử lý tấn công (GCP)

GCP: Internet → Cloud Armor → GLB (HTTPS) → NEG (pod IP trực tiếp). Không cần Ingress controller riêng như AWS LBC — GLB là managed edge của GCP.

GLB + Cloud Armor + IAP tới GKEGCP — Internet → private pod:ClientCloud ArmorWAF + DDoSGLB HTTPSTLS terminationIAPGoogle OIDCBackend + NEGpod IPGKE Podprivate IPvs AWS: ALB → LBC → ClusterIP → PodInternal (svc → svc):Pod A → Service ClusterIP → Pod B (kube-proxy / eBPF). NetworkPolicy: deny-all, allow explicit.
Tấn công phổ biến & lớp phòng thủ GCP
KiểuLayerDefenseDetect
DDoS volumetricL3/L4 trước GLBGCP DDoS Protection built-in (always-on). Cloud Armor Adaptive Protection — ML tạo rule khi anomaly.Cloud Armor logs → Monitoring alert
SQLi / XSSL7 qua GLBCloud Armor WAF: sqli-v33-stable, xss-v33-stable. Gắn policy vào backend service.Cloud Armor sampled requests
Unauth accessGLB → backendIAP — Google OIDC trước khi traffic tới pod; điều kiện domain/device.IAP audit — rejected requests
Bypass GLB → podVPC firewallChỉ allow từ GLB health ranges (130.211.0.0/22, 35.191.0.0/16). deny-all-ingress priority thấp. NetworkPolicy deny-all trừ namespace ingress.VPC Flow Logs → BigQuery
Malicious imageGKE admissionBinary Authorization + Artifact Analysis CVE. Kyverno chỉ cho phép image từ Artifact Registry.SCC Container Threat Detection
Container escapeNodeGKE Autopilot hardened — COS, Shielded VM, gVisor tùy workload. Node SA tối thiểu.SCC GKE Threat Detection → Pub/Sub
Data exfilVPC-SCVPC Service Controls — exfil BigQuery/GCS ra ngoài perimeter bị chặn dù IAM hợp lệ.SCC anomalous access
💡Trả lời ngắn: L1 GCP DDoS built-in. L2 Cloud Armor WAF. L3 IAP. L4 Firewall + NetworkPolicy. L5 Binary Auth + Kyverno. L6 VPC-SC. L7 Workload Identity (không JSON key).
📋 Q5 — Quick Reference — "Khi nào dùng gì" (GCP)
ScenarioGCP solutionTránh
GKE pod gọi GCP APIs (Secret Manager, GCS) không qua internetPSC endpoint + Cloud DNS override *.googleapis.comCloud NAT ($/GB) cho Google APIs
N projects share subnet + firewallShared VPC (host owns network)VPC Peering N², non-transitive
Expose Cloud SQL cho project khác, CIDR overlapPSC (psc-cloudsql)Peering (CIDR phải unique)
Internet → GKE private podGLB HTTPS + Cloud Armor + IAPNodePort/LoadBalancer trực tiếp (bypass WAF)
Pull image Artifact Registry privatePSC + node SA artifactregistry.readerPublic registry không kiểm soát
On-prem → GCP bandwidth caoDedicated Interconnect + HA VPN standbyChỉ HA VPN (latency biến thiên)
SSH không bastion publicIAP TCP tunneling (gcloud compute ssh --tunnel-through-iap)Public IP + SSH
Cloud SQL credential không key fileWorkload Identity → cloudsql.clientJSON key trên pod
Chống exfil khi IAM compromiseVPC Service Controls perimeterChỉ IAM (key theft bypass)
Multi-env không driftFlux ResourceSet — 1 template, defaultValues theo envHelm values copy-paste mỗi env

GCP Security Deep Dive — Defense in Depth

Sơ đồ dưới mô tả thứ tự kiểm soát từ Internet vào workload: Cloud Armor trên GLB → IAP (zero-trust nội bộ) → VPC Service Controls + firewall → IAM / Workload Identity → Secret Manager + CMEK + Binary Authorization.

🛡️ Security Architecture — Defense in Depth (tổng quan)
GCP security defense in depth🌍InternetUser / AttackerLAYER 1Cloud ArmorWAF + DDoS + GeoOWASP Top 10 rulesSQLi · XSS · RCE blockedRate limiting1000 req/min per IPGeo restrictionAllow: VN, SG, US onlyAdaptive DDoSAuto-detect L3/L4/L7Preconfigured rulesModSecurity CRS 3.3↳ Attached to GLBbackend service🛡LAYER 2Identity-Aware ProxyZero-trust app accessGoogle identity verifyOAuth2 OIDC flowIAM Conditionsemail · group · time-of-dayDevice policyCorp device required (BeyondCorp)SSH via IAP tunnel35.235.240.0/20 only↳ Internal apps onlyNo VPN needed🔐LAYER 3VPC Service ControlsData exfil perimeterSecurity perimeterGCS · BQ · AR · SecMgrAccess LevelsIP range · device · identityVPC-SC BridgeDEV ↔ PROD perimeterFirewall rulesTag-based, deny default↳ Even if IAM bypasseddata stays in perimeter🔒LAYER 4IAM + Workload IdentityLeast privilege, no keysWorkload Identity Fed.K8s SA → GCP SA (no key)IAM Conditionsresource.name · time · IPOrg Policy: no SA keysForce Workload IdentityService Account minimal1 SA per workload↳ No long-lived credsOIDC tokens only🗝LAYER 5Secrets + EncryptionData at rest & in transitSecret ManagerVersioned · CMEK · auditCloud KMS (CMEK)GCS · BQ · GKE etcdTLS everywheremTLS in GKE meshBinary AuthorizationOnly signed images deploy↳ Workload(protected)🔑EdgeIdentityNetworkAuthData

Terraform Modules — Resource Labeling Convention

🏷️ GCP Resource Labeling — Cost Allocation & Module Attribution

Mọi Terraform module follow pattern labeling nhất quán: label derive từ basename(abspath(path.module)) — Cost breakdown chính xác theo team/product; terraform-module tự cập nhật khi rename folder.

modules/_common/labels.tf (pattern)
locals {
  default_labels = {
    (basename(abspath(path.module))) = var.name
    "terraform-module" = basename(abspath(path.module))
    "product"          = var.product
    "team"             = var.team
    "environment"      = var.environment
  }
  merged_labels = merge(local.default_labels, var.labels)
}

VictoriaMetrics — Cluster → Operator Migration on GCP

⚡ Tại sao migrate từ VM Cluster (Helm) → vm-operator trên GCP?
❌ Trước: Helm per project
  • Nhiều file Helm values dev/staging/prod — drift khó phát hiện.
  • vminsert/vmstorage/vmselect replica khác nhau từng env — dễ nhầm toán.
  • Upgrade Helm tuần tự từng cluster — sequence sai có thể split-brain vmstorage.
  • Scrape rules trong ConfigMap thủ công — không validation, khó review diff.
  • Metric GCP (Cloud SQL, Cloud Run, Pub/Sub) scrape riêng — maintenance ~8h/tháng.
✅ Sau: victoria-metrics-operator
  • VMCluster CRD — operator quản lifecycle, rolling upgrade zero-downtime.
  • VMServiceScrape / VMPodScrape — discover theo label K8s, schema validated.
  • VMRule trong Git — alert validated khi apply, drift được operator sửa.
  • VMAgent với WIF — không file SA key; Flux ResourceSet giữ parity mọi env.
  • Vận hành giảm mạnh (~45 phút/tháng vs ~8 giờ) khi chuẩn hóa GitOps.
⚔️ VictoriaMetrics trên GCP vs AWS — Key Differences
ComponentAWS (EKS)GCP (GKE Autopilot)Lý do khác nhau
vmstorage diskgp3 · EBSpremium-rwo · PD-SSDAutopilot chỉ RWO; premium-rwo đủ IOPS cho write-heavy vmstorage.
VMAgent identityIRSA — annotation EKS → IAM roleWIF — annotation iam.gke.io/gcp-service-accountCùng pattern không key file; binding Terraform workloadIdentityUser.
VMAgent queueS3 persistent queueGCS persistent queueremoteWrite.persistentQueue hỗ trợ cả S3 và GCS.
GCP-managed metricsCloudWatch exporterStackdriver exporter + monitoring.viewerRate limit API — thường interval 60s trong VMServiceScrape.
Grafana accessALB + ACM + Cognito/OIDCILB L7 + IAP + Google OIDCIAP thay lớp auth edge (BeyondCorp) trước Grafana.
Security findingsSecurity Hub → SNS → AlertmanagerSCC Premium → Pub/Sub → webhookCùng flow cảnh báo; khác dịch vụ quản lý.
Billing alertsCost Explorer → CloudWatch alarmBilling Export → BigQuery + VMRuleDữ liệu chi phí đầy đủ trong BQ; có thể alert metric billing qua VMRule.

Wiki HTML có thêm sơ đồ VMCluster full (~1020×620) và pipeline Vector/Fluentbit — không nhúng nguyên văn để giữ bundle nhẹ; bảng và bullet phía trên giữ đúng semantics (Tier A).


Secrets Management — OpenBao/Vault + ESO trên GCP

GCP Secret Manager vs OpenBao/Vault + ESO — So sánh và Use Cases

Secret Manager đủ cho static secrets và WIF-native. OpenBao/Vault khi cần dynamic secrets, cross-cloud, policy chi tiết hơn IAM. External Secrets Operator là bridge chuẩn từ store → Kubernetes Secret.

Tiêu chíGCP Secret ManagerOpenBao / VaultESO
Dynamic secretsChỉ staticCloud SQL, IAM, PKI — TTL từng leasePhụ thuộc backend
WIF / GKE authNative — annotationK8s auth methodDùng WIF cho SM backend
Secret versioningCó rollbackKV v2 versionedremoteRef.version
Cross-cloudGCP onlyGom nhiều cloud / on-premMulti-store
Audit trailCloud Audit LogsVault audit logEvents trên ExternalSecret
Emergency revokeDisable versionlease revoke-prefixN/A
Cost / OpsManaged, per-callSelf-hosted — vận hành HAOpen-source operator

Wiki có sơ đồ OpenBao + ESO + SM trên GKE; ở đây giữ bảng so sánh và luồng vận hành tương đương Tier A.


GCP Workload Identity Federation — Deep Dive

🤔 GCP dùng gì? So sánh với AWS IRSA và Pod Identity

Workload Identity bind Kubernetes ServiceAccount với GCP Service Account qua OIDC: metadata server trên node intercept, exchange token qua Cloud STS — không cần JSON key trên pod.

Runtime flow (rút gọn)
WIF runtimePodK8s SA annotatedmetadata GETGKE Metadata Serverintercept + OIDC token(node-local)Cloud STStoken exchangeworkload poolGCP Service Accountaccess token ~1hIAM bindingGCP APIsSM · GCS · AR · SQLKhông JSON key trên pod — binding Terraform: workloadIdentityUser trên GCP SA.
AWS IRSAAWS Pod IdentityGCP WIF
InterceptWebhook inject tokenAgent DaemonSet IMDSGKE Metadata Server
Token exchangeSTS AssumeRoleWithWebIdentityEKS → STSCloud STS token exchange
Trust configIAM trust OIDCpod_identity_associationIAM binding workloadIdentityUser
SA annotationeks.amazonaws.com/role-arnKhông cầniam.gke.io/gcp-service-account
DaemonSet neededKhôngeks-pod-identity-agentBuilt-in GKE (metadata server)
Token TTL24h projected1h1h

Wiki HTML có thêm flow runtime (metadata server → STS → GCP API) và các bước Terraform setup — có thể mở file gốc khi cần chi tiết từng bước; bảng trên bao phủ semantics chính (Tier A).

GCP Cloud Engineer portfolio · React port · DevOps Lab