Kubernetes Deployment (EKS)

Deploy Data Stream on Amazon EKS with autoscaling using KEDA.

Architecture

Kubernetes Architecture

AWS EKS Setup

Prerequisites

AWS CLI configured with appropriate permissions
kubectl installed
eksctl (optional, for cluster creation)
Helm 3.x

Create EKS Cluster (if needed)

# Using eksctl
eksctl create cluster \
  --name datastream-prod \
  --region us-east-1 \
  --nodegroup-name workers \
  --node-type t3.medium \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 5 \
  --managed

# Configure kubectl
aws eks update-kubeconfig --name datastream-prod --region us-east-1

Verify Connection

kubectl cluster-info
kubectl get nodes

Prerequisites

Kubernetes 1.25+ (EKS 1.28+ recommended)
KEDA 2.x installed
AWS Load Balancer Controller (for ALB Ingress)
kubectl configured

Install KEDA

# Using Helm
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

# Verify installation
kubectl get pods -n keda

Deployment

1. Create Namespace

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: datastream

kubectl apply -f namespace.yaml

2. Install AWS Load Balancer Controller

The ALB Ingress Controller creates Application Load Balancers for your services:

# Add EKS Helm repo
helm repo add eks https://aws.github.io/eks-charts
helm repo update

# Install AWS Load Balancer Controller
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=datastream-prod \
  --set serviceAccount.create=true \
  --set serviceAccount.name=aws-load-balancer-controller

3. Deploy Services

# Apply all manifests from EKS base
kubectl apply -k provider/k8s/eks/base/

# Check deployments
kubectl get deploy -n datastream

# Expected output:
# NAME       READY   UP-TO-DATE   AVAILABLE   AGE
# backend    3/3     3            3           5m
# consumer   2/2     2            2           5m
# redis      1/1     1            1           5m
# nats       1/1     1            1           5m

4. Verify Ingress

# Check ALB creation
kubectl get ingress -n datastream

# Get ALB DNS
kubectl get ingress datastream -n datastream -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

EKS-Specific Configuration

Ingress with ALB

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: datastream
  namespace: datastream
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:xxx:certificate/xxx
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
  rules:
    - host: datastream.hypetech.games
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: backend
                port:
                  number: 3000

External Secrets (AWS Secrets Manager)

# external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: datastream-secrets
  namespace: datastream
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: datastream-secrets
  data:
    - secretKey: REDIS_URL
      remoteRef:
        key: datastream/prod
        property: redis_url
    - secretKey: DATABASE_URL
      remoteRef:
        key: datastream/prod
        property: database_url

Resource Requests for EKS

Properly sized resources for EKS nodes:

# backend deployment resources
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

# consumer deployment resources
resources:
  requests:
    cpu: "50m"
    memory: "64Mi"
  limits:
    cpu: "200m"
    memory: "256Mi"

Autoscaling with KEDA

Data Stream uses KEDA for intelligent autoscaling based on workload metrics.

Consumer Scaling (NATS JetStream Lag)

The consumer scales based on pending messages in the NATS JetStream consumer:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: consumer-scaledobject
  namespace: datastream
spec:
  scaleTargetRef:
    name: consumer
  pollingInterval: 15
  cooldownPeriod: 60
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
    - type: nats-jetstream
      metadata:
        natsServerMonitoringEndpoint: "nats.datastream.svc.cluster.local:8222"
        account: "$G"
        stream: "DebeziumStream"
        consumer: "cdc-public-rounds"
        lagThreshold: "100"           # Scale up when lag > 100
        activationLagThreshold: "10"  # Start scaling at lag > 10

How it works:

Lag = 0-10: 1 replica (minimum)
Lag = 10-100: Gradually scale up
Lag > 100: Scale to maximum (5 replicas)
Cooldown: 60 seconds before scaling down

Backend Scaling (CPU/Memory)

The backend scales based on resource utilization:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: backend-scaledobject
  namespace: datastream
spec:
  scaleTargetRef:
    name: backend
  pollingInterval: 15
  cooldownPeriod: 60
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
    - type: cpu
      metricType: Utilization
      metadata:
        value: "70"    # Scale up at 70% CPU
    - type: memory
      metricType: Utilization
      metadata:
        value: "80"    # Scale up at 80% Memory

How it works:

CPU < 70%: Maintain current replicas
CPU > 70%: Add replicas
Memory > 80%: Add replicas (WebSocket connections consume memory)
Both triggers: Scale based on whichever threshold is hit first

Monitoring Autoscaling

Check ScaledObjects

# View ScaledObjects status
kubectl get scaledobjects -n datastream

# Example output:
# NAME                    SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   READY   ACTIVE
# consumer-scaledobject   apps/v1.Deployment   consumer          1     5     True    False
# backend-scaledobject    apps/v1.Deployment   backend           1     5     True    True

Check HPA (created by KEDA)

# View HPA metrics
kubectl get hpa -n datastream

# Example output:
# NAME                             REFERENCE             TARGETS                         MINPODS   MAXPODS   REPLICAS
# keda-hpa-backend-scaledobject    Deployment/backend    cpu: 42%/70%, memory: 28%/80%   1         5         1
# keda-hpa-consumer-scaledobject   Deployment/consumer   0/100 (avg)                     1         5         1

Check NATS Consumer Lag

# Get NATS JetStream info
kubectl exec -n datastream deploy/consumer -- \
  wget -qO- "http://nats:8222/jsz?consumers=true" | jq '.account_details[0].stream_detail[0].consumer_detail'

# Key metrics:
# - num_pending: Messages waiting to be processed
# - num_ack_pending: Messages being processed

Watch Scaling Events

# Watch HPA events
kubectl get events -n datastream --field-selector reason=SuccessfulRescale -w

# Watch pods scaling
kubectl get pods -n datastream -w

Scaling Scenarios

High CDC Volume

When database changes spike:

Debezium captures more events
NATS JetStream queue grows
Consumer lag increases (num_pending > 100)
KEDA triggers consumer scale-up
More consumers process events in parallel
Lag decreases
After cooldown, KEDA scales down

High API Traffic

When client requests spike:

More WebSocket/HTTP connections
Backend CPU/Memory increases
KEDA triggers backend scale-up
Load distributed across replicas
After traffic decreases, KEDA scales down

Configuration Reference

Consumer ScaledObject

Parameter	Value	Description
`minReplicaCount`	1	Minimum pods
`maxReplicaCount`	5	Maximum pods
`pollingInterval`	15s	Metric check interval
`cooldownPeriod`	60s	Wait before scale-down
`lagThreshold`	100	Scale up threshold
`activationLagThreshold`	10	Activation threshold

Backend ScaledObject

Parameter	Value	Description
`minReplicaCount`	1	Minimum pods
`maxReplicaCount`	5	Maximum pods
`pollingInterval`	15s	Metric check interval
`cooldownPeriod`	60s	Wait before scale-down
`cpu.value`	70%	CPU threshold
`memory.value`	80%	Memory threshold

Troubleshooting

ScaledObject Shows "Unknown"

# Check KEDA operator logs
kubectl logs -n keda deploy/keda-operator --tail=50

# Check ScaledObject status
kubectl describe scaledobject consumer-scaledobject -n datastream

Metrics Not Available

# Verify metrics-server is running
kubectl get pods -n kube-system | grep metrics-server

# Check NATS monitoring endpoint
kubectl exec -n datastream deploy/consumer -- wget -qO- http://nats:8222/healthz

Consumer Not Scaling

# Check NATS stream info
kubectl exec -n datastream deploy/consumer -- \
  wget -qO- "http://nats:8222/jsz" | jq '.streams, .consumers'

# Verify consumer name matches
# The consumer name in KEDA must match the NATS consumer name exactly

Best Practices

Set appropriate thresholds: Start conservative, tune based on actual load
Monitor cooldown periods: Avoid thrashing (rapid scale up/down)
Use resource requests/limits: KEDA CPU/Memory scaling requires these
Test scaling behavior: Simulate load before production
Set up alerts: Monitor when max replicas is reached

Architecture​

AWS EKS Setup​

Prerequisites​

Create EKS Cluster (if needed)​

Verify Connection​

Prerequisites​

Install KEDA​

Deployment​

1. Create Namespace​

2. Install AWS Load Balancer Controller​

3. Deploy Services​

4. Verify Ingress​

EKS-Specific Configuration​

Ingress with ALB​

External Secrets (AWS Secrets Manager)​

Resource Requests for EKS​

Autoscaling with KEDA​

Consumer Scaling (NATS JetStream Lag)​

Backend Scaling (CPU/Memory)​

Monitoring Autoscaling​

Check ScaledObjects​

Check HPA (created by KEDA)​

Check NATS Consumer Lag​

Watch Scaling Events​

Scaling Scenarios​

High CDC Volume​

High API Traffic​

Configuration Reference​

Consumer ScaledObject​

Backend ScaledObject​

Troubleshooting​

ScaledObject Shows "Unknown"​

Metrics Not Available​

Consumer Not Scaling​

Best Practices​

Architecture

AWS EKS Setup

Prerequisites

Create EKS Cluster (if needed)

Verify Connection

Prerequisites

Install KEDA

Deployment

1. Create Namespace

2. Install AWS Load Balancer Controller

3. Deploy Services

4. Verify Ingress

EKS-Specific Configuration

Ingress with ALB

External Secrets (AWS Secrets Manager)

Resource Requests for EKS

Autoscaling with KEDA

Consumer Scaling (NATS JetStream Lag)

Backend Scaling (CPU/Memory)

Monitoring Autoscaling

Check ScaledObjects

Check HPA (created by KEDA)

Check NATS Consumer Lag

Watch Scaling Events

Scaling Scenarios

High CDC Volume

High API Traffic

Configuration Reference

Consumer ScaledObject

Backend ScaledObject

Troubleshooting

ScaledObject Shows "Unknown"

Metrics Not Available

Consumer Not Scaling

Best Practices