Skip to main content
btheo.com btheo.com > press start to play
NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS NEW POST: NODE.JS SECURITY 2025 OPEN FOR FREELANCE 10+ YEARS EXP REACT × NODE × AWS
CI/CD 6 MIN READ

My CI/CD Pipeline: GitHub Actions Zero-Downtime

WARNING · DRAGON AHEAD

Your deploy takes 15 minutes. You hold your breath. Tests fail. You revert. Friday night.

A real CI/CD pipeline is tight: lint in parallel, test in parallel, build once, push once, deploy with zero downtime. When it fails, it fails fast. When it succeeds, you’re proud.

Here’s the pipeline I actually use. Every company I’ve worked at, this works.

The Pipeline Stages

1. Lint & Format (parallel)
2. Test (parallel, matrix strategy)
3. Build Docker Image (once, cached layers)
4. Push to Registry
5. Deploy (pull image → health check → swap → old container stop)
6. Notify on Success/Failure

Total time: 8-12 minutes. A single lint or test failure stops everything. One issue per failure message.

The GitHub Actions Workflow

Create .github/workflows/deploy.yml:

name: CI/CD
on:
push:
branches:
- main
- develop
pull_request:
branches:
- main
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
NODE_VERSION: "20"
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v2
with:
version: 9
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: "pnpm"
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Lint
run: pnpm lint
- name: Format check
run: pnpm format:check
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: ["18", "20"]
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v2
with:
version: 9
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: "pnpm"
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Run tests
run: pnpm test --coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
build:
needs: [lint, test]
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name == 'push' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Deploy to production
env:
DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}
DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }}
IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
run: |
mkdir -p ~/.ssh
echo "$DEPLOY_KEY" > ~/.ssh/deploy_key
chmod 600 ~/.ssh/deploy_key
ssh-keyscan -H $DEPLOY_HOST >> ~/.ssh/known_hosts
ssh -i ~/.ssh/deploy_key deploy@$DEPLOY_HOST \
"cd /app && ./deploy.sh $IMAGE_TAG"
- name: Notify on failure
if: failure()
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: 'Deploy failed. Check logs: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
})

The Dockerfile: Multi-Stage Build

Layer caching is the secret. Install deps once. If code changes, skip the install step.

FROM node:20-alpine AS base
RUN npm install -g pnpm
WORKDIR /app
FROM base AS dependencies
COPY pnpm-lock.yaml .
COPY package.json .
RUN pnpm install --frozen-lockfile --prod
FROM base AS builder
COPY pnpm-lock.yaml .
COPY package.json .
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm build
FROM base AS runtime
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json .
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"
CMD ["node", "dist/server.js"]

Why this works:

  • ✔ Dependencies layer cached across builds
  • ✔ Build layer with dev deps doesn’t ship
  • ✔ Final image ~200MB (node_modules + dist only)
  • ✔ Health check catches startup failures

Zero-Downtime Deploy Script

On your server, ./deploy.sh:

#!/bin/bash
set -e
IMAGE_TAG=$1
REGISTRY=ghcr.io/yourorg/yourapp
CONTAINER_NAME=myapp
PORT=3000
HEALTH_URL=http://localhost:$PORT/health
# Pull new image
docker pull $REGISTRY:$IMAGE_TAG
echo "✔ Image pulled"
# Start new container on different port
docker run -d \
--name ${CONTAINER_NAME}_new \
-p 3001:3000 \
-e DATABASE_URL=$DATABASE_URL \
-e REDIS_URL=$REDIS_URL \
$REGISTRY:$IMAGE_TAG
echo "✔ New container started on port 3001"
# Wait for health check
MAX_ATTEMPTS=30
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
if curl -f http://localhost:3001/health >/dev/null 2>&1; then
echo "✔ Health check passed"
break
fi
ATTEMPT=$((ATTEMPT + 1))
sleep 1
if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then
echo "✗ Health check failed. Cleaning up."
docker stop ${CONTAINER_NAME}_new
docker rm ${CONTAINER_NAME}_new
exit 1
fi
done
# Swap ports (or use load balancer / reverse proxy)
docker stop $CONTAINER_NAME || true
docker rm $CONTAINER_NAME || true
docker rename ${CONTAINER_NAME}_new $CONTAINER_NAME
echo "✔ Deploy complete"
docker ps --filter "name=$CONTAINER_NAME" --format "{{.Names}} {{.Status}}"

Health check is mandatory. Never flip traffic to a container you haven’t verified.

Handling Secrets

Environment variables in CI are not secrets. Secrets are encrypted.

- name: Deploy
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
REDIS_URL: ${{ secrets.REDIS_URL }}
run: ./deploy.sh

GitHub encrypts secrets at rest and in logs. Good enough.

For sensitive operations (database migrations), require manual approval:

deploy:
needs: build
environment: production
steps:
- name: Deploy
run: ./deploy.sh

Navigate to “Environments” in your repo settings, add reviewers. Deploy pauses, waits for approval.

Rollback Strategy

Keep the previous image tag. If something melts:

Terminal window
docker pull $REGISTRY:previous-stable
docker run -d --name myapp -p 3000:3000 $REGISTRY:previous-stable

Tag your images with semantic versioning: v1.2.3, v1.2.2-rc1, etc. Always know which version is live.

- name: Tag image
run: |
VERSION=$(git describe --tags --always)
docker tag $IMAGE:latest $IMAGE:$VERSION
docker push $IMAGE:$VERSION

Matrix Strategy: Test Multiple Node Versions

Your app runs on Node 18 and 20. Test both:

test:
strategy:
matrix:
node-version: ["18", "20"]

Creates two parallel test jobs. Catches version-specific bugs early.

Parallel Execution

Lint and test run simultaneously. Only build and deploy wait for both to pass. Total time saved: the length of the longest (test or lint).

jobs:
lint:
runs-on: ubuntu-latest
steps: [...]
test:
runs-on: ubuntu-latest
steps: [...]
build:
needs: [lint, test] # Wait for BOTH
steps: [...]
deploy:
needs: build # Wait for build only
steps: [...]

Fast Feedback

The pipeline is tight but verbose. Each step logs what it’s doing. When it fails:

✗ Test failed: src/auth.test.ts line 42
Expected "admin" to equal "user"
Run: npm run test -- --grep "auth" to debug locally

No guessing. No 10-minute rebuild loops. You know the exact issue in seconds.

The Real Metrics

  • ✔ Lint: 2 minutes
  • ✔ Test: 4 minutes (parallel matrix)
  • ✔ Build: 3 minutes (cached layers)
  • ✔ Push: 1 minute
  • ✔ Deploy: 2 minutes (health check)

Total: 10 minutes from push to live.

A single failure stops everything immediately. You fix it, push again, 10 minutes later you’re live. No manual deploys. No hoping it works. No Friday night fear.

That’s production engineering.

ALL POSTS →