My CI/CD Pipeline: GitHub Actions Zero-Downtime
Your deploy takes 15 minutes. You hold your breath. Tests fail. You revert. Friday night.
A real CI/CD pipeline is tight: lint in parallel, test in parallel, build once, push once, deploy with zero downtime. When it fails, it fails fast. When it succeeds, you’re proud.
Here’s the pipeline I actually use. Every company I’ve worked at, this works.
The Pipeline Stages
1. Lint & Format (parallel) ↓2. Test (parallel, matrix strategy) ↓3. Build Docker Image (once, cached layers) ↓4. Push to Registry ↓5. Deploy (pull image → health check → swap → old container stop) ↓6. Notify on Success/FailureTotal time: 8-12 minutes. A single lint or test failure stops everything. One issue per failure message.
The GitHub Actions Workflow
Create .github/workflows/deploy.yml:
name: CI/CD
on: push: branches: - main - develop pull_request: branches: - main
env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} NODE_VERSION: "20"
jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup pnpm uses: pnpm/action-setup@v2 with: version: 9
- name: Setup Node uses: actions/setup-node@v4 with: node-version: ${{ env.NODE_VERSION }} cache: "pnpm"
- name: Install dependencies run: pnpm install --frozen-lockfile
- name: Lint run: pnpm lint
- name: Format check run: pnpm format:check
test: runs-on: ubuntu-latest strategy: matrix: node-version: ["18", "20"] steps: - uses: actions/checkout@v4
- name: Setup pnpm uses: pnpm/action-setup@v2 with: version: 9
- name: Setup Node uses: actions/setup-node@v4 with: node-version: ${{ matrix.node-version }} cache: "pnpm"
- name: Install dependencies run: pnpm install --frozen-lockfile
- name: Run tests run: pnpm test --coverage
- name: Upload coverage uses: codecov/codecov-action@v3 with: files: ./coverage/coverage-final.json
build: needs: [lint, test] runs-on: ubuntu-latest permissions: contents: read packages: write outputs: image-tag: ${{ steps.meta.outputs.tags }} steps: - uses: actions/checkout@v4
- name: Set up Docker Buildx uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry uses: docker/login-action@v3 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=semver,pattern={{version}} type=semver,pattern={{major}}.{{minor}} type=sha
- name: Build and push Docker image uses: docker/build-push-action@v5 with: context: . push: ${{ github.event_name == 'push' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max
deploy: needs: build runs-on: ubuntu-latest if: github.event_name == 'push' && github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v4
- name: Deploy to production env: DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }} DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }} IMAGE_TAG: ${{ needs.build.outputs.image-tag }} run: | mkdir -p ~/.ssh echo "$DEPLOY_KEY" > ~/.ssh/deploy_key chmod 600 ~/.ssh/deploy_key ssh-keyscan -H $DEPLOY_HOST >> ~/.ssh/known_hosts
ssh -i ~/.ssh/deploy_key deploy@$DEPLOY_HOST \ "cd /app && ./deploy.sh $IMAGE_TAG"
- name: Notify on failure if: failure() uses: actions/github-script@v7 with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: 'Deploy failed. Check logs: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}' })The Dockerfile: Multi-Stage Build
Layer caching is the secret. Install deps once. If code changes, skip the install step.
FROM node:20-alpine AS baseRUN npm install -g pnpmWORKDIR /app
FROM base AS dependenciesCOPY pnpm-lock.yaml .COPY package.json .RUN pnpm install --frozen-lockfile --prod
FROM base AS builderCOPY pnpm-lock.yaml .COPY package.json .RUN pnpm install --frozen-lockfileCOPY . .RUN pnpm build
FROM base AS runtimeCOPY --from=dependencies /app/node_modules ./node_modulesCOPY --from=builder /app/dist ./distCOPY --from=builder /app/package.json .
EXPOSE 3000HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"
CMD ["node", "dist/server.js"]Why this works:
- ✔ Dependencies layer cached across builds
- ✔ Build layer with dev deps doesn’t ship
- ✔ Final image ~200MB (node_modules + dist only)
- ✔ Health check catches startup failures
Zero-Downtime Deploy Script
On your server, ./deploy.sh:
#!/bin/bashset -e
IMAGE_TAG=$1REGISTRY=ghcr.io/yourorg/yourappCONTAINER_NAME=myappPORT=3000HEALTH_URL=http://localhost:$PORT/health
# Pull new imagedocker pull $REGISTRY:$IMAGE_TAGecho "✔ Image pulled"
# Start new container on different portdocker run -d \ --name ${CONTAINER_NAME}_new \ -p 3001:3000 \ -e DATABASE_URL=$DATABASE_URL \ -e REDIS_URL=$REDIS_URL \ $REGISTRY:$IMAGE_TAG
echo "✔ New container started on port 3001"
# Wait for health checkMAX_ATTEMPTS=30ATTEMPT=0while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do if curl -f http://localhost:3001/health >/dev/null 2>&1; then echo "✔ Health check passed" break fi ATTEMPT=$((ATTEMPT + 1)) sleep 1 if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then echo "✗ Health check failed. Cleaning up." docker stop ${CONTAINER_NAME}_new docker rm ${CONTAINER_NAME}_new exit 1 fidone
# Swap ports (or use load balancer / reverse proxy)docker stop $CONTAINER_NAME || truedocker rm $CONTAINER_NAME || truedocker rename ${CONTAINER_NAME}_new $CONTAINER_NAME
echo "✔ Deploy complete"docker ps --filter "name=$CONTAINER_NAME" --format "{{.Names}} {{.Status}}"Health check is mandatory. Never flip traffic to a container you haven’t verified.
Handling Secrets
Environment variables in CI are not secrets. Secrets are encrypted.
- name: Deploy env: DATABASE_URL: ${{ secrets.DATABASE_URL }} REDIS_URL: ${{ secrets.REDIS_URL }} run: ./deploy.shGitHub encrypts secrets at rest and in logs. Good enough.
For sensitive operations (database migrations), require manual approval:
deploy: needs: build environment: production steps: - name: Deploy run: ./deploy.shNavigate to “Environments” in your repo settings, add reviewers. Deploy pauses, waits for approval.
Rollback Strategy
Keep the previous image tag. If something melts:
docker pull $REGISTRY:previous-stabledocker run -d --name myapp -p 3000:3000 $REGISTRY:previous-stableTag your images with semantic versioning: v1.2.3, v1.2.2-rc1, etc. Always know which version is live.
- name: Tag image run: | VERSION=$(git describe --tags --always) docker tag $IMAGE:latest $IMAGE:$VERSION docker push $IMAGE:$VERSIONMatrix Strategy: Test Multiple Node Versions
Your app runs on Node 18 and 20. Test both:
test: strategy: matrix: node-version: ["18", "20"]Creates two parallel test jobs. Catches version-specific bugs early.
Parallel Execution
Lint and test run simultaneously. Only build and deploy wait for both to pass. Total time saved: the length of the longest (test or lint).
jobs: lint: runs-on: ubuntu-latest steps: [...]
test: runs-on: ubuntu-latest steps: [...]
build: needs: [lint, test] # Wait for BOTH steps: [...]
deploy: needs: build # Wait for build only steps: [...]Fast Feedback
The pipeline is tight but verbose. Each step logs what it’s doing. When it fails:
✗ Test failed: src/auth.test.ts line 42 Expected "admin" to equal "user"
Run: npm run test -- --grep "auth" to debug locallyNo guessing. No 10-minute rebuild loops. You know the exact issue in seconds.
The Real Metrics
- ✔ Lint: 2 minutes
- ✔ Test: 4 minutes (parallel matrix)
- ✔ Build: 3 minutes (cached layers)
- ✔ Push: 1 minute
- ✔ Deploy: 2 minutes (health check)
Total: 10 minutes from push to live.
A single failure stops everything immediately. You fix it, push again, 10 minutes later you’re live. No manual deploys. No hoping it works. No Friday night fear.
That’s production engineering.