Common Issues and Fixes

Frequently encountered problems and resolution steps: Bun vs Node.js, fakeroot, FOD hash, CF Worker versioning, Scaleway errors, Clerk SSL, tsconfig leaks

Overview

This page documents common issues encountered during SanMarcSoft development and operations, with root causes and resolution steps.


1. Bun vs Node.js Runtime Mismatch

Symptoms

  • Service crashes on startup with “bun: not found” or “node: not found”
  • Different behavior between local development and container

Root Cause

Nix-built containers may not have the expected runtime in PATH. Shebangs like #!/usr/bin/env bun fail because /usr/bin/env does not exist in Nix sandbox.

Fix

Use absolute Nix store paths in entrypoints:

1
2
3
4
# Instead of a shebang script
entrypoint = pkgs.writeShellScriptBin "service-name" ''
  exec ${pkgs.bun}/bin/bun ${appFiles}/app/index.js
'';

For container config:

1
2
3
config = {
  Cmd = [ "${pkgs.bun}/bin/bun" "${appFiles}/app/index.js" ];
};

2. fakeroot Error on macOS

Symptoms

error: builder for '/nix/store/...-...oci-image.drv' failed: fakeroot: not found

Root Cause

pkgs.dockerTools.buildLayeredImage uses fakeroot internally, which is not available on macOS (aarch64-darwin).

Fix

Build for the target architecture explicitly:

1
2
# This delegates the build to the x86_64-linux builder (OrbStack)
nix build .#packages.x86_64-linux.oci-image

Ensure the flake outputs are under packages.x86_64-linux, not packages.aarch64-darwin:

1
2
3
4
5
# Correct
system = "x86_64-linux";

# Wrong (will fail on macOS with fakeroot error)
system = builtins.currentSystem;  # resolves to aarch64-darwin on Mac

3. FOD Hash Mismatch

Symptoms

error: hash mismatch in fixed-output derivation
  specified: sha256-AAAA...
  got:       sha256-BBBB...

Root Cause

The Fixed-Output Derivation hash no longer matches the build output. This happens when:

  • Dependencies changed (bun.lock, go.sum, package.json)
  • Build-time environment variables changed
  • Source files included in the FOD changed

Fix

  1. Set hash to pkgs.lib.fakeHash in flake.nix
  2. Build and capture the correct hash:
    1
    
    nix build .#packages.x86_64-linux.oci-image 2>&1 | grep "got:"
    
  3. Replace fakeHash with the new hash
  4. Rebuild

4. Cloudflare Worker Versioning Issues

Symptoms

  • Deploy succeeds but old code still serves
  • Dashboard shows multiple versions with gradual rollout percentages
  • /__debug endpoint returns old version timestamp

Root Cause

Cloudflare Worker versioning system creates multiple versions. New deployments may create a new version at 0% traffic instead of replacing the active version.

Fix

  1. Check versions:

    1
    2
    3
    4
    
    CF_TOKEN=$(pass cloudflare/api-token)
    ACCOUNT_ID=$(pass cloudflare/account-id)
    curl -s "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/workers/scripts/<worker-name>/versions" \
      -H "Authorization: Bearer ${CF_TOKEN}" | jq '.result'
    
  2. Set 100% traffic to the latest version via dashboard or API

  3. If stuck, delete and recreate the worker:

    1
    2
    3
    4
    
    curl -s -X DELETE \
      "https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/workers/scripts/<worker-name>" \
      -H "Authorization: Bearer ${CF_TOKEN}"
    npx wrangler deploy
    
  4. Re-set all secrets after recreation (secrets are deleted with the worker)


5. Scaleway Container Stuck in Error State

Symptoms

  • Container status shows “error”
  • Requests return 502 or timeout
  • Repeated pulumi up does not fix it

Root Cause

Common causes:

  • Image not found in registry (wrong tag, deleted image)
  • Port mismatch between container config and application
  • Entrypoint script crash (missing binary, permission error)
  • Memory exhausted during startup

Fix

  1. Check error message:

    1
    2
    3
    
    SCW_TOKEN=$(pass sanmarcsoft/scaleway/api-secret)
    curl -s -H "X-Auth-Token: ${SCW_TOKEN}" \
      "https://api.scaleway.com/containers/v1beta1/regions/fr-par/containers/<id>" | jq '.error_message'
    
  2. Verify image exists:

    1
    2
    
    skopeo inspect "docker://rg.fr-par.scw.cloud/sanmarcsoft/<name>:<tag>" \
      --creds "nologin:$(pass sanmarcsoft/scaleway/api-secret)"
    
  3. Delete and recreate via Pulumi:

    1
    2
    
    pulumi destroy --stack <env>
    pulumi up --stack <env>
    

6. Clerk SSL Certificate Not Provisioning

Symptoms

  • Clerk dashboard shows “DNS verification pending” for custom domain
  • Custom domain returns SSL error
  • Works on default Clerk domain but not custom domain

Root Cause

Cloudflare DNS records for Clerk are set with proxied: true. Clerk needs direct access to the DNS records to provision SSL certificates.

Fix

Set all 5 Clerk DNS records to proxied: false:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CF_TOKEN=$(pass cloudflare/api-token)
ZONE_ID=$(pass cloudflare/zones/verifieddit-com)

# List Clerk records
curl -s "https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records?name=clerk" \
  -H "Authorization: Bearer ${CF_TOKEN}" | jq '.result[] | {id, name, proxied}'

# Update each record to proxied: false
RECORD_ID="<record-id>"
curl -s -X PATCH \
  "https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records/${RECORD_ID}" \
  -H "Authorization: Bearer ${CF_TOKEN}" \
  -H "Content-Type: application/json" \
  --data '{"proxied": false}'

After fixing, wait up to 24 hours for Clerk to verify and provision the SSL certificate.


7. tsconfig Test File Leaks into Build

Symptoms

  • tsc --noEmit fails with errors in test files
  • Build includes test files that should be excluded
  • Type errors from test utilities (jest, vitest) in production build

Root Cause

tsconfig.json does not properly exclude test files, or a wildcard include ("include": ["src/**/*"]) captures test files.

Fix

Add explicit exclusions to tsconfig.json:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "compilerOptions": {
    "strict": true,
    "noEmit": true
  },
  "include": ["src/**/*"],
  "exclude": [
    "node_modules",
    "dist",
    "**/*.test.ts",
    "**/*.test.tsx",
    "**/*.spec.ts",
    "**/*.spec.tsx",
    "src/__tests__/**",
    "src/__mocks__/**",
    "vitest.config.ts",
    "jest.config.ts"
  ]
}

Or create a separate tsconfig.build.json for production builds:

1
2
3
4
{
  "extends": "./tsconfig.json",
  "exclude": ["**/*.test.*", "**/*.spec.*", "src/__tests__", "src/__mocks__"]
}

Then build with:

1
tsc --project tsconfig.build.json --noEmit

8. Docker Network DNS Failure (NAS)

Symptoms

  • Inter-container communication fails with connection refused or DNS resolution error
  • curl http://container-name:port fails from another container
  • HTTP 502 from reverse proxy

Root Cause

Containers on the NAS are on different Docker networks. Docker DNS only resolves container names within the same network.

Fix

Ensure all related containers are on the same network:

1
2
3
4
5
6
7
8
9
# Create network if it doesn't exist
docker network create phenom-net 2>/dev/null || true

# Connect existing containers
docker network connect phenom-net container-1
docker network connect phenom-net container-2

# Or start containers with --network flag
docker run --network phenom-net ...

This was identified as a root cause for 502 errors in the Phenom Drop ecosystem. Added as preflight check #11.


9. Synology Proxy vs Container Nginx Confusion

Symptoms

  • Debugging nginx configuration but changes have no effect
  • Wrong nginx version reported
  • Reverse proxy rules seem to not apply

Root Cause

The Synology NAS has its own reverse proxy (DSM built-in) in addition to any nginx running inside Docker containers. Changes to the container’s nginx have no effect if the Synology proxy is the one handling the request.

Fix

  1. Identify which nginx is serving the request:

    1
    
    curl -sI https://site.matthewstevens.org | grep "server:"
    
  2. If the Synology proxy is involved, configure it in DSM > Control Panel > Application Portal > Reverse Proxy

  3. If the container nginx should handle directly, ensure the container port is exposed and the Synology proxy is not intercepting the traffic


10. PostCSS in Nix Sandbox (Hugo Docs)

Symptoms

  • Hugo build fails with “PostCSS not found”
  • Docsy theme fails to compile SCSS/CSS

Root Cause

The Nix sandbox does not have /usr/bin/env, so PostCSS CLI shebangs fail. Additionally, npm-installed binaries in the sandbox may have broken shebangs.

Workaround

Create a wrapper script in the Nix build:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
buildPhase = ''
  npm install --no-save postcss postcss-cli autoprefixer
  export NODE_PATH=$PWD/node_modules

  # Create wrapper
  WRAPPER=$TMPDIR/bin/postcss
  mkdir -p $TMPDIR/bin
  NODE_BIN=$(command -v node)
  BASH_BIN=$(command -v bash)
  echo "#!$BASH_BIN" > $WRAPPER
  echo "exec $NODE_BIN $PWD/node_modules/postcss-cli/index.js \"\$@\"" >> $WRAPPER
  chmod +x $WRAPPER
  export PATH=$TMPDIR/bin:$PATH
'';

Status: This is a known workaround. The Hugo docs build in verifieddit-www currently skips PostCSS processing.