Overview
SanMarcSoft services are monitored via health check endpoints, Scaleway container status checks, and Cloudflare Worker analytics.
Health Check Endpoints
Verifieddit (Scaleway Container)
1
2
| curl -s -o /dev/null -w "%{http_code}" https://verifieddit.com/
# Expected: 200
|
Stripe Backend (Scaleway Container)
1
2
| curl -s https://<stripe-backend-url>/health
# Expected: {"status": "ok"}
|
Badge Signer (Scaleway Container)
1
2
| curl -s https://<badge-signer-url>/health
# Expected: 200
|
Badges Worker (Cloudflare)
1
2
| curl -s https://verifieddit.com/api/__debug | jq .
# Expected: {"version": "...", "deployed": "..."}
|
Phenom Drop (AWS App Runner)
1
2
| curl -s -o /dev/null -w "%{http_code}" https://<phenom-drop-url>/health
# Expected: 200
|
Percy TTS (ai.matthewstevens.org)
1
2
| curl -s http://ai.matthewstevens.org:8086/health
# Expected: {"status": "ok"}
|
Scaleway Container Monitoring
Check Container Status
1
2
3
4
5
6
| SCW_TOKEN=$(pass sanmarcsoft/scaleway/api-secret)
# List all containers
curl -s -H "X-Auth-Token: ${SCW_TOKEN}" \
"https://api.scaleway.com/containers/v1beta1/regions/fr-par/containers" | \
jq '.containers[] | {name, status, domain_name, min_scale, max_scale}'
|
Container Status Values
| Status | Meaning | Action |
|---|
ready | Container is deployed and serving | Normal |
pending | Container is being deployed | Wait |
error | Container failed to deploy | Investigate (see below) |
locked | Container is locked | Contact Scaleway support |
deleting | Container is being removed | Wait |
Diagnosing Error State
1
2
3
4
| # Get detailed error info
curl -s -H "X-Auth-Token: ${SCW_TOKEN}" \
"https://api.scaleway.com/containers/v1beta1/regions/fr-par/containers/<container-id>" | \
jq '{status, error_message, description}'
|
Common error causes:
- Image not found in registry
- Port mismatch
- Entrypoint crash
- Memory exceeded during startup
Resolution: Redeploy
1
2
| cd infra
pulumi up --stack <environment>
|
If still in error state, delete and recreate:
1
2
| pulumi destroy --stack <environment>
pulumi up --stack <environment>
|
Cloudflare Worker Analytics
View Worker Analytics via API
1
2
3
4
5
6
7
8
9
10
11
| CF_TOKEN=$(pass cloudflare/api-token)
ACCOUNT_ID=$(pass cloudflare/account-id)
# Get worker analytics (last 24 hours)
curl -s -X POST \
"https://api.cloudflare.com/client/v4/graphql" \
-H "Authorization: Bearer ${CF_TOKEN}" \
-H "Content-Type: application/json" \
--data '{
"query": "query { viewer { accounts(filter: {accountTag: \"'${ACCOUNT_ID}'\"}) { workersInvocationsAdaptive(limit: 10, filter: {datetime_geq: \"'$(date -u -d '-24 hours' +%Y-%m-%dT%H:%M:%SZ)'\", datetime_leq: \"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'\"}) { sum { requests errors subrequests } dimensions { scriptName status } } } } }"
}' | jq '.data.viewer.accounts[0].workersInvocationsAdaptive'
|
Tail Worker Logs (Real-time)
1
2
| npx wrangler tail verifieddit-badges
npx wrangler tail verifieddit-badges --status error
|
AWS App Runner Monitoring
Check Service Status
1
2
3
| SERVICE_ARN=$(pass aws/phenom-drop/apprunner-arn)
aws apprunner describe-service --service-arn ${SERVICE_ARN} \
--query 'Service.{Status:Status,URL:ServiceUrl,Updated:UpdatedAt,Running:InstanceConfiguration}'
|
View Service Logs
1
2
| aws apprunner list-operations --service-arn ${SERVICE_ARN} \
--query 'OperationSummaryList[0:5]'
|
Automated Monitoring Checklist
Run this script to check all services:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| #!/bin/bash
echo "=== SanMarcSoft Service Health Check ==="
echo ""
echo -n "Verifieddit: "
curl -s -o /dev/null -w "%{http_code}" https://verifieddit.com/
echo ""
echo -n "Badges Worker: "
curl -s -o /dev/null -w "%{http_code}" https://verifieddit.com/api/__debug
echo ""
echo -n "Phenom Drop: "
curl -s -o /dev/null -w "%{http_code}" https://<phenom-drop-url>/health
echo ""
echo -n "Percy TTS: "
curl -s -o /dev/null -w "%{http_code}" http://ai.matthewstevens.org:8086/health
echo ""
echo ""
echo "=== Scaleway Containers ==="
SCW_TOKEN=$(pass sanmarcsoft/scaleway/api-secret)
curl -s -H "X-Auth-Token: ${SCW_TOKEN}" \
"https://api.scaleway.com/containers/v1beta1/regions/fr-par/containers" | \
jq -r '.containers[] | "\(.name): \(.status)"'
|
Troubleshooting
- Health check timeout: Service may be in cold start. Wait 10 seconds and retry.
- 502 from Scaleway: Container is starting or has crashed. Check container status and logs.
- Worker returning old data: Version mismatch. Check with
/__debug endpoint. See Cloudflare Workers SOP. - All services down: Check Cloudflare status page, Scaleway status page, and AWS status page.