Engineering
From 17 minutes to 2 minutes: how we made bulk credential imports 6x faster
·By Rajat Ravinder Varuni, Founder
A walkthrough of seven patterns that took our worst-case onboarding from 17 minutes down to 2, plus the observability we built to prove it worked.
The setup
CertScore lets users import their entire Credly portfolio in one shot. A user pastes their profile URL, the server fetches all their badges, verifies ownership cryptographically (Open Badges Infrastructure email-hash matching), and saves the verified credentials.
Two real users, four days apart, told the story:
| User | Date | Credentials | End-to-end time |
|---|---|---|---|
| User A | May 4 | 16 | 17 minutes 2 seconds |
| User B | May 8 | 12 | 2 minutes 7 seconds |
Per-cert throughput went from 64 seconds to 10.6 seconds. A 6x improvement.
This post walks through each optimization that landed between those two signups, plus the observability layer we built so we could actually see the win. Most of the patterns are generic. The CertScore-specific bits are flagged.
Pattern 1: parallelize wherever wall-clock matters more than server load
The problem
For ownership verification, the server fetches the OBI badge assertion for the first 5 badges and SHA-256-hashes the recipient identity against each of the user's verified emails. Originally these ran sequentially in a loop:
for (const badge of badgesToCheck) {
const obiData = await fetch(
`https://api.credly.com/v1/obi/v2/badge_assertions/${badge.id}`
);
// hash + compare
if (matched) break;
}Five round-trips at 1 to 3 seconds each = 5 to 15 seconds of opaque waiting. The "break on first match" optimization felt clever but it was actually hurting us. Because the work was sequential, every additional badge added latency.
The fix
Wrap the work in Promise.all and run all five in parallel. Wall-clock time becomes O(slowest) instead of O(sum):
const ownershipResults = await Promise.all(
badgesToCheck.map(async (badge) => {
const obiData = await fetchObi(badge.id);
const { matched, matchedEmail } = await checkEmailMatch(
allEmails,
obiData.recipient.identity,
obiData.recipient.salt,
);
return matched
? { badgeId: badge.id, matchedEmail, recipient: obiData.recipient }
: null;
})
);
const ownershipMatch = ownershipResults.find((r) => r !== null);Result: 1 to 3 seconds instead of 5 to 15. The "break on first match" optimization we used to do sequentially is moot. The third-party API is cheap and we care about UX latency, not server load.
When this pattern applies
Any time you have N independent network calls and the user is waiting. The mental cost of Promise.all is near zero. The wall-clock win is large. The case where it doesn't apply: when each call is expensive on the third party's side and you have a contractual obligation not to fan out.
Pattern 2: INSERT ON CONFLICT collapses N transactions into one
The problem
After ownership verification, we save N selected credentials. The original client did this:
for (const badge of selectedBadges) {
await saveCredential(badge); // ~1 to 2s per round-trip
setProgress({ current: i + 1, total });
}For 16 badges: about 30 seconds of save time, plus the overhead of each individual transaction. Smart-duplicate detection added more pain. Each save had to fetch existing credentials, pattern-match by issuer + name (with abbreviation awareness), decide insert vs update, run the write, and wait for the trigger that recomputes total points. Repeated 16 times.
The fix: a single Postgres function with INSERT ON CONFLICT
CREATE OR REPLACE FUNCTION bulk_save_credentials(
p_user_id uuid,
p_badges jsonb
) RETURNS TABLE (cred_id uuid, action text, matched_input_index int)
LANGUAGE plpgsql SECURITY DEFINER
SET search_path = public, pg_temp
AS $$
BEGIN
RETURN QUERY
INSERT INTO credentials (
user_id, issuer, credential_name, tier, base_points,
issued_date, expiry_date, verification_method, verification_status,
badge_url, image_url, raw_extraction_data
)
SELECT
p_user_id,
b->>'issuer',
b->>'credential_name',
b->>'tier',
(b->>'base_points')::int,
NULLIF(b->>'issued_date','')::date,
NULLIF(b->>'expiry_date','')::date,
b->>'verification_method',
b->>'verification_status',
b->>'badge_url',
b->>'image_url',
b->'raw_extraction_data'
FROM jsonb_array_elements(p_badges) WITH ORDINALITY AS t(b, ord)
ON CONFLICT (user_id, lower(trim(issuer)), lower(trim(credential_name)))
DO UPDATE SET
expiry_date = EXCLUDED.expiry_date,
verification_status = EXCLUDED.verification_status,
updated_at = now()
RETURNING credentials.id,
CASE WHEN xmax = 0 THEN 'inserted' ELSE 'updated' END,
(ord - 1)::int;
END;
$$;Two tricks worth knowing:
- The unique constraint is what enables ON CONFLICT. You need an index on the conflict-target columns, otherwise Postgres rejects the statement:sql
CREATE UNIQUE INDEX credentials_user_issuer_name_unique ON credentials (user_id, lower(trim(issuer)), lower(trim(credential_name))); - xmax = 0 distinguishes insert vs update on the same row. After an upsert, you can't tell whether each row was inserted or updated unless you check
xmax. Postgres setsxmax = 0for new inserts and a non-zero transaction id for updates.
Now 16 badges save in one round-trip, in one transaction, in one statement. Atomicity is free.
Watch out: WITH ORDINALITY returns bigint
A subtle gotcha that cost me 20 minutes of debugging. jsonb_array_elements WITH ORDINALITY returns (element jsonb, ordinality bigint). If your function declares matched_input_index int and you return ord directly:
ERROR: 42804: structure of query does not match function result type
DETAIL: Returned type bigint does not match expected type integer in column 3.Cast it explicitly: (ord - 1)::int. Postgres won't auto-narrow.
Pattern 3: statement-level triggers, not row-level, for aggregations
The problem
User points are aggregated from credentials. Every credential write fires a row-level trigger that recomputes users.total_points:
CREATE TRIGGER trigger_update_points
AFTER INSERT OR UPDATE OR DELETE ON credentials
FOR EACH ROW
EXECUTE FUNCTION update_user_total_points();For 16 inserts in a bulk save, this trigger fires 16 times. Each time, it re-aggregates the same user's credentials. The Nth firing produces the same total once the dust settles, but the N-1 intermediate aggregations are wasted work.
The fix: convert to statement-level using transition tables
CREATE OR REPLACE FUNCTION update_user_total_points_statement()
RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
WITH affected_users AS (
SELECT DISTINCT user_id FROM new_table
UNION
SELECT DISTINCT user_id FROM old_table
)
UPDATE users
SET total_points = (
SELECT COALESCE(SUM(base_points), 0)
FROM credentials
WHERE user_id = users.id
AND verification_status IN ('verified', 'expired')
)
WHERE id IN (SELECT user_id FROM affected_users);
RETURN NULL;
END;
$$;
-- One trigger per event type, because transition table availability differs:
CREATE TRIGGER trigger_update_points_insert
AFTER INSERT ON credentials
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();
CREATE TRIGGER trigger_update_points_update
AFTER UPDATE ON credentials
REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();
CREATE TRIGGER trigger_update_points_delete
AFTER DELETE ON credentials
REFERENCING OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();Saves 1 to 3 seconds on a 16-row bulk save. The trigger now fires once per statement, not once per row.
Why three separate triggers?
You'd expect one trigger handling all three event types, but transition table availability differs:
INSERTonly hasNEW TABLE(no rows existed before)DELETEonly hasOLD TABLE(no rows exist after)UPDATEhas both
Postgres rejects a single trigger that references unavailable transition tables. Three triggers, one per event, sharing one function.
Pattern 4: single batched audit event for batch operations
Every credential creation writes to audit_log. With a 16-cert import: 16 audit_log inserts, 16 trigger firings that post 16 messages to Slack, a wall of repetitive entries that drown out signal.
When the bulk path runs, fire ONE batched event:
logAuditEvent(supabase, {
actorId: userId,
action: 'credential.bulk_create',
category: 'credential',
entityType: 'user',
entityId: userId,
metadata: {
source: 'bulk-save-selected-credentials',
total: 16,
inserted: 12,
updated: 4,
skill_links: 102,
points_before: 0,
points_after: 970,
points_gained: 970,
},
});One row in audit_log, one Slack post, one cleaner timeline. Audit_log is a story for humans. Don't make humans read 16 lines that describe the same story.
Pattern 5: NDJSON streaming for long-running operations
The problem
Even after all the speedups, the user still sees an opaque spinner for 2 to 3 seconds during profile fetch + ownership verification. We have rich progress information server-side. We just don't expose it to the UI.
The fix: stream progress events as newline-delimited JSON
Why NDJSON over Server-Sent Events (SSE)? NDJSON is just newline-delimited JSON. Easier to parse. Easier to cross-platform. SSE has the data: and event: framing which adds nothing meaningful when both ends are your own code.
Server-side wrapper:
serve(async (req) => {
const isStreaming = new URL(req.url).searchParams.get('stream') === '1';
if (!isStreaming) {
return runImport(req, () => {}); // existing flow, no progress
}
const stream = new ReadableStream({
async start(controller) {
const send = (event) =>
controller.enqueue(new TextEncoder().encode(JSON.stringify(event) + '\n'));
try {
const response = await runImport(req, send);
const body = await response.json();
send({ type: 'complete', status: response.status, data: body });
} catch (e) {
send({ type: 'error', message: e.message });
}
controller.close();
},
});
return new Response(stream, {
headers: {
'Content-Type': 'application/x-ndjson',
'Cache-Control': 'no-cache',
'X-Accel-Buffering': 'no', // tells nginx-style proxies not to buffer
},
});
});Inside runImport, the existing logic gets send() calls at milestones:
send({ type: 'progress', stage: 'fetching_profile' });
const profile = await fetch(profileUrl);
send({
type: 'progress',
stage: 'profile_loaded',
totalBadges: profile.length,
});
let done = 0;
const results = await Promise.all(
badges.map(async (b) => {
const result = await verifyOwnership(b);
done++;
send({
type: 'progress',
stage: 'verifying_ownership',
current: done,
total: badges.length,
});
return result;
})
);Note the closure-captured counter. Each parallel promise increments and emits as it finishes. Progress events arrive in completion order, not input order. For UX this is exactly what you want: "verified 1 out of 5", "verified 2 out of 5", as fast as the server can confirm.
Web client
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() ?? ''; // keep incomplete trailing line
for (const line of lines) {
if (line.trim()) onProgress(JSON.parse(line));
}
}
if (buffer.trim()) onProgress(JSON.parse(buffer));The buffer-and-split-on-newline pattern is critical. Chunks rarely arrive on JSON-line boundaries. Hold the trailing partial line until the next chunk completes it.
Mobile (React Native): use XMLHttpRequest, not fetch
fetch().body.getReader() works on RN 0.81 in theory, but its chunking behavior varies across Hermes vs JSC and across iOS vs Android. The reliable cross-platform pattern is XHR with onprogress:
const xhr = new XMLHttpRequest();
xhr.open('POST', url);
xhr.setRequestHeader('Authorization', `Bearer ${token}`);
xhr.setRequestHeader('Content-Type', 'application/json');
let lastIndex = 0;
let buffer = '';
xhr.onprogress = () => {
const newText = xhr.responseText.substring(lastIndex);
lastIndex = xhr.responseText.length;
buffer += newText;
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (line.trim()) onProgress(JSON.parse(line));
}
};
xhr.onload = () => {
if (buffer.trim()) onProgress(JSON.parse(buffer));
resolve();
};
xhr.send(JSON.stringify(body));XHR has been stable since RN 0.20 and onprogress fires incrementally with partial responseText on every modern RN runtime. It's the closest thing to a guaranteed cross-platform streaming primitive.
Prove streaming actually streams
Bare HTTP can buffer at any layer (the runtime, the proxy, the CDN). After deploying, build a probe that records t_ms of each chunk. For a 5-badge profile, we captured 11 events across 12 chunks over 1.9 seconds, with distinct timestamps for each event:
chunk t_ms event
1 470 fetching_profile
3 1418 profile_loaded (totalBadges: 5)
4 1487 verifying_ownership 0/5
5 1634 verifying_ownership 1/5
6 1634 verifying_ownership 2/5
7 1636 verifying_ownership 3/5
9 1637 verifying_ownership 4/5
10 1819 verifying_ownership 5/5
10 1819 ownership_verified
11 1875 processing_badges
12 1943 complete (status 200, full payload)12 chunks for 11 events with separated timestamps means chunks ARE flushing. Not buffered. Verified. If your probe shows all events arriving at the same timestamp at the end, something between you and the client is buffering. Common culprits: nginx without proxy_buffering off, Cloudflare without Cache-Control: no-cache, RN's response handling on certain Android builds.
Pattern 6: don't trust over-the-air updates for critical paths
After shipping the streaming code, we pushed it to mobile via Expo OTA. Both iOS and Android builds had runtimeVersion: "1" baked in. The OTA published successfully. Then we tested on a real device. The mobile call hit:
POST | 200 | https://api.example.com/functions/v1/import-credly-profile (1632ms)No ?stream=1 query parameter. The app was running the OLD bundle.
Why
Expo's default OTA behavior:
- App opens with cached (old) bundle
- New bundle downloads silently in the background
- New bundle activates on the NEXT app launch
So your "first relaunch after publishing" still uses the cached bundle. You need to relaunch twice (once to download, once to apply). And there's no UI feedback telling you which state you're in. The user flow looks identical between "OTA hasn't reached you yet" and "OTA reached you and is broken."
The lesson
For critical paths, ship a native build. The certainty is worth the 15 to 25 minute build cycle. Native binary. Streaming and bulk-save baked in. No cache to fight. No "next launch" lottery. OTA is great for low-stakes JS-only fixes that aren't on the critical path. For anything you're going to eyeball-verify on a device, ship native.
Pattern 7: build observability before you need it
The optimizations above are mechanical. They are true even without measurement. But without observability you'd be hoping, not knowing. Here's the layered approach we built.
Layer 1: a last_seen_at column with a trigger
ALTER TABLE users ADD COLUMN last_seen_at TIMESTAMPTZ;
CREATE INDEX users_last_seen_at_idx
ON users (last_seen_at DESC NULLS LAST);
CREATE OR REPLACE FUNCTION bump_user_last_seen()
RETURNS TRIGGER LANGUAGE plpgsql SECURITY DEFINER AS $$
BEGIN
IF NEW.actor_id IS NOT NULL THEN
UPDATE users
SET last_seen_at = GREATEST(
COALESCE(last_seen_at, '-infinity'::timestamptz),
NEW.created_at
)
WHERE id = NEW.actor_id;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_bump_user_last_seen
AFTER INSERT ON audit_log
FOR EACH ROW
EXECUTE FUNCTION bump_user_last_seen();
-- Backfill from history
UPDATE users u
SET last_seen_at = a.last_event
FROM (
SELECT actor_id, MAX(created_at) AS last_event
FROM audit_log
WHERE actor_id IS NOT NULL
GROUP BY actor_id
) a
WHERE a.actor_id = u.id;Now last_seen_at is always current. Five minutes of work. Unlocks "active in last 7 days," "dormant 14 to 30 days," "churned 30+ days" with a single column scan.
Layer 2: a daily_user_metrics table + cron
Aggregations get expensive. Pre-compute them once a day and cache the result in a daily metrics table. Schedule via pg_cron, hit an edge function via net.http_post, post a Slack digest. Result: every morning, you get the previous day's snapshot in Slack without thinking about it.
SELECT cron.schedule(
'daily-metrics-summary',
'30 14 * * *', -- 14:30 UTC
$$
SELECT net.http_post(
url := 'https://your-project.supabase.co/functions/v1/daily-metrics-summary',
headers := jsonb_build_object('x-cron-secret', 'YOUR_SECRET'),
body := '{}'::jsonb
);
$$
);Layer 3: track explicit logins via a database trigger
Most apps don't track "logins." They track "activity." To distinguish, hook into Supabase's auth.sessions table:
CREATE OR REPLACE FUNCTION log_auth_session_start()
RETURNS TRIGGER LANGUAGE plpgsql SECURITY DEFINER AS $$
DECLARE
v_public_user_id uuid;
BEGIN
SELECT id INTO v_public_user_id
FROM users
WHERE supabase_auth_id = NEW.user_id;
IF v_public_user_id IS NOT NULL THEN
INSERT INTO audit_log (actor_id, actor_type, action, category, entity_type, entity_id, metadata)
VALUES (
v_public_user_id, 'user', 'auth.session.start', 'auth', 'session', NEW.id,
jsonb_build_object('aal', NEW.aal::text, 'ip', NEW.ip::text)
);
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_log_auth_session_start
AFTER INSERT ON auth.sessions
FOR EACH ROW
EXECUTE FUNCTION log_auth_session_start();Now every actual sign-in (not session refresh, not background activity) gets logged. Zero client changes.
Layer 4: funnel time-to-value
For users whose first verified credential was on this date, what's the median signup-to-first-cred time?
WITH first_creds AS (
SELECT user_id, MIN(created_at) AS first_cred_at
FROM credentials
WHERE verification_status IN ('verified','expired')
GROUP BY user_id
),
cohort AS (
SELECT EXTRACT(EPOCH FROM (fc.first_cred_at - u.created_at))::int AS seconds_to_first
FROM users u
JOIN first_creds fc ON fc.user_id = u.id
WHERE fc.first_cred_at >= v_day_start
AND fc.first_cred_at < v_day_end
AND fc.first_cred_at >= u.created_at
)
SELECT
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY seconds_to_first) AS p50,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY seconds_to_first) AS p90,
COUNT(*) AS cohort_size
FROM cohort;Now you can detect activation regressions automatically. If p50 jumps from 127s to 600s, something broke.
The receipts
User B's audit_log entry confirms the path:
{
"action": "credential.bulk_create",
"metadata": {
"source": "bulk-save-selected-credentials",
"total": 12,
"inserted": 12,
"updated": 0,
"skill_links": 102,
"points_before": 0,
"points_after": 970,
"points_gained": 970
}
}The source: "bulk-save-selected-credentials" is the new edge function. The inserted: 12 shows the ON CONFLICT path firing as inserts. The single audit row replaced what would have been 12 individual credential.create events plus 12 Slack posts.
Takeaways
- Parallelize wherever wall-clock matters more than server load.
INSERT ... ON CONFLICTwith a unique index collapses N transactions into one.- Statement-level triggers fire once per statement, not once per row.
- NDJSON over chunked HTTP is the simplest streaming protocol. Use XHR for React Native.
- Don't trust OTA for critical paths. Native build for anything you'll eyeball-verify on a device.
- Observability has tiers. Start cheap. A
last_seen_atcolumn unlocks 80% of retention questions. - Build the dashboard before you need it. Without a daily metrics table, you have anecdotes.
- Prove streaming actually streams. Bare HTTP can buffer at any layer.
What I'd do differently
- Add observability earlier. I built the perf improvements, then realized I couldn't measure them. Layer 1 takes five minutes. Worth doing before any optimization work, not after.
- Skip OTA entirely for major changes. The 30 minutes I spent debugging why streaming wasn't working on the first device test would have been a native build. Save OTA for fixes you wouldn't even mention in a release note.
- Backfill aggressively. When you ship a new metric, immediately compute it for the last 14 to 30 days. The historical view turns "interesting numbers" into a real trend you can act on.
- One probe per claim. When I said "OTA is published, mobile streaming should work," I didn't actually verify it. The probe pattern (mint a token via
auth.admin.generateLink+verifyOtp, hit the endpoint server-to-server, validate the chunked response) is universal and cheap. Build it the moment you make a claim that's hard to falsify by clicking around.
About CertScore
CertScore verifies professional certifications cryptographically (Open Badges email-hash matching) and ranks them on a global leaderboard. Hiring managers can ask any candidate to verify their stack in two minutes, not seventeen.
Verify my certifications