Engineering

From 17 minutes to 2 minutes: how we made bulk credential imports 6x faster

May 8, 2026·By Rajat Ravinder Varuni, Founder

A walkthrough of seven patterns that took our worst-case onboarding from 17 minutes down to 2, plus the observability we built to prove it worked.

The setup

CertScore lets users import their entire Credly portfolio in one shot. A user pastes their profile URL, the server fetches all their badges, verifies ownership cryptographically (Open Badges Infrastructure email-hash matching), and saves the verified credentials.

Two real users, four days apart, told the story:

User	Date	Credentials	End-to-end time
User A	May 4	16	17 minutes 2 seconds
User B	May 8	12	2 minutes 7 seconds

Per-cert throughput went from 64 seconds to 10.6 seconds. A 6x improvement.

This post walks through each optimization that landed between those two signups, plus the observability layer we built so we could actually see the win. Most of the patterns are generic. The CertScore-specific bits are flagged.

Pattern 1: parallelize wherever wall-clock matters more than server load

The problem

For ownership verification, the server fetches the OBI badge assertion for the first 5 badges and SHA-256-hashes the recipient identity against each of the user's verified emails. Originally these ran sequentially in a loop:

typescript

for (const badge of badgesToCheck) {
  const obiData = await fetch(
    `https://api.credly.com/v1/obi/v2/badge_assertions/${badge.id}`
  );
  // hash + compare
  if (matched) break;
}

Five round-trips at 1 to 3 seconds each = 5 to 15 seconds of opaque waiting. The "break on first match" optimization felt clever but it was actually hurting us. Because the work was sequential, every additional badge added latency.

The fix

Wrap the work in Promise.all and run all five in parallel. Wall-clock time becomes O(slowest) instead of O(sum):

typescript

const ownershipResults = await Promise.all(
  badgesToCheck.map(async (badge) => {
    const obiData = await fetchObi(badge.id);
    const { matched, matchedEmail } = await checkEmailMatch(
      allEmails,
      obiData.recipient.identity,
      obiData.recipient.salt,
    );
    return matched
      ? { badgeId: badge.id, matchedEmail, recipient: obiData.recipient }
      : null;
  })
);

const ownershipMatch = ownershipResults.find((r) => r !== null);

Result: 1 to 3 seconds instead of 5 to 15. The "break on first match" optimization we used to do sequentially is moot. The third-party API is cheap and we care about UX latency, not server load.

When this pattern applies

Any time you have N independent network calls and the user is waiting. The mental cost of Promise.all is near zero. The wall-clock win is large. The case where it doesn't apply: when each call is expensive on the third party's side and you have a contractual obligation not to fan out.

Pattern 2: INSERT ON CONFLICT collapses N transactions into one

The problem

After ownership verification, we save N selected credentials. The original client did this:

typescript

for (const badge of selectedBadges) {
  await saveCredential(badge);  // ~1 to 2s per round-trip
  setProgress({ current: i + 1, total });
}

For 16 badges: about 30 seconds of save time, plus the overhead of each individual transaction. Smart-duplicate detection added more pain. Each save had to fetch existing credentials, pattern-match by issuer + name (with abbreviation awareness), decide insert vs update, run the write, and wait for the trigger that recomputes total points. Repeated 16 times.

The fix: a single Postgres function with INSERT ON CONFLICT

sql

CREATE OR REPLACE FUNCTION bulk_save_credentials(
  p_user_id uuid,
  p_badges jsonb
) RETURNS TABLE (cred_id uuid, action text, matched_input_index int)
LANGUAGE plpgsql SECURITY DEFINER
SET search_path = public, pg_temp
AS $$
BEGIN
  RETURN QUERY
  INSERT INTO credentials (
    user_id, issuer, credential_name, tier, base_points,
    issued_date, expiry_date, verification_method, verification_status,
    badge_url, image_url, raw_extraction_data
  )
  SELECT
    p_user_id,
    b->>'issuer',
    b->>'credential_name',
    b->>'tier',
    (b->>'base_points')::int,
    NULLIF(b->>'issued_date','')::date,
    NULLIF(b->>'expiry_date','')::date,
    b->>'verification_method',
    b->>'verification_status',
    b->>'badge_url',
    b->>'image_url',
    b->'raw_extraction_data'
  FROM jsonb_array_elements(p_badges) WITH ORDINALITY AS t(b, ord)
  ON CONFLICT (user_id, lower(trim(issuer)), lower(trim(credential_name)))
  DO UPDATE SET
    expiry_date = EXCLUDED.expiry_date,
    verification_status = EXCLUDED.verification_status,
    updated_at = now()
  RETURNING credentials.id,
    CASE WHEN xmax = 0 THEN 'inserted' ELSE 'updated' END,
    (ord - 1)::int;
END;
$$;

Two tricks worth knowing:

The unique constraint is what enables ON CONFLICT. You need an index on the conflict-target columns, otherwise Postgres rejects the statement:
sql
```
CREATE UNIQUE INDEX credentials_user_issuer_name_unique
  ON credentials (user_id, lower(trim(issuer)), lower(trim(credential_name)));
```
xmax = 0 distinguishes insert vs update on the same row. After an upsert, you can't tell whether each row was inserted or updated unless you check xmax. Postgres sets xmax = 0 for new inserts and a non-zero transaction id for updates.

Now 16 badges save in one round-trip, in one transaction, in one statement. Atomicity is free.

Watch out: WITH ORDINALITY returns bigint

A subtle gotcha that cost me 20 minutes of debugging. jsonb_array_elements WITH ORDINALITY returns (element jsonb, ordinality bigint). If your function declares matched_input_index int and you return ord directly:

ERROR: 42804: structure of query does not match function result type
DETAIL: Returned type bigint does not match expected type integer in column 3.

Cast it explicitly: (ord - 1)::int. Postgres won't auto-narrow.

Pattern 3: statement-level triggers, not row-level, for aggregations

The problem

User points are aggregated from credentials. Every credential write fires a row-level trigger that recomputes users.total_points:

sql

CREATE TRIGGER trigger_update_points
AFTER INSERT OR UPDATE OR DELETE ON credentials
FOR EACH ROW
EXECUTE FUNCTION update_user_total_points();

For 16 inserts in a bulk save, this trigger fires 16 times. Each time, it re-aggregates the same user's credentials. The Nth firing produces the same total once the dust settles, but the N-1 intermediate aggregations are wasted work.

The fix: convert to statement-level using transition tables

sql

CREATE OR REPLACE FUNCTION update_user_total_points_statement()
RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
  WITH affected_users AS (
    SELECT DISTINCT user_id FROM new_table
    UNION
    SELECT DISTINCT user_id FROM old_table
  )
  UPDATE users
  SET total_points = (
    SELECT COALESCE(SUM(base_points), 0)
    FROM credentials
    WHERE user_id = users.id
      AND verification_status IN ('verified', 'expired')
  )
  WHERE id IN (SELECT user_id FROM affected_users);
  RETURN NULL;
END;
$$;

-- One trigger per event type, because transition table availability differs:
CREATE TRIGGER trigger_update_points_insert
AFTER INSERT ON credentials
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();

CREATE TRIGGER trigger_update_points_update
AFTER UPDATE ON credentials
REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();

CREATE TRIGGER trigger_update_points_delete
AFTER DELETE ON credentials
REFERENCING OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE FUNCTION update_user_total_points_statement();

Saves 1 to 3 seconds on a 16-row bulk save. The trigger now fires once per statement, not once per row.

Why three separate triggers?

You'd expect one trigger handling all three event types, but transition table availability differs:

INSERT only has NEW TABLE (no rows existed before)
DELETE only has OLD TABLE (no rows exist after)
UPDATE has both

Postgres rejects a single trigger that references unavailable transition tables. Three triggers, one per event, sharing one function.

Pattern 4: single batched audit event for batch operations

Every credential creation writes to audit_log. With a 16-cert import: 16 audit_log inserts, 16 trigger firings that post 16 messages to Slack, a wall of repetitive entries that drown out signal.

When the bulk path runs, fire ONE batched event:

typescript

logAuditEvent(supabase, {
  actorId: userId,
  action: 'credential.bulk_create',
  category: 'credential',
  entityType: 'user',
  entityId: userId,
  metadata: {
    source: 'bulk-save-selected-credentials',
    total: 16,
    inserted: 12,
    updated: 4,
    skill_links: 102,
    points_before: 0,
    points_after: 970,
    points_gained: 970,
  },
});

One row in audit_log, one Slack post, one cleaner timeline. Audit_log is a story for humans. Don't make humans read 16 lines that describe the same story.

Pattern 5: NDJSON streaming for long-running operations

The problem

Even after all the speedups, the user still sees an opaque spinner for 2 to 3 seconds during profile fetch + ownership verification. We have rich progress information server-side. We just don't expose it to the UI.

The fix: stream progress events as newline-delimited JSON

Why NDJSON over Server-Sent Events (SSE)? NDJSON is just newline-delimited JSON. Easier to parse. Easier to cross-platform. SSE has the data: and event: framing which adds nothing meaningful when both ends are your own code.

Server-side wrapper:

typescript

serve(async (req) => {
  const isStreaming = new URL(req.url).searchParams.get('stream') === '1';

  if (!isStreaming) {
    return runImport(req, () => {});  // existing flow, no progress
  }

  const stream = new ReadableStream({
    async start(controller) {
      const send = (event) =>
        controller.enqueue(new TextEncoder().encode(JSON.stringify(event) + '\n'));

      try {
        const response = await runImport(req, send);
        const body = await response.json();
        send({ type: 'complete', status: response.status, data: body });
      } catch (e) {
        send({ type: 'error', message: e.message });
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'application/x-ndjson',
      'Cache-Control': 'no-cache',
      'X-Accel-Buffering': 'no',  // tells nginx-style proxies not to buffer
    },
  });
});

Inside runImport, the existing logic gets send() calls at milestones:

typescript

send({ type: 'progress', stage: 'fetching_profile' });
const profile = await fetch(profileUrl);
send({
  type: 'progress',
  stage: 'profile_loaded',
  totalBadges: profile.length,
});

let done = 0;
const results = await Promise.all(
  badges.map(async (b) => {
    const result = await verifyOwnership(b);
    done++;
    send({
      type: 'progress',
      stage: 'verifying_ownership',
      current: done,
      total: badges.length,
    });
    return result;
  })
);

Note the closure-captured counter. Each parallel promise increments and emits as it finishes. Progress events arrive in completion order, not input order. For UX this is exactly what you want: "verified 1 out of 5", "verified 2 out of 5", as fast as the server can confirm.

Web client

typescript

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() ?? '';  // keep incomplete trailing line
  for (const line of lines) {
    if (line.trim()) onProgress(JSON.parse(line));
  }
}

if (buffer.trim()) onProgress(JSON.parse(buffer));

The buffer-and-split-on-newline pattern is critical. Chunks rarely arrive on JSON-line boundaries. Hold the trailing partial line until the next chunk completes it.

Mobile (React Native): use XMLHttpRequest, not fetch

fetch().body.getReader() works on RN 0.81 in theory, but its chunking behavior varies across Hermes vs JSC and across iOS vs Android. The reliable cross-platform pattern is XHR with onprogress:

typescript

const xhr = new XMLHttpRequest();
xhr.open('POST', url);
xhr.setRequestHeader('Authorization', `Bearer ${token}`);
xhr.setRequestHeader('Content-Type', 'application/json');

let lastIndex = 0;
let buffer = '';

xhr.onprogress = () => {
  const newText = xhr.responseText.substring(lastIndex);
  lastIndex = xhr.responseText.length;
  buffer += newText;
  const lines = buffer.split('\n');
  buffer = lines.pop() ?? '';
  for (const line of lines) {
    if (line.trim()) onProgress(JSON.parse(line));
  }
};

xhr.onload = () => {
  if (buffer.trim()) onProgress(JSON.parse(buffer));
  resolve();
};

xhr.send(JSON.stringify(body));

XHR has been stable since RN 0.20 and onprogress fires incrementally with partial responseText on every modern RN runtime. It's the closest thing to a guaranteed cross-platform streaming primitive.

Prove streaming actually streams

Bare HTTP can buffer at any layer (the runtime, the proxy, the CDN). After deploying, build a probe that records t_ms of each chunk. For a 5-badge profile, we captured 11 events across 12 chunks over 1.9 seconds, with distinct timestamps for each event:

chunk  t_ms   event
1      470    fetching_profile
3      1418   profile_loaded (totalBadges: 5)
4      1487   verifying_ownership 0/5
5      1634   verifying_ownership 1/5
6      1634   verifying_ownership 2/5
7      1636   verifying_ownership 3/5
9      1637   verifying_ownership 4/5
10     1819   verifying_ownership 5/5
10     1819   ownership_verified
11     1875   processing_badges
12     1943   complete (status 200, full payload)

12 chunks for 11 events with separated timestamps means chunks ARE flushing. Not buffered. Verified. If your probe shows all events arriving at the same timestamp at the end, something between you and the client is buffering. Common culprits: nginx without proxy_buffering off, Cloudflare without Cache-Control: no-cache, RN's response handling on certain Android builds.

Pattern 6: don't trust over-the-air updates for critical paths

After shipping the streaming code, we pushed it to mobile via Expo OTA. Both iOS and Android builds had runtimeVersion: "1" baked in. The OTA published successfully. Then we tested on a real device. The mobile call hit:

POST | 200 | https://api.example.com/functions/v1/import-credly-profile (1632ms)

No ?stream=1 query parameter. The app was running the OLD bundle.

Why

Expo's default OTA behavior:

App opens with cached (old) bundle
New bundle downloads silently in the background
New bundle activates on the NEXT app launch

So your "first relaunch after publishing" still uses the cached bundle. You need to relaunch twice (once to download, once to apply). And there's no UI feedback telling you which state you're in. The user flow looks identical between "OTA hasn't reached you yet" and "OTA reached you and is broken."

The lesson

For critical paths, ship a native build. The certainty is worth the 15 to 25 minute build cycle. Native binary. Streaming and bulk-save baked in. No cache to fight. No "next launch" lottery. OTA is great for low-stakes JS-only fixes that aren't on the critical path. For anything you're going to eyeball-verify on a device, ship native.

Pattern 7: build observability before you need it

The optimizations above are mechanical. They are true even without measurement. But without observability you'd be hoping, not knowing. Here's the layered approach we built.

Layer 1: a last_seen_at column with a trigger

sql

ALTER TABLE users ADD COLUMN last_seen_at TIMESTAMPTZ;

CREATE INDEX users_last_seen_at_idx
  ON users (last_seen_at DESC NULLS LAST);

CREATE OR REPLACE FUNCTION bump_user_last_seen()
RETURNS TRIGGER LANGUAGE plpgsql SECURITY DEFINER AS $$
BEGIN
  IF NEW.actor_id IS NOT NULL THEN
    UPDATE users
       SET last_seen_at = GREATEST(
         COALESCE(last_seen_at, '-infinity'::timestamptz),
         NEW.created_at
       )
     WHERE id = NEW.actor_id;
  END IF;
  RETURN NEW;
END;
$$;

CREATE TRIGGER trg_bump_user_last_seen
AFTER INSERT ON audit_log
FOR EACH ROW
EXECUTE FUNCTION bump_user_last_seen();

-- Backfill from history
UPDATE users u
   SET last_seen_at = a.last_event
  FROM (
    SELECT actor_id, MAX(created_at) AS last_event
    FROM audit_log
    WHERE actor_id IS NOT NULL
    GROUP BY actor_id
  ) a
 WHERE a.actor_id = u.id;

Now last_seen_at is always current. Five minutes of work. Unlocks "active in last 7 days," "dormant 14 to 30 days," "churned 30+ days" with a single column scan.

Layer 2: a daily_user_metrics table + cron

Aggregations get expensive. Pre-compute them once a day and cache the result in a daily metrics table. Schedule via pg_cron, hit an edge function via net.http_post, post a Slack digest. Result: every morning, you get the previous day's snapshot in Slack without thinking about it.

sql

SELECT cron.schedule(
  'daily-metrics-summary',
  '30 14 * * *',  -- 14:30 UTC
  $$
  SELECT net.http_post(
    url := 'https://your-project.supabase.co/functions/v1/daily-metrics-summary',
    headers := jsonb_build_object('x-cron-secret', 'YOUR_SECRET'),
    body := '{}'::jsonb
  );
  $$
);

Layer 3: track explicit logins via a database trigger

Most apps don't track "logins." They track "activity." To distinguish, hook into Supabase's auth.sessions table:

sql

CREATE OR REPLACE FUNCTION log_auth_session_start()
RETURNS TRIGGER LANGUAGE plpgsql SECURITY DEFINER AS $$
DECLARE
  v_public_user_id uuid;
BEGIN
  SELECT id INTO v_public_user_id
    FROM users
   WHERE supabase_auth_id = NEW.user_id;

  IF v_public_user_id IS NOT NULL THEN
    INSERT INTO audit_log (actor_id, actor_type, action, category, entity_type, entity_id, metadata)
    VALUES (
      v_public_user_id, 'user', 'auth.session.start', 'auth', 'session', NEW.id,
      jsonb_build_object('aal', NEW.aal::text, 'ip', NEW.ip::text)
    );
  END IF;
  RETURN NEW;
END;
$$;

CREATE TRIGGER trg_log_auth_session_start
AFTER INSERT ON auth.sessions
FOR EACH ROW
EXECUTE FUNCTION log_auth_session_start();

Now every actual sign-in (not session refresh, not background activity) gets logged. Zero client changes.

Layer 4: funnel time-to-value

For users whose first verified credential was on this date, what's the median signup-to-first-cred time?

sql

WITH first_creds AS (
  SELECT user_id, MIN(created_at) AS first_cred_at
    FROM credentials
   WHERE verification_status IN ('verified','expired')
   GROUP BY user_id
),
cohort AS (
  SELECT EXTRACT(EPOCH FROM (fc.first_cred_at - u.created_at))::int AS seconds_to_first
    FROM users u
    JOIN first_creds fc ON fc.user_id = u.id
   WHERE fc.first_cred_at >= v_day_start
     AND fc.first_cred_at < v_day_end
     AND fc.first_cred_at >= u.created_at
)
SELECT
  PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY seconds_to_first) AS p50,
  PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY seconds_to_first) AS p90,
  COUNT(*) AS cohort_size
FROM cohort;

Now you can detect activation regressions automatically. If p50 jumps from 127s to 600s, something broke.

The receipts

User B's audit_log entry confirms the path:

json

{
  "action": "credential.bulk_create",
  "metadata": {
    "source": "bulk-save-selected-credentials",
    "total": 12,
    "inserted": 12,
    "updated": 0,
    "skill_links": 102,
    "points_before": 0,
    "points_after": 970,
    "points_gained": 970
  }
}

The source: "bulk-save-selected-credentials" is the new edge function. The inserted: 12 shows the ON CONFLICT path firing as inserts. The single audit row replaced what would have been 12 individual credential.create events plus 12 Slack posts.

Takeaways

Parallelize wherever wall-clock matters more than server load.
INSERT ... ON CONFLICT with a unique index collapses N transactions into one.
Statement-level triggers fire once per statement, not once per row.
NDJSON over chunked HTTP is the simplest streaming protocol. Use XHR for React Native.
Don't trust OTA for critical paths. Native build for anything you'll eyeball-verify on a device.
Observability has tiers. Start cheap. A last_seen_at column unlocks 80% of retention questions.
Build the dashboard before you need it. Without a daily metrics table, you have anecdotes.
Prove streaming actually streams. Bare HTTP can buffer at any layer.

What I'd do differently

Add observability earlier. I built the perf improvements, then realized I couldn't measure them. Layer 1 takes five minutes. Worth doing before any optimization work, not after.
Skip OTA entirely for major changes. The 30 minutes I spent debugging why streaming wasn't working on the first device test would have been a native build. Save OTA for fixes you wouldn't even mention in a release note.
Backfill aggressively. When you ship a new metric, immediately compute it for the last 14 to 30 days. The historical view turns "interesting numbers" into a real trend you can act on.
One probe per claim. When I said "OTA is published, mobile streaming should work," I didn't actually verify it. The probe pattern (mint a token via auth.admin.generateLink + verifyOtp, hit the endpoint server-to-server, validate the chunked response) is universal and cheap. Build it the moment you make a claim that's hard to falsify by clicking around.

About CertScore

CertScore verifies professional certifications cryptographically (Open Badges email-hash matching) and ranks them on a global leaderboard. Hiring managers can ask any candidate to verify their stack in two minutes, not seventeen.

Verify my certifications