Designing Auth That Survives Real Clients

Auth looks small until the product asks it to behave like a system.

A login endpoint is easy. The trouble starts when the same backend has to serve a browser and a mobile app, support Google and LinkedIn sign-in, keep tokens away from JavaScript, verify email addresses, and stop users from reaching routes their account should not touch.

That was the shape of the auth problem on SkillBridge during HNG. The backend team built the system. I designed the architecture, wrote the design doc, and defined the API contract the team implemented.

This is the article I wish I had before writing that spec. Not a checklist of endpoints. A guide to the decisions behind them.

Start with what can go wrong

Before choosing JWTs, tables, or providers, ask a less glamorous question: what failure are you trying to prevent?

For SkillBridge, the risks were ordinary. That is what made them important.

A script injection reads tokens from browser storage
A refresh token leaks and keeps producing new access tokens
A user signs up with email, then signs in with Google and accidentally gets a second account
A new account skips onboarding and hits role specific routes directly
A password reset leaves old sessions alive on other devices

Those risks shaped the design more than any framework preference.

Problem	Design choice	What it teaches
XSS can read browser storage	Use httpOnly cookies	Keep bearer tokens out of JavaScript reach
Refresh tokens live longer	Rotate and revoke them	Make replay detectable instead of silently useful
OAuth can duplicate accounts	Link provider identities to users	Model identity providers separately from users
New users have incomplete state	Gate access through onboarding	Do not authorize an account before its role exists
Password reset should kill old sessions	Revoke refresh tokens by user	Treat password reset as a session reset too

Put JWTs in httpOnly cookies, not localStorage

The first real decision was where tokens live after login.

The common tutorial pattern returns a JWT in the response body and stores it in localStorage. It is simple, and it works right up until an XSS bug turns into account takeover.

SkillBridge stores both tokens in httpOnly cookies so the browser can send them, but page JavaScript cannot read them.

TS
res.cookie("access_token", accessToken, {
  httpOnly: true,
  secure: true,
  sameSite: "strict",
  maxAge: 15 * 60 * 1000,
});
 
res.cookie("refresh_token", refreshToken, {
  httpOnly: true,
  secure: true,
  sameSite: "strict",
  maxAge: 7 * 24 * 60 * 60 * 1000,
});

The access token lasts 15 minutes. The refresh token lasts 7 days. The response body contains the user object, not raw credentials.

That split matters.

The client still needs to know who just logged in so it can route the user. It can read role, email, and onboardingComplete from the response body. It does not need the token value.

The browser path is straightforward because browsers already understand cookies.

Mobile is where this gets annoying.

React Native and Flutter do not give you the same automatic cookie jar behavior. The backend can stay the same, but the mobile client needs interceptors that read Set-Cookie from auth responses, store cookie values in secure device storage, and attach a Cookie header on later requests.

Same API. Different client plumbing.

Refresh token rotation makes replay visible

A refresh token is more dangerous than an access token because it lives longer.

A stolen 15 minute access token is bad. A stolen 7 day refresh token is worse if the attacker can keep trading it for new access tokens.

That is why SkillBridge used refresh token rotation.

Rotation means every successful refresh consumes the old refresh token and creates a new one. The old token cannot be used again.

TXT
Client calls POST /auth/refresh with the refresh_token cookie
API hashes the presented token
API looks up that hash in refresh_tokens
API rejects the request if the token is missing, expired, or revoked
API marks the old token as revoked
API issues a new access token and a new refresh token
API stores the hash of the new refresh token
API sets both cookies again

The important part is not the endpoint. The important part is the state transition.

A normal refresh moves a token from active to revoked and creates a replacement. If the old token shows up again, something is wrong. It might be a race between two client requests. It might be theft. Either way, the system now has a signal it can respond to.

The database stores a hash of the refresh token, not the raw token.

SQL
CREATE TABLE refresh_tokens (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),
  token_hash TEXT NOT NULL UNIQUE,
  expires_at TIMESTAMPTZ NOT NULL,
  revoked BOOLEAN NOT NULL DEFAULT false,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  revoked_at TIMESTAMPTZ
);

This uses the same idea as password storage. If someone gets a database dump, plaintext refresh tokens would be immediately useful. Hashed refresh tokens force the attacker to also have the original token value.

Password reset uses the same table in a blunt but useful way: revoke every refresh token for that user. If you change your password, every other device has to log in again.

OAuth should link identities, not create duplicate users

OAuth looks like a login button. In the backend, it is identity mapping.

The awkward case is not a brand new Google user. The awkward case is someone who already registered with email and password, then clicks "Continue with Google" using the same email address.

Rejecting them is technically defensible and bad UX. Creating a second user is worse because the same person now has two SkillBridge accounts with different state.

SkillBridge used auto-linking.

TXT
Case 1: Provider account already exists
  Sign the user in.
 
Case 2: Provider account does not exist, but email exists
  Link the provider identity to the existing user.
  Sign the user in.
 
Case 3: Email does not exist
  Create a new user with is_verified true and password_hash null.
  Link the provider identity.
  Send the user to onboarding.

The schema uses a separate user_oauth_accounts table.

SQL
CREATE TABLE user_oauth_accounts (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id),
  provider TEXT NOT NULL,
  provider_user_id TEXT NOT NULL,
  provider_email TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (provider, provider_user_id)
);

Do not put google_id, linkedin_id, and the next provider directly on users. That works for one provider, then the table starts collecting provider columns forever.

A separate table keeps the model honest. A user is the person in your system. An OAuth account is one way that person proves identity.

Email signup

The user provides name, email, and password. The password is hashed. The account starts with is_verified = false until the OTP check passes.

OAuth signup

The provider supplies the profile and email. The account starts verified only if the provider confirms that the email is verified.

That last sentence is doing real work. Auto-linking depends on trusting the provider email. For Google, that means checking the provider's email verification claim, not just accepting whatever email appears in the profile payload.

If the provider cannot prove the email is verified, do not auto-link. Make the user verify ownership first.

OTP verification keeps secrets out of URLs

For email and password registration, SkillBridge used OTP verification instead of verification links.

The OTP has a short TTL, usually 5 to 15 minutes, and the database stores only a hash of it.

SQL
CREATE TABLE verification_otps (
  id UUID PRIMARY KEY,
  email TEXT NOT NULL,
  otp_hash TEXT NOT NULL,
  purpose TEXT NOT NULL,
  expires_at TIMESTAMPTZ NOT NULL,
  consumed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

The obvious alternative is a verification link:

TXT
https://api.example.com/auth/verify-email?token=abc123

That is convenient, but the token now lives in the URL. URLs end up in browser history, server access logs, analytics tools, crash reports, and sometimes Referer headers. You can reduce that risk, but you have to remember every place a URL might leak.

An OTP submitted in a form avoids that class of leak. The user receives a code, types it into the app, and the secret travels in the request body instead of the URL.

The resend endpoint also needs limits. SkillBridge capped resends at 3 attempts per hour per email. Without that, the endpoint becomes a cheap way to annoy users or burn through email provider quota.

Role selection belongs behind onboarding, not registration

SkillBridge had three account roles:

Role	What the role can reach
`candidate`	Assessment flow, candidate dashboard, verified profile
`employer`	Discovery dashboard and eligible candidate profiles
`admin`	Moderation, submission review, scoring oversight

Admin is not a public signup option. If your public registration form can create admins, your auth design has already lost the plot.

For candidates and employers, the role is selected during onboarding, not during initial registration.

There are two reasons.

First, both signup methods need to converge. An OAuth user may arrive with only a name and email. An email signup user may arrive with more form fields. Onboarding gives both users one place to choose the account shape.

Second, role selection changes authorization. The system should not pretend a user has a complete account until the role exists.

Before onboarding completes, the token payload looks conceptually like this:

JSON
{
  "sub": "user-uuid",
  "email": "user@email.com",
  "onboardingComplete": false,
  "iat": 1234567890,
  "exp": 1234568790
}

After POST /onboarding/role, the backend persists the role and reissues the token:

JSON
{
  "sub": "user-uuid",
  "role": "candidate",
  "email": "user@email.com",
  "onboardingComplete": true,
  "iat": 1234567890,
  "exp": 1234568790
}

The client can route from the user object returned by the auth response.

TXT
If onboardingComplete is false, send the user to /onboarding/role-select
If role is candidate, send the user to /dashboard
If role is employer, send the user to /discovery
If role is admin, send the user to /admin

Do not rely on client routing for security. The API still needs guards on protected routes. The client routing is for user experience. The backend authorization layer is where access is actually enforced.

The schema decisions are the architecture

Auth design often gets described as routes: register, login, refresh, logout, forgot password.

The routes matter, but the schema carries the rules.

Table or column	Design job
`users.password_hash` nullable	Allows OAuth only accounts without fake password values
`users.is_verified`	Blocks meaningful access until email verification succeeds
`user_oauth_accounts`	Lets one user sign in with multiple providers
`refresh_tokens.token_hash`	Avoids storing reusable refresh tokens in plaintext
`refresh_tokens.revoked`	Supports logout, refresh rotation, and password reset invalidation
`verification_otps.consumed_at`	Prevents the same OTP from being reused

If you cannot explain an auth rule from the schema, the rule probably lives in scattered handler logic. That is where bugs breed.

A good schema makes the safe path easy. A poor schema makes every endpoint remember one more special case.

What I would spec differently next time

The original SkillBridge spec was good enough for the team to build from. It still had gaps.

The first gap was refresh token reuse detection. Rotation was specified, but the escalation path was not. If a revoked refresh token appears again, the system should treat that as suspicious and revoke the whole token family. A minimal version needs one extra field on refresh_tokens: either family_id to group related tokens, or replaced_by_token_id to trace the chain.

The second gap was audit logging.

Every auth system should emit structured logs for authentication attempts: user id if known, email if submitted, IP address, user agent, timestamp, route, and outcome. Not for console debugging. For incident response. When someone asks "who tried to access this account?" you need better evidence than vibes and nginx logs.

The third gap was authorization depth.

SkillBridge started with RBAC, which was fine for the MVP. Candidates, employers, and admins have different surfaces, so roles are a reasonable first boundary.

But some questions are not role questions. For example: can this employer see this specific candidate? That is closer to attribute based authorization. It depends on candidate status, employer permissions, maybe subscription state, maybe moderation state. If every handler answers that question on its own, the system will drift.

The durable part is the boring part

None of the useful parts are clever. That is the point.

Durable auth is mostly boring rules applied consistently. The hard part is writing those rules down before the team has ten endpoints, two clients, and three slightly different ideas of what "logged in" means.

If you are speccing something similar, start with the failure modes. Then design the tables and token lifecycle that make those failures harder to turn into incidents.