HarmonyPIR — the hint row

The client's local state after the offline phase is one long row of 2N cells, called the hint row. Half the cells hold real database indices, permuted randomly; the other half are empty (written ). The permutation is private to the client — the server never learns which cell holds which index.

The row is then split into M = 2N/T contiguous segments of T cells each. For every segment, the client stores one small value, the hint parity: the XOR of the database entries at the real indices in that segment. That's it — M hint parities total, each the size of one database row. For a balanced choice T ≈ √(2N) the hint store comes out to √N rows' worth of data.

Hint row — 2N = 16 cells, grouped into M = 4 segments of T = 4 3 6 7 4 5 0 1 2 Seg 0Seg 1Seg 2Seg 3 Hint parities (stored locally by the client) H[0] = DB[3] ⊕ DB[6] from segment 0 H[1] = DB[7] ⊕ DB[4] from segment 1 H[2] = DB[5] ⊕ DB[0] from segment 2 H[3] = DB[1] ⊕ DB[2] from segment 3 For N ≈ 2²⁰ with T ≈ √(2N) ≈ 1448: the client keeps ~1448 hint parities, each one row wide — √N storage total.
Toy example: N=8, T=4. The client's hint row has 16 cells; 8 hold real indices 0..7, permuted; 8 are empty. One hint parity per segment.

One online query, from start to finish

Say the client wants database row q = 6. It looks up where the value 6 lives in its hint row: segment s = 0, position r = 1. Segment 0's contents are [3, 6, ⊥, ⊥] — only two of its four cells hold real indices, the other two are empty. The client builds the request Q by copying segment 0's contents straight through, except it replaces slot r with the value at a uniformly sampled cell outside segment 0 — say cell 10, which holds 0. (The algorithm picks that cell uniformly from the whole hint row outside segment s; there's no constraint on its within-segment position.)

The resulting request Q = [3, 0, ⊥, ⊥] goes to the server as plain database indices, with marking slots the server should skip. The server looks up the non-empty entries and returns R[0] = DB[3] and R[1] = DB[0]; positions 2 and 3 have no response. The client XORs its stored parity against every returned value at a position other than r:

answer = H[0] ⊕ R[0]
       = (DB[3] ⊕ DB[6]) ⊕ DB[3]
       = DB[6]

R[1] is skipped because that was the swap position — its value, DB[0], has nothing to do with q. R[2] and R[3] simply don't exist; if you prefer to write them, treat them as 0, which leaves the XOR unchanged. In a full segment of four real values, the answer line would have four terms instead of one — the same formula, just more non-⊥ slots to XOR.

1. Locate q = 6 → cell 1 (segment 0, position 1) 3 6 7 4 5 0 1 2 2. Build request Q — copy segment 0's contents, swap position r = 1 with cell 10's value Q[0]=3 Q[1]=0 Q[2]=⊥ Q[3]=⊥ sampled from cell 10 ⊥ slots come from the empty cells of segment 0 — server skips them 3. Server looks up each non-⊥ slot and returns those rows R[0] = DB[3], R[1] = DB[0] (R[2], R[3] absent — Q had ⊥ there) 4. XOR H[0] with every returned R[i] at a position other than r = 1 H[0] ⊕ R[0] = (DB[3] ⊕ DB[6]) ⊕ DB[3] = DB[6] ✓
Request Q is a length-T batch built from segment s, with position r swapped to a freshly sampled random cell. The client already holds H[s]; the server's response completes the XOR that isolates DB[q].

Two things are worth pausing on. First, privacy: the server sees a length-T batch of plain database indices (some of them ). Because the hint row is a uniformly random permutation hidden from the server — and because slot r was replaced by the value at an independently sampled random cell — the distribution of that batch is indistinguishable from a uniform sample of T cells from the whole hint row. The batch contains q, but the server can't point to which slot it is, and can't even rule out slots whose values came from segment s's other cells. Second, efficiency: the request has at most T indices, the response at most T rows, and the client's online work is a handful of XORs plus one hint lookup. With T ≈ √(2N) that's √N work online, against a total database of N rows.

An implementation detail. On the wire, HarmonyPIR does not actually transmit the markers shown above. Sending only the non-empty slots would let the server count how many cells in segment s are still occupied — a count that drifts upward as the client consumes hints, leaking both query progress and which slot is the real query each round. The fix replaces every with a freshly sampled random index and XOR-cancels the dummies client-side; the Hiding the empty count page walks through it.

We've also glossed over one bigger thing: after the query runs, the hint row has to change — otherwise a second query into the same segment would leak. The next page handles that, and the crypto choice that makes the whole scheme run fast enough to matter.