HarmonyPIR — the hint row
The client's local state after the offline phase is one long row of
2N cells, called the hint row. Half the
cells hold real database indices, permuted randomly; the other half are
empty (written ⊥). The permutation is private to the client
— the server never learns which cell holds which index.
The row is then split into M = 2N/T contiguous
segments of T cells each. For every
segment, the client stores one small value, the hint parity:
the XOR of the database entries at the real indices in that segment.
That's it — M hint parities total, each the size of one
database row. For a balanced choice T ≈ √(2N) the hint
store comes out to √N rows' worth of data.
One online query, from start to finish
Say the client wants database row q = 6. It looks up where
the value 6 lives in its hint row: segment s = 0,
position r = 1. Segment 0's contents are
[3, 6, ⊥, ⊥] — only two of its four cells hold real
indices, the other two are empty. The client builds the request
Q by copying segment 0's contents straight
through, except it replaces slot r with the
value at a uniformly sampled cell outside segment 0 —
say cell 10, which holds 0. (The algorithm
picks that cell uniformly from the whole hint row outside segment
s; there's no constraint on its within-segment position.)
The resulting request Q = [3, 0, ⊥, ⊥] goes to the server
as plain database indices, with ⊥ marking slots the
server should skip. The server looks up the non-empty entries and
returns R[0] = DB[3] and R[1] = DB[0];
positions 2 and 3 have no response. The
client XORs its stored parity against every returned value at a
position other than r:
answer = H[0] ⊕ R[0]
= (DB[3] ⊕ DB[6]) ⊕ DB[3]
= DB[6] R[1] is skipped because that was the swap position — its
value, DB[0], has nothing to do with q.
R[2] and R[3] simply don't exist; if you
prefer to write them, treat them as 0, which leaves the
XOR unchanged. In a full segment of four real values, the
answer line would have four terms instead of one — the same formula,
just more non-⊥ slots to XOR.
Two things are worth pausing on. First, privacy: the server sees a
length-T batch of plain database indices (some of them
⊥). Because the hint row is a uniformly random
permutation hidden from the server — and because slot r
was replaced by the value at an independently sampled random cell —
the distribution of that batch is indistinguishable from a uniform
sample of T cells from the whole hint row. The batch
contains q, but the server can't point to which slot it
is, and can't even rule out slots whose values came from segment
s's other cells. Second, efficiency: the request has at
most T indices, the response at most T rows,
and the client's online work is a handful of XORs plus one hint
lookup. With T ≈ √(2N) that's √N work
online, against a total database of N rows.
An implementation detail. On the wire, HarmonyPIR
does not actually transmit the ⊥ markers shown above.
Sending only the non-empty slots would let the server count how
many cells in segment s are still occupied — a count
that drifts upward as the client consumes hints, leaking both
query progress and which slot is the real query each round. The
fix replaces every ⊥ with a freshly sampled random
index and XOR-cancels the dummies client-side; the
Hiding the empty count page walks through it.
We've also glossed over one bigger thing: after the query runs, the hint row has to change — otherwise a second query into the same segment would leak. The next page handles that, and the crypto choice that makes the whole scheme run fast enough to matter.