(ICML 2026) PLUREL: Synthetic Data unlocks Scaling laws for Relational Foundation Models.
In this blog, I only record the methdology on How PluRel generates the schema, the row-level foreign-key connectivity, and the temporal attribute.
Some example is summarized from the code repository of PLUREL by claude code.
1. Schema sampling
Stage 1 picks a DAG over tables. Each node = one table; each edge A → B means “table B has a FK column pointing to table A.”
Three layouts are sampled at the schema level:
- BarabasiAlbert
- ReverseRandomTree
- WattsStrogatz
After the DAG is built, every node with no outgoing edges becomes an Activity table (large, gets a date column); everything else is an Entity table.
1.1 BarabasiAlbert — scale-free, hubs, multiple sinks
- Preferential attachment: each new node connects to existing nodes with probability proportional to their current degree.
- Result: power-law degree distribution → a few hub tables, most tables sparse.
- Sinks are thinned (some incoming edges to leaf tables are dropped) so activity tables don’t get implausibly wide fan-in.
- Many edges, multiple sinks, diamond patterns are common.
users ← hub
╱ │ │ ╲
posts │ │ follows
│ ╲ │ │ ╱ │
│ ▼ ▼ ▼ │
│ comments │
▼ │ ▼
likes ▼ tags ← 3 sinks = 3 Activity tables
reactions
The diamond users → posts → comments and users → comments means multiple paths exist between the same pair — features fan in through more than one route.
1.2 ReverseRandomTree — fan-in to a single sink
- A uniformly random tree with all edges oriented toward a randomly chosen root.
- Exactly
n − 1edges, exactly one sink, no diamonds.
customers products employees coupons
│ │ │ │
└──┐ ┌──┘ └──┐ ┌───┘
▼ ▼ ▼ ▼
promos pay
│ │
└─────────┐ ┌────────┘
▼ ▼
orders ← the single sink
Classic star / snowflake schema: many dimension tables feed exactly one fact table.
1.3 WattsStrogatz — small-world ring + shortcuts
- Start from a ring where each node connects to its
knearest neighbors, then randomly rewire a small fraction of edges. - Result: mostly local/sequential edges plus a few long-range shortcuts.
- High local clustering, short average path length.
raw_events ──> sessions ──> daily_agg ──> weekly_agg ──> reports
│ ▲ ▲
└──────── shortcut ────────┘ │
│ │
└────────── shortcut ─────────────┘
Feel: staged pipeline, mostly stage-by-stage with occasional cross-stage joins.
1.4 Comparison
| BarabasiAlbert | ReverseRandomTree | WattsStrogatz | |
|---|---|---|---|
| Edge count | ≈ 2n |
n − 1 |
≈ nk/2 |
| # of sinks (Activity tables) | many | exactly 1 | varies |
| Diamond / multi-path joins | yes | no | rare |
| Hub location | anywhere | root only | none |
| Real-world analogue | social graph | star/snowflake | staged pipeline |
2. Connectivity generation
The schema DAG says “table C has a FK into table P” but not which specific parent row each child row links to. Stage 2 fills in those FK values using a hierarchical stochastic block model (HSBM) on the bipartite (parent, child) graph.
2.1 Pipeline
For each (parent, child) relationship in the schema:
- Sample HSBM hyperparameters.
num_levels(a small integer, e.g. 1–5).- For each level, a cluster count for each side (a small integer, e.g. 1–3).
- Fresh draw per FK relationship — independent block structures.
-
Assign a hierarchical cluster label to every row on each side. Rows are split contiguously by row index into base clusters; each row gets an L-tuple of cluster IDs across the L levels.
-
Sample a block-probability matrix per level. For each level, a small matrix indexed by (parent cluster, child cluster):
- off-diagonal entries: very small (e.g. ~0.001).
- diagonal entries: large (e.g. 0.9). Strong same-cluster bias.
-
Score every (parent, child) pair as a product across levels:
score(a, b) = Π_l P_l[ cluster_a[a, l], cluster_b[b, l] ]With L = 2 levels, four levels of “distance”:
case example score same fine cluster 0.9 × 0.9 = 0.81same coarse, different fine 0.9 × 0.001 ≈ 9e-4different coarse, same fine 0.001 × 0.9 ≈ 9e-4totally different 0.001 × 0.001 ≈ 1e-6 -
Normalize per child row. For each child row
b, normalize the scores across all candidate parents so they sum to 1. Children are normalized independently. -
Sample one parent per child from that distribution (inverse-CDF / categorical sampling).
- Write the sampled parent indices into the child’s FK column.
2.2 Worked example — posts(12) → users(8)
With num_levels = 2 and 2 clusters per level on each side (4 clusters total per side):
users: posts:
0,1 → (0,0) 0,1,2 → (0,0)
2,3 → (0,1) 3,4,5 → (0,1)
4,5 → (1,0) 6,7,8 → (1,0)
6,7 → (1,1) 9,10,11 → (1,1)
Block-probability matrix (same shape at both levels):
cluster_b=0 cluster_b=1
cluster_a=0 0.9 0.001
cluster_a=1 0.001 0.9
And we can get the cluster probability matrix, the bipartite join is approximately block-diagonal.
posts
(0,0) (0,1) (1,0) (1,1)
(0,0) ████ · · ·
users (0,1) · ████ · ·
(1,0) · · ████ ·
(1,1) · · · ████
A typical draw:
post 0 → user 0 post 6 → user 4
post 1 → user 1 post 7 → user 5
post 2 → user 0 post 8 → user 1 ← rare cross-cluster stray
post 3 → user 3 post 9 → user 7
post 4 → user 2 post 10 → user 6
post 5 → user 3 post 11 → user 7
2.3 Determine the PK-FK pair.
Sampling directly from the per-row distribution is mathematically equivalent to:
- Sample a parent cluster with probability proportional to
(cluster size) × score(parent_cluster, child_cluster). - Sample a parent row uniformly within that cluster.
So you only really need the small K_a × K_b cluster-block score matrix (plus cluster sizes) — you don’t need to compute scores per row pair.
2.4 Properties
- Block-diagonal joins. Most child rows link to parents in the same fine cluster (~0.9 mass per level), with rare off-block strays from the small off-diagonal probability.
- Hierarchical bleed-through. Off-block penalties multiply across levels. With more levels, totally-different cluster pairs become effectively unreachable, while “same coarse / different fine” pairs stay possible.
- Many-to-one is natural. Children sample independently, so one parent can be picked by many children; some parents may be picked by none. No constraints enforce “every parent used.”
- Multiple FKs are independent. If a child has FKs into two parents, each FK gets its own freshly sampled HSBM. There is no joint coupling across FK columns at this stage.
3. Temporal attribute
3.1 How date is generated
- Only Activity tables (sinks) get a
datecolumn. Entity tables have no timestamp. - The dataset picks a random sub-window
[min_ts, max_ts]from a wide configured range. This same window is shared by every Activity table in the database. - For each Activity table with
Nrows, thedatecolumn isNevenly-spaced timestamps betweenmin_tsandmax_ts. val_timestampandtest_timestampare placed at the 80% and 90% points of the window — used as eval split markers.
Key consequence: row index encodes time order. Row 0 is the earliest, row N−1 is the latest. Rows are never shuffled.
3.2 Time-series-flavored features
Independently of date, some feature columns can be generated as a function of row_idx with three components:
value(row_idx) = trend(row_idx) + cycle(row_idx) + AR(1)-noise(row_idx)
Activity tables get non-zero trend and cycle scales (so features look genuinely temporal); Entity tables get zero trend/cycle and high noise (so features look like pure noise — appropriate, since entities have no time).
When such a feature is plotted against the date column, it looks like a real time series — but this is purely a side-effect of indexing the generator by row position.
3.3 Emergent temporal cohorts
Two facts combine into an interesting consequence:
- HSBM cluster assignment is contiguous by row index.
- The
datecolumn is monotonic with row index.
Therefore HSBM “communities” automatically become temporal cohorts:
posts cluster (0,0) → earliest 25% of the time window
posts cluster (0,1) → next 25%
posts cluster (1,0) → next 25%
posts cluster (1,1) → latest 25%
Because user-cluster c connects almost exclusively to post-cluster c, each user effectively gets a lifetime window:
| user cluster | when those users post |
|---|---|
| (0, 0) | early period |
| (0, 1) | early-middle |
| (1, 0) | late-middle |
| (1, 1) | late |
Rare off-diagonal links become “old user comes back” events. With more HSBM levels, cohorts nest into super-cohorts and sub-cohorts.
3.4 Disadvantage
- No event bursts, no business-hour or weekday effects. Real activity logs are spiky; here every inter-event gap is identical.
- No referential-time consistency. A child row at time
t1can reference a parent activity row at a later timet2 > t1— the HSBM ignores timestamps. - Hard cohort boundaries. Off-diagonal probabilities are tiny, so cohort boundaries are clear. Real users have long tails of activity.
- Independent FK cohorts. Each FK’s HSBM is sampled independently, so on a child table with two FKs the two cohort timings are uncorrelated. No “users from 2015 wrote about topics from 2015” coupling.
dateis post-hoc, not causal. The timestamp column is added after features are generated; it is not a node in the causal DAG. Temporal-looking features are an artifact of generator indexing, not of the timestamp itself.