Rethinking Tabular Modeling: From Feature Embedding to In-Context Learning

Tabular data remains the dominant data format in real-world analytics and database systems. Despite strong empirical performance of modern models, feature embedding continues to be the fundamental bottleneck in tabular learning. This post summarizes recent insights on tabular foundation models and motivates a unified solution inspired by image inpainting.

1. Traditional Tabular Learning Paradigm

In the classical setting like the figure shows, a tabular model learns a fixed mapping:

\[f: \mathbf{x} \rightarrow y\]

where $\mathbf{x}$ is a feature vector and $y$ is the target attribute.
The model compresses knowledge into parameters during training and predicts each instance independently at inference time. Inference and training can be shown like this

Limitations

Ignores relationships among instances (assume IID between instances)
Relies heavily on task-specific feature engineering
*Poor generalization across datasets and tasks

2. Tabular Foundation Models: Learning with Context

In tabular foundation models, the input is extended to include a context set of sampled training instances, as the figure shows

\[\mathbf{x} = (\textbf{x}_{ctx}, \textbf{y}_{ctx}) || (\textbf{x}_{test}, \textbf{y}_{test}) \;\; \text{where} \;(\textbf{x}_{ctx}, \textbf{y}_{ctx})\sim(X_{train}, Y_{train})\]

This enables the model to learn not only the feature-to-target mapping, but also instance-level relationships within the context.

Key advantage: improved generalizability, as predictions adapt dynamically to contextual signals instead of relying solely on fixed embeddings. We can apply the scaling laws on these problem and use this model for different prediction tasks.

3. Analogy to Image Inpainting

This paradigm closely resembles image inpainting in computer vision:

Image Inpainting	Tabular Foundation Modeling
Image Inpainting	Tabular Foundation Modeling
Masked pixels are inferred	Target / missing values are inferred
Context = surrounding pixels	Context = sampled instances
Encoder–decoder architecture	Context-aware encoder

Both problems share a common structure: predicting masked values using contextual information. Image inpainting typically employs encoder–decoder or Transformer-based architectures, suggesting a shared high-level solution for tabular learning.

In fact, image inpainting is a sophisticated research area where Vision Transformers (ViT) achieve state-of-the-art performance. Bridging these two problem formulations could significantly accelerate progress in tabular modeling.

4. Difference between tabular data and image

4.1 Feature Embedding vs. Pixel Embedding

Pixel embedding
- Native RGB representation
- Uniform and task-agnostic
- Shared across all images
Feature embedding
- Depends on heterogeneous data types
- Often task- or dataset-specific
- Limited transferability

Feature embedding is therefore the core challenge in tabular learning.

4.2 Tabular Format vs. Image Format

Property	Tabular Data	Image Data
Ordering	Permutation-invariant	Fixed spatial order
Semantics	Orthogonal (rows & columns)	Local (neighbor pixels)
Structure	Schema-driven	Grid-based

While self-attention is permutation-invariant, tabular data requires explicit modeling of orthogonal semantics across rows and columns.

5. Limitations of Existing Tabular Foundation Models

Most current models (e.g., TabPFN, TabICL) adopt simplifying assumptions:

Only numerical and categorical features are supported
Categorical features are handled via per-column ordinal encoding

This leads to:

Artificial partial-order noise for categorical values
Duplicated embeddings for identical values in different columns
Poor scalability to high-cardinality features

As a result, learned embeddings overfit to specific datasets and tasks.

6. Toward a Unified Feature-to-Embedding Mechanism

Feature embedding—not model architecture—is the fundamental obstacle in tabular learning.

By:

Introducing context-aware learning,
Designing a unified feature-to-embedding mechanism,
Drawing inspiration from image inpainting,

we move closer to general, transferable, and robust tabular foundation models.

Rethinking Tabular Modeling: From Feature Embedding to In-Context Learning

1. Traditional Tabular Learning Paradigm

2. Tabular Foundation Models: Learning with Context

3. Analogy to Image Inpainting

4. Difference between tabular data and image

4.1 Feature Embedding vs. Pixel Embedding

4.2 Tabular Format vs. Image Format

5. Limitations of Existing Tabular Foundation Models

6. Toward a Unified Feature-to-Embedding Mechanism

You may also enjoy:

[IDEA] Value-Based Position Encoding for Numerical Features

TabPFN for downstream unsupervised task [tabpfn-extension]

Tabular Foundation Model: Synthesis Data Generation