Leveraging Discrete Function Decomposability for Scientific Design

JC Bowden, S Levine, J Listgarten
International Conference on Learning Representations (ICLR), 2026
[paper] [code] [email me] [copy bibtex]

Written by James Bowden, with thanks to Jenn and Sergey, Cat Glossop, Aileen Zhang, and Shreshth Srivastava for feedback.

Choose your own adventure (pick the one best suited to your expertise):

This blog post will focus on protein design for concreteness. Optional details are collapsed; without these, it should be a ~5 min read.

[optional] How can we formalize protein sequence design as an optimization problem?

Though our method can be used for any optimization problem over a discrete design space, as a concrete example, let's consider only the problem of designing a protein sequence. We can set this up as follows:

The kind of discrete object that we'd like to design, $x$, is a protein sequence of length $L$ in which each design variable (or "position"), $x_i$, is an amino acid from an alphabet, $A_i$. For simplicity, we'll assume that every position uses the same standard amino acid alphabet of size 20, $A$.
Given $L$ and $A$, we can write the design space as $X := A^L$, i.e., all amino acid sequences of length $L$. It follows that there are $\lvert A\rvert^L=20^L$ possible proteins to choose from.
Specify a property function to design toward, $f(x)$, such as binding affinity to a target, or gene editing efficiency. In practice, this may be a predictive model fit on limited assay-labeled data.
Putting this all together, our design problem is to find a sequence that maximizes our specification: $x^{\ast}=\arg\max_{x\in X} f(x)$.

[optional] How are discrete optimization problems solved with standard distributional optimization?

Distributional optimization is a way of solving design problems that can be written as $x^{\ast}=\arg\max_{x\in X} f(x)$; estimation of distribution algorithms (EDAs) and policy optimization in reinforcement learning are two common instantiations. Compared to naively evaluating one protein, then the next, until all of $X$ has been considered, distributional optimization algorithms navigate the design space using a probability distribution, $p_\theta(x)$, often referred to as a "search distribution" or a "policy". Intuitively, the search distribution is like a spotlight that moves through the design space toward regions where $f(x)$ is larger (schematic: bottom left). In modern times, $p_\theta(x)$ is typically parameterized as a highly expressive neural network generative model, like an autoregressive model or diffusion model, allowing for pretty arbitrarily shaped spotlights. Initializing $p_\theta(x)$.

Standard EDA pseudocode

There's much more discussion of EDAs, their derivation, relevant hyperparameters, and the important ways they can be extended in our paper.

Decomposing the design space

A crux of many difficult design problems is that there are a huge number of possible designs—far too many to consider them all. However, we often have information that can help us decompose the design space and thereby search a much smaller area. For example, protein sequence design workflows assume, roughly, that the binding interface of a protein and the scaffold can be designed separately (sometimes called a scaffolding problem; schematic: top right). We call this a decomposition, and represent it with a graph in which nodes are design variables (amino acid positions) and edges denote coupling with respect to the property function. So, our example decomposition would be represented by a graph in which all interface amino acids are connected by edges, and all scaffold amino acids are connected by edges, but there are no edges between the interface and scaffold amino acids. We don't expect such clean-cut decomposability in most problems. Our core contribution is a design algorithm that can leverage any decomposability in the property function.

A crux of many difficult design problems is that there are a huge number of possible designs—far too many to consider them all. Imagine if, however, a (particularly simple) decomposition held—yielding two completely separate, smaller design spaces representing respectively the interface amino acids and the scaffold amino acids (schematic: top right). This decomposition corresponds to a linear additive form of the property function to be maximized, $f(x_i, x_s) = f_i(x_i) + f_s(x_s)$, from which we can see that this is actually two separate optimization problems. We don't expect such clean-cut decomposability in most problems. To allow more complex forms of decomposability, we represent a decomposition as a graph in which nodes are design variables and edges denote coupling with respect to $f(x)$; the simple example corresponds to two disconnected components. Our core contribution is a search algorithm that can leverage arbitrarily complex decomposition graphs.

Unlike in typical RL settings, for scientific design we can omit both time and the state-action distinction. There's no stochastic environment and the reward function is defined only on design variables, which you can think of as the actions. In RL, the decomposition graph would have as its nodes state-action pairs for each timestep, and edges only between subsequent timesteps, resulting in a chain. Ignoring this chain decomposition and performing policy optimization would amount to searching the space of all possible trajectories, which grows exponentially with the number of nodes in the graph (timesteps). By analogy, our decomposition graph has instead design variables for nodes (amino acid positions in the case of protein sequence design), and these variables can be coupled with respect to the reward function by arbitrarily complex graph topologies (i.e., not chains, or even trees). Our core contribution is a policy optimization algorithm that can leverage any decomposition graph topology.

What do realistic decompositions look like? Let’s look at decomposition graphs derived from two proteins, AAV and CreiLOV.

In the first figure, we show one way to obtain a decomposition graph for a protein design problem—by computing a contact graph.

a, AAV

b, CreiLOV

Notice that the decomposition graph for AAV (right) has few edges and is relatively chain-like. This suggests that we will be able to realize a large efficiency gain by operating in its decomposed design space. On the other hand, CreiLOV looks a lot more like a fully-connected graph. In this case, we can’t expect to improve over a naive optimization method which considers all variables jointly. One could always choose a more sparsely-connected decomposition. This hints at a key tradeoff in practice.

Briefly, we can easily convert any undirected graph into a directed junction tree, which has no cycles, allowing DADO to draw inspiration from exact message-passing (next section). The corresponding junction tree for each protein is shown below.

DADO: Decomposition-Aware Distributional Optimization

To infuse distributional optimization with knowledge of a decomposition graph, our method has two core components:

First, we perform search with a generative model, $p_\theta(x)$, factorized according to the decomposition junction tree. Each factor distribution searches a subset of design variables corresponding to a node in the tree. This factorization makes it so that DADO only "sees" the smaller decomposed space¹; whereas the standard EDA searches all dimensions of $x$ together.

Second, we coordinate the generative model's factor distributions via message-passing. In essence, the messages (also called "value functions" and denoted $Q_i(\tilde{x}_i, \tilde{x}_p)$) describe how changing a single factor distribution affects its descendants in the tree, so that each factor can be updated in a manner that considers the design variables it has no direct control over. For more details, see the machine learning explanation (or the paper). The upshot is that this leads to a more statistically efficient optimization algorithm that navigates the space of designs more effectively. Below, we show a schematic comparison of a standard EDA to DADO (panel b) for a particular tree decomposition (panel a); on synthetic problems DADO consistently finds higher-$f(x)$ designs (panel c).

Second, we coordinate the factor distributions via graph message-passing so that they can each be trained separately. The messages are called value functions and denoted $Q_i(\tilde{x}_i, \tilde{x}_p)$. Each $Q_i(\tilde{x}_i, \tilde{x}_p)$ describes the partial value of $f$ on the subtree rooted at node $i$, in expectation over its descendants' search distributions. We can use the value functions to perform a weighted maximum likelihood update to each factor distribution separately, which makes DADO more statistically efficient than the standard EDA. Below, we show a schematic comparison of a standard EDA to DADO (panel b) for a particular tree decomposition (panel a); on synthetic problems DADO consistently finds higher-$f(x)$ designs (panel c).

Second, we coordinate the policy factors with value functions, $Q_i(\tilde{x}_i, \tilde{x}_p)$, which describe the partial value of $f$ on the subtree rooted at node $i$, in expectation over its descendants' policies. Then we perform an RWR/AWR-style weighed maximum likelihood update to each policy factor separately. This makes DADO more statistically efficient than the naive policy optimization algorithm (EDA), as each lower-dimensional distribution is updated using the full sample budget. Below, we show a schematic comparison of a standard EDA to DADO (panel b) for a particular tree decomposition (panel a); on synthetic problems DADO consistently finds higher-$f(x)$ designs (panel c).

DADO schematic

DADO’s optimization efficiency gain holds up for messier, real-world design problems too. Recall the two protein design problems we introduced earlier, AAV and CreiLOV, and their contrasting decomposition graphs (sparse vs. dense). Below², we observe that DADO finds much better designs than decomposition-unaware methods for AAV design, whereas for CreiLOV, knowledge of the decomposition doesn’t help, as one would expect.

What now?

Finding an accurate decomposition for a design problem is not always straightforward. The real world is often structured though, and even very approximate decompositions can be useful. One might try to infer decomposability from labeled data or auxiliary information, or some other creative scheme. This is an exciting research direction both for proteins and scientific design in general.
For concreteness, here are some examples of scientific design problems with modularity that might lend itself to decomposition.
- In natural systems.
- Modularity abounds in man-made systems.
There's no reason why DADO can't be used for optimization in continuous design spaces; we simply didn't investigate it in our paper. Everything should extend straightforwardly.

We hope you’ll read (and enjoy) our paper! If you’d like, you can return to the top and re-read from a different perspective :)
Feel free to email me with any questions, comments or feedback. I’d also be excited to discuss applying our method to your problem, or potential collaboration.

Footnotes

In our paper, we include a baseline—a modernized version of the factorized distribution algorithm, or FDA—that only uses a factorization of the search distribution without the message-passing coordination. In FDA, the factorized search distribution is updated the same way as the standard EDA, with a per-sample weight, $f(x)$, instead of a per-node weight. It's interesting that for a few problems, FDA performs as well as or better than DADO, despite its search distribution update being less statistically efficient. We suspect this is due to an additional source of variance—the sample-based approximation of DADO's value functions—which can outweigh the benefit of a per-node update. We only observed this when the junction tree nodes were relatively large, which is exactly when estimating a value function from finite samples is most difficult. It would be interesting to more carefully characterize this behavior, and one might adapt variance-reduction techniques from RL (like learned value functions) here. ↩
The y-axes between the two plots are not comparable; $f(x)$ is protein-specific. For AAV, $f$ represents viral viability (whether the capsid packages), and for CreiLOV, $f$ represents fluorescence intensity. The upshot being that DADO can find better protein sequences than decomposition-unaware methods for AAV packaging, but cannot for CreiLOV fluorescence intensity. One should not conclude that the methods perform better on CreiLOV than on AAV or vice versa, as $f$ is not comparable between them. ↩

↚ back to james' homepage

Leveraging Discrete Function Decomposability for Scientific Design

Decomposing the design space

DADO: Decomposition-Aware Distributional Optimization

What now?