Causal Representation Learning for Latent Space Optimization

Wenlin Chen

August 2021

Abstract

In this thesis, we study causal representation learning for latent space optimization, which allows for robust and efficient generation of novel synthetic data with maximal target value. We assume that the observed data was generated by a few latent factors, some of which are causally related to the target and others of which are spuriously correlated with the target and confounded by an environment variable. Our proposed method consists of three steps, which exploits the structure of the causal graph that describes the assumed underlying data generating process. In the first step, we recover the true data representation (i.e., the latent factors from which the observed data originated). We obtain novel identifiability theory, showing that the true data representation can be recovered up to simple transformations by a generalized version of identifiable variational auto-encoders. In the second step, we identify the causal latent factors of the target, for which we propose a practical causal inference scheme that employs (conditional) independence tests and causal discovery algorithms. Our method does not require having access to the true environment variable, which overcomes a major limitation of existing causal representation learning approaches in the literature. In the final step, we query latent points that correspond to data points with high target values by intervening upon the causal latent factors using standard latent space optimization techniques. We empirically evaluate and thoroughly analyze our method on three different tasks, including a chemical design task. We show that our method can successfully recover the true data representation in the finite data regime and correctly identify the causal latent factors of the target, which results in state-of-the-art performance for black-box optimization.

Type

Thesis

Publication

MPhil Thesis, University of Cambridge

Causality AI for Science

Causal Representation Learning for Latent Space Optimization

Abstract

Wenlin Chen

PhD Student in Machine Learning

Related