While deep learning has achieved remarkable performance in modelling complex patterns in structured data, a key challenge is its reliance on large datasets. In contrast, probabilistic inference excels in data-scarce settings but suffers from computational inefficiencies for high dimensional data and struggles to model structured data where representation learning is crucial. This thesis focuses on the synergies between deep learning and probabilistic inference, bridging these gaps from two complementary perspectives, which results in novel machine learning methods with improved data efficiency, identifiability, and sampling scalability. In the first part of this thesis, we investigate how probabilistic inference can enhance deep learning. We first introduce a data-efficient meta-learning framework, which combines Gaussian processes and deep neural networks to improve representation learning on related low-data tasks. By formulating this problem in a novel bilevel optimisation framework and solving it with the implicit function theorem, this approach enhances the generalisation capabilities of deep neural networks for few-shot molecular property prediction and optimisation tasks. Next, we analyse the theoretical properties of neural network representations learned across multiple tasks within a probabilistic framework, establishing conditions under which neural networks can recover canonical feature representations that reflect the underlying ground-truth data generating process. Our framework not only ensures linear identifiability in the general multi-task regression setting, but also offers a simple probabilistic inference approach to recovering point-wise identifiable feature representations under certain assumptions of task structures, resulting in stronger theoretical guarantees and empirical performance of identifiability than previous methods on real-world molecular data. In the second part of this thesis, we explore the other direction in their reciprocal relationship: utilising deep learning to improve probabilistic inference. Inspired by diffusion-based modelling techniques, we propose a novel approach for training deep generative models to emulate sampling-based probabilistic inference for unnormalised probability distributions. This enables efficient sampling from multi-modal probability distributions such as Boltzmann distributions for many-body particle systems. Our approach outperforms previous neural samplers while achieving faster training and inference speed. Together, this thesis demonstrates how deep learning and probabilistic inference can be integrated in a mutually reinforcing manner to enhance each other.