A Self-Supervised Framework for Robust Multi-Modal Molecular Representation Learning

Indra Priyadarsini S; Seiji Takeda; Lisa Hamada; Sina Asuka Klampt; Takao Moriyama

NeurIPS 2025

Workshop paper

02 Dec 2025

A Self-Supervised Framework for Robust Multi-Modal Molecular Representation Learning

Abstract

Molecular property prediction has greatly benefited from learned embeddings such as SMILES-based, SELFIES-based, and graph-derived representations. However, existing approaches often rely on a single modality or naïvely concatenating multiple modalities, limiting robustness and failing under missing-modality conditions. In this work, we propose a novel self-supervised fusion framework - dynamic fusion, that dynamically integrates multiple molecular embeddings. The proposed framework employs intra-modal gating for feature selection, inter-modal attention for adaptive weighting, and cross-modal reconstruction to ensure information exchange. Through progressive modality masking during training, the dynamic fusion approach learns to generate fused embeddings resilient to missing modalities. We conducted preliminary evaluations of the proposed approach on MoleculeNet benchmarks, and demonstrate a superior performance in reconstruction, modality alignment, and downstream property prediction tasks compared to unimodal baselines. Our findings highlight the importance of feature-level gating, entropy-regularized attention, and cross-modal reconstruction in achieving robust fusion.

Conference paper