Benefits from Variational Regularization in Language Models

C. Ferner*, S. Wegenkittl

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks. © 2022 by the authors.
Original languageEnglish
Pages (from-to)542-555
Number of pages14
JournalMach. Learn. Knowl. Extr.
Volume4
Issue number2
DOIs
Publication statusPublished - 9 Jun 2022

Keywords

  • generalizability
  • isotropy
  • language models
  • regularization
  • semantic reasoning

Fingerprint

Dive into the research topics of 'Benefits from Variational Regularization in Language Models'. Together they form a unique fingerprint.

Cite this