Abstract
| Original language | English |
|---|---|
| Pages (from-to) | 542-555 |
| Number of pages | 14 |
| Journal | Mach. Learn. Knowl. Extr. |
| Volume | 4 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 9 Jun 2022 |
Keywords
- generalizability
- isotropy
- language models
- regularization
- semantic reasoning
Fingerprint
Dive into the research topics of 'Benefits from Variational Regularization in Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Mach. Learn. Knowl. Extr., Vol. 4, No. 2, 09.06.2022, p. 542-555.
Research output: Contribution to journal › Article › peer-review
TY - JOUR
T1 - Benefits from Variational Regularization in Language Models
AU - Ferner, C.
AU - Wegenkittl, S.
N1 - Cited By :3 Export Date: 14 December 2023 Correspondence Address: Ferner, C.; Information Technology and Systems Management, Urstein Sued 1, Austria; email: [email protected] Funding details: 20102-F1901166-KZP, WISS 2025 Funding text 1: This project is partially funded by the Science and Innovation Strategy Salzburg (WISS 2025) project “IDA-Lab Salzburg”, Grant Number 20102-F1901166-KZP. References: Devlin, J., Chang, M.W., Lee, K., Toutanova, K., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186. , Minneapolis, MN, USA, 2–7 June 2019; Ethayarajh, K., How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 55-65. , Hong Kong, China, 3–7 November 2019; Gao, J., He, D., Tan, X., Qin, T., Wang, L., Liu, T., Representation Degeneration Problem in Training Natural Language Generation Models Proceedings of the International Conference on Learning Representations, , New Orleans, LA, USA, 6–9 May 2019; Wang, L., Huang, J., Huang, K., Hu, Z., Wang, G., Gu, Q., Improving Neural Language Generation with Spectrum Control Proceedings of the International Conference on Learning Representations, , Addis Ababa, Ethiopia, 26–30 April 2020; Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L., On the Sentence Embeddings from Pre-trained Language Models Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9119-9130. , Online, 16–20 November 2020; Kingma, D.P., Welling, M., Auto-Encoding Variational Bayes Proceedings of the International Conference on Learning Representations, , Banff, AB, Canada, 14–16 April 2014; Goodfellow, I., Bengio, Y., Courville, A., (2016) Deep Learning, , MIT Press, Cambridge, MA, USA; Gururangan, S., Dang, T., Card, D., Smith, N.A., Variational Pretraining for Semi-supervised Text Classification Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5880-5894. , Florence, Italy, 28 July–2 August 2019; Mahabadi, R.K., Belinkov, Y., Henderson, J., Variational Information Bottleneck for Effective Low-Resource Fine-Tuning Proceedings of the International Conference on Learning Representations, , Virtual, 3–7 May 2021; Deudon, M., Learning Semantic Similarity in a Continuous Space (2018) Advances in Neural Information Processing Systems, 31. , Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R., (eds), Curran Associates, Inc., New York, NY, USA; Zhao, T., Lee, K., Eskenazi, M., Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 1098-1107. , Melbourne, Australia, 15–20 July 2018; Miao, Y., Yu, L., Blunsom, P., Neural Variational Inference for Text Processing Proceedings of the 33rd International Conference on Machine Learning (ICML’16), 48, pp. 1727-1736. , New York, NY, USA, 20–22 June 2016; Wang, T., Wan, X., T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion Proceedings of the IJCAI, pp. 5233-5239. , Macao, China, 10–16 August 2019; Shu, R., Lee, J., Nakayama, H., Cho, K., Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 8846-8853. , New York, NY, USA, 7–12 February 2020; Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S., Generating Sentences from a Continuous Space Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10-21. , Berlin, Germany, 11–12 August 2016; Fang, L., Li, C., Gao, J., Dong, W., Chen, C., Implicit Deep Latent Variable Models for Text Generation Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3946-3956. , Hong Kong, China, 3–7 November 2019; Yang, Z., Hu, Z., Salakhutdinov, R., Berg-Kirkpatrick, T., Improved Variational Autoencoders for Text Modeling Using Dilated Convolutions Proceedings of the 34th International Conference on Machine Learning (ICML’17), 70, pp. 3881-3890. , Sydney, NSW, Australia, 6–11 August 2017; Liu, D., Liu, G., A Transformer-Based Variational Autoencoder for Sentence Generation Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 1-7. , Budapest, Hungary, 14–19 July 2019; Li, C., Gao, X., Li, Y., Peng, B., Li, X., Zhang, Y., Gao, J., Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4678-4699. , Virtual, 16–20 November 2020; Li, R., Li, X., Chen, G., Lin, C., Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation Proceedings of the 28th International Conference on Computational Linguistics, pp. 2381-2397. , Online, 8–13 December 2020; Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., Attention is All you Need (2017) Advances in Neural Information Processing Systems 30, pp. 5998-6008. , Curran Associates, Inc., New York, NY, USA; Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A., beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework Proceedings of the International Conference on Learning Representations, , Toulon, France, 24–26 April 2017; Odaibo, S., Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function (2019) arXiv, , arxiv:1907.08956, Available online; Barrault, L., Bojar, O., Costa-Jussà, M., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Malmasi, S., Findings of the 2019 Conference on Machine Translation (WMT19) Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pp. 1-61. , Florence, Italy, 1–2 August 2019; Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics—Volume 2 Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 385-393. , Montréal, QC, Canada, 7–8 June 2012; Lucas, J., Tucker, G., Grosse, R., Norouzi, M., Understanding Posterior Collapse in Generative Latent Variable Models Proceedings of the International Conference on Learning Representations, DeepGenStruct Workshop, , New Orleans, LA, USA, 6–9 May 2019; Sennrich, R., Haddow, B., Birch, A., Improving Neural Machine Translation Models with Monolingual Data Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 86-96. , Berlin, Germany, 7–12 August 2016; Wieting, J., Gimpel, K., ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations Proceedings of the 56th Annual Meeting of the Association for Computational LinguisticsMelbourne, pp. 451-462. , Australia, 15–20 July 2018; Gupta, A., Agarwal, A., Singh, P., Rai, P., A deep generative framework for paraphrase generation Proceedings of the AAAI Conference on Artificial Intelligence, 32. , New Orleans, LA, USA, 2–7 February 2018; Donahue, C., Lee, M., Liang, P., Enabling Language Models to Fill in the Blanks Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2492-2501. , Online, 5–10 July 2020; Wu, X., Zhang, T., Zang, L., Han, J., Hu, S., Mask and Infill: Applying Masked Language Model for Sentiment Transfer Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization, pp. 5271-5277. , Vienna, Austria, Macao, China, 10–16 August 2019; Reimers, N., Gurevych, I., Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982-3992. , Hong Kong, China, 3–7 November 2019; Turc, I., Chang, M.W., Lee, K., Toutanova, K., Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (2019) arXiv, , arxiv:1908.08962v2, Available online; Conneau, A., Kiela, D., SentEval: An Evaluation Toolkit for Universal Sentence Representations Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018); European Language Resources Association (ELRA), , Miyazaki, Japan, 7–12 May 2018; Kingma, D.P., Ba, J., Adam: A Method for Stochastic Optimization Proceedings of the International Conference on Learning Representations, , San Diego, CA, USA, 7–9 May 2015
PY - 2022/6/9
Y1 - 2022/6/9
N2 - Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks. © 2022 by the authors.
AB - Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks. © 2022 by the authors.
KW - generalizability
KW - isotropy
KW - language models
KW - regularization
KW - semantic reasoning
U2 - 10.3390/make4020025
DO - 10.3390/make4020025
M3 - Article
SN - 2504-4990
VL - 4
SP - 542
EP - 555
JO - Mach. Learn. Knowl. Extr.
JF - Mach. Learn. Knowl. Extr.
IS - 2
ER -