TY - JOUR
T1 - When BERT Started Traveling: TourBERT—A Natural Language Processing Model for the Travel Industry
AU - Arefeva, V.
AU - Egger, R.
N1 - Cited By :1
Export Date: 14 December 2023
Correspondence Address: Egger, R.; Department of Innovation and Management in Tourism, Austria; email: [email protected]
Funding text 1: This project was carried out without funding.
References: Doolin, B., Burgess, L., Cooper, J., Evaluating the use of the Web for tourism marketing: A case study from New Zealand (2002) Tour. Manag, 23, pp. 557-561; Yu, J., Egger, R., Tourist Experiences at Overcrowded Attractions: A Text Analytics Approach (2021) Information and Communication Technologies in Tourism 2021, pp. 231-243. , Springer, Cham, Switzerland; Daxböck, J., Dulbecco, M.L., Kursite, S., Nilsen, T.K., Rus, A.D., Yu, J., Egger, R., The Implicit and Explicit Motivations of Tourist Behaviour in Sharing Travel Photographs on Instagram: A Path and Cluster Analysis (2021) Information and Communication Technologies in Tourism 2021, pp. 244-255. , Springer, Cham, Switzerland; Saraiva, J.P.D.P.M., Web 2.0 in restaurants: Insights regarding TripAdvisor’s use in Lisbon (2013) Doctoral Dissertation, , Universidade Catolica Protugesa, Lisboa, Portugal; Egger, R., Gokce, E., Natural Language Processing: An Introduction (2022) Applied Data Science in Tourism. Interdisciplinary Approaches, Methodologies and Applications, pp. 307-334. , Egger R., (ed), Springer, Berlin/Heidelberg, Germany; Wennker, P., Künstliche Intelligenz in der Praxis (2020) Anwendung in Unternehmen und Branchen: KI wettbewerbs- und zukunftsorientiert Einsetzen, , https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6326361, Springer Gabler, Wiesbaden, Germany, Available online; Poon, A., (1993) Tourism, Technology and Competitive Strategies, , CAB International, Wallingford, UK; Egger, R., Text Representations and Word Embeddings. Vectorizing Textual Data (2022) Applied Data Science in Tourism. Interdisciplinary Approaches, Methodologies and Applications, pp. 335-361. , Springer, Berlin/Heidelberg, Germany; Tenney, I., Dipanjan, D., Pavlick, E., BERT rediscovers the classical NLP pipeline (2019) arXiv, , 1905.05950; Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R., Albert: A lite bert for self-supervised learning of language representations (2019) arXiv, , 1909.11942; Edwards, A., Camacho-Collados, J., De Ribaupierre, H., Preece, A., Go simple and pre-train on domain-specific corpora: On the role of training data for text classification Proceedings of the 28th International Conference on Computational Linguistics, pp. 5522-5529. , Barcelona, Spain, 8–13 December 2020; Gururangan, S., Marasović, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., Smith, N.A., Don’t stop pretraining: Adapt language models to domains and tasks (2020) arXiv, , 2004.10964; Araci, D., Finbert: Financial sentiment analysis with pre-trained language models (2019) arXiv, , 1908.10063; Alsentzer, E., Murphy, J.R., Boag, W., Weng, W.H., Jin, D., Naumann, T., McDermott, M., Publicly available clinical BERT embeddings (2019) arXiv, , 1904.03323; Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J., BioBERT: A pre-trained biomedical language representation model for biomedical text mining (2020) Bioinformatics, 36, pp. 1234-1240. , 31501885; Beltagy, I., Lo, K., Cohan, A., Scibert: A pretrained language model for scientific text (2019) arXiv, , 1903.10676; Hotel Reviews from around the world with Sentiment Values and Review Ratings in different Categories for Natural Language Processing. IEEE Dataport, , https://ieee-dataport.org/documents/hotel-reviews-around-world-sentiment-values-and-review-ratings-different-categories, Available online; Liu, J., (2019) 515K Hotel Reviews Data in Europe, , https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe, Available online
PY - 2022/11/11
Y1 - 2022/11/11
N2 - In recent years, Natural Language Processing (NLP) has become increasingly important for extracting new insights from unstructured text data, and pre-trained language models now have the ability to perform state-of-the-art tasks like topic modeling, text classification, or sentiment analysis. Currently, BERT is the most widespread and widely used model, but it has been shown that a potential to optimize BERT can be applied to domain-specific contexts. While a number of BERT models that improve downstream tasks’ performance for other domains already exist, an optimized BERT model for tourism has yet to be revealed. This study thus aimed to develop and evaluate TourBERT, a pre-trained BERT model for the tourism industry. It was trained from scratch and outperforms BERT-Base in all tourism-specific evaluations. Therefore, this study makes an essential contribution to the growing importance of NLP in tourism by providing an open-source BERT model adapted to tourism requirements and particularities. © 2022 by the authors.
AB - In recent years, Natural Language Processing (NLP) has become increasingly important for extracting new insights from unstructured text data, and pre-trained language models now have the ability to perform state-of-the-art tasks like topic modeling, text classification, or sentiment analysis. Currently, BERT is the most widespread and widely used model, but it has been shown that a potential to optimize BERT can be applied to domain-specific contexts. While a number of BERT models that improve downstream tasks’ performance for other domains already exist, an optimized BERT model for tourism has yet to be revealed. This study thus aimed to develop and evaluate TourBERT, a pre-trained BERT model for the tourism industry. It was trained from scratch and outperforms BERT-Base in all tourism-specific evaluations. Therefore, this study makes an essential contribution to the growing importance of NLP in tourism by providing an open-source BERT model adapted to tourism requirements and particularities. © 2022 by the authors.
KW - BERT
KW - natural language model
KW - TourBERT
KW - tourism
U2 - 10.3390/digital2040030
DO - 10.3390/digital2040030
M3 - Article
SN - 2673-6470
VL - 2
SP - 546
EP - 559
JO - Digital.
JF - Digital.
IS - 4
ER -