A Semi-discriminative Approach for Sub-sentence Level Topic Classification on a Small Dataset

C. Ferner*, S. Wegenkittl

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper aims at identifying sequences of words related to specific product components in online product reviews. A reliable baseline performance for this topic classification problem is given by a Max Entropy classifier which assumes independence over subsequent topics. However, the reviews exhibit an inherent structure on the document level allowing to frame the task as sequence classification problem. Since more flexible models from the class of Conditional Random Fields were not competitive because of the limited amount of training data available, we propose using a Hidden Markov Model instead and decouple the training of transition and emission probabilities. The discriminating power of the Max Entropy approach is used for the latter. Besides outperforming both standalone methods as well as more generic models such as linear-chain Conditional Random Fields, the combined classifier is able to assign topics on sub-sentence level although labeling in the training data is only available on sentence level. © Springer Nature Switzerland AG 2020.
Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part II
PublisherSpringer
Number of pages14
Volume11907 LNAI
ISBN (Electronic)978-3-030-46147-8
ISBN (Print)978-3-030-46146-1
DOIs
Publication statusPublished - Apr 2020
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019 - Würzburg, Germany
Duration: 16 Sept 201920 Sept 2019
https://ecmlpkdd2019.org/

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
Abbreviated titleECML PKDD 2019
Country/TerritoryGermany
CityWürzburg
Period16/09/1920/09/19
Internet address

Keywords

  • Hidden Markov Model
  • Small data
  • Topic classification
  • Entropy
  • Hidden Markov models
  • Information retrieval systems
  • Machine learning
  • Base-line performance
  • Combined classifiers
  • Conditional random field
  • Discriminating power
  • Discriminative approach
  • Emission probabilities
  • Online product reviews
  • Sequence classification
  • Classification (of information)

Fingerprint

Dive into the research topics of 'A Semi-discriminative Approach for Sub-sentence Level Topic Classification on a Small Dataset'. Together they form a unique fingerprint.

Cite this