Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

Lee, Joongkyu; Oh, Min-hwan

Statistics > Machine Learning

arXiv:2405.09831 (stat)

[Submitted on 16 May 2024 (v1), last revised 22 May 2024 (this version, v2)]

Title:Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

Authors:Joongkyu Lee, Min-hwan Oh

View PDF HTML (experimental)

Abstract:In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the feature dimension $d$ and the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{O}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{O}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.

Comments:	Preprint. Under review
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2405.09831 [stat.ML]
	(or arXiv:2405.09831v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2405.09831

Submission history

From: Joongkyu Lee [view email]
[v1] Thu, 16 May 2024 06:07:31 UTC (148 KB)
[v2] Wed, 22 May 2024 15:43:53 UTC (156 KB)

Statistics > Machine Learning

Title:Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators