Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms - Université de Lille
Article Dans Une Revue Transactions on Machine Learning Research Journal Année : 2024

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Résumé

We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the Bandits with Stochastic Corruption problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for Bandits with Stochastic Corruption, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving Bandits with Stochastic Corruption for different reward distributions and different levels of corruptions.
Fichier principal
Vignette du fichier
530_bandits_corrupted_by_nature_lo.pdf (1.15 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04615733 , version 1 (18-06-2024)

Licence

Identifiants

  • HAL Id : hal-04615733 , version 1

Citer

Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard. Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms. Transactions on Machine Learning Research Journal, 2024. ⟨hal-04615733⟩
97 Consultations
38 Téléchargements

Partager

More