Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Timothée Mathieu; Debabrota Basu; Odalric-Ambrym Maillard

Article Dans Une Revue Transactions on Machine Learning Research Journal Année : 2024

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

(1, 2) , (1, 2, 3, 4) , (1, 2, 3)

1
2
3
4

Timothée Mathieu

Fonction : Auteur
PersonId : 1130096
IdHAL : timothee-mathieu

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Université de Lille

Centrale Lille

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Scool

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Université de Lille

Résumé

We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the Bandits with Stochastic Corruption problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for Bandits with Stochastic Corruption, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving Bandits with Stochastic Corruption for different reward distributions and different levels of corruptions.

Mots clés

Bandits Robust statistics Unbounded corruption Huber's estimator Lower bounds Regret bounds Heavy tail distributions

Domaines

Machine Learning [stat.ML] Apprentissage [cs.LG] Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

530_bandits_corrupted_by_nature_lo.pdf (1.15 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Debabrota Basu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04615733

Soumis le : mardi 18 juin 2024-13:12:08

Dernière modification le : jeudi 20 juin 2024-03:18:53

Dates et versions

hal-04615733 , version 1 (18-06-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04615733 , version 1

Citer

Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard. Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms. Transactions on Machine Learning Research Journal, 2024. ⟨hal-04615733⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL ANR PEPR_IA

97 Consultations

38 Téléchargements

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager