General offline contextual bandits paper: Learning from Logged Implicit Exploration Data
We trained a LSTM with the following parameters (sequence length, number of hidden nodes, dropout):
Generalized linear models
We relax our previous assumption where we threw out forced choice trials. Instead, we now incorporated forced choice trials – successful and violation ones – in our CART model to examine feature importance of forced choice trial and violation lookback.