Week 8 LSTM

We trained a LSTM with the following parameters (sequence length, number of hidden nodes, dropout):

[[10,15,20], [2, 3, 4, 5, 6], [0.2, 0.35, 0.5]]

and got the following accuracies on the test set:

[[[0.7343219029966018, 0.745443311708372, 0.7348367830295541],
  [0.7432808155699722, 0.6328905365049944, 0.6704767789105138],
  [0.7313355988054783, 0.7355576150756874, 0.7328802389043353],
  [0.7424570075172485, 0.7442075996292864, 0.742354031510658],
  [0.7435897435897436, 0.7415302234579343, 0.7424570075172485]],
 [[0.7398826073524869, 0.7313355988054783, 0.7322623828647925],
  [0.7411183194315725, 0.7433837915765626, 0.525280609617959],
  [0.743898671609515, 0.7413242714447533, 0.7357635670888683],
  [0.7426629595304294, 0.7475028318401813, 0.7079600453094429],
  [0.7515188960972093, 0.7393677273195346, 0.7407064154052105]],
 [[0.7441046236226959, 0.732365358871383, 0.7421480794974771],
  [0.7466790237874575, 0.7401915353722582, 0.7422510555040676],
  [0.7403974873854392, 0.7282463186077643, 0.7457522397281433],
  [0.7375141592009062, 0.7432808155699722, 0.74503140768201],
  [0.7484296158994954, 0.7390587992997631, 0.743898671609515]]]

Apart from a few bad choices of parameters (or outliers), the LSTM robustly obtains 73-75% accuracy. What if we just have 1 hidden node? With parameters [[10,15,20], [0.2, 0.35, 0.5]] The LSTM can still get 73% accuracy, so increased number of hidden nodes do not help much.

[[[0.525280609617959, 0.525280609617959, 0.7238183503243745]],
 [[0.7126969416126042, 0.7339099989702399, 0.7316445268252497]],
 [[0.6701678508907425, 0.525280609617959, 0.7326742868911543]]]

Kevin’s Link: https://www.dropbox.com/s/ezw5odk2gewjw3j/hierarchical_models2.pdf?dl=0

Written on November 18, 2015