[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:37:39 - INFO - __main__ - Distributed environment: MULTI_GPU Backend: smddp
[1,mpirank:0,algo-1]<stderr>:Num processes: 8
[1,mpirank:0,algo-1]<stderr>:Process index: 0
[1,mpirank:0,algo-1]<stderr>:Local process index: 0
[1,mpirank:0,algo-1]<stderr>:Device: cuda:0
[1,mpirank:0,algo-1]<stderr>:
...
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - ***** Running training *****
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Num examples = 10000
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Num Epochs = 1
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Instantaneous batch size per device = 16
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 128
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Gradient Accumulation steps = 1
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:31 - INFO - __main__ - Total optimization steps = 79
...
[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 79/79 [00:19<00:00, 4.79it/s]
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:50 - INFO - __main__ - Epoch 0 training took 19.50162172317505 seconds
[1,mpirank:0,algo-1]<stdout>:starting evaluation
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:55 - INFO - __main__ - printing few sample generations and corresponding labels from eval set
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:55 - INFO - __main__ - prompt | generated | label
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:55 - INFO - __main__ - chatbot:your persona: i work as an electrician. i always sleep 8 hours a day. </s> <s> Which level are you at?</s> | I'm at the top of the ladder. I work for an electrical company. | I received on-the-job training when i first started
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:38:55 - INFO - __main__ - chatbot:your persona: i work as an electrician. i always sleep 8 hours a day. </s> <s> Which level are you at? </s> <s> I received on-the-job training when i first started </s> <s> Thats great! How long have you been doing this work? </s> | I've been working as an Electrician for about 5 years now. It's a great job. | For a good number of years now.
...
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:39:02 - INFO - __main__ - chatbot:your persona: i'm a painter and love to create art. i'm a talented singer and have won several competitions. </s> <s> I love the Doors! They have such a unique sound. Do you have a favorite Doors song?</s> | My favorite song of theirs is "When I Was Your Man". What's yours? | It's a tie between People are strange and Love me two times. What's your favorite?
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:39:02 - INFO - __main__ - chatbot:your persona: i'm a painter and love to create art. i'm a talented singer and have won several competitions. </s> <s> I think my favorite is Love Street. It has such a haunting melody. Have you heard that one?</s> | No, I haven't. I'll have to check it out. What genre is it? | Yeah, it's a pretty great song, Jim Morrison was just an amazing songwriter, I aspire to make songs as good as his one day.
...
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:39:17 - INFO - __main__ - {'bleu': 1.7067114414104911}
[1,mpirank:0,algo-1]<stdout>:evaluation completed
[1,mpirank:0,algo-1]<stderr>:09/21/2022 13:39:17 - INFO - __main__ - Epoch 0 evaluation took 24.294514417648315 seconds
[1,mpirank:0,algo-1]<stderr>:Configuration saved in /opt/ml/model/config.json
[1,mpirank:0,algo-1]<stderr>:Model weights saved in /opt/ml/model/pytorch_model.bin
[1,mpirank:0,algo-1]<stderr>:tokenizer config file saved in /opt/ml/model/tokenizer_config.json
[1,mpirank:0,algo-1]<stderr>:Special tokens file saved in /opt/ml/model/special_tokens_map.json
[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 79/79 [00:47<00:00, 1.65it/s]
2022-09-21 13:39:27,753 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.
2022-09-21 13:39:27,753 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.
2022-09-21 13:39:27,754 sagemaker-training-toolkit INFO Reporting training SUCCESS