CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
leechanwoo-kor

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: leechanwoo-kor/coursera
Path: blob/main/deep-learning-specialization/course-5-sequence-models/Week 1 Quiz - Recurrent Neural Networks.md
Views: 34205

Week 1 Quiz - Recurrent Neural Networks

1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the lthl^{th} word in the kthk^{th} training example?

  • x(k)<l>x^{(k)< l>}

  • ...

πŸ“Œ The parentheses represent the training example and the brackets represent the word. You should choose the training example and then the word.

2. Consider this RNN:

image

True/False: This specific type of architecture is appropriate when Tx>TyT_x>T_y

  • True

  • False

πŸ“Œ This type of architecture is for applications where the input and output sequence length is the same.

3. Select the two tasks combination that could be addressed by a many-to-one RNN model architecture from the following:

  • Task 1: Speech recognition. Task 2: Gender recognition

  • Task 1: Image classification. Task 2: Sentiment classification.

  • Task 1: Gender recognition from audio. Task 2: Movie review (positive/negative) classification.

  • Task 1: Gender recognition from audio. Task 2: Image classification.

πŸ“Œ Gender recognition from audio and movie review classification are two examples of many-to-one RNN architecture

4. Using this as the training model below, answer the following:

image

True/False: At the ttht^{th}t time step the RNN is estimating P(y<t>∣y<1>,y<2>,…,y<tβˆ’1>)P(y^{< t>} \mid y^{<1>}, y^{<2>}, …, y^{< t-1>})

  • True

  • False

πŸ“Œ In a training model we try to predict the next step based on knowledge of all prior steps.

5. You have finished training a language model RNN and are using it to sample random sentences, as follows:

image

True/False: In this sample sentence, step t uses the probabilities output by the RNN to randomly sample a chosen word for that time-step. Then it passes this selected word to the next time-step.

  • True

  • False

πŸ“Œ Step t uses the probabilities output by the RNN to randomly sample a chosen word for that time-step. Then it passes this selected word to the next time-step.

6. True/False: If you are training an RNN model, and find that your weights and activations are all taking on the value of NaN (β€œNot a Number”) then you have an exploding gradient problem.

  • True

  • False

πŸ“Œ Exploding gradients happen when large error gradients accumulate and result in very large updates to the NN model weights during training. These weights can become too large and cause an overflow, identified as NaN.

7. Suppose you are training an LSTM. You have an 80000 word vocabulary, and are using an LSTM with 800-dimensional activations a<t>a^{< t>}. What is the dimension of Ξ“u\Gamma_u at each time step?

  • 800

  • ...

πŸ“Œ Ξ“u\Gamma_u is a vector of dimension equal to the number of hidden units in the LSTM.

8. True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the Ξ“r\Gamma_r i.e., setting Ξ“r=1\Gamma_r = 1 always.

  • True

  • False

πŸ“Œ If Ξ“uβ‰ˆ0\Gamma_u \approx 0 for a timestep, the gradient can propagate back through that timestep without much decay. For the signal to backpropagate without vanishing, we need c<t>c^{< t>} to be highly dependent on c<tβˆ’1>c^{< t-1>}.

9. True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to 1βˆ’Ξ“u1 - \Gamma_u and Ξ“u\Gamma_u.

image

  • True

  • False

πŸ“Œ No. Instead of using Ξ“u\Gamma_u to compute 1βˆ’Ξ“u1 - \Gamma_u, LSTM uses 2 gates (Ξ“u\Gamma_u and Ξ“f\Gamma_f) to compute the final value of the hidden state. So, Ξ“f\Gamma_f is used instead of 1βˆ’Ξ“u1 - \Gamma_u.

10. Your mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x<1>,…,x<365>x^{<1>}, \dots, x^{<365>}. You’ve also collected data on your mood, which you represent as y<1>,…,y<365>y^{<1>}, \dots, y^{<365>}. You’d like to build a model to map from x β†’ y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?

  • Unidirectional RNN, because the value of y<t>y^{< t>} depends only on x<1>,…,x<t>x^{<1>}, \dots, x^{< t>}, but not on x<1>,…,x<365>x^{<1>}, \dots, x^{<365>}.