CoCalc -- Week 1 Quiz - Recurrent Neural Networks.md

GitHub Repository: leechanwoo-kor/coursera
Path: blob/main/deep-learning-specialization/course-5-sequence-models/Week 1 Quiz - Recurrent Neural Networks.md
⁶⁰¹³⁷ views

Week 1 Quiz - Recurrent Neural Networks

1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the $l^{th}$ word in the $k^{th}$ training example?

$x^{(k)< l>}$
...

📌 The parentheses represent the training example and the brackets represent the word. You should choose the training example and then the word.

2. Consider this RNN:

True/False: This specific type of architecture is appropriate when $T_x>T_y$

True
False

📌 This type of architecture is for applications where the input and output sequence length is the same.

3. Select the two tasks combination that could be addressed by a many-to-one RNN model architecture from the following:

Task 1: Speech recognition. Task 2: Gender recognition
Task 1: Image classification. Task 2: Sentiment classification.
Task 1: Gender recognition from audio. Task 2: Movie review (positive/negative) classification.
Task 1: Gender recognition from audio. Task 2: Image classification.

📌 Gender recognition from audio and movie review classification are two examples of many-to-one RNN architecture

4. Using this as the training model below, answer the following:

True/False: At the $t^{th}$ t time step the RNN is estimating $P(y^{< t>} \mid y^{<1>}, y^{<2>}, …, y^{< t-1>})$

True
False

📌 In a training model we try to predict the next step based on knowledge of all prior steps.

5. You have finished training a language model RNN and are using it to sample random sentences, as follows:

True/False: In this sample sentence, step t uses the probabilities output by the RNN to randomly sample a chosen word for that time-step. Then it passes this selected word to the next time-step.

True
False

📌 Step t uses the probabilities output by the RNN to randomly sample a chosen word for that time-step. Then it passes this selected word to the next time-step.

6. True/False: If you are training an RNN model, and find that your weights and activations are all taking on the value of NaN (“Not a Number”) then you have an exploding gradient problem.

True
False

📌 Exploding gradients happen when large error gradients accumulate and result in very large updates to the NN model weights during training. These weights can become too large and cause an overflow, identified as NaN.

7. Suppose you are training an LSTM. You have an 80000 word vocabulary, and are using an LSTM with 800-dimensional activations $a^{< t>}$ . What is the dimension of $\Gamma_u$ at each time step?

800
...

📌 $\Gamma_u$ is a vector of dimension equal to the number of hidden units in the LSTM.

8. True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the $\Gamma_r$ i.e., setting $\Gamma_r = 1$ always.

True
False

📌 If $\Gamma_u \approx 0$ for a timestep, the gradient can propagate back through that timestep without much decay. For the signal to backpropagate without vanishing, we need $c^{< t>}$ to be highly dependent on $c^{< t-1>}$ .

9. True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to $1 - \Gamma_u$ and $\Gamma_u$ .

True
False

📌 No. Instead of using $\Gamma_u$ to compute $1 - \Gamma_u$ , LSTM uses 2 gates ( $\Gamma_u$ and $\Gamma_f$ ) to compute the final value of the hidden state. So, $\Gamma_f$ is used instead of $1 - \Gamma_u$ .

10. Your mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as $x^{<1>}, \dots, x^{<365>}$ . You’ve also collected data on your mood, which you represent as $y^{<1>}, \dots, y^{<365>}$ . You’d like to build a model to map from x → y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?

Unidirectional RNN, because the value of $y^{< t>}$ depends only on $x^{<1>}, \dots, x^{< t>}$ , but not on $x^{<1>}, \dots, x^{<365>}$ .

Week 1 Quiz - Recurrent Neural Networks

1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the $l^{th}$ word in the $k^{th}$ training example?

2. Consider this RNN:

3. Select the two tasks combination that could be addressed by a many-to-one RNN model architecture from the following:

4. Using this as the training model below, answer the following:

5. You have finished training a language model RNN and are using it to sample random sentences, as follows:

6. True/False: If you are training an RNN model, and find that your weights and activations are all taking on the value of NaN (“Not a Number”) then you have an exploding gradient problem.

7. Suppose you are training an LSTM. You have an 80000 word vocabulary, and are using an LSTM with 800-dimensional activations $a^{< t>}$ . What is the dimension of $\Gamma_u$ at each time step?

8. True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the $\Gamma_r$ i.e., setting $\Gamma_r = 1$ always.

9. True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to $1 - \Gamma_u$ and $\Gamma_u$ .

Product

Resources

Company

Week 1 Quiz - Recurrent Neural Networks

1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the lthl^{th}lth word in the kthk^{th}kth training example?

2. Consider this RNN:

3. Select the two tasks combination that could be addressed by a many-to-one RNN model architecture from the following:

4. Using this as the training model below, answer the following:

5. You have finished training a language model RNN and are using it to sample random sentences, as follows:

6. True/False: If you are training an RNN model, and find that your weights and activations are all taking on the value of NaN (“Not a Number”) then you have an exploding gradient problem.

7. Suppose you are training an LSTM. You have an 80000 word vocabulary, and are using an LSTM with 800-dimensional activations a<t>a^{< t>}a<t>. What is the dimension of Γu\Gamma_uΓu​ at each time step?

8. True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the Γr\Gamma_rΓr​ i.e., setting Γr=1\Gamma_r = 1Γr​=1 always.

9. True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to 1−Γu1 - \Gamma_u1−Γu​ and Γu\Gamma_uΓu​.

1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the $l^{th}$ word in the $k^{th}$ training example?

7. Suppose you are training an LSTM. You have an 80000 word vocabulary, and are using an LSTM with 800-dimensional activations $a^{< t>}$ . What is the dimension of $\Gamma_u$ at each time step?

8. True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the $\Gamma_r$ i.e., setting $\Gamma_r = 1$ always.

9. True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to $1 - \Gamma_u$ and $\Gamma_u$ .