CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
leechanwoo-kor

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: leechanwoo-kor/coursera
Path: blob/main/deep-learning-specialization/course-2-deep-neural-network/Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks.md
Views: 34198

Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks

1. If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?

  • True

  • False

2. If it is only possible to tune two parameters from the following due to limited computational resources. Which two would you choose?

  • α\alpha

📌 This might be the hyperparameter that most impacts the results of a model.

  • The β\beta parameter of the momentum in gradient descent.

📌 This hyperparameter can increase the speed of convergence of the training, thus is worth tuning.

  • β1,β2\beta_1,\beta_2 in Adam.

  • ϵ\epsilon in Adam.

3. Even if enough computational power is available for hyperparameter tuning, it is always better to babysit one model ("Panda" strategy), since this will result in a more custom model. True/False?

  • True

  • False

📌 Although it is possible to create good models using the "Panda" strategy, obtaining better results is more likely using a "caviar" strategy due to the number of tests and the nature of the deep learning process of ideas, code, and experiment.

  • r = np.random.rand()
    beta = 1 - 10 ** (- r - 1)

  • ...

5. Once good values of hyperparameters have been found, those values should be changed if new data is added or a change in computational power occurs. True/False?

  • True

  • False

📌 The choice of some hyperparameters such as the batch size, depends on conditions such as hardware and quantity of data.

6. When using batch normalization it is OK to drop the parameter b[l]b^{[l]} from the forward propagation since it will be subtracted out when we compute z~normalize[l]=β[l]z^[l]+γ[l]\tilde{z}^{[l]}_{\text{normalize}} = \beta^{[l]}\hat{z}^{[l]} + \gamma^{[l]}. True/False?

  • True

  • False

📌 Since in the normalization process the values of z[l]z^{[l]} are re-centered at the origin, it is irrelevant to add the b[l]b^{[l]} parameter.

7. In the normalization formula znorm(i)=z(i)−μσ2+εz_{norm}^{(i)} = \frac{z^{(i)} - \mu}{\sqrt{\sigma^2 + \varepsilon}} , why do we use epsilon?

  • In case μ\mu is too small

  • To speed up convergence

  • To have a more accurate normalization

  • To aviod division by zero

8. Which of the following is true about batch normalization?

  • The parameters γ[l]\gamma^{[l]} and β[l]\beta^{[l]} set the mean and variance of zˉ[l]\bar{z}^{[l]}.

  • znorm(i)=z(i)−μσ2z_{norm}^{(i)} = \dfrac{z^{(i)} - \mu}{\sqrt{\sigma^2}}

  • The parameters γ[l]\gamma^{[l]} and β[l]\beta^{[l]} can be learned only using gain gradient descent.

  • The optimal values to use for γ[l]\gamma^{[l]} and β[l]\beta^{[l]} are γ[l]=σ2+ε\gamma^{[l]}=\sqrt{\sigma^2 + \varepsilon} and β[l]=μ\beta^{[l]}=\mu

9. After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should:

  • If you implemented Batch Norm on mini-batches of (say) 256 examples, then to evaluate on one test example, duplicate that example 256 times so that you're working with a mini-batch the same size as during training.

  • Use the most recent mini-batch's values of μ\mu and σ2\sigma^2 to perform the needed normalizations.

  • Skip the step where you normalize using μ\mu and σ2\sigma^2 since a single test example cannot be normalized.

  • Perform the needed normalizations, use μ\mu and σ2\sigma^2 estimated using an exponentially weighted average across mini-batches seen during training.

10. If a project is open-source, it is a guarantee that it will remain open source in the long run and will never be modified to benefit only one company. True/False?

  • True

  • False

📌 To ensure that a project will remain open source in the long run it must have a good governance body too.