Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/deep-learning-specialization/course-2-deep-neural-network/Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks.md
Views: 34198
Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks
1. If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?
True
False
2. If it is only possible to tune two parameters from the following due to limited computational resources. Which two would you choose?
📌 This might be the hyperparameter that most impacts the results of a model.
The parameter of the momentum in gradient descent.
📌 This hyperparameter can increase the speed of convergence of the training, thus is worth tuning.
in Adam.
in Adam.
3. Even if enough computational power is available for hyperparameter tuning, it is always better to babysit one model ("Panda" strategy), since this will result in a more custom model. True/False?
True
False
📌 Although it is possible to create good models using the "Panda" strategy, obtaining better results is more likely using a "caviar" strategy due to the number of tests and the nature of the deep learning process of ideas, code, and experiment.
4. If you think \betaβ (hyperparameter for momentum) is between 0.9 and 0.99, which of the following is the recommended way to sample a value for beta?
r = np.random.rand()
beta = 1 - 10 ** (- r - 1)...
5. Once good values of hyperparameters have been found, those values should be changed if new data is added or a change in computational power occurs. True/False?
True
False
📌 The choice of some hyperparameters such as the batch size, depends on conditions such as hardware and quantity of data.
6. When using batch normalization it is OK to drop the parameter from the forward propagation since it will be subtracted out when we compute . True/False?
True
False
📌 Since in the normalization process the values of are re-centered at the origin, it is irrelevant to add the parameter.
7. In the normalization formula , why do we use epsilon?
In case is too small
To speed up convergence
To have a more accurate normalization
To aviod division by zero
8. Which of the following is true about batch normalization?
The parameters and set the mean and variance of .
The parameters and can be learned only using gain gradient descent.
The optimal values to use for and are and
9. After training a neural network with Batch Norm, at test time, to evaluate the neural network on a new example you should:
If you implemented Batch Norm on mini-batches of (say) 256 examples, then to evaluate on one test example, duplicate that example 256 times so that you're working with a mini-batch the same size as during training.
Use the most recent mini-batch's values of and to perform the needed normalizations.
Skip the step where you normalize using and since a single test example cannot be normalized.
Perform the needed normalizations, use and estimated using an exponentially weighted average across mini-batches seen during training.
10. If a project is open-source, it is a guarantee that it will remain open source in the long run and will never be modified to benefit only one company. True/False?
True
False
📌 To ensure that a project will remain open source in the long run it must have a good governance body too.