Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/deep-learning-specialization/course-5-sequence-models/Week 2 Quiz - Latural Language Processing & Word Embeddings.md
Views: 34203
Week 2 Quiz - Latural Language Processing & Word Embeddings
1. True/False: Suppose you learn a word embedding for a vocabulary of 20000 words. Then the embedding vectors could be 1000 dimensional, so as to capture the full range of variation and meaning in those words.
True
False
📌 The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors range between 50 and 1000.
2. What is t-SNE?
A non-linear dimensionality reduction technique
...
3. Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.
x (input text) | y (happy?) |
---|---|
Having a great time! | 1 |
I'm sad it's raining. | 0 |
I'm feeling awesome! | 1 |
Even if the word “wonderful” does not appear in your small training set, what label might be reasonably expected for the input text “I feel wonderful!”?
y=1
y=0
📌 Yes, word vectors empower your model with an incredible ability to generalize. The vector for “wonderful” would contain a negative/unhappy connotation which will probably make your model classify the sentence as a "1”.
4. Which of these equations do you think should hold for a good word embedding? (Check all that apply)
5. Let be an embedding matrix, and let be a one-hot vector corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call in Python?
The correct formula is
This doesn't handle unknown words (<UNK>).
None of the above: calling the Python snippet as described above is fine.
It is computationally wasteful.
📌 The element-wise multiplication will be extremely inefficient.
6. When learning word embeddings, we create an artificial task of estimating . It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.
True
False
7. In the word2vec algorithm, you estimate , where is the target word and is a context word. How are and chosen from the training set? Pick the best answer.
is a sequence of several words immediately before
is the one word that comes immediately before
is the sequence of all the words in the sentence before
and are chosen to be nearby words.
8. Suppose you have a 10000 word vocabulary, and are learning 100-dimensional word embeddings. The word2vec model uses the following softmax function:
Which of these statements are correct? Check all that apply.
After training, we should expect to be very close to when and are the same word.
and are both trained with an optimization algorithm.
and are both 100 dimensional vectors.
and are both 10000 dimensional vectors.
9. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:
True/False: and should be initialized to 0 at the beginning of training.
True
False
📌 and should be initialized randomly at the beginning of training.
10. You have trained word embeddings using a text dataset of words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstances would you expect the word embeddings to be helpful?
📌 should transfer to