CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
leechanwoo-kor

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: leechanwoo-kor/coursera
Path: blob/main/deep-learning-specialization/course-5-sequence-models/Week 2 Quiz - Latural Language Processing & Word Embeddings.md
Views: 34203

Week 2 Quiz - Latural Language Processing & Word Embeddings


1. True/False: Suppose you learn a word embedding for a vocabulary of 20000 words. Then the embedding vectors could be 1000 dimensional, so as to capture the full range of variation and meaning in those words.

  • True

  • False

📌 The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors range between 50 and 1000.


2. What is t-SNE?

  • A non-linear dimensionality reduction technique

  • ...


3. Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

x (input text)y (happy?)
Having a great time!1
I'm sad it's raining.0
I'm feeling awesome!1

Even if the word “wonderful” does not appear in your small training set, what label might be reasonably expected for the input text “I feel wonderful!”?

  • y=1

  • y=0

📌 Yes, word vectors empower your model with an incredible ability to generalize. The vector for “wonderful” would contain a negative/unhappy connotation which will probably make your model classify the sentence as a "1”.


4. Which of these equations do you think should hold for a good word embedding? (Check all that apply)

  • emanekingequeenewomane_{man} - e_{king} \approx e_{queen} - e_{woman}

  • emanewomanekingequeene_{man} - e_{woman} \approx e_{king} - e_{queen}

  • emanekingewomanequeene_{man} - e_{king} \approx e_{woman} - e_{queen}

  • emanewomanequeenekinge_{man} - e_{woman} \approx e_{queen} - e_{king}


5. Let EE be an embedding matrix, and let o1234o_{1234} be a one-hot vector corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call Eo1234E * o_{1234} in Python?

  • The correct formula is ETo1234E^T * o_{1234}

  • This doesn't handle unknown words (<UNK>).

  • None of the above: calling the Python snippet as described above is fine.

  • It is computationally wasteful.

📌 The element-wise multiplication will be extremely inefficient.


6. When learning word embeddings, we create an artificial task of estimating P(targetcontext)P(target \mid context). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.

  • True

  • False


7. In the word2vec algorithm, you estimate P(tc)P(t \mid c), where tt is the target word and cc is a context word. How are tt and cc chosen from the training set? Pick the best answer.

  • cc is a sequence of several words immediately before tt

  • cc is the one word that comes immediately before tt

  • cc is the sequence of all the words in the sentence before tt

  • cc and tt are chosen to be nearby words.


8. Suppose you have a 10000 word vocabulary, and are learning 100-dimensional word embeddings. The word2vec model uses the following softmax function:

P(tc)=eθtTect=110000eθtTecP(t \mid c) = \frac{e^{\theta_t^T e_c}}{\sum\nolimits_{t'=1}^{10000} e^{\theta_t^T e_c}}

Which of these statements are correct? Check all that apply.

  • After training, we should expect θt\theta_t to be very close to ece_c when tt and cc are the same word.

  • θt\theta_t and ece_c are both trained with an optimization algorithm.

  • θt\theta_t and ece_c are both 100 dimensional vectors.

  • θt\theta_t and ece_c are both 10000 dimensional vectors.


9. Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:

mini=110,000j=110,000f(Xij)(θiTej+bi+bjlogXij)2\min \sum\nolimits_{i=1}^{10,000} \sum\nolimits_{j=1}^{10,000} f(X_{ij}) (\theta_i^T e_j + b_i + b_j’ - log X_{ij})^2

True/False: θi\theta_i and eje_j should be initialized to 0 at the beginning of training.

  • True

  • False

📌 θi\theta_i and eje_j should be initialized randomly at the beginning of training.


10. You have trained word embeddings using a text dataset of s1s_1 words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of s2s_2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstances would you expect the word embeddings to be helpful?

  • s1<<s2s_1 << s_2

  • s1>>s2s_1 >> s_2

📌 s1s_1 should transfer to s2s_2