CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
huggingface

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: huggingface/notebooks
Path: blob/main/course/fr/chapter2/section4_tf.ipynb
Views: 2555
Kernel: Python 3

Tokenizers (TensorFlow)

Installez la bibliothèque 🤗 Transformers pour exécuter ce notebook.

!pip install transformers[sentencepiece]
tokenized_text = "Jim Henson était marionnettiste.".split() print(tokenized_text)
from transformers import CamembertTokenizer tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("camembert-base")
tokenizer("Utiliser un Transformer est simple")
tokenizer.save_pretrained("répertoire_sur_mon_ordinateur")
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("camembert-base") sequence = "Utiliser un Transformer est simple" tokens = tokenizer.tokenize(sequence) print(tokens)
ids = tokenizer.convert_tokens_to_ids(tokens) print(ids)
decoded_string = tokenizer.decode(ids) print(decoded_string)