Path: blob/main/24_machine_learning_compilation_deployment_implementation.ipynb
42 views
Lecture 24: Machine Learning Compiler and Deployment
In this lecture, we will walk you through some example usage of the machine learning compiler Apache TVM. To learn more, checkout https://tvm.apache.org/
The content of this lecture is adapted from TVM's tutorials.
Install package
To get started, we need to obtain a version of TVM. For quick demo purpose we will use the following command to install a latest version of the TVM unity compiler and related language model dependenchy solution
Looking in links: https://mlc.ai/wheels
Collecting mlc-ai-nightly-cu121
Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_ai_nightly_cu121-0.18.dev184-cp310-cp310-manylinux_2_28_x86_64.whl (1115.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 GB 1.2 MB/s eta 0:00:00
Collecting mlc-llm-nightly-cu121
Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_llm_nightly_cu121-0.18.dev33-cp310-cp310-manylinux_2_28_x86_64.whl (167.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.5/167.5 MB 6.1 MB/s eta 0:00:00
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (24.2.0)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (3.1.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (4.4.2)
Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (0.4.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (1.26.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (24.2)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (5.9.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (1.13.1)
Requirement already satisfied: tornado in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (6.3.3)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (4.12.2)
Collecting fastapi (from mlc-llm-nightly-cu121)
Downloading fastapi-0.115.5-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn (from mlc-llm-nightly-cu121)
Downloading uvicorn-0.32.1-py3-none-any.whl.metadata (6.6 kB)
Collecting shortuuid (from mlc-llm-nightly-cu121)
Downloading shortuuid-1.0.13-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.5.1+cu121)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (0.4.5)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.32.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (4.66.6)
Collecting tiktoken (from mlc-llm-nightly-cu121)
Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Requirement already satisfied: prompt-toolkit in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (3.0.48)
Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (1.54.4)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (4.46.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.2.2)
Collecting datasets (from mlc-llm-nightly-cu121)
Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (3.16.1)
Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (17.0.0)
Collecting dill<0.3.9,>=0.3.0 (from datasets->mlc-llm-nightly-cu121)
Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets->mlc-llm-nightly-cu121)
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->mlc-llm-nightly-cu121)
Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets->mlc-llm-nightly-cu121)
Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (3.11.2)
Requirement already satisfied: huggingface-hub>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (0.26.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (6.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (2024.8.30)
Collecting starlette<0.42.0,>=0.40.0 (from fastapi->mlc-llm-nightly-cu121)
Downloading starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi->mlc-llm-nightly-cu121) (2.9.2)
Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (3.7.1)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (1.9.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (0.27.2)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (0.7.1)
Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (1.3.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2024.2)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit->mlc-llm-nightly-cu121) (0.2.13)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->mlc-llm-nightly-cu121) (2024.9.11)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (3.1.4)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->mlc-llm-nightly-cu121) (1.3.0)
Requirement already satisfied: tokenizers<0.21,>=0.20 in /usr/local/lib/python3.10/dist-packages (from transformers->mlc-llm-nightly-cu121) (0.20.3)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn->mlc-llm-nightly-cu121) (8.1.7)
Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn->mlc-llm-nightly-cu121) (0.14.0)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai->mlc-llm-nightly-cu121) (1.2.2)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (2.4.3)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.3.1)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (0.2.0)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.17.2)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (4.0.3)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai->mlc-llm-nightly-cu121) (1.0.7)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->mlc-llm-nightly-cu121) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->mlc-llm-nightly-cu121) (2.23.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->mlc-llm-nightly-cu121) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->mlc-llm-nightly-cu121) (3.0.2)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 13.4 MB/s eta 0:00:00
Downloading fastapi-0.115.5-py3-none-any.whl (94 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.9/94.9 kB 7.9 MB/s eta 0:00:00
Downloading shortuuid-1.0.13-py3-none-any.whl (10 kB)
Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 50.3 MB/s eta 0:00:00
Downloading uvicorn-0.32.1-py3-none-any.whl (63 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 kB 6.7 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 12.2 MB/s eta 0:00:00
Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 16.6 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 13.5 MB/s eta 0:00:00
Downloading starlette-0.41.3-py3-none-any.whl (73 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.2/73.2 kB 7.3 MB/s eta 0:00:00
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 18.4 MB/s eta 0:00:00
Installing collected packages: xxhash, uvicorn, shortuuid, fsspec, dill, tiktoken, starlette, multiprocess, mlc-ai-nightly-cu121, fastapi, datasets, mlc-llm-nightly-cu121
Attempting uninstall: fsspec
Found existing installation: fsspec 2024.10.0
Uninstalling fsspec-2024.10.0:
Successfully uninstalled fsspec-2024.10.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
Successfully installed datasets-3.1.0 dill-0.3.8 fastapi-0.115.5 fsspec-2024.9.0 mlc-ai-nightly-cu121-0.18.dev184 mlc-llm-nightly-cu121-0.18.dev33 multiprocess-0.70.16 shortuuid-1.0.13 starlette-0.41.3 tiktoken-0.8.0 uvicorn-0.32.1 xxhash-3.5.0
Loop-level representation and transformations
Let us start with a vector add example. the follow code snippet allows us to create a vector add code, and store it in a container called IRModule.
An IRModule contains a collection of low-level functions, we can use the script function to inspect the functions inside an IRModule.
Build and run
We can turn the programs in an IRModule to runnable functions by calling a build function.
After build, mod contains a collection of runnable functions. We can retrieve each function by its name.
To invoke the function, we can create three NDArrays in the tvm runtime, and then invoke the generated function.
Transform the code
The IRModule is the central data structure for program optimization, which can be transformed by a helper class called Schedule. A schedule contains multiple primitive methods to interactively transform the program. Each primitive transforms the program in certain ways to bring additional performance optimizations.
Let us try to transform the module, we can do it by creating a Schedule instance.
Transforming a matrix multiplication program
In the above example, we showed how to transform an vector add. Now let us try to apply that to a slightly more complicated program(matrix multiplication).
We can transform the loop access pattern to make it more cache friendly. Let us use the following schedule.
Try to change the value of bn to see what performance you can get. In pratice, we will leverage an automated system to search over a set of possible transfromations to find an optimal one.
There are other optimizations that can be applied here, such as vectorization, parallelization and data layout optimization. Please checkout
End to end model deployment
Finally, let us walk through an example flow for an end to end model deployment.