GitHub Repository: dlsyscourse/public_notebooks
Path: blob/main/24_machine_learning_compilation_deployment_implementation.ipynb
⁴² views

Kernel: Python 3

Lecture 24: Machine Learning Compiler and Deployment

In this lecture, we will walk you through some example usage of the machine learning compiler Apache TVM. To learn more, checkout https://tvm.apache.org/

The content of this lecture is adapted from TVM's tutorials.

Install package

To get started, we need to obtain a version of TVM. For quick demo purpose we will use the following command to install a latest version of the TVM unity compiler and related language model dependenchy solution

In [2]:

!pip install --pre  mlc-ai-nightly-cu121 mlc-llm-nightly-cu121 -f https://mlc.ai/wheels

Out[2]:

Looking in links: https://mlc.ai/wheels
Collecting mlc-ai-nightly-cu121
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_ai_nightly_cu121-0.18.dev184-cp310-cp310-manylinux_2_28_x86_64.whl (1115.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 GB 1.2 MB/s eta 0:00:00
Collecting mlc-llm-nightly-cu121
  Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_llm_nightly_cu121-0.18.dev33-cp310-cp310-manylinux_2_28_x86_64.whl (167.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.5/167.5 MB 6.1 MB/s eta 0:00:00
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (24.2.0)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (3.1.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (4.4.2)
Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (0.4.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (1.26.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (24.2)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (5.9.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (1.13.1)
Requirement already satisfied: tornado in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (6.3.3)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu121) (4.12.2)
Collecting fastapi (from mlc-llm-nightly-cu121)
  Downloading fastapi-0.115.5-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn (from mlc-llm-nightly-cu121)
  Downloading uvicorn-0.32.1-py3-none-any.whl.metadata (6.6 kB)
Collecting shortuuid (from mlc-llm-nightly-cu121)
  Downloading shortuuid-1.0.13-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.5.1+cu121)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (0.4.5)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.32.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (4.66.6)
Collecting tiktoken (from mlc-llm-nightly-cu121)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Requirement already satisfied: prompt-toolkit in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (3.0.48)
Requirement already satisfied: openai in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (1.54.4)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (4.46.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from mlc-llm-nightly-cu121) (2.2.2)
Collecting datasets (from mlc-llm-nightly-cu121)
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (3.16.1)
Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (17.0.0)
Collecting dill<0.3.9,>=0.3.0 (from datasets->mlc-llm-nightly-cu121)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets->mlc-llm-nightly-cu121)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->mlc-llm-nightly-cu121)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets->mlc-llm-nightly-cu121)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (3.11.2)
Requirement already satisfied: huggingface-hub>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (0.26.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets->mlc-llm-nightly-cu121) (6.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->mlc-llm-nightly-cu121) (2024.8.30)
Collecting starlette<0.42.0,>=0.40.0 (from fastapi->mlc-llm-nightly-cu121)
  Downloading starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi->mlc-llm-nightly-cu121) (2.9.2)
Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (3.7.1)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (1.9.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (0.27.2)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (0.7.1)
Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai->mlc-llm-nightly-cu121) (1.3.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas->mlc-llm-nightly-cu121) (2024.2)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit->mlc-llm-nightly-cu121) (0.2.13)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->mlc-llm-nightly-cu121) (2024.9.11)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (3.1.4)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->mlc-llm-nightly-cu121) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->mlc-llm-nightly-cu121) (1.3.0)
Requirement already satisfied: tokenizers<0.21,>=0.20 in /usr/local/lib/python3.10/dist-packages (from transformers->mlc-llm-nightly-cu121) (0.20.3)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn->mlc-llm-nightly-cu121) (8.1.7)
Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.10/dist-packages (from uvicorn->mlc-llm-nightly-cu121) (0.14.0)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai->mlc-llm-nightly-cu121) (1.2.2)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (2.4.3)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.3.1)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (0.2.0)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (1.17.2)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->mlc-llm-nightly-cu121) (4.0.3)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx<1,>=0.23.0->openai->mlc-llm-nightly-cu121) (1.0.7)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->mlc-llm-nightly-cu121) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->mlc-llm-nightly-cu121) (2.23.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->mlc-llm-nightly-cu121) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->mlc-llm-nightly-cu121) (3.0.2)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 13.4 MB/s eta 0:00:00
Downloading fastapi-0.115.5-py3-none-any.whl (94 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.9/94.9 kB 7.9 MB/s eta 0:00:00
Downloading shortuuid-1.0.13-py3-none-any.whl (10 kB)
Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 50.3 MB/s eta 0:00:00
Downloading uvicorn-0.32.1-py3-none-any.whl (63 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 kB 6.7 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 12.2 MB/s eta 0:00:00
Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 16.6 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 13.5 MB/s eta 0:00:00
Downloading starlette-0.41.3-py3-none-any.whl (73 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.2/73.2 kB 7.3 MB/s eta 0:00:00
Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 18.4 MB/s eta 0:00:00
Installing collected packages: xxhash, uvicorn, shortuuid, fsspec, dill, tiktoken, starlette, multiprocess, mlc-ai-nightly-cu121, fastapi, datasets, mlc-llm-nightly-cu121
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2024.10.0
    Uninstalling fsspec-2024.10.0:
      Successfully uninstalled fsspec-2024.10.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
Successfully installed datasets-3.1.0 dill-0.3.8 fastapi-0.115.5 fsspec-2024.9.0 mlc-ai-nightly-cu121-0.18.dev184 mlc-llm-nightly-cu121-0.18.dev33 multiprocess-0.70.16 shortuuid-1.0.13 starlette-0.41.3 tiktoken-0.8.0 uvicorn-0.32.1 xxhash-3.5.0

Loop-level representation and transformations

Let us start with a vector add example. the follow code snippet allows us to create a vector add code, and store it in a container called IRModule.

In [3]:

import tvm
from tvm.ir.module import IRModule
from tvm.script import tir as T
import numpy as np

In [4]:

def lnumpy_add(a, b, c):
    for i in range(128):
        c[i] = a[i] + b[i]

In [5]:

from tvm import te

A = te.placeholder(shape=(128,), dtype="float32", name="A")
B = te.placeholder(shape=(128,), dtype="float32", name="B")
C = te.compute((128,), lambda i: A[i] + B[i], name="C")
func = te.create_prim_func([A, B, C])
func = func.with_attr("global_symbol", "main")
ir_module = IRModule({"main": func})

An IRModule contains a collection of low-level functions, we can use the script function to inspect the functions inside an IRModule.

In [6]:

ir_module.show()

Out[6]:

Build and run

We can turn the programs in an IRModule to runnable functions by calling a build function.

In [7]:

rt_mod = tvm.build(ir_module, target="llvm")  # The module for CPU backends.
print(type(rt_mod))

Out[7]:

<class 'tvm.driver.build_module.OperatorModule'>

After build, mod contains a collection of runnable functions. We can retrieve each function by its name.

In [8]:

func = rt_mod["main"]

In [9]:

func

Out[9]:

<tvm.runtime.packed_func.PackedFunc at 0x7bb3e23b3510>

In [10]:

a = tvm.nd.array(np.arange(128, dtype="float32"))

In [11]:

b = tvm.nd.array(np.ones(128, dtype="float32"))

In [12]:

c = tvm.nd.empty((128,), dtype="float32")

In [13]:

Out[13]:

<tvm.nd.NDArray shape=(128,), cpu(0)>
array([          nan,           nan, 0.0000000e+00, 0.0000000e+00,
       1.4012985e-45, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       7.3384490e+21, 3.1205516e-41, 0.0000000e+00, 0.0000000e+00,
       4.4841551e-44, 0.0000000e+00,           nan, 4.2038954e-45,
       1.5694543e-43, 0.0000000e+00, 1.5694543e-43, 0.0000000e+00,
       1.1112200e-31, 3.1198509e-41, 7.3384760e+21, 3.1205516e-41,
       0.0000000e+00, 5.6051939e-45, 1.1479437e-40, 0.0000000e+00,
       7.3039829e+21, 3.1205516e-41, 7.3241500e+21, 3.1205516e-41,
       7.3040279e+21, 3.1205516e-41, 7.3042531e+21, 3.1205516e-41,
       7.3242671e+21, 3.1205516e-41, 8.9683102e-44, 1.4012985e-45,
       1.4012985e-45, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       0.0000000e+00, 0.0000000e+00, 7.4128689e-43, 0.0000000e+00,
       3.3631163e-43, 0.0000000e+00, 1.7936620e-43, 0.0000000e+00,
       1.1511097e-31, 3.1198509e-41, 7.2055432e+21, 3.1205516e-41,
       8.1226202e+21, 3.1205516e-41, 7.2058044e+21, 3.1205516e-41,
       7.3385615e+21, 3.1205516e-41, 1.6815582e-44, 0.0000000e+00,
       2.9757554e+29, 1.7284442e+28, 1.8177100e+31, 3.9819297e-41,
       4.5317422e+02, 4.4376320e-41, 0.0000000e+00, 0.0000000e+00,
       4.5317422e+02, 4.4376320e-41, 0.0000000e+00, 0.0000000e+00,
                 nan, 3.1205516e-41, 7.6003558e+21, 3.1205516e-41,
       7.2919313e+21, 3.1205516e-41, 4.7223758e-43, 0.0000000e+00,
       7.4904229e+21, 3.1205516e-41, 4.4726599e+21, 3.1205516e-41,
       7.6003558e+21, 3.1205516e-41, 0.0000000e+00, 0.0000000e+00,
                 nan,           nan, 0.0000000e+00, 0.0000000e+00,
       1.4012985e-45, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       7.3386561e+21, 3.1205516e-41, 0.0000000e+00, 0.0000000e+00,
       4.4841551e-44, 0.0000000e+00, 1.1210388e-43, 0.0000000e+00,
       1.1157790e-31, 3.1198509e-41, 1.7536104e-02, 4.4376320e-41,
       1.4012985e-45, 0.0000000e+00, 7.3386831e+21, 3.1205516e-41,
       0.0000000e+00, 5.6051939e-45, 0.0000000e+00, 0.0000000e+00,
       0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
       2.2420775e-43, 0.0000000e+00, 1.1210388e-43, 0.0000000e+00],
      dtype=float32)

To invoke the function, we can create three NDArrays in the tvm runtime, and then invoke the generated function.

In [14]:

func(a, b, c)

In [15]:

print(a)
print(b)
print(c)

Out[15]:

[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
 15.  16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.
 29.  30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.
 43.  44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.
 57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
 71.  72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.
 85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111.
113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125.
127.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1.]
[  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
 16.  17.  18.  19.  20.  21.  22.  23.  24.  25.  26.  27.  28.
 30.  31.  32.  33.  34.  35.  36.  37.  38.  39.  40.  41.  42.
 44.  45.  46.  47.  48.  49.  50.  51.  52.  53.  54.  55.  56.
 58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.  70.
 72.  73.  74.  75.  76.  77.  78.  79.  80.  81.  82.  83.  84.
 86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.  98.
100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112.
114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126.
128.]

Transform the code

The IRModule is the central data structure for program optimization, which can be transformed by a helper class called Schedule. A schedule contains multiple primitive methods to interactively transform the program. Each primitive transforms the program in certain ways to bring additional performance optimizations.

Let us try to transform the module, we can do it by creating a Schedule instance.

In [32]:

sch = tvm.tir.Schedule(ir_module)

In [33]:

sch.mod.show()

Out[33]:

In [34]:

blockC = sch.get_block("C")
i, = sch.get_loops(blockC)

In [35]:

i0, i1 = sch.split(i, factors=[None, 8])

In [36]:

sch.mod.show()

Out[36]:

In [37]:

sch.reorder(i1, i0)

In [38]:

sch.mod.show()

Out[38]:

In [39]:

sch.reorder(i0, i1)

In [40]:

sch.mod.show()

Out[40]:

In [41]:

sch.parallel(i0)
sch.vectorize(i1)

In [42]:

sch.mod.show()

Out[42]:

Transforming a matrix multiplication program

In the above example, we showed how to transform an vector add. Now let us try to apply that to a slightly more complicated program(matrix multiplication).

In [43]:

M = 1024
K = 1024
N = 1024

# The default tensor type in tvm
dtype = "float32"

target = "llvm"
dev = tvm.device(target, 0)

# Algorithm
k = te.reduce_axis((0, K), "k")
A = te.placeholder((M, K), name="A")
B = te.placeholder((K, N), name="B")
C = te.compute((M, N), lambda m, n: te.sum(A[m, k] * B[k, n], axis=k), name="C")

# Default schedule
func = te.create_prim_func([A, B, C])
func = func.with_attr("global_symbol", "main")
ir_module = IRModule({"main": func})
ir_module.show()

Out[43]:

In [44]:

func = tvm.build(ir_module, target="llvm")  # The module for CPU backends.
a_np = np.random.rand(M, K).astype(dtype)
b_np = np.random.rand(K, N).astype(dtype)
a = tvm.nd.array(a_np, dev)
b = tvm.nd.array(b_np, dev)
c = tvm.nd.array(np.zeros((M, N), dtype=dtype), dev)
func(a, b, c)

evaluator = func.time_evaluator("main", dev, number=3)
print("Baseline time cost %g sec" % evaluator(a, b, c).mean)

Out[44]:

Baseline time cost 3.1031 sec

In [45]:

c.numpy() - a_np @b_np

Out[45]:

array([[ 9.1552734e-05,  1.5258789e-05,  9.1552734e-05, ...,
         3.0517578e-05,  1.0681152e-04, -6.1035156e-05],
       [ 1.2207031e-04,  1.0681152e-04, -6.1035156e-05, ...,
        -7.6293945e-05,  6.1035156e-05, -3.0517578e-05],
       [ 3.2043457e-04, -3.0517578e-05,  3.0517578e-05, ...,
        -9.1552734e-05,  3.0517578e-05, -7.6293945e-05],
       ...,
       [ 6.1035156e-05,  0.0000000e+00,  3.0517578e-05, ...,
         0.0000000e+00,  1.0681152e-04, -2.7465820e-04],
       [ 1.5258789e-05, -6.1035156e-05, -1.8310547e-04, ...,
        -1.5258789e-04, -2.2888184e-04, -6.1035156e-05],
       [ 9.1552734e-05, -4.5776367e-05,  6.1035156e-05, ...,
         6.1035156e-05, -1.5258789e-05,  3.0517578e-05]], dtype=float32)

In [46]:

ir_module.show()

Out[46]:

We can transform the loop access pattern to make it more cache friendly. Let us use the following schedule.

In [47]:

def transform(sch, tile_m, tile_n):
    block_C = sch.get_block("C")
    m, n, k = sch.get_loops(block_C)
    mo, mi = sch.split(m, [None, tile_m])
    no, ni = sch.split(n, [None, tile_n])
    sch.reorder(mo, no, k, mi, ni)
    return sch
sch = tvm.tir.Schedule(ir_module)
sch = transform(sch, 8, 8)
sch.mod.show()

Out[47]:

In [48]:

sch = tvm.tir.Schedule(ir_module)
sch = transform(sch, 32, 32)
sch.mod.show()
mod = tvm.build(sch.mod, target="llvm")
new_eval = mod.time_evaluator("main", number=3, dev=tvm.cpu())
print("Transformed time cost %g sec" % new_eval(a, b, c).mean)

Out[48]:

Transformed time cost 0.286962 sec

In [ ]:

Try to change the value of bn to see what performance you can get. In pratice, we will leverage an automated system to search over a set of possible transfromations to find an optimal one.

There are other optimizations that can be applied here, such as vectorization, parallelization and data layout optimization. Please checkout

End to end model deployment

Finally, let us walk through an example flow for an end to end model deployment.

In [49]:

!git lfs install

Out[49]:

Git LFS initialized.

In [1]:

from mlc_llm import MLCEngine

# Create engine
model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)

In [2]:

# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": "Tell me about cmu?"}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

Out[2]:

Carnegie Mellon University (CMU) is a private research university with a rich history and a strong reputation for excellence in various fields. Here are some key facts about CMU:

**Location**: CMU is located in Pittsburgh, Pennsylvania, a city known for its cultural and economic revitalization.

**History**: Founded in 1900 as the Carnegie Technical Schools, CMU was established by Andrew Carnegie, a Scottish-American industrialist and philanthropist. The university has since grown to become one of the top-ranked institutions in the world.

**Academics**: CMU offers a wide range of undergraduate and graduate programs across seven schools:

1. School of Computer Science
2. School of Engineering
3. Tepper School of Business
4. Dietrich College of Humanities and Social Science
5. Mellon College of Science
6. College of Fine Arts
7. School of Public Policy and Management

**Research**: CMU is known for its interdisciplinary research, with a strong focus on artificial intelligence, machine learning, robotics, and data science. The university is home to several world-renowned research centers, including the Robotics Institute, the Human-Computer Interaction Institute, and the Language Technology Institute.

**Rankings**: CMU consistently ranks among the top universities globally, according to various rankings:

* Ranked #25 in the 2022 US News & World Report Best Colleges rankings
* Ranked #24 in the 2022 QS World University Rankings
* Ranked #10 in the 2022 Times Higher Education World University Rankings

**Student life**: CMU has a diverse student body of around 13,000 students, with a strong sense of community and a wide range of extracurricular activities, including arts, athletics, and cultural organizations.

**Alumni**: CMU has a long history of producing successful and influential alumni, including notable figures in technology, business, and the arts.

Overall, Carnegie Mellon University is a world-class institution known for its academic excellence, innovative research, and strong sense of community.