CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
Path: blob/main/24_machine_learning_compilation_deployment_implementation.ipynb
Views: 35
Lecture 24: Machine Learning Compiler and Deployment
In this lecture, we will walk you through some example usage of the machine learning compiler Apache TVM. To learn more, checkout https://tvm.apache.org/
The content of this lecture is adapted from TVM's tutorials.
Install package
To get started, we need to obtain a version of TVM. For quick demo purpose we will use the following command to install a latest version of the TVM unity compiler and related language model dependenchy solution
Looking in links: https://mlc.ai/wheels
Collecting mlc-ai-nightly-cu118
Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_ai_nightly_cu118-0.12.dev1880-cp310-cp310-manylinux_2_28_x86_64.whl (544.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 544.5/544.5 MB 2.6 MB/s eta 0:00:00
Collecting mlc-chat-nightly-cu118
Downloading https://github.com/mlc-ai/package/releases/download/v0.9.dev0/mlc_chat_nightly_cu118-0.1.dev646-cp310-cp310-manylinux_2_28_x86_64.whl (60.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.4/60.4 MB 11.5 MB/s eta 0:00:00
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (23.1.0)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (2.2.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (4.4.2)
Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (0.2.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (1.23.5)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (5.9.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (1.11.3)
Requirement already satisfied: tornado in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (6.3.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from mlc-ai-nightly-cu118) (4.5.0)
Collecting fastapi (from mlc-chat-nightly-cu118)
Downloading fastapi-0.104.1-py3-none-any.whl (92 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.9/92.9 kB 2.3 MB/s eta 0:00:00
Collecting uvicorn (from mlc-chat-nightly-cu118)
Downloading uvicorn-0.24.0.post1-py3-none-any.whl (59 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.7/59.7 kB 9.1 MB/s eta 0:00:00
Collecting shortuuid (from mlc-chat-nightly-cu118)
Downloading shortuuid-1.0.11-py3-none-any.whl (10 kB)
Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from fastapi->mlc-chat-nightly-cu118) (3.7.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from fastapi->mlc-chat-nightly-cu118) (1.10.13)
Collecting starlette<0.28.0,>=0.27.0 (from fastapi->mlc-chat-nightly-cu118)
Downloading starlette-0.27.0-py3-none-any.whl (66 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.0/67.0 kB 9.7 MB/s eta 0:00:00
Collecting typing-extensions (from mlc-ai-nightly-cu118)
Downloading typing_extensions-4.9.0rc1-py3-none-any.whl (32 kB)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.10/dist-packages (from uvicorn->mlc-chat-nightly-cu118) (8.1.7)
Collecting h11>=0.8 (from uvicorn->mlc-chat-nightly-cu118)
Downloading h11-0.14.0-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 8.3 MB/s eta 0:00:00
Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->mlc-chat-nightly-cu118) (3.4)
Requirement already satisfied: sniffio>=1.1 in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->mlc-chat-nightly-cu118) (1.3.0)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<4.0.0,>=3.7.1->fastapi->mlc-chat-nightly-cu118) (1.1.3)
Installing collected packages: typing-extensions, shortuuid, h11, uvicorn, starlette, mlc-ai-nightly-cu118, fastapi, mlc-chat-nightly-cu118
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.5.0
Uninstalling typing_extensions-4.5.0:
Successfully uninstalled typing_extensions-4.5.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0rc1 which is incompatible.
Successfully installed fastapi-0.104.1 h11-0.14.0 mlc-ai-nightly-cu118-0.12.dev1880 mlc-chat-nightly-cu118-0.1.dev646 shortuuid-1.0.11 starlette-0.27.0 typing-extensions-4.9.0rc1 uvicorn-0.24.0.post1
Loop-level representation and transformations
Let us start with a vector add example. the follow code snippet allows us to create a vector add code, and store it in a container called IRModule.
An IRModule contains a collection of low-level functions, we can use the script function to inspect the functions inside an IRModule.
Build and run
We can turn the programs in an IRModule to runnable functions by calling a build function.
After build, mod contains a collection of runnable functions. We can retrieve each function by its name.
To invoke the function, we can create three NDArrays in the tvm runtime, and then invoke the generated function.
Transform the code
The IRModule is the central data structure for program optimization, which can be transformed by a helper class called Schedule. A schedule contains multiple primitive methods to interactively transform the program. Each primitive transforms the program in certain ways to bring additional performance optimizations.
Let us try to transform the module, we can do it by creating a Schedule instance.
Let us first try to split the loops
We can also reorder the loops, swapping the order of i_0 and i_1
Finally, we can add hints to the program generator that we want to vectorize the inner most loop.
Transforming a matrix multiplication program
In the above example, we showed how to transform an vector add. Now let us try to apply that to a slightly more complicated program(matrix multiplication).
We can transform the loop access pattern to make it more cache friendly. Let us use the following schedule.
Try to change the value of bn to see what performance you can get. In pratice, we will leverage an automated system to search over a set of possible transfromations to find an optimal one.
There are other optimizations that can be applied here, such as vectorization, parallelization and data layout optimization. Please checkout
End to end model deployment
Finally, let us walk through an example flow for an end to end model deployment.
Cloning into 'dist/prebuilt/lib'...
remote: Enumerating objects: 389, done.
remote: Counting objects: 100% (115/115), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 389 (delta 94), reused 93 (delta 77), pack-reused 274
Receiving objects: 100% (389/389), 126.00 MiB | 14.01 MiB/s, done.
Resolving deltas: 100% (279/279), done.
Updating files: 100% (100/100), done.
Cloning into 'mlc-chat-Llama-2-7b-chat-hf-q4f16_1'...
remote: Enumerating objects: 129, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 129 (delta 0), reused 0 (delta 0), pack-reused 126
Receiving objects: 100% (129/129), 500.53 KiB | 19.25 MiB/s, done.
Filtering content: 100% (116/116), 3.53 GiB | 58.34 MiB/s, done.