高性能 :OpenAI Triton Open-source GPU programming Language LINUX 环境配置
目录
- 配置triton环境
- cuda
- build-essential
- 带有pip的python环境
- 直接安装pip
- anaconda
- 安装 triton 环境
- pip install triton
- pip install torch
- 运行test示例
- vector-add.py
- launch.json
配置triton环境
cuda
- wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
- sudo sh cuda_11.0.2_450.51.05_linux.run
build-essential
sudo apt-get install build-essential
:不安装可能会有报错,RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
(tritonenv) cifi@cifi-666666:~/Downloads$ sudo apt-get install build-essential
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:cpp-9 dpkg-dev fakeroot g++ g++-9 gcc gcc-10-base gcc-9 gcc-9-base libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan5 libatomic1 libc-dev-bin libc6 libc6-dbg libc6-devlibcc1-0 libcrypt-dev libfakeroot libgcc-9-dev libgcc-s1 libgomp1 libitm1 liblsan0 libquadmath0 libstdc++-9-dev libstdc++6 libtsan0 libubsan1 linux-libc-dev make manpages-dev
Suggested packages:gcc-9-locales debian-keyring g++-multilib g++-9-multilib gcc-9-doc gcc-multilib autoconf automake libtool flex bison gcc-doc gcc-9-multilib glibc-doc libstdc++-9-doc make-doc
The following NEW packages will be installed:build-essential dpkg-dev fakeroot g++ g++-9 gcc gcc-9 libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan5 libc-dev-bin libc6-dev libcrypt-dev libfakeroot libgcc-9-devlibitm1 liblsan0 libquadmath0 libstdc++-9-dev libtsan0 libubsan1 linux-libc-dev make manpages-dev
The following packages will be upgraded:cpp-9 gcc-10-base gcc-9-base libatomic1 libc6 libc6-dbg libcc1-0 libgcc-s1 libgomp1 libstdc++6
10 upgraded, 25 newly installed, 0 to remove and 395 not upgraded.
Need to get 65.6 MB of archives.
After this operation, 157 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libc6-dbg amd64 2.31-0ubuntu9.17 [20.2 MB]
Get:2 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libatomic1 amd64 10.5.0-1ubuntu1~20.04 [9,284 B]
Get:3 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 gcc-10-base amd64 10.5.0-1ubuntu1~20.04 [20.8 kB]
Get:4 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libstdc++6 amd64 10.5.0-1ubuntu1~20.04 [501 kB]
Get:5 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libgomp1 amd64 10.5.0-1ubuntu1~20.04 [102 kB]
Get:6 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libcc1-0 amd64 10.5.0-1ubuntu1~20.04 [48.8 kB]
Get:7 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libgcc-s1 amd64 10.5.0-1ubuntu1~20.04 [41.8 kB]
Get:8 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libc6 amd64 2.31-0ubuntu9.17 [2,721 kB]
Get:9 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libc-dev-bin amd64 2.31-0ubuntu9.17 [71.8 kB]
Get:10 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 linux-libc-dev amd64 5.4.0-205.225 [1,125 kB]
Get:11 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 libcrypt-dev amd64 1:4.4.10-10ubuntu4 [104 kB]
Get:12 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libc6-dev amd64 2.31-0ubuntu9.17 [2,521 kB]
Get:13 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 cpp-9 amd64 9.4.0-1ubuntu1~20.04.2 [7,502 kB]
Get:14 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 gcc-9-base amd64 9.4.0-1ubuntu1~20.04.2 [18.9 kB]
Get:15 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libitm1 amd64 10.5.0-1ubuntu1~20.04 [26.2 kB]
Get:16 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libasan5 amd64 9.4.0-1ubuntu1~20.04.2 [2,752 kB]
Get:17 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 liblsan0 amd64 10.5.0-1ubuntu1~20.04 [835 kB]
Get:18 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libtsan0 amd64 10.5.0-1ubuntu1~20.04 [2,016 kB]
Get:19 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libubsan1 amd64 10.5.0-1ubuntu1~20.04 [785 kB]
Get:20 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libquadmath0 amd64 10.5.0-1ubuntu1~20.04 [146 kB]
Get:21 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libgcc-9-dev amd64 9.4.0-1ubuntu1~20.04.2 [2,359 kB]
Get:22 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 gcc-9 amd64 9.4.0-1ubuntu1~20.04.2 [8,276 kB]
Get:23 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 gcc amd64 4:9.3.0-1ubuntu2 [5,208 B]
Get:24 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 libstdc++-9-dev amd64 9.4.0-1ubuntu1~20.04.2 [1,722 kB]
Get:25 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 g++-9 amd64 9.4.0-1ubuntu1~20.04.2 [8,421 kB]
Get:26 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 g++ amd64 4:9.3.0-1ubuntu2 [1,604 B]
Get:27 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 make amd64 4.2.1-1.2 [162 kB]
Get:28 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 dpkg-dev all 1.19.7ubuntu3.2 [679 kB]
Get:29 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal-updates/main amd64 build-essential amd64 12.8ubuntu1.1 [4,664 B]
Get:30 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 libfakeroot amd64 1.24-1 [25.7 kB]
Get:31 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 fakeroot amd64 1.24-1 [62.6 kB]
Get:32 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 libalgorithm-diff-perl all 1.19.03-2 [46.6 kB]
Get:33 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 libalgorithm-diff-xs-perl amd64 0.04-6 [11.3 kB]
Get:34 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 libalgorithm-merge-perl all 0.08-3 [12.0 kB]
Get:35 https://mirrors.tuna.tsinghua.edu.cn/ubuntu focal/main amd64 manpages-dev all 5.05-1 [2,266 kB]
Fetched 65.6 MB in 8s (8,219 kB/s)
Extracting templates from packages: 100%
Preconfiguring packages ...
(Reading database ... 147792 files and directories currently installed.)
Preparing to unpack .../libc6-dbg_2.31-0ubuntu9.17_amd64.deb ...
Unpacking libc6-dbg:amd64 (2.31-0ubuntu9.17) over (2.31-0ubuntu9.9) ...
Preparing to unpack .../libatomic1_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libatomic1:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Preparing to unpack .../gcc-10-base_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking gcc-10-base:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Setting up gcc-10-base:amd64 (10.5.0-1ubuntu1~20.04) ...
(Reading database ... 147790 files and directories currently installed.)
Preparing to unpack .../libstdc++6_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libstdc++6:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Setting up libstdc++6:amd64 (10.5.0-1ubuntu1~20.04) ...
(Reading database ... 147790 files and directories currently installed.)
Preparing to unpack .../libgomp1_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libgomp1:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Preparing to unpack .../libcc1-0_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libcc1-0:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Preparing to unpack .../libgcc-s1_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libgcc-s1:amd64 (10.5.0-1ubuntu1~20.04) over (10.3.0-1ubuntu1~20.04) ...
Setting up libgcc-s1:amd64 (10.5.0-1ubuntu1~20.04) ...
(Reading database ... 147790 files and directories currently installed.)
Preparing to unpack .../libc6_2.31-0ubuntu9.17_amd64.deb ...
Unpacking libc6:amd64 (2.31-0ubuntu9.17) over (2.31-0ubuntu9.9) ...
Setting up libc6:amd64 (2.31-0ubuntu9.17) ...
Selecting previously unselected package libc-dev-bin.
(Reading database ... 147790 files and directories currently installed.)
Preparing to unpack .../00-libc-dev-bin_2.31-0ubuntu9.17_amd64.deb ...
Unpacking libc-dev-bin (2.31-0ubuntu9.17) ...
Selecting previously unselected package linux-libc-dev:amd64.
Preparing to unpack .../01-linux-libc-dev_5.4.0-205.225_amd64.deb ...
Unpacking linux-libc-dev:amd64 (5.4.0-205.225) ...
Selecting previously unselected package libcrypt-dev:amd64.
Preparing to unpack .../02-libcrypt-dev_1%3a4.4.10-10ubuntu4_amd64.deb ...
Unpacking libcrypt-dev:amd64 (1:4.4.10-10ubuntu4) ...
Selecting previously unselected package libc6-dev:amd64.
Preparing to unpack .../03-libc6-dev_2.31-0ubuntu9.17_amd64.deb ...
Unpacking libc6-dev:amd64 (2.31-0ubuntu9.17) ...
Preparing to unpack .../04-cpp-9_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking cpp-9 (9.4.0-1ubuntu1~20.04.2) over (9.4.0-1ubuntu1~20.04.1) ...
Preparing to unpack .../05-gcc-9-base_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking gcc-9-base:amd64 (9.4.0-1ubuntu1~20.04.2) over (9.4.0-1ubuntu1~20.04.1) ...
Selecting previously unselected package libitm1:amd64.
Preparing to unpack .../06-libitm1_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libitm1:amd64 (10.5.0-1ubuntu1~20.04) ...
Selecting previously unselected package libasan5:amd64.
Preparing to unpack .../07-libasan5_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking libasan5:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Selecting previously unselected package liblsan0:amd64.
Preparing to unpack .../08-liblsan0_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking liblsan0:amd64 (10.5.0-1ubuntu1~20.04) ...
Selecting previously unselected package libtsan0:amd64.
Preparing to unpack .../09-libtsan0_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libtsan0:amd64 (10.5.0-1ubuntu1~20.04) ...
Selecting previously unselected package libubsan1:amd64.
Preparing to unpack .../10-libubsan1_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libubsan1:amd64 (10.5.0-1ubuntu1~20.04) ...
Selecting previously unselected package libquadmath0:amd64.
Preparing to unpack .../11-libquadmath0_10.5.0-1ubuntu1~20.04_amd64.deb ...
Unpacking libquadmath0:amd64 (10.5.0-1ubuntu1~20.04) ...
Selecting previously unselected package libgcc-9-dev:amd64.
Preparing to unpack .../12-libgcc-9-dev_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking libgcc-9-dev:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Selecting previously unselected package gcc-9.
Preparing to unpack .../13-gcc-9_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking gcc-9 (9.4.0-1ubuntu1~20.04.2) ...
Selecting previously unselected package gcc.
Preparing to unpack .../14-gcc_4%3a9.3.0-1ubuntu2_amd64.deb ...
Unpacking gcc (4:9.3.0-1ubuntu2) ...
Selecting previously unselected package libstdc++-9-dev:amd64.
Preparing to unpack .../15-libstdc++-9-dev_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking libstdc++-9-dev:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Selecting previously unselected package g++-9.
Preparing to unpack .../16-g++-9_9.4.0-1ubuntu1~20.04.2_amd64.deb ...
Unpacking g++-9 (9.4.0-1ubuntu1~20.04.2) ...
Selecting previously unselected package g++.
Preparing to unpack .../17-g++_4%3a9.3.0-1ubuntu2_amd64.deb ...
Unpacking g++ (4:9.3.0-1ubuntu2) ...
Selecting previously unselected package make.
Preparing to unpack .../18-make_4.2.1-1.2_amd64.deb ...
Unpacking make (4.2.1-1.2) ...
Selecting previously unselected package dpkg-dev.
Preparing to unpack .../19-dpkg-dev_1.19.7ubuntu3.2_all.deb ...
Unpacking dpkg-dev (1.19.7ubuntu3.2) ...
Selecting previously unselected package build-essential.
Preparing to unpack .../20-build-essential_12.8ubuntu1.1_amd64.deb ...
Unpacking build-essential (12.8ubuntu1.1) ...
Selecting previously unselected package libfakeroot:amd64.
Preparing to unpack .../21-libfakeroot_1.24-1_amd64.deb ...
Unpacking libfakeroot:amd64 (1.24-1) ...
Selecting previously unselected package fakeroot.
Preparing to unpack .../22-fakeroot_1.24-1_amd64.deb ...
Unpacking fakeroot (1.24-1) ...
Selecting previously unselected package libalgorithm-diff-perl.
Preparing to unpack .../23-libalgorithm-diff-perl_1.19.03-2_all.deb ...
Unpacking libalgorithm-diff-perl (1.19.03-2) ...
Selecting previously unselected package libalgorithm-diff-xs-perl.
Preparing to unpack .../24-libalgorithm-diff-xs-perl_0.04-6_amd64.deb ...
Unpacking libalgorithm-diff-xs-perl (0.04-6) ...
Selecting previously unselected package libalgorithm-merge-perl.
Preparing to unpack .../25-libalgorithm-merge-perl_0.08-3_all.deb ...
Unpacking libalgorithm-merge-perl (0.08-3) ...
Selecting previously unselected package manpages-dev.
Preparing to unpack .../26-manpages-dev_5.05-1_all.deb ...
Unpacking manpages-dev (5.05-1) ...
Setting up manpages-dev (5.05-1) ...
Setting up libalgorithm-diff-perl (1.19.03-2) ...
Setting up linux-libc-dev:amd64 (5.4.0-205.225) ...
Setting up libgomp1:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up libfakeroot:amd64 (1.24-1) ...
Setting up libc6-dbg:amd64 (2.31-0ubuntu9.17) ...
Setting up fakeroot (1.24-1) ...
update-alternatives: using /usr/bin/fakeroot-sysv to provide /usr/bin/fakeroot (fakeroot) in auto mode
Setting up make (4.2.1-1.2) ...
Setting up libquadmath0:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up libatomic1:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up libubsan1:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up libcrypt-dev:amd64 (1:4.4.10-10ubuntu4) ...
Setting up libc-dev-bin (2.31-0ubuntu9.17) ...
Setting up libalgorithm-diff-xs-perl (0.04-6) ...
Setting up libcc1-0:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up liblsan0:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up libitm1:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up gcc-9-base:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Setting up libalgorithm-merge-perl (0.08-3) ...
Setting up libtsan0:amd64 (10.5.0-1ubuntu1~20.04) ...
Setting up dpkg-dev (1.19.7ubuntu3.2) ...
Setting up libasan5:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Setting up cpp-9 (9.4.0-1ubuntu1~20.04.2) ...
Setting up libc6-dev:amd64 (2.31-0ubuntu9.17) ...
Setting up libgcc-9-dev:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Setting up gcc-9 (9.4.0-1ubuntu1~20.04.2) ...
Setting up libstdc++-9-dev:amd64 (9.4.0-1ubuntu1~20.04.2) ...
Setting up gcc (4:9.3.0-1ubuntu2) ...
Setting up g++-9 (9.4.0-1ubuntu1~20.04.2) ...
Setting up g++ (4:9.3.0-1ubuntu2) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.8ubuntu1.1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
带有pip的python环境
直接安装pip
sudo apt install python3-pip
anaconda
wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
,从网页https://www.anaconda.com/download/success下载速度可能更快bash Anaconda3-2024.02-1-Linux-x86_64.sh
conda create -n tritonenv python=3.8
conda activate tritonenv
安装 triton 环境
pip install triton
pip install triton
https://triton-lang.org/main/getting-started/installation.html
(base) cifi@cifi-666666:~/Downloads$ conda activate tritonenv
(tritonenv) cifi@cifi-666666:~/Downloads$ pip install triton
Collecting tritonDownloading triton-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting filelock (from triton)Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Downloading triton-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.4 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 8.1 kB/s eta 0:00:00
Downloading filelock-3.16.1-py3-none-any.whl (16 kB)
Installing collected packages: filelock, triton
Successfully installed filelock-3.16.1 triton-3.1.0
pip install torch
pip install torch
(tritonenv) cifi@cifi-666666:~/Downloads$ pip install torch
Collecting torchDownloading torch-2.4.1-cp38-cp38-manylinux1_x86_64.whl.metadata (26 kB)
Requirement already satisfied: filelock in /home/cifi/anaconda3/envs/tritonenv/lib/python3.8/site-packages (from torch) (3.16.1)
Collecting typing-extensions>=4.8.0 (from torch)Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch)Downloading sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)Downloading networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting jinja2 (from torch)Downloading jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch)Downloading fsspec-2025.2.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch)Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.20.5 (from torch)Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch)Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)
Collecting triton==3.0.0 (from torch)Downloading triton-3.0.0-1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch)Downloading nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)Downloading MarkupSafe-2.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->torch)Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading torch-2.4.1-cp38-cp38-manylinux1_x86_64.whl (797.1 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 797.1/797.1 MB 566.0 kB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 958.2 kB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 1.1 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 1.5 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 1.8 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 656.9/664.8 MB 867.7 kB/s eta 0:00:10Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 2.4 MB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 2.8 MB/s eta 0:00:00
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 2.2 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 2.6 MB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 3.0 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 3.0 MB/s eta 0:00:00
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
Downloading triton-3.0.0-1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 3.1 MB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading fsspec-2025.2.0-py3-none-any.whl (184 kB)
Downloading jinja2-3.1.5-py3-none-any.whl (134 kB)
Downloading networkx-3.1-py3-none-any.whl (2.1 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 3.5 MB/s eta 0:00:00
Downloading sympy-1.13.3-py3-none-any.whl (6.2 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 3.2 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 4.8 MB/s eta 0:00:00
Downloading nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 3.0 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, triton, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torchAttempting uninstall: tritonFound existing installation: triton 3.1.0Uninstalling triton-3.1.0:Successfully uninstalled triton-3.1.0
Successfully installed MarkupSafe-2.1.5 fsspec-2025.2.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.1.105 sympy-1.13.3 torch-2.4.1 triton-3.0.0 typing-extensions-4.12.2
运行test示例
vector-add.py
# https://github.com/triton-lang/triton/blob/main/python/tutorials/01-vector-add.py
import torch
import triton
import triton.language as tl@triton.jit
def add_kernel(x_ptr, # *Pointer* to first input vector. 指向第一个输入向量的指针。y_ptr, # *Pointer* to second input vector. 指向第二个输入向量的指针。output_ptr, # *Pointer* to output vector. 指向输出向量的指针。n_elements, # Size of the vector. 向量的大小。BLOCK_SIZE: tl.constexpr, # Number of elements each program should process. 每个程序应处理的元素数量。# NOTE: `constexpr` so it can be used as a shape value. 注意:`constexpr` 因此它可以用作形状值。):# There are multiple 'programs' processing different data. We identify which program# 有多个“程序”处理不同的数据。需要确定是哪一个程序:pid = tl.program_id(axis=0) # We use a 1D launch grid so axis is 0. 使用 1D 启动网格,因此轴为 0。# This program will process inputs that are offset from the initial data.# 该程序将处理相对初始数据偏移的输入。# For instance, if you had a vector of length 256 and block_size of 64, the programs would each access the elements [0:64, 64:128, 128:192, 192:256].# 例如,如果有一个长度为 256, 块大小为 64 的向量,程序将各自访问 [0:64, 64:128, 128:192, 192:256] 的元素。# Note that offsets is a list of pointers:# 注意 offsets 是指针列表:block_start = pid * BLOCK_SIZEoffsets = block_start + tl.arange(0, BLOCK_SIZE)# Create a mask to guard memory operations against out-of-bounds accesses.# 创建掩码以防止内存操作超出边界访问。mask = offsets < n_elements# Load x and y from DRAM, masking out any extra elements in case the input is not a multiple of the block size.# 从 DRAM 加载 x 和 y,如果输入不是块大小的整数倍,则屏蔽掉任何多余的元素。x = tl.load(x_ptr + offsets, mask=mask)y = tl.load(y_ptr + offsets, mask=mask)output = x + y# Write x + y back to DRAM.# 将 x + y 写回 DRAM。tl.store(output_ptr + offsets, output, mask=mask)def add(x: torch.Tensor, y: torch.Tensor):# We need to preallocate the output.# 需要预分配输出。output = torch.empty_like(x)assert x.is_cuda and y.is_cuda and output.is_cudan_elements = output.numel()# The SPMD launch grid denotes the number of kernel instances that run in parallel.# SPMD 启动网格表示并行运行的内核实例的数量。# It is analogous to CUDA launch grids. It can be either Tuple[int], or Callable(metaparameters) -> Tuple[int].# 它类似于 CUDA 启动网格。它可以是 Tuple[int],也可以是 Callable(metaparameters) -> Tuple[int]。# In this case, we use a 1D grid where the size is the number of blocks:# 在这种情况下,使用 1D 网格,其中大小是块的数量:grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']), )# NOTE:# 注意:# - Each torch.tensor object is implicitly converted into a pointer to its first element.# - 每个 torch.tensor 对象都会隐式转换为其第一个元素的指针。# - `triton.jit`'ed functions can be indexed with a launch grid to obtain a callable GPU kernel.# - `triton.jit` 函数可以通过启动网格索引来获得可调用的 GPU 内核。# - Don't forget to pass meta-parameters as keywords arguments.# - 不要忘记以关键字参数传递元参数。add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)# We return a handle to z but, since `torch.cuda.synchronize()` hasn't been called, the kernel is still running asynchronously at this point.# 返回 z 的句柄,但由于 `torch.cuda.synchronize()` 尚未被调用,此时内核仍在异步运行。return outputtorch.manual_seed(0)
size = 98432
x = torch.rand(size, device='cuda')
y = torch.rand(size, device='cuda')
output_torch = x + y
output_triton = add(x, y)
print("Torch计算结果:")
print(output_torch)
print("Triton计算结果:")
print(output_triton)
print(f'Torch 和 Triton 之间的最大差异是'f'{torch.max(torch.abs(output_torch - output_triton))}')
launch.json
{// Use IntelliSense to learn about possible attributes.// Hover to view descriptions of existing attributes.// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387"version": "0.2.0","configurations": [{"name":"Python Debugger: Current File","type":"debugpy","request":"launch","program":"${file}","console":"integratedTerminal"},{"name": "Python Debugger: Current File","type": "debugpy","request": "launch","program": "${file}","python": "/home/cifi/anaconda3/envs/tritonenv/bin/python3","console": "integratedTerminal"}]
}
相关文章:

高性能 :OpenAI Triton Open-source GPU programming Language LINUX 环境配置
目录 配置triton环境cudabuild-essential带有pip的python环境直接安装pipanaconda 安装 triton 环境pip install tritonpip install torch 运行test示例vector-add.pylaunch.json 配置triton环境 cuda wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_…...

TCP 端口号为何位于首部前四个字节?协议设计的智慧与启示
知乎的一个问题很有意思:“为什么在TCP首部中要把TCP的端口号放入最开始的四个字节?” 这种问题很适合我这种搞历史的人,大年初一我给出了一个简短的解释,但仔细探究这个问题,我们将会获得 TCP/IP 被定义的过程。 文…...
HTML之JavaScript函数声明
HTML之JavaScript函数声明 1. function 函数名(){}2. var 函数名 function(){}<!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1…...
R 数组:高效数据处理的基础
R 数组:高效数据处理的基础 引言 在数据科学和统计分析领域,R 语言以其强大的数据处理和分析能力而备受推崇。R 数组是 R 语言中用于存储和操作数据的基本数据结构。本文将详细介绍 R 数组的创建、操作和优化,帮助读者掌握 R 数组的使用技巧…...

git服务器搭建,gitea服务搭建,使用systemclt管理服务
文章目录 页面展示使用二进制文件安装git服务下载选择架构使用wget下载安装 验证 GPG 签名服务器设置准备环境创建systemctl文件 备份与恢复备份命令 (dump)恢复命令 (restore) 页面展示 使用二进制文件安装git服务 所有打包的二进制程序均包含 SQLite,MySQL 和 Po…...

Pdf手册阅读(1)--数字签名篇
原文阅读摘要 PDF支持的数字签名, 不仅仅是公私钥签名,还可以是指纹、手写、虹膜等生物识别签名。PDF签名的计算方式,可以基于字节范围进行计算,也可以基于Pdf 对象(pdf object)进行计算。 PDF文件可能包…...

嵌入式WebRTC压缩至670K,目标将so动态库压缩至500K,.a静态库还可以更小
最近把EasyRTC的效果发布出去给各大IPC厂商体验了一下,直接就用EasyRTC与各个厂商的负责人进行的通话,在通话中,用户就反馈效果确实不错! 这两天有用户要在海思hi3516cv610上使用EasyRTC,工具链是:gcc-2024…...

百度高德地图坐标转换
百度地图和高德地图的侧重点不太一样。同样一个地名,在百度地图网站上搜索到的地点可能是商业网点,在高德地图网站上搜索到的地点可能是自然行政地点。 高德地图api 在高德地图中,搜索地名,如“乱石头川”,该地名会出…...
ES 索引结构
ES 既不像 MySQL 这样有严格的 Schema,也不像 MongoDB 那样完全无 Schema,而是介于两者之间。 1️⃣ ES 的 Schema 模式 ES 默认是 Schema-less(无模式) 的,允许动态添加字段。 但 ES 也支持 Schema(映射 …...

HPM_SDK应用本地化——基于6750evkmini
文章目录 前言一、准备工作1、下载官方的SDK2、解压SDK 二、实操1、新建目标工程文件夹2、回到SDK中将相关文件复制1、Borad文件夹2、hello_world文件夹 三、实验现象总结 前言 为什么要对sdk进行应用本地化?在嵌入式开发中我们一般将官方提供的SDK作为参考&#x…...

【deepseek-r1本地部署】
首先需要安装ollama,之前已经安装过了,这里不展示细节 在cmd中输入官网安装命令:ollama run deepseek-r1:32b,开始下载 出现success后,下载完成 接下来就可以使用了,不过是用cmd来运行使用 可以安装UI可视化界面&a…...

查询语句来提取 detail 字段中包含 xxx 的 URL 里的 commodity/ 后面的数字串
您可以使用以下 SQL 查询语句来提取 detail 字段中包含 oss.kxlist.com 的 URL 里的 commodity/ 后面的数字串: <p><img style"max-width:100%;" src"https://oss.kxlist.com//8a989a0c55e4a7900155e7fd7971000b/commodity/20170925/20170…...

堆排序
目录 堆排序(不稳定): 代码实现: 思路分析: 总结: 堆排序(不稳定): 如果想要一段数据从小到大进行排序,则要先建立大根堆,因为这样每次堆顶上都能…...

【MySQL】我在广州学Mysql 系列—— 数据备份与还原
ℹ️大家好,我是练小杰,今天周一,过两天就是元宵节了,今年元宵节各位又要怎么过呢!! 本文主要对Mysql数据库中的数据备份与还原内容进行讨论!! 回顾:👉【MySQ…...
【LeetCode Hot100 双指针】移动零、盛最多水的容器、三数之和、接雨水
双指针 1. 移动零题目描述解题思路关键思路:步骤:时间复杂度:空间复杂度: 代码实现 2. 盛最多水的容器题目解析解题思路代码实现 3. 三数之和问题描述:解题思路:算法步骤:代码实现: …...
HTML应用指南:利用POST请求获取接入比亚迪业态的充电桩位置信息
在新能源汽车快速发展的今天,充电桩的分布和可用性成为了影响用户体验的关键因素之一。比亚迪作为全球领先的新能源汽车制造商,不仅在车辆制造方面取得了卓越成就,也在充电基础设施建设上投入了大量资源。为了帮助用户更方便地找到比亚迪充电桩的位置,本篇文章,我们将探究…...
Android车机DIY开发之软件篇(十二) AOSP12下载编译
Android车机DIY开发之软件篇(十二) AOSP12下载编译 sudo apt-get update sudo apt-get install git-core gnupg flex bison gperf build-essential zip curl zlib1g-dev gcc-multilib gmultilib libc6-dev-i386 lib32ncurses5-dev libx11-dev lib32z-dev ccache libgl1-mesa-…...

Jenkins+gitee 搭建自动化部署
Jenkinsgitee 搭建自动化部署 环境说明: 软件版本备注CentOS8.5.2111JDK1.8.0_211Maven3.8.8git2.27.0Jenkins2.319最好选稳定版本,不然安装插件有点麻烦 一、安装Jenkins程序 1、到官网下载相应的版本war或者直接使用yum安装 Jenkins官网下载 直接…...

【文本处理】如何在批量WORD和txt文本提取手机号码,固话号码,提取邮箱,删除中文,删除英文,提取车牌号等等一些文本提取固定格式的操作,基于WPF的解决方案
企业的应用场景 数据清洗:在进行数据导入或分析之前,往往需要对大量文本数据进行预处理,比如去除文本中的无关字符(中文、英文),只保留需要的联系信息(手机号码、固话号码、邮箱)。…...

Linux系统引导与服务管理
目录 一、Linux引导过程 1、引导过程概述 1.1、BIOS开机自检 1.2、MBR读取 1.3、加载引导加载程序(GRUB) 1.4、内核加载 1.5、初始化进程(init) 二、服务 2.1、服务类型 2.2、服务管理工具 三、运行级别 四、systemd …...

AI-调查研究-01-正念冥想有用吗?对健康的影响及科学指南
点一下关注吧!!!非常感谢!!持续更新!!! 🚀 AI篇持续更新中!(长期更新) 目前2025年06月05日更新到: AI炼丹日志-28 - Aud…...

大数据学习栈记——Neo4j的安装与使用
本文介绍图数据库Neofj的安装与使用,操作系统:Ubuntu24.04,Neofj版本:2025.04.0。 Apt安装 Neofj可以进行官网安装:Neo4j Deployment Center - Graph Database & Analytics 我这里安装是添加软件源的方法 最新版…...
基于大模型的 UI 自动化系统
基于大模型的 UI 自动化系统 下面是一个完整的 Python 系统,利用大模型实现智能 UI 自动化,结合计算机视觉和自然语言处理技术,实现"看屏操作"的能力。 系统架构设计 #mermaid-svg-2gn2GRvh5WCP2ktF {font-family:"trebuchet ms",verdana,arial,sans-…...

(十)学生端搭建
本次旨在将之前的已完成的部分功能进行拼装到学生端,同时完善学生端的构建。本次工作主要包括: 1.学生端整体界面布局 2.模拟考场与部分个人画像流程的串联 3.整体学生端逻辑 一、学生端 在主界面可以选择自己的用户角色 选择学生则进入学生登录界面…...
【Linux】C语言执行shell指令
在C语言中执行Shell指令 在C语言中,有几种方法可以执行Shell指令: 1. 使用system()函数 这是最简单的方法,包含在stdlib.h头文件中: #include <stdlib.h>int main() {system("ls -l"); // 执行ls -l命令retu…...
ffmpeg(四):滤镜命令
FFmpeg 的滤镜命令是用于音视频处理中的强大工具,可以完成剪裁、缩放、加水印、调色、合成、旋转、模糊、叠加字幕等复杂的操作。其核心语法格式一般如下: ffmpeg -i input.mp4 -vf "滤镜参数" output.mp4或者带音频滤镜: ffmpeg…...

DBAPI如何优雅的获取单条数据
API如何优雅的获取单条数据 案例一 对于查询类API,查询的是单条数据,比如根据主键ID查询用户信息,sql如下: select id, name, age from user where id #{id}API默认返回的数据格式是多条的,如下: {&qu…...
Spring AI 入门:Java 开发者的生成式 AI 实践之路
一、Spring AI 简介 在人工智能技术快速迭代的今天,Spring AI 作为 Spring 生态系统的新生力量,正在成为 Java 开发者拥抱生成式 AI 的最佳选择。该框架通过模块化设计实现了与主流 AI 服务(如 OpenAI、Anthropic)的无缝对接&…...
【python异步多线程】异步多线程爬虫代码示例
claude生成的python多线程、异步代码示例,模拟20个网页的爬取,每个网页假设要0.5-2秒完成。 代码 Python多线程爬虫教程 核心概念 多线程:允许程序同时执行多个任务,提高IO密集型任务(如网络请求)的效率…...

如何在最短时间内提升打ctf(web)的水平?
刚刚刷完2遍 bugku 的 web 题,前来答题。 每个人对刷题理解是不同,有的人是看了writeup就等于刷了,有的人是收藏了writeup就等于刷了,有的人是跟着writeup做了一遍就等于刷了,还有的人是独立思考做了一遍就等于刷了。…...