计算着色器#
对于某些类型的计算,GPU上的计算着色器可能比仅在CPU上快数千倍。
在本教程中,我们将使用“N-Body模拟”来模拟恒星场。每颗恒星都受到其他恒星引力的影响。对于1,000颗恒星,这意味着我们要对每一帧执行1,000 x 1,000=1,000亿次计算。这段视频有6.5万颗恒星,每帧需要计算42亿次引力。在高端硬件上,它仍然可以运行在60fps!
这是怎么回事?该计划包括三个主要部分:
分配缓冲区并将所有内容粘合在一起的Python代码
可视化着色器,让我们可以看到缓冲区中的数据
计算着色器,用于移动所有对象
缓冲区#
我们需要一个地方来存储我们要可视化的数据。为此,我们将创建两个 Shader Storage Buffer Objects (SSBO)来自我们的Python代码中的浮点数。一个将保存前一帧的起始位置,另一个将用于存储计算下一帧的位置。
每个缓冲区必须能够为每个恒星存储以下内容:
存储的每颗恒星的x、y和半径
恒星的速度,它将不会被可视化所使用
星的浮点RGBA颜色
生成对齐的数据#
为了避免GPU内存对齐问题,我们将使用下面的函数来生成准备加载到SSBO中的对齐良好的数据。文档字符串和注释更详细地解释了原因:
def gen_initial_data(
screen_size: Tuple[int, int],
num_stars: int = NUM_STARS,
use_color: bool = False
) -> array:
"""
Generate an :py:class:`~array.array` of randomly positioned star data.
Some of this data is wasted as padding because:
1. GPUs expect SSBO data to be aligned to multiples of 4
2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
so we have to use 4-length vectors anyway.
:param screen_size: A (width, height) of the area to generate stars in
:param num_stars: How many stars to generate
:param use_color: Whether to generate white or randomized pastel stars
:return: an array of star position data
"""
width, height = screen_size
color_channel_min = 0.5 if use_color else 1.0
def _data_generator() -> Generator[float, None, None]:
"""Inner generator function used to illustrate memory layout"""
for i in range(num_stars):
# Position/radius
yield random.randrange(0, width)
yield random.randrange(0, height)
yield 0.0 # z (padding, unused by shaders)
yield 6.0
# Velocity (unused by visualization shaders)
yield 0.0
yield 0.0
yield 0.0 # vz (padding, unused by shaders)
yield 0.0 # vw (padding, unused by shaders)
# Color
yield random.uniform(color_channel_min, 1.0) # r
yield random.uniform(color_channel_min, 1.0) # g
yield random.uniform(color_channel_min, 1.0) # b
yield 1.0 # a
# Use the generator function to fill an array in RAM
return array('f', _data_generator())
分配缓冲区#
# --- Create buffers
# Create pairs of buffers for the compute & visualization shaders.
# We will swap which buffer instance is the initial value and
# which is used as the current value to write to.
# ssbo = shader storage buffer object
initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
self.ssbo_previous = self.ctx.buffer(data=initial_data)
self.ssbo_current = self.ctx.buffer(data=initial_data)
# vao = vertex array object
# Format string describing how to interpret the SSBO buffer data.
# 4f = position and size -> x, y, z, radius
# 4x4 = Four floats used for calculating velocity. Not needed for visualization.
# 4f = color -> rgba
buffer_format = "4f 4x4 4f"
# Attribute variable names for the vertex shader
attributes = ["in_vertex", "in_color"]
self.vao_previous = self.ctx.geometry(
[BufferDescription(self.ssbo_previous, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
self.vao_current = self.ctx.geometry(
[BufferDescription(self.ssbo_current, buffer_format, attributes)],
mode=self.ctx.POINTS,
)
可视化着色器#
现在我们有了数据,我们需要能够将其可视化。我们将通过应用顶点、几何体和碎片着色器将SSBO中的数据转换为像素来实现这一点。对于阵列中的每个恒星的12个浮点,将发生以下数据流:
顶点着色器#
在本教程中,顶点着色器将对每个恒星的12个浮点长的原始填充数据在 self.ssbo_current
。每次执行都会将干净的类型化数据输出到几何体着色器的一个实例。
数据按如下方式读入:
每颗恒星的x、y和半径可通过
in_vertex
星的浮点RGBA颜色,VIA
in_color
1#version 330
2
3in vec4 in_vertex;
4in vec4 in_color;
5
6out vec2 vertex_pos;
7out float vertex_radius;
8out vec4 vertex_color;
9
10void main()
11{
12 vertex_pos = in_vertex.xy;
13 vertex_radius = in_vertex.w;
14 vertex_color = in_color;
15}
然后将以下变量作为输入传递到几何体着色器:
vertex_pos
vertex_radius
vertex_color
几何体着色器#
这个 geometry shader 将单个点转换为四边形,在本例中为正方形,可由GPU渲染。它通过以输入点为中心发射四个点来实现这一点。
1#version 330
2
3layout (points) in;
4layout (triangle_strip, max_vertices = 4) out;
5
6// Use arcade's global projection UBO
7uniform Projection {
8 uniform mat4 matrix;
9} proj;
10
11
12// The outputs from the vertex shader are used as inputs
13in vec2 vertex_pos[];
14in float vertex_radius[];
15in vec4 vertex_color[];
16
17// These are used with EmitVertex to generate four points of
18// a quad centered around vertex_pos[0].
19out vec2 g_uv;
20out vec3 g_color;
21
22void main() {
23 vec2 center = vertex_pos[0];
24 vec2 hsize = vec2(vertex_radius[0]);
25
26 g_color = vertex_color[0].rgb;
27
28 gl_Position = proj.matrix * vec4(vec2(-hsize.x, hsize.y) + center, 0.0, 1.0);
29 g_uv = vec2(0, 1);
30 EmitVertex();
31
32 gl_Position = proj.matrix * vec4(vec2(-hsize.x, -hsize.y) + center, 0.0, 1.0);
33 g_uv = vec2(0, 0);
34 EmitVertex();
35
36 gl_Position = proj.matrix * vec4(vec2(hsize.x, hsize.y) + center, 0.0, 1.0);
37 g_uv = vec2(1, 1);
38 EmitVertex();
39
40 gl_Position = proj.matrix * vec4(vec2(hsize.x, -hsize.y) + center, 0.0, 1.0);
41 g_uv = vec2(1, 0);
42 EmitVertex();
43
44 // End geometry emmission
45 EndPrimitive();
46}
片段着色器#
A fragment shader 对四边形中的每个像素运行。它将四边形内的UV坐标转化为浮点RGBA值。在本教程中,着色器在每个恒星的四边形的表面上生成柔和的发光圆圈。
1#version 330
2
3in vec2 g_uv;
4in vec3 g_color;
5
6out vec4 out_color;
7
8void main()
9{
10 float l = length(vec2(0.5, 0.5) - g_uv.xy);
11 if ( l > 0.5)
12 {
13 discard;
14 }
15 float alpha;
16 if (l == 0.0)
17 alpha = 1.0;
18 else
19 alpha = min(1.0, .60-l * 2);
20
21 vec3 c = g_color.rgb;
22 // c.xy += v_uv.xy * 0.05;
23 // c.xy += v_pos.xy * 0.75;
24 out_color = vec4(c, alpha);
25}
计算着色器#
现在我们有了一种显示数据的方法,我们应该更新它。
我们在前面创建了多对缓冲区。我们将使用一个SSBO作为 input buffer 保存前一帧的数据,另一帧作为我们的 output 要将结果写入的缓冲区。
然后,我们在绘制后的每一帧中交换缓冲区,使用输出作为下一帧的输入,并重复该过程,直到程序停止运行。
1#version 430
2
3// Set up our compute groups.
4// The COMPUTE_SIZE_X and COMPUTE_SIZE_Y values will be replaced
5// by the Python code with actual values. This does not happen
6// automatically, and must be called manually.
7layout(local_size_x=COMPUTE_SIZE_X, local_size_y=COMPUTE_SIZE_Y) in;
8
9// Input uniforms would go here if you need them.
10// The examples below match the ones commented out in main.py
11//uniform vec2 screen_size;
12//uniform float frame_time;
13
14// Structure of the star data
15struct Star
16{
17 vec4 pos;
18 vec4 vel;
19 vec4 color;
20};
21
22// Input buffer
23layout(std430, binding=0) buffer stars_in
24{
25 Star stars[];
26} In;
27
28// Output buffer
29layout(std430, binding=1) buffer stars_out
30{
31 Star stars[];
32} Out;
33
34void main()
35{
36 int curStarIndex = int(gl_GlobalInvocationID);
37
38 Star in_star = In.stars[curStarIndex];
39
40 vec4 p = in_star.pos.xyzw;
41 vec4 v = in_star.vel.xyzw;
42
43 // Move the star according to the current force
44 p.xy += v.xy;
45
46 // Calculate the new force based on all the other bodies
47 for (int i=0; i < In.stars.length(); i++) {
48 // If enabled, this will keep the star from calculating gravity on itself
49 // However, it does slow down the calcluations do do this check.
50 // if (i == x)
51 // continue;
52
53 // Calculate distance squared
54 float dist = distance(In.stars[i].pos.xyzw.xy, p.xy);
55 float distanceSquared = dist * dist;
56
57 // If distance is too small, extremely high forces can result and
58 // fling the star into escape velocity and forever off the screen.
59 // Using a reasonable minimum distance to prevents this.
60 float minDistance = 0.02;
61 float gravityStrength = 0.3;
62 float simulationSpeed = 0.002;
63 float force = min(minDistance, gravityStrength / distanceSquared) * -simulationSpeed;
64
65 vec2 diff = p.xy - In.stars[i].pos.xyzw.xy;
66 // We should normalize this I think, but it doesn't work.
67 // diff = normalize(diff);
68 vec2 delta_v = diff * force;
69 v.xy += delta_v;
70 }
71
72
73 Star out_star;
74 out_star.pos.xyzw = p.xyzw;
75 out_star.vel.xyzw = v.xyzw;
76
77 vec4 c = in_star.color.xyzw;
78 out_star.color.xyzw = c.xyzw;
79
80 Out.stars[curStarIndex] = out_star;
81}
完成的Python程序#
该代码包括详细的文档字符串和解释其工作原理的注释。
1"""
2N-Body Gravity with Compute Shaders & Buffers
3"""
4import random
5from array import array
6from pathlib import Path
7from typing import Generator, Tuple
8
9import arcade
10from arcade.gl import BufferDescription
11
12# Window dimensions in pixels
13WINDOW_WIDTH = 800
14WINDOW_HEIGHT = 600
15
16# Size of performance graphs in pixels
17GRAPH_WIDTH = 200
18GRAPH_HEIGHT = 120
19GRAPH_MARGIN = 5
20
21NUM_STARS: int = 4000
22USE_COLORED_STARS: bool = True
23
24
25def gen_initial_data(
26 screen_size: Tuple[int, int],
27 num_stars: int = NUM_STARS,
28 use_color: bool = False
29) -> array:
30 """
31 Generate an :py:class:`~array.array` of randomly positioned star data.
32
33 Some of this data is wasted as padding because:
34
35 1. GPUs expect SSBO data to be aligned to multiples of 4
36 2. GLSL's vec3 is actually a vec4 with compiler-side restrictions,
37 so we have to use 4-length vectors anyway.
38
39 :param screen_size: A (width, height) of the area to generate stars in
40 :param num_stars: How many stars to generate
41 :param use_color: Whether to generate white or randomized pastel stars
42 :return: an array of star position data
43 """
44 width, height = screen_size
45 color_channel_min = 0.5 if use_color else 1.0
46
47 def _data_generator() -> Generator[float, None, None]:
48 """Inner generator function used to illustrate memory layout"""
49
50 for i in range(num_stars):
51 # Position/radius
52 yield random.randrange(0, width)
53 yield random.randrange(0, height)
54 yield 0.0 # z (padding, unused by shaders)
55 yield 6.0
56
57 # Velocity (unused by visualization shaders)
58 yield 0.0
59 yield 0.0
60 yield 0.0 # vz (padding, unused by shaders)
61 yield 0.0 # vw (padding, unused by shaders)
62
63 # Color
64 yield random.uniform(color_channel_min, 1.0) # r
65 yield random.uniform(color_channel_min, 1.0) # g
66 yield random.uniform(color_channel_min, 1.0) # b
67 yield 1.0 # a
68
69 # Use the generator function to fill an array in RAM
70 return array('f', _data_generator())
71
72
73class NBodyGravityWindow(arcade.Window):
74
75 def __init__(self):
76 # Ask for OpenGL context supporting version 4.3 or greater when
77 # calling the parent initializer to make sure we have compute shader
78 # support.
79 super().__init__(
80 WINDOW_WIDTH, WINDOW_HEIGHT,
81 "N-Body Gravity with Compute Shaders & Buffers",
82 gl_version=(4, 3),
83 resizable=False
84 )
85 # Attempt to put the window in the center of the screen.
86 self.center_window()
87
88 # --- Create buffers
89
90 # Create pairs of buffers for the compute & visualization shaders.
91 # We will swap which buffer instance is the initial value and
92 # which is used as the current value to write to.
93
94 # ssbo = shader storage buffer object
95 initial_data = gen_initial_data(self.get_size(), use_color=USE_COLORED_STARS)
96 self.ssbo_previous = self.ctx.buffer(data=initial_data)
97 self.ssbo_current = self.ctx.buffer(data=initial_data)
98
99 # vao = vertex array object
100 # Format string describing how to interpret the SSBO buffer data.
101 # 4f = position and size -> x, y, z, radius
102 # 4x4 = Four floats used for calculating velocity. Not needed for visualization.
103 # 4f = color -> rgba
104 buffer_format = "4f 4x4 4f"
105
106 # Attribute variable names for the vertex shader
107 attributes = ["in_vertex", "in_color"]
108
109 self.vao_previous = self.ctx.geometry(
110 [BufferDescription(self.ssbo_previous, buffer_format, attributes)],
111 mode=self.ctx.POINTS,
112 )
113 self.vao_current = self.ctx.geometry(
114 [BufferDescription(self.ssbo_current, buffer_format, attributes)],
115 mode=self.ctx.POINTS,
116 )
117
118 # --- Create the visualization shaders
119
120 vertex_shader_source = Path("shaders/vertex_shader.glsl").read_text()
121 fragment_shader_source = Path("shaders/fragment_shader.glsl").read_text()
122 geometry_shader_source = Path("shaders/geometry_shader.glsl").read_text()
123
124 # Create the complete shader program which will draw the stars
125 self.program = self.ctx.program(
126 vertex_shader=vertex_shader_source,
127 geometry_shader=geometry_shader_source,
128 fragment_shader=fragment_shader_source,
129 )
130
131 # --- Create our compute shader
132
133 # Load in the raw source code safely & auto-close the file
134 compute_shader_source = Path("shaders/compute_shader.glsl").read_text()
135
136 # Compute shaders use groups to parallelize execution.
137 # You don't need to understand how this works yet, but the
138 # values below should serve as reasonable defaults. Later, we'll
139 # preprocess the shader source by replacing the templating token
140 # with its corresponding value.
141 self.group_x = 256
142 self.group_y = 1
143
144 self.compute_shader_defines = {
145 "COMPUTE_SIZE_X": self.group_x,
146 "COMPUTE_SIZE_Y": self.group_y
147 }
148
149 # Preprocess the source by replacing each define with its value as a string
150 for templating_token, value in self.compute_shader_defines.items():
151 compute_shader_source = compute_shader_source.replace(templating_token, str(value))
152
153 self.compute_shader = self.ctx.compute_shader(source=compute_shader_source)
154
155 # --- Create the FPS graph
156
157 # Enable timings for the performance graph
158 arcade.enable_timings()
159
160 # Create a sprite list to put the performance graph into
161 self.perf_graph_list = arcade.SpriteList()
162
163 # Create the FPS performance graph
164 graph = arcade.PerfGraph(GRAPH_WIDTH, GRAPH_HEIGHT, graph_data="FPS")
165 graph.position = GRAPH_WIDTH / 2, self.height - GRAPH_HEIGHT / 2
166 self.perf_graph_list.append(graph)
167
168 def on_draw(self):
169 # Clear the screen
170 self.clear()
171 # Enable blending so our alpha channel works
172 self.ctx.enable(self.ctx.BLEND)
173
174 # Bind buffers
175 self.ssbo_previous.bind_to_storage_buffer(binding=0)
176 self.ssbo_current.bind_to_storage_buffer(binding=1)
177
178 # If you wanted, you could set input variables for compute shader
179 # as in the lines commented out below. You would have to add or
180 # uncomment corresponding lines in compute_shader.glsl
181 # self.compute_shader["screen_size"] = self.get_size()
182 # self.compute_shader["frame_time"] = self.frame_time
183
184 # Run compute shader to calculate new positions for this frame
185 self.compute_shader.run(group_x=self.group_x, group_y=self.group_y)
186
187 # Draw the current star positions
188 self.vao_current.render(self.program)
189
190 # Swap the buffer pairs.
191 # The buffers for the current state become the initial state,
192 # and the data of this frame's initial state will be overwritten.
193 self.ssbo_previous, self.ssbo_current = self.ssbo_current, self.ssbo_previous
194 self.vao_previous, self.vao_current = self.vao_current, self.vao_previous
195
196 # Draw the graphs
197 self.perf_graph_list.draw()
198
199
200
201if __name__ == "__main__":
202 app = NBodyGravityWindow()
203 arcade.run()
支持3D的本教程的扩展版本可在以下位置获得:https://github.com/pvcraven/n-body