(MTL S01E06) Blit Operations

4 min readOct 13, 2024

Metal offers a broad range of blit operations, which are essential for efficient data transfer between memory locations. A blit operation (short for BLock Information Transfer) involves copying a section of a buffer, texture, or their parts from one place in memory to another.

By leveraging these operations, you can move data efficiently without the need for additional rendering or compute passes. So if you need to copy, fill, or generate mipmaps, use a blit operation to perform these tasks more efficiently.

Encoder

Blit is a GPU operation that runs as part of a command buffer, so you’ll need a corresponding command encoder to perform it. Here’s a simple example:

if let encoder = commandBuffer.makeBlitCommandEncoder() {
    // Perform your blit operations here
    encoder.endEncoding()
}

This encoder handles the actual blit operations, and once you’re done, you end the encoding to complete the process.

Operations

The blit operations in Metal are pretty simple and well documented. However, I’ll briefly explain some of them:

Copying

There’re a heap of .copy(…) operations.

Note, that you can copy areas within the same texture or buffer, but be cautious — if the areas overlap, it may result in undefined behaviour.

From buffer to buffer

Since buffers don’t have many additional attributes and are simply blocks of memory, copying them is straightforward:

// Repeating 20 bytes at offset 20 starting from offset 40
encoder.copy(
    from: srcBuffer,
    sourceOffset: 20,
    to: srcBuffer,
    destinationOffset: 40,
    size: 20
)

From buffers to texture

These operations can be particularly useful when uploading a raw image buffer to a GPU texture — at least, that’s the most common case in my experience.

// Uploading an image with size 640x480 and 4 bytes per pixel to a texture.
encoder.copy(
    from: buffer,
    sourceOffset: 0,
    sourceBytesPerRow: 640 * 4,
    sourceBytesPerImage: 640 * 480 * 4,
    sourceSize: MTLSize(width: 640, height: 480, depth: 1),
    to: texture,
    destinationSlice: 0,
    destinationLevel: 0,
    destinationOrigin: MTLOrigin(x: 0, y: 0, z: 0))

From texture to buffer

Personally, I’ve mostly used these operations for downloading textures.

// Downloading texture from previous example back to the buffer.
encoder.copy(
    from: texture, 
    sourceSlice: 0, 
    sourceLevel: 0, 
    sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), 
    sourceSize: MTLSize(width: 640, height: 480, depth: 1), 
    to: buffer, 
    destinationOffset: 0, 
    destinationBytesPerRow: 640 * 4, 
    destinationBytesPerImage: 640 * 480 * 4)

From texture to texture

straight copy of the whole texture

// Copying the whole content of `srcTexture` into `dstTexture`.
encoder.copy(from: srcTexture, to: dstTExture)

copying whe whole slices and levels

// Copying mip level 3 of slice 1 of `srcTexture` to mip level 3 of slice 2 of `dstTexture`
encoder.copy(
    from: srcTexture,
    sourceSlice: 1,
    sourceLevel: 3,
    to: dstTExture,
    destinationSlice: 2,
    destinationLevel: 3,
    sliceCount: 1,
    levelCount: 1)

copying some area of one texture to an area of destination texture

encoder.copy(
    from: srcTexture,
    sourceSlice: 0,
    sourceLevel: 0,
    sourceOrigin: MTLOrigin(x: 10, y: 20, z: 0),
    sourceSize: MTLSize(width: 120, height: 90, depth: 1),
    to: dstTexture,
    destinationSlice: 0,
    destinationLevel: 0,
    destinationOrigin: MTLOrigin(x: 20, y: 30, z: 0))

Filling Buffer

If you need to fill a buffer with the same constant bytes, use something like the following:

encoder.fill(buffer: buffer, range: 0..<bufferLength, value: 0x42)

Generating Mipmap

This is pretty simple way to generate a mipmap for the given texture:

encoder.generateMipmap(for: texture)

Optimisation Content

To be honest, I’ve never used these functions myself, but they do exist and seem like a good reason to experiment.

The purpose of these functions is to realign a texture’s memory for more efficient access by the GPU or CPU:

optimizeContentsForGPUAccess(texture: texture)
optimizeContentsForCPUAccess(texture: texture)

Synchronisation

Here we have a function for synchronizing the CPU copy of managed resources with their GPU copy (it’s not relevant for ARM processors, since they use shared memory, but we still have users with Intel machines):

encoder.synchronize(texture: texture, slice: 0, level:0)

Sometimes, you need to synchronize operations within a pass using fences (MTLFence). Metal provides the following functions to handle this in a blit pass (and yes, similar fence synchronization is available in render and compute encoders):

// ...
// Wait for the fence to be updated
encoder.waitForFence(fence)
// ...
// Update the fence and continue processing:
encoder.updateFence(fence)

NOTE: If you call `updateFence()` before `waitForFence()`, it can cause a GPU deadlock — be careful!

Other functions

There are several of other operations, but I’ve never used them — feel free to check them out in the documentation.

Conclusion

As you can see, there’s a comprehensive set of functions for fast operations with GPU buffers and textures.
These operations are highly optimized, so there’s no need to replicate them using compute or rendering encoders.
You can even perform certain image manipulations just by copying (it may seem a bit unconventional, but in some cases, it’s an effective solution).