优化 Swift 代码的内存使用和性能

优化 Swift 代码的内存使用和性能

了解如何提升 Swift 代码的性能和内存管理。我们将探索优化代码的多种方法，包括进行高级算法更改，以及采用新的 InlineArray 和 Span 类型对内存和分配进行更精细的控制。

章节
- 0:00 - Introduction & Agenda
- 1:19 - QOI format & parser app
- 2:25 - Algorithms
- 8:17 - Allocations
- 16:30 - Exclusivity
- 19:12 - Stack versus heap
- 21:08 - Reference counting
- 29:52 - Swift Binary Parsing library
- 31:03 - Next steps
资源
相关视频

WWDC25
WWDC24
- 探索 Swift 性能
Hello! I’m Nate Cook, and I work on the Swift Standard Library. Today we’re going to explore how to understand and improve your code’s performance, in part by using some new additions to the language and the standard library in Swift 6.2. We'll get to use the new InlineArray and Span types, try out value generics, and learn about non-escapable types. We'll use all those new tools to eliminate retains and releases, exclusivity and uniqueness checks, and other extra work. I'll also debut a new open source library that uses all these tools to make writing binary parsers fast and safe. It's called Swift Binary Parsing. The library focuses on speed and provides tools for managing several different kinds of safety. We all want our code to be fast, and Swift gives us the tools to make that happen. But sometimes, things just aren't as speedy as we expect. In this session, we'll practice figuring out where our code is spending its time, and then try out several kinds of performance optimizations... Picking the right algorithms. Getting rid of extra allocations. Eliminating exclusivity checks. Moving from heap to stack allocations. And cutting down on reference counting. In our exploration, we’ll be looking at a little app I built. It’s a viewer for an image format called QOI, and includes a hand-written parser for the format. QOI is a lossless image format that is simple enough that its specification fits on a single page, making it useful for trying out different approaches and seeing their performance. The QOI format uses a standard idiom for binary formats, with a fixed size header followed by a data section, which includes a dynamic number of differently sized encoded pixels. The encoded pixels take several forms. A pixel can be an RGB or RGBA value, a difference from the previous pixel, a lookup into a cache of previously seen pixels, or... just the number of times to repeat the previous pixel. Alright – let’s try out my QOI Parser App! I can open this icon file, which is only a few kilobytes, and it loads immediately.
This photo of a bird is a bit bigger. It can take a few seconds to load… And there it is. What's taking so long? When you see a noticeable slowdown working with real-world data, it’s often a sign of incorrect usage of an algorithm or data structure. Let’s use Instruments to find and tackle the source of this problem. In my parsing library, I’ve written a test that parses that same bird image that was slow to load. I can click the run button to run the test...
And it passes after a few seconds. In addition to using this test to check for correctness, I can also profile the test to see its performance in Instruments. This time, I use a secondary click on the run button. There's an option in the menu to profile the test.
I love this feature. When profiling a test, I can focus on the specific part of my code that I'm interested in. I’ll select that option now to launch Instruments.
Instruments opens with its template chooser, showing all the different ways that it can help you understand your code’s performance. We're going to use two different instruments today, so I'll start out with the blank template.
I can add instruments by clicking on the add instrument button. I'll add the Allocations instrument to help understand how my parser is allocating memory. And since I'm really interested in knowing where my app is spending its time, I'll add the “Time Profiler” instrument.
The Time Profiler is a great place to start for performance questions.
Let's hide the sidebar to make a little more room for the results. And then use the record button to kick off the test.
We can see a few things in the results window.
The instruments included in the profile are listed at the top of the window. We'll use the “Time Profiler” first, so I’ll keep that selected. At the bottom is the detail view for the selected tool. On the left is a list of captured calls...
And on the right is the heaviest stack trace for the currently selected call.
I’d like to see the most frequently captured calls first, regardless of how they're reached, so I’ll click on the Call Tree button and then check the “Invert Call Tree” checkbox I can switch to a graphical view by using this button at the top of the detail view. When I click, the view switches to show the profile as a flame graph.
Each of the bars in the flame graph shows the proportion of the times a call was captured during the profile. In this case, there's a huge bar dominating the process, labeled “platform_memmove”. That same symbol also shows up in the stack trace, memmove is a system call for copying data, so that huge bar indicates that the parser is spending most of its time copying data around, instead of just reading it. But, that should not be happening. Let’s figure out which part of my code is causing all that copying. I want to see all the frames in the stack trace, so I’ll click the “show all frames” button at the top of the view.
At the top of the trace are system calls, including platform_memmove, and then some specialized versions of methods provided by the Foundation Data type. You might have seen specialized methods like these in a stack trace, or when debugging. These specialized methods are type-specific versions of generic code that the Swift compiler generates for you.
Finally, we come to a method that I defined, readByte.
Since this is the closest function in my code to the problem, it's the right place to start. To jump straight to this method, I can use a secondary click and then choose “Reveal in Xcode.” And here’s the declaration of the readByte method in Xcode. Instruments sent me right to this line, where I’m dropping the first byte and then calling the Data initializer. With Instruments, I was able to identify all those memmove calls as a potential source of slowness in my library, and then jump straight to the specific line of code that is causing all of that copying.
This helper method is really important... because my parsing code calls readByte over and over while consuming the raw binary data.
I thought that this would just shrink the data, returning the first byte and moving the start of the data forward every time I called the readByte method. Instead, it actually copies the entire contents of the data into a new allocation, each time I read a byte. That's a lot more work than I expected. Let's fix this mistake. I'm back in Xcode now, editing the readByte method.
Because the Data type is designed to shrink from both ends, we actually have access to a collection method called `popFirst()`. popFirst() returns the first byte in `data`, and then slides the front of the collection forward, shrinking it by one byte. Just what we want.
With that fixed, I can switch back to my test, and run the profile again.
Instruments opens automatically, with the test already running with the same profiling configuration. Excellent! That huge platform_memmove bar is gone from the flame graph.
When I benchmark my code, I can also see an absolutely huge speed up as a result of that change! That's fantastic, but with an algorithmic change like this, the absolute change isn't necessarily the whole story. In my original version, the relationship between the size of the image and the time it took to parse was quadratic. As the images I was parsing got larger, the time it took to parse got drastically longer. With the copying fix in place, the relationship is now linear. There's more of a direct match between the size of the image and the time it takes to parse. We have a bunch more improvements coming up that will improve the linear performance, and we'll be able to compare those improvements more directly.
With that issue out of the way, let's take a look at another common performance pitfall: extra allocations.
Let’s look at what the heaviest stack trace is now. These calls show that we’re seeing a lot of traffic to the methods that allocate and deallocate a Swift array. Allocating and deallocating memory can be expensive. My parser will be faster if I can figure out where these extra allocations are coming from, and eliminate them. To see the allocations that my parser is making, I can use the Allocations instrument that we added earlier. There are a couple different indicators that my code is probably causing unnecessary allocations.
First is the sheer number: Nearly a million allocations on the way to parsing one image? I think we can do better. Second, we can see that nearly all of those allocations are transient allocations, which are marked as short-lived by the Allocations instrument To find the source of the problem, I’ll switch the detail panel to the Call Tree view. First, I click on the pop-up button labeled Statistics. And then I’ll choose Call Trees.
With the top thread selected, I’ll look at the stack trace to find the part of my code that’s closest to the problem. Since this stack trace is not inverted, I need to start looking at the bottom of the trace. The first symbol from my parser is this RGBAPixel.data method.
When I click on that method, it’s revealed in the call tree detail window. And when I use secondary click on the method there, I can choose Reveal in Xcode to jump right to the source.
This method seems to be the source of the extra allocations I can see that every time it’s called, it returns an array with either the RGB values, or RGBA values of the pixel. That means it’s going to create an array and allocate room for at least three elements every time it’s called.
To find out where it’s being used, I’ll use a secondary click on the function name and choose “Show callers.” The caller is this closure in our main parsing function, which is just one part of this big flatMap and prefix chain. To understand why this code is making so many separate allocations, let’s look at how the allocations pile up, step by step.
First, the readEncodedPixels method parses the binary data into encoded pixels – these are the different pixel types I mentioned earlier – and it needs to allocate enough space to store them.
Next, decodePixels is called for each encoded pixel, to produce one or more RGBA pixels. Most encodings just turn into a single pixel, but there’s one encoding that says we need to repeat the previous pixel a certain number of times. To support that, decodePixels always returns an array. Each one of those arrays needs to be allocated.
The “flattening” part of flatMap takes all of those little arrays that we just created and merges them into one much larger array. That’s a new allocation, and all the small arrays we just created are deallocated.
This prefix method puts a cap on the number of pixels that we can produce.
The second flatMap starts by calling RGBAPixel.data, the method that we flagged when we used the Allocations instrument. We saw earlier that it returns an array with either three or four elements. What we’re seeing now means that one of those 3- or 4-element arrays is being created for every single pixel in the final image. Sometimes the compiler is able to optimize some of these extra allocations away, but as we saw in the trace, that won't always happen.
Next, the small arrays are flattened into one big new array again.
And finally, that big array of RGB or RGBA pixel data is copied into a new Data instance so that it can be returned.
There is a certain elegance to these lines of code. They pack a lot of power into a few short, chained method calls. But just because it’s shorter, doesn’t mean it’s faster. Instead of working through all those different steps, and eventually ending up with a Data instance to return, what if we allocate the data first, and then write each pixel as we decode from the binary source data. That way, we can do all the same processing without needing any of those intermediate allocations. I’m back in my parsing function. Let’s rewrite this method to eliminate all those extra allocations.
The first thing we’ll do is calculate "totalBytes": the final size for the result data. Then we'll allocate "pixelData", with just the right amount of storage. The "offset" variable keeps track of how much data we've written. This up-front allocation means we won’t have to make additional allocations as we work our way through the binary data.
Next, we’ll parse each piece of data and process it immediately. We can use a switch statement to deal with the parsed pixel.
For the encoded pixels that indicate a run, we’ll loop the required number of times, writing out the pixel data each time.
For any other kind of pixel, we'll decode and write that directly into the data. That’s the full rewrite, with no allocations other than the data that we need to return. Let’s verify that we’ve fixed this issue by profiling our test again.
We can see right away that the number of allocations is far lower. To see the actual number of allocations in my code, I can use the filter. I'll click into the filter field at the bottom of the window. And type “QOI.init.” This filters out any call tree that doesn’t include QOI.init somewhere in the stack trace.
The remaining lines show that now our parser code only makes a handful of allocations, for a total of just under two megabytes. When I hold down option and click on the disclosure triangle, the call tree expands.
The expanded tree shows what we want.
The only thing we're really allocating is the Data that stores our resulting image.
Looking at benchmarks, that’s another great improvement! By cutting out those extra allocations, we’ve reduced the execution time by over half.
So far we’ve made two algorithmic changes to our parser, eliminating a lot of accidental copying and then reducing the number of allocations. For our next few improvements, we’ll use some more advanced techniques to allow the Swift compiler to eliminate a lot of the automatic memory management work that happens at runtime.
First, let’s talk about how arrays and other collection types work. Swift’s Array type is one of the most common tools in our toolbox, because it’s fast, safe, and easy to use. Arrays can grow or shrink as needed, so you don’t have to know in advance how many items you’ll be working with. Swift handles the memory for you behind the scenes. Arrays are also value types, which means that changes to one copy of an array don’t affect other copies. If you make a copy of an array, by assigning it to a different variable, or passing it off to a function, Swift doesn’t immediately duplicate the elements. Instead, it uses an optimization called copy-on-write, which delays that duplication until you actually change one of the arrays.
These features make arrays a great general-purpose collection, but they also have some trade-offs. To support its dynamic size and multiple references, Array stores its contents in a separate allocation, often on the heap. The Swift runtime uses reference counting to keep track of the number of copies of each array, and when you make a change, arrays make a uniqueness check to see if they need to copy their elements. Finally, to make sure your code stays safe, Swift enforces exclusivity, which means that two different things can’t modify the same data at the same time. While this rule is often enforced at compile time, it sometimes can only be enforced at runtime. Now that we’ve learned about these low-level concepts, let’s look at how they show up in our profiling. We’ll start by looking for runtime checks for exclusivity, which can add work to your program and get in the way of optimizations. Before we can start looking for exclusivity checks, we actually have a good problem. We've improved our performance enough that Instruments isn't really getting enough time to inspect the parser process. We can give it a little more to look at by looping over the parsing code – 50 times ought to do the trick.
Let’s take a look at this richer profile.
Exclusivity tests show up in a trace as the ‘swift_beginAccess’ and 'swiftendAccess' symbols. Once again, I’ll click in the filter box at the bottom of the window. Then enter the symbol name.
At the top of the flame graph, swift_beginAccess appears a few times, with the symbols that require this checking right below. Those symbols are the accessors for the previous pixel and the pixel cache, which are stored in my parser's State class. I’ll switch back to Xcode and find that declaration. Here it is… State is a class with those two properties we saw in the flame graph. Modifying a class instance is one of the situations where Swift has to check for exclusivity at runtime, so this declaration is the reason we’re seeing what we’re seeing. We can eliminate that checking by moving these properties out of the class, and putting them directly in the parser type.
Next, we’ll do a find-replace, to remove the `state.` accesses for previousPixel and pixelCache.
When I build, the compiler lets me know there’s a little more work to do.
Since the state properties aren’t nested in a class anymore, I can’t modify them in a non-mutating method.
I’ll accept this fix to make the method mutating.
There's one more to fix...
And we're done. With that change in place, let’s go back to the test.
And re-record a profile to see the change.
I’ll filter on swift_beginAccess again.
There's nothing there! We’ve completely removed the runtime exclusivity checking. Let’s take another look at those state variables. This is a good place to use a new Swift feature to move data from heap memory to stack memory and make sure that those exclusivity checks can’t creep back in. The pixel cache in our parser is an array of RGBAPixels – it’s initialized with 64 elements, and never changes size. This cache would be a great place to use the new InlineArray type. InlineArray is a new standard library type in Swift 6.2. Like a regular array, it stores multiple elements of the same kind in contiguous memory, but it has some important differences. First, inline arrays have a fixed size that you set at compile time. Unlike regular arrays that you can append to or remove from, InlineArray uses the new value generics feature to make its size a part of its type. That means that while you can make changes to the elements of an inline array, you can’t append or remove, or assign an inline array to one of a different size.
Second, like the name implies, when you use an InlineArray, the elements are always stored inline instead of in a separate allocation. Inline arrays don’t share storage between copies, and don’t use copy-on-write. Instead, they’re copied whenever you make a copy. This eliminates the need for all the reference counting and uniqueness and exclusivity checks that regular arrays require. This different copying behavior of InlineArray is a bit of a double-edged sword – if your usage of an Array requires making copies or sharing references between different variables or classes, an InlineArray might not the right choice. In this case, however, the pixel cache is a fixed size array that is modified in place, but never copied. A perfect place to use `InlineArray`.
For our final optimization, we’ll use the standard library’s new span types to eliminate most of the reference counting while parsing. Back in the time profiler flame graph, let’s use filtering again to only look at our QOI parser. I’ll add QOI.init in the filter box.
The view changes to focus on only the stack traces that include our parsing initializer. Let’s look for the retain and release symbols. swift_retain is this pink bar, showing up in 7% of samples, and swift_release is this one, showing up in another 7%. The uniqueness check that we talked about earlier also shows up here, in another 3% of samples.
To figure out where these are coming from, I’ll click back onto swift_release and just like we’ve done before, I’ll scan down the heaviest stack trace to find the first userdefined method. It looks like it’s the same readByte method that we started with.
This time, it isn’t an algorithmic issue that we’re dealing with, but the use of `Data` itself. Just like `Array`, `Data` usually stores its memory on the heap, and needs to be reference counted.
These reference counting operations - retain and release - are very efficient, but can add up to a significant chunk of time when they happen in a tight loop, just like this method. To deal with this, we want to move from working with a high-level collection type like `Data` or `Array` to a type that doesn’t cause this explosion of reference counting. Up until Swift 6.2, you might have used a method like `withUnsafeBufferPointer` to access the underlying storage of a collection. Those methods let you manually manage memory, with no reference counting, but they introduce unsafety into your code.
It's worth asking - why are pointers unsafe? Swift calls them unsafe because they get around many of the language's safety guarantees. They can point to both initialized and uninitialized memory, they drop some type guarantees, and they can escape from their context, leading to a risk of accessing memory that's no longer allocated. When you use unsafe pointers, you are fully responsible for keeping your code's safety intact. The compiler can't help you out. This processUsingBuffer function does use unsafe pointers correctly. The usage stays entirely within the unsafe buffer pointer closure, with only the result of the calculation returned at the end. On the other hand, this `getPointerToBytes()` function is dangerous. It contains two major programming errors. The function creates an array of bytes and calls the with UnsafeBufferPointer method, but instead of limiting use of the pointer to the closure, it returns the pointer to the outer scope. Error number 1. Even worse, the code then returns that no-longer valid pointer from the function itself. Error number 2! Both of these errors extend the pointer's life beyond the lifetime of what it's pointing to, creating a dangerous leftover reference to moved or deallocated memory.
To help with this, Swift 6.2 introduces a new group of types called Spans. Spans are a new way to work with the contiguous memory belonging to a collection. Importantly, spans use the new “non-escapable” language feature, which allows the compiler to tie their lifetimes to the collection that provides them. The memory that a span provides access to is guaranteed to live as long as the span, with no chance of a lingering reference. Because every span type is declared as non-Escapable, the compiler prevents you from escaping or returning a span outside the context where you retrieved it.
This "processUsingSpan" method shows how you can use a span to write simpler, safer code than pointers allow. To get a Span over the array's elements, just use the span property. Without using a closure, we have access to the array's storage that’s just as efficient as unsafe pointers, without any of the unsafety. We can see the non-escapable language feature in action if we try to rewrite the dangerous function from before. The first thing we'll run into is that we can’t even write this same function signature with `Span`. Because a span's lifetime is tied to the collection that provides it, without any collection or span being passed in, there’s nowhere to get a lifetime for the span being passed out.
What if we try to hide the span from compiler, by capturing it in a closure. In this function, I’ll create an array, access its span, and then try to return a closure that captures that span. But even that doesn’t work. The compiler recognizes that capturing the span lets it escape, and points out that its lifetime depends on the local array.
This compiler-checked requirement that a span doesn't escape its scope means that retains and releases aren’t necessary. We get the performance of using an unsafe buffer without any of the unsafety. The `Span` family includes typed and raw versions of both read-only and mutable spans, for working with existing collections, as well as an output span that you can use to initialize a new collection. The family also includes UTF8Span, a new type that's designed for safe and efficient Unicode processing.
Back in our code, let’s implement this same readByte method for RawSpan.
We'll start by adding a RawSpan extension...
and defining the readByte method.
The API for RawSpan is a little different from Data, but it does the same thing as our implementation above. It loads the first byte, shrinks the RawSpan, and then returns the loaded value. Note that this unsafeLoad method is named this way only because it can be unsafe to load certain kinds of types. Loading a built-in integer type, like we're doing here, is always safe.
Next, we'll update our parsing methods.
These two parsing methods should use RawSpan instead of Data as a parameter.
I'll also need to make a change at the call site.
Instead of passing the data itself, we’ll get the data's RawSpan and pass that into the parsing method. I'll access Data's RawSpan using the `bytes` property. This rawBytes value is non-escapable. I wouldn't be able to return it from this function, but I can pass it into the parsing method without any problems.
With that change, I'm all done with the update to use RawSpan. To save even more low-level work, we can also adopt the new OutputSpan in our parsing method.
Instead of creating a zero-initialized Data, we'll use the new rawCapacity initializer, that provides an OutputSpan to gradually fill in the uninitialized data.
OutputSpan keeps track of how much data you've written, so we can use its count property instead of this separate offset variable.
And we'll use a different variation of our write-to method that writes to the outputSpan instead of a Data instance.
Let's take a look at that method's implementation.
The write(to:) method is able to call OutputSpan's append method for each channel in the pixel. Since OutputSpan is a non-escapable type that is designed for this kind of use, this is both simpler and more efficient than writing into the `Data` instance, and safer than dropping down to an unsafe buffer pointer. With those changes finished, I’ll jump back to my test. and record a new profile.
I'll filter on QOI.init.
And in the flame graph we can see that those swift_retain and swift_release blocks are gone! That actually looks great. Let's stop there and see the results of adopting InlineArray and RawSpan.
With these latest changes, our memory management work has made our parsing six times as fast, without resorting to any unsafe code. That's 16 times faster than we were after getting rid of the quadratic algorithm, and over 700 times faster than what we started with! We've covered a lot in this session. While revising this image parsing library, we made two algorithmic changes to operate more efficiently and to reduce allocations. We used new standard library types, InlineArray and RawSpan, to eliminate runtime memory management, and learned about the new non-escapable language feature. The new Swift Binary Parsing library is built on top of these same features. The library is designed for building safe, efficient parsers of binary formats, and supports developers in handling multiple different kinds of safety. The library provides a whole set of parsing initializers and other tools that guide you to safely consume values from raw, binary data.
Here’s an example of a parser for the QOI header, written using the new library. This shows several of its features, including ParserSpan, a customized raw span type for parsing binary data. And parsing initializers that prevent integer overflow and let you specify signedness, bit width, and byte order. The library also provides validating parsers for your own custom raw-representable types, and optional-producing operators, for safely making calculations with untrusted, newly parsed values.
We're already using the Binary Parsing library inside Apple, and it's publicly available today! We encourage you to take a look and try it out. You can join the community by posting in the Swift forums or opening issues or pull requests on GitHub.
Thanks so much for joining me on this journey through optimizing our Swift code! Try using Xcode and Instruments to profile a test of the performance-critical parts of your own app. You can explore the new InlineArray and Span types in the documentation, or by downloading the new version of Xcode. Have a great WWDC!

import Foundation

extension Data {
  /// Consume a single byte from the start of this data.
  mutating func readByte() -> UInt8? {
    guard !isEmpty else { return nil }
    return self.popFirst()
  }
}

9:56 - RGBAPixel.data(channels:) method

extension RGBAPixel {
  /// Returns the RGB or RGBA values for this pixel, as specified
  /// by the given channels information.
  func data(channels: QOI.Channels) -> some Collection<UInt8> {
    switch channels {
    case .rgb:
      [r, g, b]
    case .rgba:
      [r, g, b, a]
    }
  }
}

10:21 - Original QOIParser.parseQOI(from:) method

extension QOIParser {
  /// Parses an image from the given QOI data.
  func parseQOI(from input: inout Data) -> QOI? {
    guard let header = QOI.Header(parsing: &input) else { return nil }
    
    let pixels = readEncodedPixels(from: &input)
      .flatMap { decodePixels(from: $0) }
      .prefix(header.pixelCount)
      .flatMap { $0.data(channels: header.channels) }

    return QOI(header: header, data: Data(pixels))
  }
}

12:53 - Revised QOIParser.parseQOI(from:) method

extension QOIParser {
  /// Parses an image from the given QOI data.
  func parseQOI(from input: inout Data) -> QOI? {
    guard let header = QOI.Header(parsing: &input) else { return nil }
    
    let totalBytes = header.pixelCount * Int(header.channels.rawValue)
    var pixelData = Data(repeating: 0, count: totalBytes)
    var offset = 0
    
    while offset < totalBytes {
      guard let nextPixel = parsePixel(from: &input) else { break }
      
      switch nextPixel {
      case .run(let count):
        for _ in 0..<count {
          state.previousPixel
            .write(to: &pixelData, at: &offset, channels: header.channels)
        }
      default:
        decodeSinglePixel(from: nextPixel)
          .write(to: &pixelData, at: &offset, channels: header.channels)
      }
    }
    
    return QOI(header: header, data: pixelData)
  }
}

15:07 - Array behavior

var array = [1, 2, 3]
array.append(4)
array.removeFirst()
// array == [2, 3, 4]

var copy = array
copy[0] = 10      // copy happens on mutation
// array == [2, 3, 4]
// copy == [10, 3, 4]

19:47 - InlineArray behavior (part 1)

var array: InlineArray<3, Int> = [1, 2, 3]
array[0] = 4
// array == [4, 2, 3]

// Can't append or remove elements
array.append(4)
// error: Value of type 'InlineArray<3, Int>' has no member 'append'

// Can only assign to a same-sized inline array
let bigger: InlineArray<6, Int> = array
// error: Cannot assign value of type 'InlineArray<3, Int>' to type 'InlineArray<6, Int>'

20:23 - InlineArray behavior (part 2)

var array: InlineArray<3, Int> = [1, 2, 3]
array[0] = 4
// array == [4, 2, 3]

var copy = array    // copy happens on assignment
for i in copy.indices {
    copy[i] += 10
}
// array == [4, 2, 3]
// copy == [14, 12, 13]

23:13 - processUsingBuffer() function

// Safe usage of a buffer pointer
func processUsingBuffer(_ array: [Int]) -> Int {
    array.withUnsafeBufferPointer { buffer in
        var result = 0
        for i in 0..<buffer.count {
            result += calculate(using: buffer, at: i)
        }
        return result
    }
}

23:34 - Dangerous getPointerToBytes() function

// Dangerous - DO NOT USE!
func getPointerToBytes() -> UnsafePointer<UInt8> {
    let array: [UInt8] = Array(repeating: 0, count: 128)
    // DANGER: The next line escapes a pointer
    let pointer = array.withUnsafeBufferPointer { $0.baseAddress! }
    // DANGER: The next line returns the escaped pointer
    return pointer
}

24:46 - processUsingSpan() function

// Safe usage of a span
@available(macOS 16.0, *)
func processUsingSpan(_ array: [Int]) -> Int {
    let intSpan = array.span
    var result = 0
    for i in 0..<intSpan.count {
        result += calculate(using: intSpan, at: i)
    }
    return result
}

25:07 - getHiddenSpanOfBytes() function (attempt 1)

@available(macOS 16.0, *)
func getHiddenSpanOfBytes() -> Span<UInt8> { }
// error: Cannot infer lifetime dependence...

25:28 - getHiddenSpanOfBytes() function (attempt 2)

@available(macOS 16.0, *)
func getHiddenSpanOfBytes() -> () -> Int {
    let array: [UInt8] = Array(repeating: 0, count: 128)
    let span = array.span
    return { span.count }
}

26:27 - RawSpan.readByte() method

@available(macOS 16.0, *)
extension RawSpan {
  mutating func readByte() -> UInt8? {
    guard !isEmpty else { return nil }
    
    let value = unsafeLoadUnaligned(as: UInt8.self)
    self = self._extracting(droppingFirst: 1)
    return value
  }
}

28:02 - Final QOIParser.parseQOI(from:) method

/// Parses an image from the given QOI data.
mutating func parseQOI(from input: inout RawSpan) -> QOI? {
  guard let header = QOI.Header(parsing: &input) else { return nil }
  
  let totalBytes = header.pixelCount * Int(header.channels.rawValue)
  
  let pixelData = Data(rawCapacity: totalBytes) { outputSpan in
    while outputSpan.count < totalBytes {
      guard let nextPixel = parsePixel(from: &input) else { break }
      
      switch nextPixel {
      case .run(let count):
        for _ in 0..<count {
          previousPixel
            .write(to: &outputSpan, channels: header.channels)
        }
        
      default:
        decodeSinglePixel(from: nextPixel)
          .write(to: &outputSpan, channels: header.channels)
        
      }
    }
  }
  
  return QOI(header: header, data: pixelData)
}

28:31 - RGBAPixel.write(to:channels:) method

@available(macOS 16.0, *)
extension RGBAPixel {
  /// Writes this pixel's RGB or RGBA data into the given output span.
  @lifetime(&output)
  func write(to output: inout OutputRawSpan, channels: QOI.Channels) {
    output.append(r)
    output.append(g)
    output.append(b)
    
    if channels == .rgba {
      output.append(a)
    }
  }
}

- 0:00 - Introduction & Agenda
- Learn about optimizing performance of Swift code apps and libraries using Swift 6.2. The new 'InlineArray' and 'Span' types reduce allocations, exclusivity checks, and reference counting. A new open-source Swift library, Binary Parsing, is introduced for fast and safe binary parsing.
- 1:19 - QOI format & parser app
- This WWDC25 session's app loads images in the QOI format — a simple, lossless format with a single-page spec. The app's image parser handles various pixel-encoding methods. Then, the app loads a small icon file instantly but takes a few seconds to load a larger photo of a bird.
- 2:25 - Algorithms
- When apps work with real-world data, performance issues can often happen because of incorrect use of algorithms or data structures. To identify and address these problems, you can use Instruments, which has templates of instruments to analyze allocations and releases and identifying inefficient code with profilers. The Time Profiler instrument is particularly useful for performance issues. By analyzing the captured calls and stack traces, you can pinpoint the areas where apps spend the most time. In the example, a significant amount of time was spent on a system call for copying data, 'platform_memmove'. By using Instruments, this example analyzes a custom method named 'readByte'. This method was added to an extension on the 'Data' type, which caused excessive copying of binary data. The example replaces the method with the more efficient 'popFirst()' method, which shrinks the data from the front of a sequence without copying. This change resolved the performance issue in the 'readByte' method. After making the change, the example ran the profile again, and the significant 'platform_memmove' bar was gone from the flame graph. Benchmarking showed a substantial speedup, and the relationship between the image size and parsing time changed from quadratic to linear, indicating a more efficient algorithm.
- 8:17 - Allocations
- The app is profiled again to find that the image parser results in excessive memory allocations and deallocations, particularly involving arrays. The high number of allocations, nearly a million for parsing a single image, indicates a critical problem. Most of these allocations are transient, short-lived, which suggests they can be optimized. To identify the source of these unnecessary allocations, the example uses the Allocations instrument in Instruments. The analysis reveals that a method called 'RGBAPixel.data(channels:)' is a major culprit. This method creates an array every time it is called, leading to a substantial number of allocations. The code's structure, which involves a complex chain of 'flatMap' and 'prefix' methods, contributes to the problem. Each step in this chain results in new allocations, as arrays are created, flattened, and copied repeatedly. While this approach is concise, it's not memory-efficient. To address this issue, the example rewrites the parsing function. Instead of relying on intermediate allocations, it calculates the total size of the result data upfront and allocates a single buffer. This approach eliminates the need for repeated allocations during the decoding process.
- 16:30 - Exclusivity
- The app's performance improved so much that the profiling instruments needed more data. After looping the parsing code 50 times, the results showed 'swift_beginAccess' and 'swift_endAccess' symbols that indicate exclusivity tests. These exclusivity tests were caused by properties in the 'State' class nested within the `QOIParser` structure, which the example then moves directly into the parent parser type to eliminate the exclusivity checks. After a few compiler adjustments, the exclusivity checking was completely removed, as verified by a new profile run.
- 19:12 - Stack versus heap
- The example replaces the app's usage of 'Array' with 'InlineArray', a fixed-size collection stored inline, that optimizes memory use by eliminating reference counting and exclusivity checks. It's ideal for the pixel cache — a 64-element array that never changes size and is modified in place, enhancing performance without the need for copying or sharing references.
- 21:08 - Reference counting
- In the app's final optimization example, the example uses the new 'Span' types to improve performance and enhance memory safety. In Instruments, the flame graph is used from the Time Profiler analysis. The profiled data is focused on the 'QOIParser' and finds significant time spent on reference counting operations, particularly with the 'Data' type due its to copy-on-write semantics. 'Span' and its related types are a new way to work with contiguous memory in a collection. They use Swift's nonescapable ('~Escapable') feature, which binds their lifetimes to the collection, ensuring memory safety and eliminating the need for manual memory management. This allows for efficient access to memory without the risks associated with unsafe pointers. The example demonstrates how to use the 'Span' types to rewrite existing methods, making them simpler, safer, and more performant. In the image parsing methods, 'Data' is replaced with 'RawSpan', and the overhead of reference counting is greatly reduced. Additionally, 'OutputSpan' is adopted in the parsing process to further optimization, making the parsing operation six times faster than before without resorting to unsafe pointers.
- 29:52 - Swift Binary Parsing library
- Swift Binary Parsing enables you to create safe and efficient parsers for binary formats. It offers tools to handle various safety aspects, including preventing integer overflow, specifying signedness, bit width, and byte order, and validating custom types. The library is already in use at Apple and is publicly available for you to try and contribute to via the Swift forums and GitHub.
- 31:03 - Next steps
- Key takeaways include: Using Xcode and Instruments for profiling apps. Analyzing the performance of algorithms to identity bottlenecks. Exploring solutions to the above with the new 'InlineArray' and 'Span' types introduced in Swift 6.2.

章节

资源

相关视频

WWDC25

WWDC24