Bug 1479145 - Give RGB textures a 32-byte aligned stride on macOS in order to improve texture upload efficiency on certain drivers. r=mattwoodrow a=pascalc
authorMarkus Stange <mstange@themasta.com>
Fri, 29 Mar 2019 20:11:12 +0000
changeset 525964 f64e6108bdfc6462983667170768450c320efaf7
parent 525963 233aa5cf03be6c45ed388f2a7b3b73563de06e2c
child 525965 81e863ea76d5add70e24fc5134498cc21dd95eb2
push id2032
push userffxbld-merge
push dateMon, 13 May 2019 09:36:57 +0000
treeherdermozilla-release@455c1065dcbe [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersmattwoodrow, pascalc
bugs1479145
milestone67.0
Bug 1479145 - Give RGB textures a 32-byte aligned stride on macOS in order to improve texture upload efficiency on certain drivers. r=mattwoodrow a=pascalc In particular, it looks like this alignment is required by the Intel driver on macOS if you want to avoid CPU copies. It was already known that the efficiency gains from client storage only materialize if you follow certain restrictions: - The textures need to use the TEXTURE_RECTANGLE_ARB texture target. - The textures' format, internalFormat and type need to be chosen from a small list of supported configurations. Unsupported configurations will trigger format conversions on the CPU. - The GL_TEXTURE_STORAGE_HINT_APPLE may need to be set to shared or cached. - glTextureRangeAPPLE may or may not make a difference. It now appears that the stride alignment is another requirement: When uploading textures which otherwise comply with the above requirements, the Intel driver will still make copies using the CPU if the texture's stride is not 32-byte aligned. These CPU copies are reflected in a high CPU usage (as observed in Activity Monitor) and they show up in profiles as time spent inside _platform_memmove under glrUpdateTexture. However, when uploading 32-byte stride aligned textures which comply with the above requirements, this CPU usage goes away. There might still be hardware copies behind the scenes, but they no longer take up CPU time. Differential Revision: https://phabricator.services.mozilla.com/D25316
gfx/layers/ImageDataSerializer.cpp
--- a/gfx/layers/ImageDataSerializer.cpp
+++ b/gfx/layers/ImageDataSerializer.cpp
@@ -17,17 +17,22 @@
 
 namespace mozilla {
 namespace layers {
 namespace ImageDataSerializer {
 
 using namespace gfx;
 
 int32_t ComputeRGBStride(SurfaceFormat aFormat, int32_t aWidth) {
+#ifdef XP_MACOSX
+  // Some drivers require an alignment of 32 bytes for efficient texture upload.
+  return GetAlignedStride<32>(aWidth, BytesPerPixel(aFormat));
+#else
   return GetAlignedStride<4>(aWidth, BytesPerPixel(aFormat));
+#endif
 }
 
 int32_t GetRGBStride(const RGBDescriptor& aDescriptor) {
   return ComputeRGBStride(aDescriptor.format(), aDescriptor.size().width);
 }
 
 uint32_t ComputeRGBBufferSize(IntSize aSize, SurfaceFormat aFormat) {
   MOZ_ASSERT(aSize.height >= 0 && aSize.width >= 0);