I don't think you're using the same definition of "optimization" as everyone else. Just designing things efficiently isn't really the same.
It sounds like you came up with the flyweight pattern.
I don't think you're using the same definition of "optimization" as everyone else. Just designing things efficiently isn't really the same.
It sounds like you came up with the flyweight pattern.
Interesting, I had to look up what a "flyweight" pattern was. I may not quite understand the concept correctly from the definition I saw though. To me "flyweight" just describes every program ever written in c, or just relegating the functionality of objects to the things that manage them. I don't know if that's right. ..?
Tell that to the stupid compiler.
...
I don't have the original assembly file anymore but it was fairly retarded let me tell you what. Now I'm not very good at digesting that kind of stuff because I'm not very good at reading and understanding it in depth, but it was dumb. Obvious offenders were reading from too many points random memory that never changes, likely causing cache issues (Though like I said vcp didn't want to track them for some reason), and just having to pull out anything that was confusing to it from the inner loop. Pretty not-hard to do things, but it just seems like a waste a time. However in this case there was a large improvement so it was well worth it IMO.
I can post the newer code listing though. I did add a few good ideas in there.
Good enough for me. Now I don't have to worry about it at all.Code:; 255 : ; 256 : float32 tilePosX = (x1 * tileWidth) + (minX - x1) * tileWidth; 0005e 8b 45 10 mov eax, DWORD PTR _minX$[ebp] 00061 0f af 45 08 imul eax, DWORD PTR _tileWidth$[ebp] ; 257 : float32 tilePosY = (y1 * tileHeight) + (minY - y1) * tileHeight; ; 258 : float32 currentTilePositionX = tilePosX; ; 259 : float32 currentTilePositionY = tilePosY; ; 260 : ; 261 : Quad2D* vertexPointer = (Quad2D*)spriteBatch->PushCurrentVertexArrayPointer(); 00065 8b 7d 0c mov edi, DWORD PTR _spriteBatch$[ebp] 00068 89 45 08 mov DWORD PTR tv870[ebp], eax 0006b 8b c6 mov eax, esi 0006d 0f af 45 fc imul eax, DWORD PTR _tileHeight$[ebp] 00071 db 45 08 fild DWORD PTR tv870[ebp] 00074 d9 5d 14 fstp DWORD PTR _tilePosX$[ebp] 00077 d9 45 14 fld DWORD PTR _tilePosX$[ebp] 0007a d9 5d ec fstp DWORD PTR _currentTilePositionX$[ebp] 0007d 89 45 08 mov DWORD PTR tv867[ebp], eax 00080 8b 47 28 mov eax, DWORD PTR [edi+40] 00083 89 45 18 mov DWORD PTR _vertexPointer$[ebp], eax 00086 db 45 08 fild DWORD PTR tv867[ebp] 00089 83 c4 30 add esp, 48 ; 00000030H ; 262 : ; 263 : spriteBatch->SetBlendMode(GetBlendMode()); 0008c 8d 43 2c lea eax, DWORD PTR [ebx+44] 0008f 50 push eax 00090 8b cf mov ecx, edi 00092 d9 5d fc fstp DWORD PTR _currentTilePositionY$[ebp] 00095 e8 00 00 00 00 call ?SetBlendMode@SpriteBatch@@QAEXABVBlendMode@@@Z ; SpriteBatch::SetBlendMode ; 264 : spriteBatch->SetTextureID(GetTileset()->GetTextureID()); 0009a 8b 43 14 mov eax, DWORD PTR [ebx+20] 0009d e8 00 00 00 00 call ?GetTextureID@Tileset@@QBEIXZ ; Tileset::GetTextureID 000a2 50 push eax 000a3 8b cf mov ecx, edi 000a5 e8 00 00 00 00 call ?SetTextureID@SpriteBatch@@QAEXI@Z ; SpriteBatch::SetTextureID ; 265 : ; 266 : const Color layerColor = GetColor(); 000aa 8b 4b 30 mov ecx, DWORD PTR [ebx+48] ; 267 : ; 268 : for(int32 y(minY); y != maxY; ++y) 000ad 89 75 08 mov DWORD PTR _y$16981[ebp], esi 000b0 3b 75 1c cmp esi, DWORD PTR _maxY$[ebp] 000b3 0f 84 16 01 00 00 je $LN7@InternalDr@2 ; 269 : { ; 270 : const TileMapLayerCell* currentCell = &m_tiles(y, minX); 000b9 8b 45 f0 mov eax, DWORD PTR _maxX$[ebp] 000bc d9 45 fc fld DWORD PTR _currentTilePositionY$[ebp] 000bf 2b 45 10 sub eax, DWORD PTR _minX$[ebp] 000c2 d9 45 ec fld DWORD PTR _currentTilePositionX$[ebp] 000c5 c1 e0 03 shl eax, 3 000c8 89 45 f0 mov DWORD PTR tv393[ebp], eax 000cb eb 02 jmp SHORT $LN9@InternalDr@2 $LN73@InternalDr@2: ; 267 : ; 268 : for(int32 y(minY); y != maxY; ++y) 000cd d9 c9 fxch ST(1) $LN9@InternalDr@2: ; 269 : { ; 270 : const TileMapLayerCell* currentCell = &m_tiles(y, minX); 000cf 8b 43 24 mov eax, DWORD PTR [ebx+36] 000d2 0f af 45 08 imul eax, DWORD PTR _y$16981[ebp] 000d6 03 45 10 add eax, DWORD PTR _minX$[ebp] 000d9 8b 53 18 mov edx, DWORD PTR [ebx+24] 000dc 8d 14 c2 lea edx, DWORD PTR [edx+eax*8] ; 271 : const TileMapLayerCell* lastCell = currentCell + (maxX - minX); 000df 8b 45 f0 mov eax, DWORD PTR tv393[ebp] 000e2 03 c2 add eax, edx 000e4 89 45 ec mov DWORD PTR _lastCell$16986[ebp], eax ; 272 : ; 273 : for( ; currentCell != lastCell; ++currentCell) 000e7 3b d0 cmp edx, eax 000e9 0f 84 c3 00 00 00 je $LN54@InternalDr@2 $LN6@InternalDr@2: ; 274 : { ; 275 : const Tile* tile = currentCell->tile; 000ef 8b 02 mov eax, DWORD PTR [edx] ; 276 : ; 277 : if(tile != null) 000f1 85 c0 test eax, eax 000f3 0f 84 a6 00 00 00 je $LN74@InternalDr@2 ; 278 : { ; 279 : Rectf textureCoords = tile->uv; 000f9 8d 70 08 lea esi, DWORD PTR [eax+8] ; 280 : ; 281 : // flip ; 282 : if(currentCell->flags & 1) Swap(textureCoords.min.x, textureCoords.max.x); 000fc 8a 42 06 mov al, BYTE PTR [edx+6] 000ff 8d 7d dc lea edi, DWORD PTR _textureCoords$16992[ebp] 00102 a5 movsd 00103 a5 movsd 00104 a5 movsd 00105 a5 movsd 00106 a8 01 test al, 1 00108 74 0c je SHORT $LN32@InternalDr@2 0010a d9 45 dc fld DWORD PTR _textureCoords$16992[ebp] 0010d d9 45 e4 fld DWORD PTR _textureCoords$16992[ebp+8] 00110 d9 5d dc fstp DWORD PTR _textureCoords$16992[ebp] 00113 d9 5d e4 fstp DWORD PTR _textureCoords$16992[ebp+8] $LN32@InternalDr@2: ; 283 : if(currentCell->flags & 2) Swap(textureCoords.min.y, textureCoords.max.y); 00116 a8 02 test al, 2 00118 74 0c je SHORT $LN34@InternalDr@2 0011a d9 45 e0 fld DWORD PTR _textureCoords$16992[ebp+4] 0011d d9 45 e8 fld DWORD PTR _textureCoords$16992[ebp+12] 00120 d9 5d e0 fstp DWORD PTR _textureCoords$16992[ebp+4] 00123 d9 5d e8 fstp DWORD PTR _textureCoords$16992[ebp+12] $LN34@InternalDr@2: ; 284 : ; 285 : float vertices[4] = { ; 286 : currentTilePositionX, ; 287 : currentTilePositionY, ; 288 : currentTilePositionX + floatTileWidth, ; 289 : currentTilePositionY + floatTileHeight ; 290 : }; ; 291 : ; 292 : const Color color = layerColor; ; 293 : ; 294 : vertexPointer->SetVertexUVColorData((float32*)vertices, (float32*)&textureCoords, color); 00126 8b 45 18 mov eax, DWORD PTR _vertexPointer$[ebp] 00129 d9 45 f8 fld DWORD PTR _floatTileWidth$[ebp] 0012c d8 c1 fadd ST(0), ST(1) ; 295 : ++vertexPointer; 0012e 8b 7d 0c mov edi, DWORD PTR _spriteBatch$[ebp] 00131 d9 c2 fld ST(2) 00133 89 48 10 mov DWORD PTR [eax+16], ecx 00136 d8 45 f4 fadd DWORD PTR _floatTileHeight$[ebp] 00139 89 48 24 mov DWORD PTR [eax+36], ecx 0013c d9 ca fxch ST(2) 0013e 89 48 38 mov DWORD PTR [eax+56], ecx 00141 d9 10 fst DWORD PTR [eax] 00143 89 48 4c mov DWORD PTR [eax+76], ecx 00146 d9 cb fxch ST(3) 00148 83 c0 50 add eax, 80 ; 00000050H 0014b d9 50 b4 fst DWORD PTR [eax-76] 0014e 89 45 18 mov DWORD PTR _vertexPointer$[ebp], eax 00151 d9 45 dc fld DWORD PTR _textureCoords$16992[ebp] 00154 d9 58 b8 fstp DWORD PTR [eax-72] 00157 d9 45 e0 fld DWORD PTR _textureCoords$16992[ebp+4] 0015a d9 58 bc fstp DWORD PTR [eax-68] 0015d d9 cb fxch ST(3) 0015f d9 50 c4 fst DWORD PTR [eax-60] 00162 d9 ca fxch ST(2) 00164 d9 50 c8 fst DWORD PTR [eax-56] 00167 d9 45 dc fld DWORD PTR _textureCoords$16992[ebp] 0016a d9 58 cc fstp DWORD PTR [eax-52] 0016d d9 45 e8 fld DWORD PTR _textureCoords$16992[ebp+12] 00170 d9 58 d0 fstp DWORD PTR [eax-48] 00173 d9 c9 fxch ST(1) 00175 d9 50 d8 fst DWORD PTR [eax-40] 00178 d9 c9 fxch ST(1) 0017a d9 58 dc fstp DWORD PTR [eax-36] 0017d d9 45 e4 fld DWORD PTR _textureCoords$16992[ebp+8] 00180 d9 58 e0 fstp DWORD PTR [eax-32] 00183 d9 45 e8 fld DWORD PTR _textureCoords$16992[ebp+12] 00186 d9 58 e4 fstp DWORD PTR [eax-28] 00189 d9 58 ec fstp DWORD PTR [eax-20] 0018c d9 c9 fxch ST(1) 0018e d9 50 f0 fst DWORD PTR [eax-16] 00191 d9 45 e4 fld DWORD PTR _textureCoords$16992[ebp+8] 00194 d9 58 f4 fstp DWORD PTR [eax-12] 00197 d9 45 e0 fld DWORD PTR _textureCoords$16992[ebp+4] 0019a d9 58 f8 fstp DWORD PTR [eax-8] 0019d eb 02 jmp SHORT $LN3@InternalDr@2 $LN74@InternalDr@2: ; 267 : ; 268 : for(int32 y(minY); y != maxY; ++y) 0019f d9 c9 fxch ST(1) $LN3@InternalDr@2: ; 272 : ; 273 : for( ; currentCell != lastCell; ++currentCell) 001a1 83 c2 08 add edx, 8 ; 296 : } ; 297 : ; 298 : currentTilePositionX += floatTileWidth; 001a4 d9 c9 fxch ST(1) 001a6 d8 45 f8 fadd DWORD PTR _floatTileWidth$[ebp] 001a9 3b 55 ec cmp edx, DWORD PTR _lastCell$16986[ebp] 001ac 0f 85 3d ff ff ff jne $LN6@InternalDr@2 $LN54@InternalDr@2: ; 267 : ; 268 : for(int32 y(minY); y != maxY; ++y) 001b2 ff 45 08 inc DWORD PTR _y$16981[ebp] ; 272 : ; 273 : for( ; currentCell != lastCell; ++currentCell) 001b5 dd d8 fstp ST(0) ; 299 : } ; 300 : ; 301 : currentTilePositionX = tilePosX; 001b7 d9 45 14 fld DWORD PTR _tilePosX$[ebp] 001ba 8b 45 08 mov eax, DWORD PTR _y$16981[ebp] ; 302 : currentTilePositionY += floatTileHeight; 001bd d9 c9 fxch ST(1) 001bf d8 45 f4 fadd DWORD PTR _floatTileHeight$[ebp] 001c2 3b 45 1c cmp eax, DWORD PTR _maxY$[ebp] 001c5 0f 85 02 ff ff ff jne $LN73@InternalDr@2 001cb dd d9 fstp ST(1) 001cd dd d8 fstp ST(0) $LN7@InternalDr@2: ; 303 : } ; 304 : ; 305 : // This will simply increment the current vertex pointer in the array. ; 306 : // Since we validate storage beforehand this is extremely fast. ; 307 : spriteBatch->PopCurrentVertexArrayPointer(vertexPointer); 001cf 8b 45 18 mov eax, DWORD PTR _vertexPointer$[ebp] 001d2 2b 47 28 sub eax, DWORD PTR [edi+40] 001d5 6a 50 push 80 ; 00000050H 001d7 59 pop ecx 001d8 99 cdq 001d9 f7 f9 idiv ecx 001db 6b c0 50 imul eax, 80 ; 00000050H 001de 01 47 28 add DWORD PTR [edi+40], eax
...There's probably not much more I can do anyway; too many FPU loads and stores going on.
[edit] Tip of the day: Don't ever trust the compiler to do things for you.
Last edited by Gleeok; 07-07-2015 at 06:00 AM.
This post contains the official Gleeok seal of approval. Look for these and other posts in an area near you.
Looking at it again, I think I misread that before. You're probably using the flyweight pattern, but that's not what you were describing.Interesting, I had to look up what a "flyweight" pattern was. I may not quite understand the concept correctly from the definition I saw though. To me "flyweight" just describes every program ever written in c, or just relegating the functionality of objects to the things that manage them. I don't know if that's right. ..?
It's basically deduplication. When you've got a lot of objects that are largely identical, don't give every one of them its own copy of the common data. Just keep one copy of each and give each instance a pointer. It's the same way combos work in ZC, for instance; each one on the screen is just a combo number rather than a separate copy of the definition.
What I mean to say is that the "root of all evil" optimization isn't high-level design. It's stuff like rewriting a function in assembly to save a few clock cycles. Small things that make the code harder to understand and maintain for relatively little performance gain.Tell that to the stupid compiler.
I think I see what you mean though. [side note; awful explanation: https://en.wikipedia.org/wiki/Flyweight_pattern ]
It's like a chess program. Each piece has no information about itself, not even it's position. Then you get to the board, which defines all the pieces together as bit states. Then you have to up to a component that manages boards just to see if there's something at square 31, and so on. Very efficient.
Yep, ZC does do a lot of things well. Which is why I always think that rewriting it would be easy, because it's easy to see where the bare bones of it is very sane, and where it isn't.
There's probably many different definitions people have of what "optimization" is I guess. Trying to save a few cycles from a function that gets called 1000 times to me is stupid. Trying to stop potential L2 cache miss 1000 times on different particles, entities, and collision stuff is not stupid. That's just the way I see it.
Let's just put it this way:
I have a 1.8GHZ CPU which comes out to *roughly* 30,000,000 CYCLES/FRAME. If I was taking 3% of that, then optimized it down to 1% of that, then those come out to be an improvement of 20,000 CYCLES/FRAME or 1,200,000 CYCLES per SECOND.
....I wonder if I can SIMD that? ..hmm.
Last edited by Gleeok; 07-09-2015 at 07:22 AM. Reason: math is hard when you are tired
This post contains the official Gleeok seal of approval. Look for these and other posts in an area near you.
There are currently 1 users browsing this thread. (0 members and 1 guests)