User Tag List

Results 1 to 10 of 16

Thread: Musings of a Tilemap Junky

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Is this the end?
    ZC Developer
    Saffith's Avatar
    Join Date
    Jan 2001
    Age
    41
    Posts
    3,389
    Mentioned
    178 Post(s)
    Tagged
    6 Thread(s)
    vBActivity - Stats
    Points
    6,435
    Level
    24
    vBActivity - Bars
    Lv. Percent
    70.4%
    I don't think you're using the same definition of "optimization" as everyone else. Just designing things efficiently isn't really the same.
    It sounds like you came up with the flyweight pattern.

  2. #2
    The Time-Loop Continues ZC Developer
    Gleeok's Avatar
    Join Date
    Apr 2007
    Posts
    4,826
    Mentioned
    259 Post(s)
    Tagged
    10 Thread(s)
    vBActivity - Stats
    Points
    12,961
    Level
    33
    vBActivity - Bars
    Lv. Percent
    26.42%
    Quote Originally Posted by Saffith View Post
    It sounds like you came up with the flyweight pattern.
    Interesting, I had to look up what a "flyweight" pattern was. I may not quite understand the concept correctly from the definition I saw though. To me "flyweight" just describes every program ever written in c, or just relegating the functionality of objects to the things that manage them. I don't know if that's right. ..?

    Quote Originally Posted by Saffith View Post
    I don't think you're using the same definition of "optimization" as everyone else. Just designing things efficiently isn't really the same.
    Tell that to the stupid compiler.

    ...
    I don't have the original assembly file anymore but it was fairly retarded let me tell you what. Now I'm not very good at digesting that kind of stuff because I'm not very good at reading and understanding it in depth, but it was dumb. Obvious offenders were reading from too many points random memory that never changes, likely causing cache issues (Though like I said vcp didn't want to track them for some reason), and just having to pull out anything that was confusing to it from the inner loop. Pretty not-hard to do things, but it just seems like a waste a time. However in this case there was a large improvement so it was well worth it IMO.

    I can post the newer code listing though. I did add a few good ideas in there.

    Code:
    ; 255  : 	
    ; 256  : 	float32 tilePosX = (x1 * tileWidth) + (minX - x1) * tileWidth;
    
      0005e	8b 45 10	 mov	 eax, DWORD PTR _minX$[ebp]
      00061	0f af 45 08	 imul	 eax, DWORD PTR _tileWidth$[ebp]
    
    ; 257  : 	float32 tilePosY = (y1 * tileHeight) + (minY - y1) * tileHeight;
    ; 258  : 	float32 currentTilePositionX = tilePosX;
    ; 259  : 	float32 currentTilePositionY = tilePosY;
    ; 260  :  
    ; 261  : 	Quad2D* vertexPointer = (Quad2D*)spriteBatch->PushCurrentVertexArrayPointer();
    
      00065	8b 7d 0c	 mov	 edi, DWORD PTR _spriteBatch$[ebp]
      00068	89 45 08	 mov	 DWORD PTR tv870[ebp], eax
      0006b	8b c6		 mov	 eax, esi
      0006d	0f af 45 fc	 imul	 eax, DWORD PTR _tileHeight$[ebp]
      00071	db 45 08	 fild	 DWORD PTR tv870[ebp]
      00074	d9 5d 14	 fstp	 DWORD PTR _tilePosX$[ebp]
      00077	d9 45 14	 fld	 DWORD PTR _tilePosX$[ebp]
      0007a	d9 5d ec	 fstp	 DWORD PTR _currentTilePositionX$[ebp]
      0007d	89 45 08	 mov	 DWORD PTR tv867[ebp], eax
      00080	8b 47 28	 mov	 eax, DWORD PTR [edi+40]
      00083	89 45 18	 mov	 DWORD PTR _vertexPointer$[ebp], eax
      00086	db 45 08	 fild	 DWORD PTR tv867[ebp]
      00089	83 c4 30	 add	 esp, 48			; 00000030H
    
    ; 262  : 
    ; 263  : 	spriteBatch->SetBlendMode(GetBlendMode());
    
      0008c	8d 43 2c	 lea	 eax, DWORD PTR [ebx+44]
      0008f	50		 push	 eax
      00090	8b cf		 mov	 ecx, edi
      00092	d9 5d fc	 fstp	 DWORD PTR _currentTilePositionY$[ebp]
      00095	e8 00 00 00 00	 call	 ?SetBlendMode@SpriteBatch@@QAEXABVBlendMode@@@Z ; SpriteBatch::SetBlendMode
    
    ; 264  : 	spriteBatch->SetTextureID(GetTileset()->GetTextureID());
    
      0009a	8b 43 14	 mov	 eax, DWORD PTR [ebx+20]
      0009d	e8 00 00 00 00	 call	 ?GetTextureID@Tileset@@QBEIXZ ; Tileset::GetTextureID
      000a2	50		 push	 eax
      000a3	8b cf		 mov	 ecx, edi
      000a5	e8 00 00 00 00	 call	 ?SetTextureID@SpriteBatch@@QAEXI@Z ; SpriteBatch::SetTextureID
    
    ; 265  : 
    ; 266  : 	const Color layerColor = GetColor();
    
      000aa	8b 4b 30	 mov	 ecx, DWORD PTR [ebx+48]
    
    ; 267  : 
    ; 268  : 	for(int32 y(minY); y != maxY; ++y)
    
      000ad	89 75 08	 mov	 DWORD PTR _y$16981[ebp], esi
      000b0	3b 75 1c	 cmp	 esi, DWORD PTR _maxY$[ebp]
      000b3	0f 84 16 01 00
    	00		 je	 $LN7@InternalDr@2
    
    ; 269  : 	{
    ; 270  : 		const TileMapLayerCell* currentCell = &m_tiles(y, minX);
    
      000b9	8b 45 f0	 mov	 eax, DWORD PTR _maxX$[ebp]
      000bc	d9 45 fc	 fld	 DWORD PTR _currentTilePositionY$[ebp]
      000bf	2b 45 10	 sub	 eax, DWORD PTR _minX$[ebp]
      000c2	d9 45 ec	 fld	 DWORD PTR _currentTilePositionX$[ebp]
      000c5	c1 e0 03	 shl	 eax, 3
      000c8	89 45 f0	 mov	 DWORD PTR tv393[ebp], eax
      000cb	eb 02		 jmp	 SHORT $LN9@InternalDr@2
    $LN73@InternalDr@2:
    
    ; 267  : 
    ; 268  : 	for(int32 y(minY); y != maxY; ++y)
    
      000cd	d9 c9		 fxch	 ST(1)
    $LN9@InternalDr@2:
    
    ; 269  : 	{
    ; 270  : 		const TileMapLayerCell* currentCell = &m_tiles(y, minX);
    
      000cf	8b 43 24	 mov	 eax, DWORD PTR [ebx+36]
      000d2	0f af 45 08	 imul	 eax, DWORD PTR _y$16981[ebp]
      000d6	03 45 10	 add	 eax, DWORD PTR _minX$[ebp]
      000d9	8b 53 18	 mov	 edx, DWORD PTR [ebx+24]
      000dc	8d 14 c2	 lea	 edx, DWORD PTR [edx+eax*8]
    
    ; 271  : 		const TileMapLayerCell* lastCell = currentCell + (maxX - minX);
    
      000df	8b 45 f0	 mov	 eax, DWORD PTR tv393[ebp]
      000e2	03 c2		 add	 eax, edx
      000e4	89 45 ec	 mov	 DWORD PTR _lastCell$16986[ebp], eax
    
    ; 272  : 
    ; 273  : 		for( ; currentCell != lastCell; ++currentCell)
    
      000e7	3b d0		 cmp	 edx, eax
      000e9	0f 84 c3 00 00
    	00		 je	 $LN54@InternalDr@2
    $LN6@InternalDr@2:
    
    ; 274  : 		{
    ; 275  : 			const Tile* tile = currentCell->tile;
    
      000ef	8b 02		 mov	 eax, DWORD PTR [edx]
    
    ; 276  : 
    ; 277  : 			if(tile != null)
    
      000f1	85 c0		 test	 eax, eax
      000f3	0f 84 a6 00 00
    	00		 je	 $LN74@InternalDr@2
    
    ; 278  : 			{
    ; 279  : 				Rectf textureCoords = tile->uv;
    
      000f9	8d 70 08	 lea	 esi, DWORD PTR [eax+8]
    
    ; 280  : 
    ; 281  : 				// flip
    ; 282  : 				if(currentCell->flags & 1) Swap(textureCoords.min.x, textureCoords.max.x);
    
      000fc	8a 42 06	 mov	 al, BYTE PTR [edx+6]
      000ff	8d 7d dc	 lea	 edi, DWORD PTR _textureCoords$16992[ebp]
      00102	a5		 movsd
      00103	a5		 movsd
      00104	a5		 movsd
      00105	a5		 movsd
      00106	a8 01		 test	 al, 1
      00108	74 0c		 je	 SHORT $LN32@InternalDr@2
      0010a	d9 45 dc	 fld	 DWORD PTR _textureCoords$16992[ebp]
      0010d	d9 45 e4	 fld	 DWORD PTR _textureCoords$16992[ebp+8]
      00110	d9 5d dc	 fstp	 DWORD PTR _textureCoords$16992[ebp]
      00113	d9 5d e4	 fstp	 DWORD PTR _textureCoords$16992[ebp+8]
    $LN32@InternalDr@2:
    
    ; 283  : 				if(currentCell->flags & 2) Swap(textureCoords.min.y, textureCoords.max.y);
    
      00116	a8 02		 test	 al, 2
      00118	74 0c		 je	 SHORT $LN34@InternalDr@2
      0011a	d9 45 e0	 fld	 DWORD PTR _textureCoords$16992[ebp+4]
      0011d	d9 45 e8	 fld	 DWORD PTR _textureCoords$16992[ebp+12]
      00120	d9 5d e0	 fstp	 DWORD PTR _textureCoords$16992[ebp+4]
      00123	d9 5d e8	 fstp	 DWORD PTR _textureCoords$16992[ebp+12]
    $LN34@InternalDr@2:
    
    ; 284  : 
    ; 285  : 				float vertices[4] = {
    ; 286  : 					currentTilePositionX,
    ; 287  : 					currentTilePositionY,
    ; 288  : 					currentTilePositionX + floatTileWidth,
    ; 289  : 					currentTilePositionY + floatTileHeight
    ; 290  : 				};
    ; 291  : 
    ; 292  : 				const Color color = layerColor;
    ; 293  : 
    ; 294  : 				vertexPointer->SetVertexUVColorData((float32*)vertices, (float32*)&textureCoords, color);
    
      00126	8b 45 18	 mov	 eax, DWORD PTR _vertexPointer$[ebp]
      00129	d9 45 f8	 fld	 DWORD PTR _floatTileWidth$[ebp]
      0012c	d8 c1		 fadd	 ST(0), ST(1)
    
    ; 295  : 				++vertexPointer;
    
      0012e	8b 7d 0c	 mov	 edi, DWORD PTR _spriteBatch$[ebp]
      00131	d9 c2		 fld	 ST(2)
      00133	89 48 10	 mov	 DWORD PTR [eax+16], ecx
      00136	d8 45 f4	 fadd	 DWORD PTR _floatTileHeight$[ebp]
      00139	89 48 24	 mov	 DWORD PTR [eax+36], ecx
      0013c	d9 ca		 fxch	 ST(2)
      0013e	89 48 38	 mov	 DWORD PTR [eax+56], ecx
      00141	d9 10		 fst	 DWORD PTR [eax]
      00143	89 48 4c	 mov	 DWORD PTR [eax+76], ecx
      00146	d9 cb		 fxch	 ST(3)
      00148	83 c0 50	 add	 eax, 80			; 00000050H
      0014b	d9 50 b4	 fst	 DWORD PTR [eax-76]
      0014e	89 45 18	 mov	 DWORD PTR _vertexPointer$[ebp], eax
      00151	d9 45 dc	 fld	 DWORD PTR _textureCoords$16992[ebp]
      00154	d9 58 b8	 fstp	 DWORD PTR [eax-72]
      00157	d9 45 e0	 fld	 DWORD PTR _textureCoords$16992[ebp+4]
      0015a	d9 58 bc	 fstp	 DWORD PTR [eax-68]
      0015d	d9 cb		 fxch	 ST(3)
      0015f	d9 50 c4	 fst	 DWORD PTR [eax-60]
      00162	d9 ca		 fxch	 ST(2)
      00164	d9 50 c8	 fst	 DWORD PTR [eax-56]
      00167	d9 45 dc	 fld	 DWORD PTR _textureCoords$16992[ebp]
      0016a	d9 58 cc	 fstp	 DWORD PTR [eax-52]
      0016d	d9 45 e8	 fld	 DWORD PTR _textureCoords$16992[ebp+12]
      00170	d9 58 d0	 fstp	 DWORD PTR [eax-48]
      00173	d9 c9		 fxch	 ST(1)
      00175	d9 50 d8	 fst	 DWORD PTR [eax-40]
      00178	d9 c9		 fxch	 ST(1)
      0017a	d9 58 dc	 fstp	 DWORD PTR [eax-36]
      0017d	d9 45 e4	 fld	 DWORD PTR _textureCoords$16992[ebp+8]
      00180	d9 58 e0	 fstp	 DWORD PTR [eax-32]
      00183	d9 45 e8	 fld	 DWORD PTR _textureCoords$16992[ebp+12]
      00186	d9 58 e4	 fstp	 DWORD PTR [eax-28]
      00189	d9 58 ec	 fstp	 DWORD PTR [eax-20]
      0018c	d9 c9		 fxch	 ST(1)
      0018e	d9 50 f0	 fst	 DWORD PTR [eax-16]
      00191	d9 45 e4	 fld	 DWORD PTR _textureCoords$16992[ebp+8]
      00194	d9 58 f4	 fstp	 DWORD PTR [eax-12]
      00197	d9 45 e0	 fld	 DWORD PTR _textureCoords$16992[ebp+4]
      0019a	d9 58 f8	 fstp	 DWORD PTR [eax-8]
      0019d	eb 02		 jmp	 SHORT $LN3@InternalDr@2
    $LN74@InternalDr@2:
    
    ; 267  : 
    ; 268  : 	for(int32 y(minY); y != maxY; ++y)
    
      0019f	d9 c9		 fxch	 ST(1)
    $LN3@InternalDr@2:
    
    ; 272  : 
    ; 273  : 		for( ; currentCell != lastCell; ++currentCell)
    
      001a1	83 c2 08	 add	 edx, 8
    
    ; 296  : 			}
    ; 297  : 
    ; 298  : 			currentTilePositionX += floatTileWidth;
    
      001a4	d9 c9		 fxch	 ST(1)
      001a6	d8 45 f8	 fadd	 DWORD PTR _floatTileWidth$[ebp]
      001a9	3b 55 ec	 cmp	 edx, DWORD PTR _lastCell$16986[ebp]
      001ac	0f 85 3d ff ff
    	ff		 jne	 $LN6@InternalDr@2
    $LN54@InternalDr@2:
    
    ; 267  : 
    ; 268  : 	for(int32 y(minY); y != maxY; ++y)
    
      001b2	ff 45 08	 inc	 DWORD PTR _y$16981[ebp]
    
    ; 272  : 
    ; 273  : 		for( ; currentCell != lastCell; ++currentCell)
    
      001b5	dd d8		 fstp	 ST(0)
    
    ; 299  : 		}
    ; 300  : 
    ; 301  : 		currentTilePositionX = tilePosX;
    
      001b7	d9 45 14	 fld	 DWORD PTR _tilePosX$[ebp]
      001ba	8b 45 08	 mov	 eax, DWORD PTR _y$16981[ebp]
    
    ; 302  : 		currentTilePositionY += floatTileHeight;
    
      001bd	d9 c9		 fxch	 ST(1)
      001bf	d8 45 f4	 fadd	 DWORD PTR _floatTileHeight$[ebp]
      001c2	3b 45 1c	 cmp	 eax, DWORD PTR _maxY$[ebp]
      001c5	0f 85 02 ff ff
    	ff		 jne	 $LN73@InternalDr@2
      001cb	dd d9		 fstp	 ST(1)
      001cd	dd d8		 fstp	 ST(0)
    $LN7@InternalDr@2:
    
    ; 303  : 	}
    ; 304  : 
    ; 305  : 	// This will simply increment the current vertex pointer in the array.
    ; 306  : 	// Since we validate storage beforehand this is extremely fast.
    ; 307  : 	spriteBatch->PopCurrentVertexArrayPointer(vertexPointer);
    
      001cf	8b 45 18	 mov	 eax, DWORD PTR _vertexPointer$[ebp]
      001d2	2b 47 28	 sub	 eax, DWORD PTR [edi+40]
      001d5	6a 50		 push	 80			; 00000050H
      001d7	59		 pop	 ecx
      001d8	99		 cdq
      001d9	f7 f9		 idiv	 ecx
      001db	6b c0 50	 imul	 eax, 80			; 00000050H
      001de	01 47 28	 add	 DWORD PTR [edi+40], eax
    Good enough for me. Now I don't have to worry about it at all.

    ...There's probably not much more I can do anyway; too many FPU loads and stores going on.


    [edit] Tip of the day: Don't ever trust the compiler to do things for you.
    Last edited by Gleeok; 07-07-2015 at 06:00 AM.
    This post contains the official Gleeok seal of approval. Look for these and other posts in an area near you.

  3. #3
    Is this the end?
    ZC Developer
    Saffith's Avatar
    Join Date
    Jan 2001
    Age
    41
    Posts
    3,389
    Mentioned
    178 Post(s)
    Tagged
    6 Thread(s)
    vBActivity - Stats
    Points
    6,435
    Level
    24
    vBActivity - Bars
    Lv. Percent
    70.4%
    Interesting, I had to look up what a "flyweight" pattern was. I may not quite understand the concept correctly from the definition I saw though. To me "flyweight" just describes every program ever written in c, or just relegating the functionality of objects to the things that manage them. I don't know if that's right. ..?
    Looking at it again, I think I misread that before. You're probably using the flyweight pattern, but that's not what you were describing.
    It's basically deduplication. When you've got a lot of objects that are largely identical, don't give every one of them its own copy of the common data. Just keep one copy of each and give each instance a pointer. It's the same way combos work in ZC, for instance; each one on the screen is just a combo number rather than a separate copy of the definition.

    Tell that to the stupid compiler.
    What I mean to say is that the "root of all evil" optimization isn't high-level design. It's stuff like rewriting a function in assembly to save a few clock cycles. Small things that make the code harder to understand and maintain for relatively little performance gain.

  4. #4
    The Time-Loop Continues ZC Developer
    Gleeok's Avatar
    Join Date
    Apr 2007
    Posts
    4,826
    Mentioned
    259 Post(s)
    Tagged
    10 Thread(s)
    vBActivity - Stats
    Points
    12,961
    Level
    33
    vBActivity - Bars
    Lv. Percent
    26.42%
    Quote Originally Posted by Saffith View Post
    Looking at it again, I think I misread that before. You're probably using the flyweight pattern, but that's not what you were describing.
    It's basically deduplication. When you've got a lot of objects that are largely identical, don't give every one of them its own copy of the common data. Just keep one copy of each and give each instance a pointer. It's the same way combos work in ZC, for instance; each one on the screen is just a combo number rather than a separate copy of the definition.


    What I mean to say is that the "root of all evil" optimization isn't high-level design. It's stuff like rewriting a function in assembly to save a few clock cycles. Small things that make the code harder to understand and maintain for relatively little performance gain.
    I think I see what you mean though. [side note; awful explanation: https://en.wikipedia.org/wiki/Flyweight_pattern ]
    It's like a chess program. Each piece has no information about itself, not even it's position. Then you get to the board, which defines all the pieces together as bit states. Then you have to up to a component that manages boards just to see if there's something at square 31, and so on. Very efficient.

    Yep, ZC does do a lot of things well. Which is why I always think that rewriting it would be easy, because it's easy to see where the bare bones of it is very sane, and where it isn't.


    There's probably many different definitions people have of what "optimization" is I guess. Trying to save a few cycles from a function that gets called 1000 times to me is stupid. Trying to stop potential L2 cache miss 1000 times on different particles, entities, and collision stuff is not stupid. That's just the way I see it.

    Let's just put it this way:
    I have a 1.8GHZ CPU which comes out to *roughly* 30,000,000 CYCLES/FRAME. If I was taking 3% of that, then optimized it down to 1% of that, then those come out to be an improvement of 20,000 CYCLES/FRAME or 1,200,000 CYCLES per SECOND.

    ....I wonder if I can SIMD that? ..hmm.
    Last edited by Gleeok; 07-09-2015 at 07:22 AM. Reason: math is hard when you are tired
    This post contains the official Gleeok seal of approval. Look for these and other posts in an area near you.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
About us
Armageddon Games is a game development group founded in 1997. We are extremely passionate about our work and our inspirations are mostly drawn from games of the 8-bit and 16-bit era.
Social