Copyright ©Andrew Towers 2003
Following is a whole bunch of notes on the TIA I made while I was trying to work out how the whole thing is put together. You'll need a copy of the TIA schematics to understand the more complicated bits of this since I was looking at them when I wrote it. According to my copy they were scanned in by Mark De Smet. They are now available for [http://www.atariage.com/2600/archives/schematics_tia/index.html|download from AtariAge].
I started out searching through the stella archives for any info on triggering the players more than once per scanline (at the time I wanted to draw more than the 12 copies possible by flicker and 3-repeat) - and I came across Eckhard Stolberg's 'grid2' demo from Oct 1998, followed by a long series of threads over several
months discussing how the technique actually manages to work =) From all the articles it looked like a complete black art and no-one had a theory that would explain it fully.
Then I came across the TIA-1A schematics, and proceeded to spend the next 3-4 days solid drinking copious amounts of coffee and taking very little sleep while I tried to figure the whole mess out from the gate level up. (hey, the 2600 is a new hobby, I can splurge ;) In the end I found that as usual writing a 'few quick notes' turned into 'writing a tutorial' or, 'a small opus on the TIA'. So, here we go.
Almost all of the timing and counting within the TIA is implemented in the form of "polynomial counters", so this seems a good place to start. If you've never come across these before (I hadn't) they seem a really obscure way to go about counting things, but they're very small and simple to implement and therefore cheap on silicon. They also have the useful property that 'adding 1' takes linear time (unlike a ripple-carry adder/counter) - as long as you don't want to know where you're up to in traditional binary numeric form, they're perfect ;)
Actually, as the TIA designers pointed out, you can use a small lookup table to convert from one to the other, and you can compare two counter states to see if you're up to the same count without knowing the numeric values. But this is getting off track and hand-wavery. If you want to know the maths behind polynomial counters I suggest you look elsewhere, I'm no mathematician ;) These things seem to be used as pseudo-random number generators or noise generators (see the TIA sound generator, ditto for the GBA) more than anything else.
A polynomial counter (actually a form of "Linear Feedback Shift Register") consists of a shift register, as the name suggests, with some sort of feedback logic - in this case a single two- input XNOR gate obfuscated in NOR logic. They have the property that they will step through up to (2^n)-1 unique states when optimally wired up, from any starting state except for the illegal state (and of course it's possible to power-up in the illegal state =) so for a 6-bit shift register there can be at most 63 valid states.
The TIA uses the same polynomial counter circuit for all of its horizontal counters - there is a HSync counter, two Player Position and two Missile Position counters, and the Ball Position counter. The sound generator has a more complex design involving another polynomial counter or two - I haven't delved into the workings of this one yet.
Beside each counter there is a two-phase clock generator. This takes the incoming 3.58 MHz colour clock (CLK) and divides by 4 using a couple of flip-flops. Two AND gates are then used to generate two independent clock signals thusly:
. __ __ __ _| |_________| |_________| |_________ PHASE-1 (H@1) __ __ __ _______| |_________| |_________| |___ PHASE-2 (H@2)
The two clock lines are used to perform a two-step increment of the counter, as well as being used independently to move data through the supporting clocked logic.
This concept seems to come up a -lot- in the TIA, I think it's some sort of Zen NMOS thing, it seems to go hand and hand with storing data in back of inverters all over the place (a * is used in the TIA schematics to denote this), and building bit-shifting chains into your data storage so you don't need addressing ;p If you've ever wondered why the Playfield bit order is so obscure, now you know.
Each counter has a wired-AND counter state decode matrix (woo..) connected in parallel with the shift register. In every case, the top line of the decoder on the schematics checks for '111111' and forces a Reset if it is encountered. This is to prevent the counter getting stuck in the illegal state when it powers up as mentioned earlier.
Also in every case, the next decoder line is the 'wrap-around' value - when this state comes up, the counter does a self-reset to 000000 on the next phase-2 clock, and usually does something useful like generating a START signal for graphics output.
The rest of the counter decodes depend entirely on which counter we're looking at, set let's get into 'em.
The HSync counter counts from 0 to 56 once for every TV scan-line before wrapping around, a period of 57 counts at 1/4 CLK (57*4=228 CLK). The counter decodes shown below provide all the horizontal timing for the control lines used to construct a valid TV signal.
This table shows the elapsed number of CLK, CPU cycles, Playfield (PF) bits and Playfield pixels at the start of each counter state (ie when the counter changes to this state on the rising edge of the H@2 clock). The decoded control lines are usually clocked into other logic blocks during the next H@1-H@2 cycle (within 4 CLK).
. Value | HCount | CLK | CPU | PF | Pixel | Control 000000 | 0 | 0 | 0 | | | 100000 | 1 | 4 | 1.3 | | | 110000 | 2 | 8 | 2.6 | | | 111000 | 3 | 12 | 4 | | | 111100 | 4 | 16 | 5.3 | | | Set H-SYNC [SHS] 111110 | 5 | 20 | 6.6 | | | 011111 | 6 | 24 | 8 | | | 101111 | 7 | 28 | 9.3 | | | 110111 | 8 | 32 | 10.6 | | | Reset H-SYNC [RHS] 111011 | 9 | 36 | 12 | | | 111101 | 10 | 40 | 13.3 | | | 011110 | 11 | 44 | 14.6 | | | 001111 | 12 | 48 | 16 | | | ColourBurst [RCB] 100111 | 13 | 52 | 17.3 | | | 110011 | 14 | 56 | 18.6 | | | 111001 | 15 | 60 | 20 | | | 011100 | 16 | 64 | 21.3 | | | Reset H-BLANK [RHB] 101110 | 17 | 68 | 22.6 | 0 | 0 | 010111 | 18 | 72 | 24 | 1 | 4 | Late RHB [LRHB] 101011 | 19 | 76 | 25.3 | 2 | 8 | 110101 | 20 | 80 | 26.6 | 3 | 12 | 011010 | 21 | 84 | 28 | 4 | 16 | 001101 | 22 | 88 | 29.3 | 5 | 20 | 000110 | 23 | 92 | 30.6 | 6 | 24 | 000011 | 24 | 96 | 32 | 7 | 28 | 100001 | 25 | 100 | 33.3 | 8 | 32 | 010000 | 26 | 104 | 34.6 | 9 | 36 | 101000 | 27 | 108 | 36 | 10 | 40 | 110100 | 28 | 112 | 37.3 | 11 | 44 | 111010 | 29 | 116 | 38.6 | 12 | 48 | 011101 | 30 | 120 | 40 | 13 | 52 | 001110 | 31 | 124 | 41.3 | 14 | 56 | 000111 | 32 | 128 | 42.6 | 15 | 60 | 100011 | 33 | 132 | 44 | 16 | 64 | 110001 | 34 | 136 | 45.3 | 17 | 68 | 011000 | 35 | 140 | 46.6 | 18 | 72 | 101100 | 36 | 144 | 48 | 19 | 76 | Center [CNT] 110110 | 37 | 148 | 49.3 | 20 | 80 | 011011 | 38 | 152 | 50.6 | 21 | 84 | 101101 | 39 | 156 | 52 | 22 | 88 | 010110 | 40 | 160 | 53.3 | 23 | 92 | 001011 | 41 | 164 | 54.6 | 24 | 96 | 100101 | 42 | 168 | 56 | 25 | 100 | 010010 | 43 | 172 | 57.3 | 26 | 104 | 001001 | 44 | 176 | 58.6 | 27 | 108 | 000100 | 45 | 180 | 60 | 28 | 112 | 100010 | 46 | 184 | 61.3 | 29 | 116 | 010001 | 47 | 188 | 62.6 | 30 | 120 | 001000 | 48 | 192 | 64 | 31 | 124 | 100100 | 49 | 196 | 65.3 | 32 | 128 | 110010 | 50 | 200 | 66.6 | 33 | 132 | 011001 | 51 | 204 | 68 | 34 | 136 | 001100 | 52 | 208 | 69.3 | 35 | 140 | 100110 | 53 | 212 | 70.6 | 36 | 144 | 010011 | 54 | 216 | 72 | 37 | 148 | 101001 | 55 | 220 | 73.3 | 38 | 152 | 010100 | 56 | 224 | 74.6 | 39 | 156 | RESET, HBLANK [SHB] 101010 | 57 | (228) | (76) | (40) | (160) | (already at 000000) 010101 | 58 | 232 | - | - | - | 001010 | 59 | 236 | - | - | - | 000101 | 60 | 240 | - | - | - | 000010 | 61 | 244 | - | - | - | 000001 | 62 | 248 | - | - | - | 000000 | 0 | 0 | - | - | - | (cycle) 111111 | - | - | - | - | - | ERROR (Reset to 000 00 0) Value | HCount | CLK | CPU | PF | Pixel | Control Key: SHS Turn on the TV HSYNC signal to start Horizontal flyback. RHS Turn off the HSYNC signal, delayed 4 CLK. RCB Reset Colour Burst, delayed 4 CLK latching [CB]. RHB Reset HBlank (enable output), delayed 4 CLK latching [HB]. LRHB Late RHB, used instead of RHB when [HMOVE] latch is set. CNT Center screen, start copy/reflect PF, delayed 4 CLK for [CNTD]. SHB Start HBlank (disable output), Reset HCount to 000000. .
The HSync counter resets itself after 57 counts; the decode on HCount=56 performs a reset to 000000 delayed by 4 CLK, so HCount=57 becomes HCount=0. This gives a period of 57 counts or 228 CLK.
Playfield pixels start on the RHB control line at CLK=64, but the first visible pixel won't appear until CLK=68 due to the clocking on its output. The CNT control line either starts the Playfield again as normal, or starts a reverse-shifted copy when reflect-playfield REF is enabled.
RSYNC resets the two-phase clock for the HSync counter to the H@1 rising edge when strobed. It looks like this could be used to move the HSync counter into phase with the CPU on any cycle (although there is some auto-synchronisation between the two-phase clock and the div-by-3 counter for the CPU clock, I haven't looked into this yet.) A full H@1-H@2 cycle after RSYNC is strobed, the HSync counter is also reset to 000000 and HBlank is turned on. This one requires more investigation.
There are two independent Player Horizontal Position Counters, one each for player 0 and player 1. The counters are identical; only one is drawn in the schematics. This section describes only the player 0 counter.
The player position counter controls the position of the player graphics object (P0) on each scanline. The player counter counts from 0 to 39 and then wraps around, giving a period of 40 counts at 1/4 CLK (160 CLK) - also the number of visible pixels on a scanline.
This table shows the elapsed number of CLK and CPU cycles at the beginning of each counter state (the CPU column isn't particularly relevant). Each START decode is delayed by 4 CLK in decoding, plus a further 1 CLK to latch the STARTat the graphics scan counter. The START decodes are ANDed with flags from the NUSIZ register before being latched, to determine whether to draw that copy. Actual graphics output is shown in parentheses for non-stretched copies of the player.
. Value PCount CPU CLK Event 000000 | 0 | 0 | 0 | (draw -012) 100000 | 1 | 1.3 | 4 | (draw 3456) 110000 | 2 | 2.6 | 8 | (draw 7---) 111000 | 3 | 4 | 12 | START DRAWING (NUSIZ=001,011) 111100 | 4 | 5.3 | 16 | (draw -012) 111110 | 5 | 6.6 | 20 | (draw 3456) 011111 | 6 | 8 | 24 | (draw 7---) 101111 | 7 | 9.3 | 28 | START DRAWING (NUSIZ=011,010,110) 110111 | 8 | 10.6 | 32 | (draw -012) 111011 | 9 | 12 | 36 | (draw 3456) 111101 | 10 | 13.3 | 40 | (draw 7---) 011110 | 11 | 14.6 | 44 | 001111 | 12 | 16 | 48 | 100111 | 13 | 17.3 | 52 | 110011 | 14 | 18.6 | 56 | 111001 | 15 | 20 | 60 | START DRAWING (NUSIZ=100,110) 011100 | 16 | 21.3 | 64 | (draw -012) 101110 | 17 | 22.6 | 68 | (draw 3456) 010111 | 18 | 24 | 72 | (draw 7---) 101011 | 19 | 25.3 | 76 | 110101 | 20 | 26.6 | 80 | 011010 | 21 | 28 | 84 | 001101 | 22 | 29.3 | 88 | 000110 | 23 | 30.6 | 92 | 000011 | 24 | 32 | 96 | 100001 | 25 | 33.3 | 100 | 010000 | 26 | 34.6 | 104 | 101000 | 27 | 36 | 108 | 110100 | 28 | 37.3 | 112 | 111010 | 29 | 38.6 | 116 | 011101 | 30 | 40 | 120 | 001110 | 31 | 41.3 | 124 | 000111 | 32 | 42.6 | 128 | 100011 | 33 | 44 | 132 | 110001 | 34 | 45.3 | 136 | 011000 | 35 | 46.6 | 140 | 101100 | 36 | 48 | 144 | 110110 | 37 | 49.3 | 148 | 011011 | 38 | 50.6 | 152 | 101101 | 39 | 52 | 156 | RESET, START DRAWING (always) 010110 | 40 | 53.3 | 160 | (already at 000000) 001011 | 41 | 54.6 | | 100101 | 42 | 56 | | 010010 | 43 | 57.3 | | 001001 | 44 | 58.6 | | 000100 | 45 | 60 | | 100010 | 46 | 61.3 | | 010001 | 47 | 62.6 | | 001000 | 48 | 64 | | 100100 | 49 | 65.3 | | 110010 | 50 | 66.6 | | 011001 | 51 | 68 | | 001100 | 52 | 69.3 | | 100110 | 53 | 70.6 | | 010011 | 54 | 72 | | 101001 | 55 | 73.3 | | 010100 | 56 | 74.6 | | 101010 | 57 | 76 | | 010101 | 58 | - | | 001010 | 59 | - | | 000101 | 60 | - | | 000010 | 61 | - | | 000001 | 62 | - | | 000000 | 0 | - | | (cycle) 111111 | - | - | | ERROR (Reset to 000000) Value PCount CPU CLK Event .
The graphics output for players contains some extra clocking logic not present for the Playfield or other screen objects. It takes 1 additional CLK to latch the player START signal. The rest of the clocking logic is in common with the other graphics objects; therefore we can say that player graphics are delayed by 1 CLK (this is why the leftmost possible start position for a RESP0 is pixel 1, not pixel 0. You can HMOVE the player further left though, if necessary.)
The most important thing to note about the player counter is that it only receives CLK signals during the visible part of each scanline, when HBlank is off; exactly 160 CLK per scanline (except during HMOVE). During the other 68 CLK per line, the counter lies dormant on the exact 1/4 phase it was up to. The [MOTCK] (motion clock?) line supplies the CLK signals for all movable graphics objects during the visible part of
the scanline. It is an inverted (out of phase) CLK signal.
This arrangement means that resetting the player counter on any visible pixel will cause the main copy of the player to appear at that same pixel position on the next and subsequent scanlines. There are 5 CLK worth of clocking/latching to take into account, so the actual position ends up 5 pixels to the right of the reset pixel (approx. 9 pixels after the start of STA RESP0).
For better or worse, the manual 'reset' signal (RESP0) does not generate a START signal for graphics output. This means that you must always do a 'reset' then wait for the counter to wrap around (160 CLK later) before the main copy of the player will appear. However, if you have any of the 'close', 'medium' or 'far' copies of the player enabled in NUSIZ, these will be drawn on the current and subsequent scanlines as the appropriate decodes are reached and generate their START signals.
The Player Graphics Scan Counters are 3-bit binary ripple counters attached to the player objects, used to determine which pixel of the player is currently being drawn by generating a 3-bit source pixel address. These are the only binary ripple counters in the TIA.
The Scan Counters are never reset, so once the counter receives the Start signal it will count fully from 0 to 7. Counting is only performed during the visible part of the scanline since it is driven by the [MOTCK] line used to advance the Player Position Counter. This gives rise to "sprite wrapping" whereby a player positioned so it ends past the righthand side of the screen will finish drawing at the beginning of the next scanline. Note that a HMOVE can gobble up the wrapped player graphics - see below.
The count frequency is determined by the NUSIZ register for that player; this is used to selectively mask off the clock signals to the Graphics Scan Counter. Depending on the player stretch mode, one clock signal is allowed through every 1, 2 or 4 graphics CLK. The stretched modes are derived from the two-phase clock; the H@2 phase allows 1 in 4 CLK through (4x stretch), both phases ORed together allow 1 in 2 CLK through (2x stretch).
The NUSIZ register can be changed at any time in order to alter the counting frequency, since it is read every graphics CLK. This should allow possible player graphics warp effects etc.
Player Reflect bit - this is read every time a pixel is generated, and used to conditionally invert the bits of the source pixel address. This has the effect of flipping the player image drawn. This flag could potentially be changed during the rendering of the player, for example this might be used to draw bits 01233210.
Player graphics registers - there are four 8-bit registers in the TIA for storing Player graphics, two for each player. Only two of these are ever directly accessible; these are labelled the "new" player graphics registers on the schematics. Unless the Player Vertical Delay (VDELPn) is set, the "new" registers are always drawn.
Writes to GRP0 always modify the "new" P0 value, and the contents of the "new" P0 are copied into "old" P0 whenever GRP1 is written. (Likewise, writes to GRP1 always modify the "new" P1 value, and the contents of the "new" P1 are copied into "old" P1 whenever GRP0 is written). It is safe to modify GRPn at any time, with immediate effect.
Vertical Delay bit - this is also read every time a pixel is generated and used to select which of the "new" (0) or "old" (1) Player Graphics registers is used to generate the pixel. (ie the pixel is retrieved from both registers in parallel, and this flag used to choose between them at the graphics output). It is safe to modify VDELPn at any time, with immediate effect.
There are also two individual Horizontal Position Counters for missile 0 and missile 1. The counters are independent and identical.
These counters use exactly the same counter decodes as the players, but without the extra 1 CLK delay to start writing out graphics.
Missiles use the same control lines as the player from the NUISZ register to determine the number of copies drawn, although they ignore the player scaling options (you'll just get a single copy for the scaled player modes).
Missile width is implemented in the same way as the ball width; it appears to be exactly the same gate arrangement (see below).
The Missile-to-player reset is implemented by resetting the M0 counter when the P0 graphics scan counter is at %100 (in the middle of drawing the player graphics) AND the main copy of P0 is being drawn (ie the missile counter will not be reset when a subsequent copy is drawn, if any). This second condition is generated from a latch outputting FSTOB that is reset when the P0 counter wraps around, and set when the START signal is decoded for a 'close', 'medium' or 'far' copy of P0.
The ball position counter controls the position of the ball graphics object (BL) on each scanline. The ball counter counts from 0 to 39 and then wraps around, giving a period of 40 counts at 1/4 CLK (160 CLK).
Ball width is given by combining clock signals of different widths based on the state of the two size bits (the gates form an AND -> OR -> AND -> OR -> out arrangement, with a hanger-on AND gate).
See notes later for all the messy details ;p
It seems a shame to have a whole polynomial counter for the ball, and no special effects aside from its size - except for one small detail.
If you look closely at the START signal for the ball, unlike all the other position counters - the ball reset RESBL does send a START signal for graphics output! This makes the ball incredibly useful since you can trigger it as many times as you like across the same scanline and it will start drawing immediately each time :)
So it's good for cutting holes in things, drawing background details, clipping the edges off things, etc. It can even be used to draw simple sprites, or used as the background colour (because it's behind
everything else) for a two-colour sprite.
Actually on my 2600jr (long rainbow), setting the ball size to 8 pixels results in solid colour when it's reset every 9 pixels (this might just be colour bleeding, I'm not sure).
. Value BCount CPU CLK Event | 000000 | 0 | 0 | 0 | (draw 0123) | 100000 | 1 | 1.3 | 4 | (draw 4567) | 110000 | 2 | 2.6 | 8 | | 111000 | 3 | 4 | 12 | | 111100 | 4 | 5.3 | 16 | | 111110 | 5 | 6.6 | 20 | | 011111 | 6 | 8 | 24 | | 101111 | 7 | 9.3 | 28 | | 110111 | 8 | 10.6 | 32 | | 111011 | 9 | 12 | 36 | | 111101 | 10 | 13.3 | 40 | | 011110 | 11 | 14.6 | 44 | | 001111 | 12 | 16 | 48 | | 100111 | 13 | 17.3 | 52 | | 110011 | 14 | 18.6 | 56 | | 111001 | 15 | 20 | 60 | | 011100 | 16 | 21.3 | 64 | | 101110 | 17 | 22.6 | 68 | | 010111 | 18 | 24 | 72 | | 101011 | 19 | 25.3 | 76 | | 110101 | 20 | 26.6 | 80 | | 011010 | 21 | 28 | 84 | | 001101 | 22 | 29.3 | 88 | | 000110 | 23 | 30.6 | 92 | | 000011 | 24 | 32 | 96 | | 100001 | 25 | 33.3 | 100 | | 010000 | 26 | 34.6 | 104 | | 101000 | 27 | 36 | 108 | | 110100 | 28 | 37.3 | 112 | | 111010 | 29 | 38.6 | 116 | | 011101 | 30 | 40 | 120 | | 001110 | 31 | 41.3 | 124 | | 000111 | 32 | 42.6 | 128 | | 100011 | 33 | 44 | 132 | | 110001 | 34 | 45.3 | 136 | | 011000 | 35 | 46.6 | 140 | | 101100 | 36 | 48 | 144 | | 110110 | 37 | 49.3 | 148 | | 011011 | 38 | 50.6 | 152 | | 101101 | 39 | 52 | 156 | RESET, START DRAWING | 010110 | 40 | 53.3 | | | 001011 | 41 | 54.6 | | | 100101 | 42 | 56 | | | 010010 | 43 | 57.3 | | | 001001 | 44 | 58.6 | | | 000100 | 45 | 60 | | | 100010 | 46 | 61.3 | | | 010001 | 47 | 62.6 | | | 001000 | 48 | 64 | | | 100100 | 49 | 65.3 | | | 110010 | 50 | 66.6 | | | 011001 | 51 | 68 | | | 001100 | 52 | 69.3 | | | 100110 | 53 | 70.6 | | | 010011 | 54 | 72 | | | 101001 | 55 | 73.3 | | | 010100 | 56 | 74.6 | | | 101010 | 57 | 76 | | | 010101 | 58 | - | | | 001010 | 59 | - | | | 000101 | 60 | - | | | 000010 | 61 | - | | | 000001 | 62 | - | | | 000000 | 0 | - | | (cycle) | 111111 | - | - | | ERROR (Reset to 000000) Value BCount CPU CLK Event .
Vertical Delay bit - the VDELBL control bit works in the same manner as the player VDEL bits; the state of VDELBL is used every CLK to determine which of the "new" (0) or "old" (1) ENABL values to use at the graphics output. Writes to ENABL always modify the "new" value, and whenever GRP1 is written the "new" value is copied into the "old". It is safe to modify VDELBL and ENABL at any time, with immediate effects.
The documented way to use a player position counter is to reset it with RESPn on any CPU cycle divisible by 5 during the visible scanline (5 is a convenient number for DEX-BNE loops), set up HMPn to adjust the position by +7 (left) to -8 (right) pixels, and hit HMOVE immediately after the next WSYNC. Then configure NUSIZn for the number and spacing of copies required, and let the hardware go about its business. Once this is set up, you can just change the graphics in GRPn every scanline to get one, two or three copies at fixed spacing.
In fact the hardware has hard-wired requirements for almost none of the above =) The fixed spacing between copies is hard-wired and HMOVE is largely not negotiable, but the rest is complete tosh.
The TIA renders each movable graphics object according to independent position counters running at 1/4 CLK with a period of 40 increments, and synchronised to the last RESPn/RESMn/RESBL strobe. Each and every time a counter wraps around, the 'main' copy of the object starts to draw. Since it takes 4 CLK to reset the counter to zero and 4 CLK to increment the counter, the image can be expected to appear after exactly 40 full counts, or 160 CLK.
The counters are normally only running during the 'visible' part of a scanline, unless you're doing a HMOVE. Since the scanline has 160 visible pixels, this yields the documented behaviour that a RESPn/etc sets the position for the next scanline. It's out by 5 pixels when you set it, but who's counting?
Due to extra clocking logic for Player graphics output, the first player pixel won't appear until 1 CLK later than for any other grahpics object once rendering 'starts'. See the HSync/Player Counter info above for an explanation of this.
During the horizontal blank (see the Horizontal Counter info above) the Player, Missile and Ball counters stop receiving CLK signals, so they pause on the exact 1/4 CLK they're up to and resume where they left off at the first visible pixel on the next scanline. This gives rise to the 'wrap around' effect, to the point of splitting a copy of the player image in half because it happened to start too near the right edge of the screen ;)
The object counters are running at the same 1/4 CLK rate as the HSync counter, but you can set them out of phase with the HSync counter (and therefore the Playfield) by resetting any of them on a CPU cycle that isn't divisible by 4. (If this were not the case, there would only be 40 possible positions along the scanline and we could all go home early). You can also use the HMOVE command, which is described below.
In principle the operation of HMOVE is quite straight-forward; if a HMOVE is initiated immediately after HBlank starts, which is the case when HMOVE is used as documented, the HMOVE signalis latched and used to delay the end of the HBlank by exactly 8 CLK, or two counts of the HSync Counter. This is achieved in the TIA by resetting the HB (HBlank) latch on the LRHB (Late Reset H-Blank) counter decode rather than the normal RHB (Reset H-Blank) decode.
The extra HBlank time shifts everything except the Playfield right by 8 pixels, because the position counters will now resume counting 8 CLK later than they would have without the HMOVE. This is also the source of the HMOVE 'comb' effect; the extended HBlank hides the normal playfield output for the first 8 pixels of the line.
In order to move less than 8 pixels right the TIA performs 'clock stuffing' on the Player, Missile and Ball position counters, whereby a number of clock pulses between 0 and 15 are sent to the counters during HMOVE. Each extra clock pulse eats up 1/4 count in the object's horizontal position counter, and thereby moves the object left one pixel. This must be done during HBlank because it is sending these extra clock pulses down the same clock lines that usually receive MOTCK pulses during the visible part of the scanline.
The Stella Programmer's Guide states that "the motion registers should not be modified for at least 24 machine cycles after an HMOVE command". This is indeed for internal hardware considerations, although perhaps not entirely mysterious. After several attempts, I finally got my head around the heavily obfuscated logic in the schematics. It turns out to be fairly simple, and quite elegant :)
The HMOVE values set by the programmer are stored in a matrix of 4-bit data latches with built-in comparators - each latch effectively contains a wired-XOR gate, and the 4 latches for a given HMxx register are arranged in a wired-NOR formation to give a 4-bit comparator.
Beside the matrix of HMxx latches is a 4-bit binary ripple counter. It begins at 15 and decrements down to zero during the HMOVE at a rate of 1 decrement every 4 CLK (it's built from 2-phase clocked logic). The counter is wired in parallel to the comparators in all 5 HMxx registers.
At the beginning of the HMOVE, a latch is set for each movable object to indicate that it requires more motion to the left. When the comparator for a given object detects that none of the 4 bits match the bits in the counter state, it clears this latch (a clever exercise in reverse logic!) Until this time, the output of the latch is sent through to the movable object once every 4 CLK (on every H@1 signal from the HSync two-phase clock) as an extra "stuffed" clock signal.
Since one extra CLK pulse is sent every 4 CLK, this takes at most 4*16=64 CLK (including counter reset at the end), or 64/3=21 CPU cycles. It takes 3 CLK after the HMOVE command is received to decode the SEC signal (at most 6 CLK depending on the timing of STA HMOVE) and a further 4 CLK to set the "more movement required" latches. So we need to wait at least 71/3=23.66 CPU cycles before the HMOVE operation is complete. For a normal HMOVE after WSYNC, it might be safe by cycle 23 (this has not been tested).
The first compare (against 15) will be sampled 15 CLK after STA HMOVE begins and every 4 CLK thereafter. The first counter decrement will happen at CLK 17, and every 4 CLK thereafter.
You may have noticed that the above discussion ignores the fact that HMxx values are specified in the range +7 to -8. In an odd twist, this was done purely for the convenience of the programmer! The comparator for D7 in each HMxx latch is wired up in reverse, costing nothing in silicon and effectively inverting this bit so that the value can be treated as a simple 0-15 count for movement left. It might be easier to think of this as having D7 inverted when it is stored in the first place.
In theory then the side effects of modifying the HMxx registers during HMOVE should be quite straight-forward. If the internal counter has not yet reached the value in HMxx, a new value greater than this (in 0-15 terms) will work normally. Conversely, if the counter has already reached the value in HMxx, new values will have no effect because the latch will have been cleared.
Much more interesting is this: if the counter has not yet reached the value in HMxx (or has reached it but not yet commited the comparison) and a value with at least one bit in common with all remaining internal counter states is written (zeros or ones), the stopping condition will never be reached and the object will be moved a full 15 pixels left. In addition to this, the HMOVE will complete without clearing the "more movement required" latch, and so will continue to send an additional clock signal every 4 CLK (during visible and non-visible parts of the scanline) until another HMOVE operation clears the latch. The HMCLR command does not reset these latches.
The Cosmic Ark stars effect achieved this by writing the value $60 to HMM0, 21 cycles after HMOVE starts. See this message in the archives: http://www.biglist.com/lists/stella/archives/199705/msg00024.html
Following is how I believe it works: at 21 cycles in, the internal counter has just decremented to %0000 and is about to test this against the HMxx registers (2 CLK from now, if my timings are correct). If we flip the top bit of $60 as described above, we have the binary pattern %1110. This pattern has at least one bit in common with the final remaining state (the bottom zero bit), and also has bits in common with the default counter state
%1111 which will arise when the counter resets. This means the compare will pass now and forever more :) For this to work, I expect that they must have set HMM0 to $70 before using the trick (binary %0111, or %1111 with the bit flipped), but after a cursory glance at Thomas' commented Cosmic Ark code I haven't found this.
Looking at the archives relating to Cosmic Arc and Rabbit Transit tricks, I also notice that a HMCLR 20 cycles in has the same effect. In this case it will be resetting HMxx to %1000 (bit-flipped) which also obeys the rules for bypassing the stopping condition.
Also of note, the HMOVE latch used to extend the HBlank time is cleared when the HSync Counter wraps around. This fact is exploited by the trick that invloves hitting HMOVE on the 74th CPU cycle of the scanline; the CLK stuffing will still take place during the HBlank and the HSYNC latch will be set just before the counter wraps around. It will then be cleared again immediately (and therefore ignored) when the counter wraps, preventing the HMOVE comb effect. Since the extended HBlank is needed to move all objects right 8 pixels, this has the limitation that objects can only be moved left, and the normal HMOVE numbering no longer applies. Instead the HMOVE value is interpreted as (8 + value) pixels to the left, ie:
-8 = 0 -4 = 4 0 = 8 4 = 12
-7 = 1 -3 = 5 1 = 9 5 = 13
-6 = 2 -2 = 6 2 = 10 6 = 14
-5 = 3 -1 = 7 3 = 11 7 = 15
This means that all objects will be moved 8 pixels left unless you set their HMxx value to -8 for zero movement.
I've recently found a post in the Stella mailing list archives that gave these results by exhaustive testing, posted by Brad Mott: http://www.biglist.com/lists/stella/archives/199804/msg00198.html
Since the Graphics Scan Counters are never reset, player graphics output can wrap around as mentioned above.
A HMOVE 8 pixels right (-8 << 4), has no effect on the scan counter since it will perform no "clock stuffing" of the player counters for that player (the extended HBlank time moves everything right 8 pixels).
Any other HMOVE value will gobble up at least one pixel, or more proportional to the HMOVE value. Since a HMOVE value really represents a count from 0 (for -8) to 15 (for +7) with the top bit inverted, this is the number of player pixels that will be gobbled up by the HMOVE.
This means that a HMOVE of 0 will gobble up all remaining wrapped output for the non-stretched player modes, since it sends 8 extra clocks to the player. (Note that this is only true if HMOVE was actually strobed for the scanline, otherwise the configured HMxx registers never have any effect). For the stretched player modes there could be some output left - it takes 16 stuffed clocks to eat up a full 2X player, and 32 clocks to eat up a full 4X player.
I mentioned above that HMOVE sends extra clock pulses down the same clock lines that are usually used during the visible part of the scanline. In theory this means that performing a HMOVE during the visible part of the scanline should have no effect. However, looking at how the various clock signals interact, I suspect it is possible. I did some preliminary experiments (on a 2600 Jr) at some point, and I seem to remember having some success.
In this case the extra HMOVE clock pulses act to perform 'plugging' instead of the normal 'stuffing'; by this I mean that the extra pulses plug up the gaps in the normal [MOTCK] pulses, preventing them from counting as clock pulses. This only works because the extra HMOVE pulses are derived from the two-phase clock on the HSync counter, which is itself derived from CLK (the TIA colour clock input), whereas [MOTCK] is an inverted CLK signal - so they are more or less precisely out of phase :)
I'm not sure how universal (or reliable!) this might turn out to be, but I haven't seen it mentioned before. Also of note, this technique can only be used to effect a move to the right, at a rate of 1 pixel every 4 CLK (since this is the rate that HMOVE generates the extra clock pulses).
I've read some theories suggesting that re-triggering is a hack, possibly dependent on chip revision, where you trick the TIA into rendering more than three copies by hitting RESP0/RESP1 during the rendering of a 'legitimate' copy, or some other method to confuse the poor chip. Through extensive coffee consumption, I have determined that this is not the case. Perhaps peering at the TIA schematics for countless hours on end, until I fell asleep (two days in a row), may have helped also.
The behaviour of the TIA positioning registers is quite predictable and completely independent from its graphics output logic, as documented above. What remains are issues involving the timing of RESPn commands, given that the TIA counts things at 1/4 clock and the CPU runs at 1/3 CLK =)
Following is a table of the cycle decodes for the Player counters, starting from CLK=0 when the counter resets. This is an excerpt from the Player Counter table listed elsewhere in the document (I recommend you go have a look, the spacing between events should look oddly familiar ;)
|111000||3||4||12||START DRAWING (NUSIZ=001,011) Close|
|101111||7||9.3||28||START DRAWING (011,010,110) Medium|
|111001||15||20||60||START DRAWING (100,110) Far|
|101101||39||52||156||RESET, START DRAWING (always) Main|
The columns from the left are: the polynomial counter state, (see notes above), the decimal value that the player counter is up to, the number of CPU cycles since the counter reset, and the number of CLKs elapsed since the counter reset.
You'll notice I'm now talking about everything relative to RESPn on the current scanline, rather than the beginning of the scanline. This is because this is all that matters. You should understand the following point:
If you hit RESPn at least twice on every scanline, you will never see the 'main' copy of that player, ever, on any scanline.
This is because the counter will always be reset before it manages to complete a full 40 counts (160 CLK), and so the 'main' copy will never start drawing.
This is tricky to test, especially if you don't reset a few things when you stop (eg, for VSync) - whenever you stop hitting RESPn, you will start to get the normal output on the next and subsequent scanlines, including the 'main' copy. The very top visible scanline is a perfectly valid 'subsequent scanline' after the very bottom visible scanline, once you get past the first frame ;)
If you've set up NUSIZn for 3 copies close (011), you'll be getting four copies on every scanline on which you hit RESPn twice, as long as they are far enough apart. This works because it doesn't take a counter wrap-around to get to the 'close' and 'medium' copies as shown in the table above. They will appear 4+12+1=17 and 4+28+1=33 pixels after each RESPn CLK arrives in the TIA (it takes 4 extra CLK to reset the counter, and 1 extra CLK to start the graphics output).
It's important to note, that as long as the second RESPn on the line causes a reset after the 'start' signal has been generated for the 'medium' copy of the first RESPn, you will get four copies regardless of how far apart the RESPn hits are. If you do the second RESPn too soon you'll end up with only three copies - the 'close' from the first RESPn, followed by the the 'close' and the 'medium' from the second RESPn. If you do the second RESPn before the first 'close' copy, you'll only end up with the 'close' and 'medium' from the second RESPn.
From this it follows that if you set NUSIZ0 to 011, hit RESPn and wait until the 'medium' copy has started, then change NUSIZ0 to 100 or 110, you will get all of 'close', 'medium' and 'far'.
These are special cases only because resetting a Position Counter (RSPn, RESMn, RESBL) also resets the two-phase clock attached to it, and this in turn affects the clocked logic on the output of the counter decodes.
For the player counters, this affects the four decodes that produce the Start signal for copies of the player graphics. These are generated 12, 28, 60, and 160 CLK after the Position Counter has been reset, in order to trigger the 'close', 'medium', 'far' and 'main' copies.
These decodes pass through a block of logic that requires a full cycle of the two-phase clock (hence the normal 4 CLK delay before graphics output common to all movable graphics objects). If the Position Counter and therefore the two-phase clock are reset during this decoding process, the Start signal will either be lost or delayed up to 3 CLK depending on exact timing.
This effect is most evident when attempting to re-trigger the player graphics over and over again. For example, examine this retriggering technique:
. STA RESP0 ;3 reset P0, call this 0 CLK. CMP $EA ;3 nop STA RESP0 ;3 reset P0 again, after 18 CLK. CMP $EA ;3 nop STA RESP0 ;3 reset P0 again, after 18 CLK. .
The visible result of this will be a 'close' copy of P0 shifted right by two pixels from the expected position, followed by a second 'close' copy shifted right by two pixels, and finally a third 'close' copy, not shifted right. There will be an 18 pixel gap between the first two copies of P0, and only a 16 pixel gap before the third copy.
In order to fix up the spacing of the final copy, it is necessary to trigger P0 yet again exactly 18 CLK later, but clear GRP0 in the mean time so nothing is drawn.
If the retriggering will be continuing onto the next line there is no need to do this; just ensure that the first re-trigger on the next line happens 18 visible pixels after the last RESP0 on the previous line (ie 18 CLK later, minus HBlank time).
Just to reiterate, ball width is given by combining clock signals of different widths based on the state of the two size bits (the gates form an AND -> OR -> AND -> OR -> out arrangement, with a hanger-on AND gate).
The Enable (output) signal is built in two halves, arranged back-to-back at the final OR gate.
The first half comes from one of three sources combined through the earlier OR gate and then AND-ed with the Start signal:
(1) If D4 and D5 are both clear, one of the two-phase clock signals
(active 1 in 4 colour CLK) yields a single pixel of output.
(2) If D4 is set, a line active 2 in every 4 colour CLK is borrowed
from the two-phase clock generator (this yields 2 pixels).
(3) Finally D5 itself is used directly - the Start signal is active
for 4 CLK so this generates 4 pixels.
The second half is added if both D4 and D5 are set; a delayed copy of the Start signal (4 colour CLK wide again) is OR-ed into the Enable signal at the final OR gate.
I hope someone had as much fun building this little circuit as I had pulling it apart again ;p
The Player Position Counter can be reset to zero (with RESP0/1) on any CPU cycle as shown below, and copies will appear at the pixel positions listed for 'close', 'medium' and/or 'far' depending on the flags in NUSIZ; 1, 2, 3 or (if you change NUSIZ at the right time) 4 copies at hard-wired positions after the reset. If the counter is allowed to wrap around, the 'main' copy will appear on the next line.
Resetting the counter takes 4 CLK, decoding the 'start drawing' signal takes 4 CLK, latching the 'start' takes a further 1 CLK giving a total 9 CLK delay after a RESP0/1. Since the playfield takes 4 CLK to start drawing the player is visibly delayed by exactly 5 CLK - hence the magic '5' :)
NOTE: The player counter can be safely reset 18 CLK after the previous reset and the previous copy will still be drawn. BUT the 'start' signal for the previous copy will be delayed a further 2 CLK due to the 2- phase clock being reset before the 'start' signal has been clocked through to the 'start' latch.
. | CPU | CLK | Pixel | Main | Close | Medium | Far | PF | 0 | 0 | - | 1 | 17 | 33 | 65 | - . . . | 22 | 66 | - | 1 | 17 | 33 | 65 | - | 22.6 |-------------------------------------------| | 23 | 69 | 1 | 6 | 22 | 38 | 70 | 0.25 | 24 | 72 | 4 | 9 | 25 | 41 | 73 | 1 | 25 | 75 | 7 | 12 | 28 | 44 | 76 | 1.75 | 26 | 78 | 10 | 15 | 31 | 47 | 79 | 2.5 | 27 | 81 | 13 | 18 | 34 | 50 | 82 | 3.25 | 28 | 84 | 16 | 21 | 37 | 53 | 85 | 3 | 29 | 87 | 19 | 24 | 40 | 56 | 88 | | 30 | 90 | 22 | 27 | 43 | 59 | 91 | | 31 | 93 | 25 | 30 | 46 | 62 | 94 | | 32 | 96 | 28 | 33 | 49 | 65 | 97 | | 33 | 99 | 31 | 36 | 52 | 68 | 100 | | 34 | 102 | 34 | 39 | 55 | 71 | 103 | | 35 | 105 | 37 | 42 | 58 | 74 | 106 | | 36 | 108 | 40 | 45 | 61 | 77 | 109 | | 37 | 111 | 43 | 48 | 64 | 80 | 112 | | 38 | 114 | 46 | 51 | 67 | 83 | 115 | | 39 | 117 | 49 | 54 | 70 | 86 | 118 | | 40 | 120 | 52 | 57 | 73 | 89 | 121 | | 41 | 123 | 55 | 60 | 76 | 92 | 124 | | 42 | 126 | 58 | 63 | 79 | 95 | 127 | | 43 | 129 | 61 | 66 | 82 | 98 | 130 | | 44 | 132 | 64 | 69 | 85 | 101 | 133 | | 45 | 135 | 67 | 72 | 88 | 104 | 136 | | 46 | 138 | 70 | 75 | 91 | 107 | 139 | | 47 | 141 | 73 | 78 | 94 | 110 | 142 | | 48 | 144 | 76 | 81 | 97 | 113 | 145 | | 49 | 147 | 79 | 84 | 100 | 116 | 148 | | 50 | 150 | 82 | 87 | 103 | 119 | 151 | | 51 | 153 | 85 | 90 | 106 | 122 | 154 | | 52 | 156 | 88 | 93 | 109 | 125 | 157 | | 53 | 159 | 91 | 96 | 112 | 128 | 0 | | 54 | 162 | 94 | 99 | 115 | 131 | 3 | | 55 | 165 | 97 | 102 | 118 | 134 | 6 | | 56 | 168 | 100 | 105 | 121 | 137 | 9 | | 57 | 171 | 103 | 108 | 124 | 140 | 12 | | 58 | 174 | 106 | 111 | 127 | 143 | 15 | | 59 | 177 | 109 | 114 | 130 | 146 | 18 | | 60 | 180 | 112 | 117 | 133 | 149 | 21 | | 61 | 183 | 115 | 120 | 136 | 152 | 24 | | 62 | 186 | 118 | 123 | 139 | 155 | 27 | | 63 | 189 | 121 | 126 | 142 | 158 | 30 | | 64 | 192 | 124 | 129 | 145 | 1 | 33 | | 65 | 195 | 127 | 132 | 148 | 4 | 36 | | 66 | 198 | 130 | 135 | 151 | 7 | 39 | | 67 | 201 | 133 | 138 | 154 | 10 | 42 | | 68 | 204 | 136 | 141 | 157 | 13 | 45 | | 69 | 207 | 139 | 144 | 0 | 16 | 48 | | 70 | 210 | 142 | 147 | 3 | 19 | 51 | | 71 | 213 | 145 | 150 | 6 | 22 | 54 | | 72 | 216 | 148 | 153 | 9 | 25 | 57 | | 73 | 219 | 151 | 156 | 12 | 28 | 60 | | 74 | 222 | 154 | 159 | 15 | 31 | 63 | | 75 | 225 | 157 | 2 | 18 | 34 | 66 | | 76 | 228 | 0 | 5 | 21 | 37 | 69 | |-------------------------------------------| Start HBLANK | CPU | CLK | Pixel | Main | Close | Medium | Far | PF .
Also note that hitting RESP0 before HBLANK has finished will reset the counter immediately, but it will only start counting again when HBLANK goes off. Due to output clocking, this will produce player graphics at playfield pixel 1.
The 6-digit score trick involves putting both players into 3-repeat mode (011 or 110 in NUSIZ0/1) and resetting them such that all the player 2 images are positioned exactly between all the player 1 images, ergo:
1 2 1 2 1 2
Then you need to set the graphics up (GRP0/1) for the first two digits, and write some very precise timing code to wait until the scan-line is just about to start drawing the first copy of P1. While you're waiting, get the rest of the graphics loaded into the registers (A, X, and Y).
At this point you need to start storing all the graphics you've loaded into GRP0 and GRP1 as fast as you can - it will look like this because there's only one way to do it fast enough:
. STA GRP0 ; 3 STX GRP1 ; 3 STY GRP0 ; 3 ST? GRP1 ; 3 we've run out of registers! .
Notice that each one takes 3 cycles to execute (which is 9 pixels) and makes the change on the -end- of the 3rd cycle. We could use the stack pointer register (S) for the last one and do a TSX, but that would take 5 cycles (that's 15 pixels) which is too long.
To get it working you need to turn on VDELP0/1 (vertical delay) which allows you to set up the first 3 digits in the TIA's graphics registers before the beginning of the scanline, and requires only the 3 remaining registers to hold the last 3 digits.
I've found a post in the Stellar archives that explains this technique in great detail, so I'll stop here.
Please note that these notes are my own, and are made available without any warranties of any kind. They may include errors, omissions and much that is apocryphal; use at your own risk.
Please let me know if you spot anything that is blatantly wrong and I'll update the document. I'm also happy to answer any questions about this stuff.
Copyright © Andrew Towers 2003
alternate source TIA Hardware Notes