; The CPU is a VLIW (very-long instruction-word) processor.
; It contains 9 "units", and each VLIW instruction specifies an
; operation for each of the 9 units independently and simultaneously.
; Most units are registers, and the instruction indicates which of
; several values to load that register with.

; Four primary 12-bit registers, A, B, C, D:
;    A: first operand to ALU, also reads IO input
;    B: second operand to ALU, also can be loaded with a constant
;    C: loop counter
;    D: data storage, can exchange with A
;
; Additional registers:
;    W: 12-bit write-only register, IO output always comes from here
;    F: 1-bit comparison result flag
;
; Additional units:
;    branch control unit
;    IO output
;    ALU: choose between computing 13 different functions 
;
; Each of the units above is independently targetable in the
; VLIW instruction set. For each unit's opcode, specify one of
; the below strings (exactly; the assembler does string matching),
; or no string at all for a no-op in that unit. Separate each of the
; unit opcodes using commas; the unit opcodes can appear in any
; order. Note that all operations occur simultaneously, so each
; operation refers to the value that was in the registers at the
; beginning of the instruction, so the order the unit opcodes are
; listed in has no effect.
;
;  A unit:    B unit:      C unit:       D unit:
;   A=D        B=A          C=alu         D=A
;   A=IN1      B=constant   DEC
;   A=IN2      B=alu        DECNZ label
;   A=alu
;
;  W unit:       F unit:       branch:       output
;   W=A           F=ZERO(alu)   JMP label     OUT1=W
;   W=constant    F=NEG(alu)    JMPT label    OUT2=W
;   W=alu         F=POS(alu)    JMPF label
;
; constant = -2048..2047, or -$800..$7FF
; replace 'alu' with one of:
;    0
;    -A
;    B
;    C
;    A+B
;    B-A
;    A+B+F     ; F is used as a carry
;    B-A-F     ; F is used as a borrow
;    A>>1      ; shift bits right
;    A|B       ; bitwise or
;    A&B       ; bitwise and
;    A^B       ; bitwise xor
;    ~A        ; bitwise inverse
;
; All appearances of 'alu' in a single instruction must expand to the
; same string. You can use shorthand like 'A=B=B-A' if multiple registers
; are getting the alu result. You can use F=ZERO(),F=NEG(),F=POS() to
; implicitly use the alu string specified elsewhere in the same instruction.
;
; All constants in a single instruction must be identical. All labels in
; a single instruction must be identical. If both a label and a constant
; are present, they must have the same value, or the constant must be in
; the range -32..31 and the instruction must be 0..63. (Instructions are
; numbered starting at 0, and you are limited to at most 256 instructions.)
;
; If both IN and OUT appear in the same instruction, they must have the same
; index (i.e. IN1 & OUT1 or IN2 & OUT2, but not IN1 & OUT2 or IN2 & OUT1).
;
; F=ZERO(v) sets F to true if v is 0, false otherwise.
;
; F=POS(v) sets F to true if v is positive, false otherwise. v is computed with
; extra precision, so F=POS(B-A) is equivalent to F=(B>A). Likewise, F=NEG(v)
; sets F to true if v is negative, false otherwise, with extended precision for v.
;
; A>>1 places the bottom bit into the hidden sign bit normally used
; to compute F=ZERO/POS/NEG. This means that F=NEG(A>>1) sets F if
; A was odd, and otherwise clears F. Because F is computed from all
; the bits, this means other F functions on A>>1 make little sense.
; Specifically:
;     F=ZERO(A>>1) is true if and only if A was zero before the shift.
;     F=POS(A>>1) is true if and only if A was positive and even before the shift
;
; JMPT means jump if F is true, JMPF means jump if F is false.
;
; DEC means 'C=C-1'. DECNZ means 'C=C-1', then branch if the result is non-zero
; (similar to 8086 "LOOP" instruction).
;
; Instructions must always begin with a space character. If there is no
; space character, the line is assumed to begin with a label. Labels are
; terminated with the ':' character. Labels can appear on a line by themselves
; and does not count as an instruction.
;
; Example instructions:
;
;   A=B,B=A           ; swaps A & B. Note that A=B implicitly uses the ALU to compute B.
;
;   A=B-A,F=ZERO()    ; compute B-A into A, and sets F to true if A & B were equal
;
;   A=IN1,W=A,OUT1=W  ; writes old value of W, loads W from A, and loads A from input.
;
; label2: B=D=A,C=A+B ; can use B=D=A notation even when non-ALU
;
;   A=0,B=1           ; B loads immediate constant 1, ALU is set to 0 for A
;
;   A=IN1,B=B-A,DECNZ label2,D=A,F=ZERO(),JMPT label2,W=A,OUT1=W ; uses every unit
;
;
; Conventional instruction ordering:
;
; The standard convention for ordering of operations within a single instruction
; is to sequence them (as much as possible) so that if they were split into separate
; instructions, they would still produce the same result. E.g. if one operation
; reads from A and another writes to A, then put the first before the second.
; Branches go at the end.
;
; There are two cases that cannot be ordered so that they behave the same as separate instructions:
;   1. A cycle or swap like "D=A,A=D"
;   2. An assignment to F in the same instruction as JMPT or JMPF
;
; The first case should be put after everything else except branches.
;
; In the second case, if the assignment to F is explicit, e.g. F=ZERO(B-A),
; then it should appear before any assignments to the ALU inputs, here A and B.
; If it is implicit, e.g. F=ZERO(), then it can be placed arbitrarily, but before
; any branches.
; 
; So the last example above:
;   A=IN1,B=B-A,DECNZ label2,D=A,F=ZERO(),JMPT label2,W=A,OUT1=W
; could be written:
;   OUT1=W,W=A,D=A,B=B-A,A=IN1,F=ZERO(),DECNZ label2,JMPT label2
;
; ============================================================================

;;;;;;;;;;;;;
;;;;;;;;;;;;;
;;;
;;;   Example problem: compute OUT1 as 8*IN1
;;;

; straightforwad logic:

loop1:
   A=IN1
   B=A
   A=B=A+B
   A=B=A+B
   W=A+B
   OUT1=W
   JMP loop1

; load 'A' 1-cycle earlier; this loads an extra IN1, but it's harmless to read off the end

   A=IN1
loop2:
   B=A
   A=B=A+B
   A=B=A+B
   W=A+B
   OUT1=W
   A=IN1
   JMP loop2

; combine instructions where possible to get a 5-cycle loop

   A=IN1
loop3:
   B=A
   A=B=A+B
   A=B=A+B
   W=A+B
   OUT1=W,A=IN1,JMP loop3

; compute A=IN1 as soon as possible:

   A=IN1
loop4:
   B=A
   A=B=A+B
   A=B=A+B
   W=A+B,A=IN1
   OUT1=W,JMP loop4

; compute B earlier to get a 4-cycle loop

   A=IN1
   B=A
loop5:
   A=B=A+B
   A=B=A+B
   W=A+B,A=IN1
   OUT1=W,B=A,JMP loop5