VxWorks / Tornado II FAQ


2.2.1 Power PC

Q: I added memory to my board and now I get an error when I load the file. I increased the size from 32Mb to 64 Mb.

A: This is a problem of the Power PC architecture.
The problem is that memLib is allocating memory on a first-fit basis during a walk of memory from high-memory to low memory. So your newly ld'd .o image is being located in high memory. Your vxWorks kernel, on the other hand is located in low memory. The distance between them is more than 32 Mbytes. The encoding used by the code emitted by your compiler for function calls is a "relative jump" encoding rather than a jump to an absolute address. The relative jump encoding only has 26 effective bits of address, thus giving you an effective range of 32Mbytes of "jumpability".
On your 64Mbyte setup, the distance from your newly ld'd .o and the vxWorks kernel is > 32 Mbytes, and thus relative jumps don't work. The dynamic linker is complaining above about this fact.
When you compile your code withg the flag -mlongcall (see the gcc manual for information about this) a 32 bit jump address will be used.
(From: Bob Schulman, bob@seaweed.com)

Another solution is to define only 32 Mb and after loading all the modules adding the remaining 32Mb to the system using the command:

memAddToPool(sysMemTop(),0x2000000)
This adds the upper 32MB to available memory, thus ending up with a 64MB system. The drawback is that no more code can be downloaded unless a reboot is done, but that's just a minor inconvenience.
(From: Robert G Fenske Jr, fenske@rocke.electro.swri.edu)


Q: When I run VxWorks from a "ROM"-ed version it runs a lot slower than when it runs from a RAM version loaded over the network.

A: The proble is caused by the fact that the initialisation is different for a "rom" and a "ram" version.

The "rom" version does (later on) enable ICACHE and DCACHE, but examining the HID0 register, I saw some bit differences eventually (when my appl. runs):

"ROM version" (using romInit.s and bootInit.c):

  1. EMCP bit is set (DRAM ECC detection)
  2. SIED bit cleared (serial instr exec disable)
  3. BHTE bit cleared (branch history table enable)
"RAM version" (using sysAlib.s):
  1. EMCP bit cleared
  2. SIED bit set (in sysInit() )
  3. BHTE bit set (in sysInit() )
The SIED and BHTE bit being set strongly improve the PPC performance (of course). Setting these bits in ROM version resulted in same performance figures as for RAM version.
(From: Bert Pleijsier (pleysier@nlr.nl) and John Fusco ())


Q: I am using PPC860 / 850 BSP on a custom board and I receive the following error message "uninitialized interrupt" from time to time.
The offending interrupt is a Level 7 interrupt. ( Interrupt #15 in VxWorks. ) As Level 7 interrupt is mask off, so I assume it is not a hardware problem. It seems to be a software problem because Level 7 interrupt is the default encoding in SIVEC register. The code in ppc860Intr.c seems to be OK.

A: We had a similar problem with the PPC860SAR when we enabled the D-cache - turned out it was the CPM Error interrupt.
We fixed it by changing a couple of lines in init.s in the bootrom...

 /* Initialise instruction support control register (ICTRL) */
 lis     r5,HIADJ(0x00000007)      /* was 0x00000006 */
 addi    r5,r5,LO(0x00000007)      /* was 0x00000006 */
 mtspr   ICTRL,r5
I understand that this disables the placing of certain debug information on the bus.
(From: Will Fookes, will@noSpam.wfookes.co.uk)


Q: How can I measure the time a function takes to execute?

A: Here's my favorite time measuring routine on a PPC using the timebase, with a precision of 60 nsec, assuming the decrementer is running at 16.666666 MHz.

#define TIMEBASE_HZ		16666666
#define TIMEBASE_PERIOD		(1.0 / TIMEBASE_HZ)

double double_time(void)
{
	UINT32 tbu, tbl;
	vxTimeBaseGet(&tbu, &tbl);	/* Get 64-bit value */
	return (tbu * 4294967296.0 + tbl) * TIMEBASE_PERIOD;
}
It simply returns the time in seconds; for example:
void benchmark(void)
{
	double ts, te;

	ts = double_time();
	my_code();
	te = double_time();

	printf("Elapsed time = %f sec\n", te - ts);
	printf("             = %f usec\n", (te - ts) * 1.0e6);
}
If you're measuring a very short interval, in order to get accurate results, you may want to surround the benchmark with il=intLock()...intUnlock(il), assuming my_code() allows it.
You could also avoid using doubles without too much trouble, but doubles happen to be an ideal type for manipulating time values on this kind of processor.
(From: Curt McDowell, csm@broadcom.com)


Q: I have a problem with the PPC860 FEC code to get the ethernet controller working.

A: I recently found a problem with the VxWorks FEC code. Motorola changed the 860T Ethernet chip interface between the B5 and the D3 version and this required a modification to the VxWorks code. I believe I made changes to both the syslib.c and the FEC driver code to get the D3 chip to work. Take a look on the Motorola web site for a migration document for the 860T.
(From: Keith Galloway, kgalloway@cinci.rr.com)


Q: How is the stack for Power organised?

A: Here's something I wrote up for the PPC604, then modified for the PPC860. The stack frames under vxWorks/GNU are the same.
(From: Vic Sperry, sperry.family@gte.net)


Q: Is there a high-speed clock available?

A: As you're using a PowerPC, vxLib.h defines an undocumented function (actually given in arch/ppc/vxPpcLib.h)

IMPORT void     vxTimeBaseGet (UINT32 * pTbu, UINT32 * pTbl);
This gives you access to the CPU's timebase register, which counts at some multiple of your bus clock rate, and I think that will meet for your requirements. This counter might get reset on every system clock tick, but I've not checked that.
(From: Andrew Johnson, anj@aps.anl.gov)


Q: Function cacheArchInvalidate does not comply to EABI specifications.

A: Please take a look at the following output, which I obtained by doing

arppc x F:\TORNADO\target\lib\libPPC604gnuvx.a cacheALib.o
objdumpppc -Sr cachealib.o
in my $WIND_BASE/target/lib directory.
;// 000000c8  add r5,r5,r4
;// 000000cc  rlwinm r4,r4,0,0,26
;// 000000d4  beq     cr3,000000e8 
;// 000000d8  cmpwi   r3,0
;// 000000dc  bne     0000014c 
;// 000000e0  icbi      r0,r4
;// 000000e4  b       000000ec 
;// 000000e8  dcbi      r0,r4
;// 000000ec  addi   r4,r4,32
;// 000000f0  cmplw        r4,r5
;// 000000f4  bge  0000016c 
;// 000000f8  beq  cr3,000000e8 
;// 000000fc  b   000000e0 
Now, according to the EABI spec., condition code register fields 2, 3 and 4 are non-volatile, and any routine that wants to use them must save them on entry and restore them on exit. The cacheArchInvalidate routine above clearly does no such thing. [I've also checked the code path through cacheInvalidate that leads here, and it neither saves nor uses cr3].
This bug only shows itself up under very limited circumstances. To the best of my knowledge, the cr fields are only used for longterm storage when you have the optimizer turned on, and perhaps only at high levels. Use of the -g flag (and probably -fvolatile too, though I haven't checked) kills this optimization.
When you do get bitten by this one, though, it's going to be nasty. The apparent symptom is liable to be something such as an if .. else statement branching the wrong way, where the test in the condition is an expression that's used repeatedly in the enclosing function. Something like
void somefunction(lots of args)
{
  if (some condition)
    do something;
  else
    do something else;

   ... more code.....

  if (same condition as before)
    do something;
  else
    do something else;

... more code,  including some that does IO or for other reasons invalidates the cache .......

  if (same condition again)
    do something;
  else
    do something else;
}
..and you find that the third if (..) sometimes takes the opposite decision to the first two, despite there being no code in between that could alter the values of the variables on which the condition depends.
The answer is simple enough. Here's a little bit of assembler code that hotpatches your OS to make the code above use cr6 in place of cr3. It checks that all the instructions it is about to patch are where they should be, and won't do anything if it doesn't recognize the hex values corresponding to the instructions. You might want to remove the comments if you aren't using the c preprocessor on your .S files, and you might need to convert crX and rX into plain X.
// int FixOSProblem(void)
//  0 = fixed, 1 = not able to fix - code didn't match pattern.

                .globl  FixOSProblem
FixOSProblem:
                b       .go

.insns:
.insnc8:        add     r5,r5,r4
.insncc:        rlwinm  r4,r4,0,0,26            // change me
.insnd0:        cmpwi   cr3,r3,1                // and me
.insnd4:        beq     cr3,.insne8
.insnd8:        cmpwi   r3,0
.insndc:        bne     .insn14c
.insne0:        icbi    r0,r4
.insne4:        b       .insnec
.insne8:        dcbi    r0,r4
.insnec:        addi    r4,r4,32
.insnf0:        cmplw   r4,r5
.insnf4:        bge     .insn16c                // and me
.insnf8:        beq     cr3,.insne8
.insnfc:        b       .insne0
.insn14c:       ori     r0,r0,r0
.insn16c:       ori     r0,r0,r0

.repls:
.replc8:        add     r5,r5,r4
.replcc:        rlwinm  r4,r4,0,0,26            // change me
.repld0:        cmpwi   cr6,r3,1                // and me
.repld4:        beq     cr6,.reple8
.repld8:        cmpwi   r3,0
.repldc:        bne     .repl14c
.reple0:        icbi    r0,r4
.reple4:        b       .replec
.reple8:        dcbi    r0,r4
.replec:        addi    r4,r4,32
.replf0:        cmplw   r4,r5
.replf4:        bge     .repl16c                // and me
.replf8:        beq     cr6,.reple8
.replfc:        b       .reple0
.repl14c:       ori     0,r0,r0
.repl16c:       ori             r0,r0,r0

// first verify the insns are as we expect.
.go:            lis     r3,cacheArchInvalidate@ha
                addi    r3,r3,cacheArchInvalidate@l

                lis     r4,.insns@ha
                addi    r4,r4,.insns@l

                lis     r5,.repls@ha
                addi    r5,r5,.repls@l

                lwz     r6,.insnd0-.insns(r3)
                lwz     r7,.insnd0-.insns(r4)
                xor.    r6,r6,r7
                bne     cr0,.err

                lwz     r6,.insnd4-.insns(r3)
                lwz     r7,.insnd4-.insns(r4)
                xor.    r6,r6,r7
                bne     cr0,.err

                lwz     r6,.insnf8-.insns(r3)
                lwz     r7,.insnf8-.insns(r4)
                xor.    r6,r6,r7
                bne     cr0,.err

                lwz     r6,.repld4-.repls(r5)
                stw     r6,.repld4-.repls(r3)

                lwz     r6,.replf8-.repls(r5)
                stw     r6,.replf8-.repls(r3)

                lwz     r6,.repld0-.repls(r5)
                stw     r6,.repld0-.repls(r3)

// flush d cache back to mem
                li      r4,.repld4-.repls
                li      r5,.replf8-.repls
                li      r6,.repld0-.repls
// make no assumptions about cache lines
// just flush all 3 modified insns
                dcbst   r6,r3
                dcbst   r4,r3
                dcbst   r5,r3
// wait for mem to update
                sync
// invalidate I cache
                icbi    r6,r3
                icbi    r4,r3
                icbi    r5,r3
// and context sync to ensure I cache invalidation completes.
                isync

                xor     r3,r3,r3        // return success
                blr

.err:           li      r3,1            // return failure
                 blr

(From: Dave Korn)