[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patches] powerpc 8xx dcbz problem



Mark Mitchell wrote:
> Steven Munroe wrote:
>
>   
>> Memset is performance sensitive code and the dynamic __cache_line_size
>> check is slowing down 970/power4/power5/power5+/power6/power6x. When the
>> processor can retire (up to) 5 instructions per cycle, the dependent
>> sequence to address the GOT and check the __cache_line_size is very
>> noticable.
>>     
>
> I'm just trying to understand this better; please be patient if I ask
> dumb questions.
>
> I understand what you're saying above.  I'm assuming that
> --with-cpu=power6 (say) gives a version of memset that doesn't have the
> cache check, and therefore goes faster?
>
>   
Actually the power4 implementation removes the dynamic __cache_line_size
and are Implies files in 970/power5/power5+ so this implementation is
picked up for ppc970, power5, and power5+, Power6 needs additional
changes because the CPU is in-order and the interactions with the L1/L2
cache is different. So power6 has its own memset and power6x has Implies
files that pick up the memset.S from power6.

Basically Implies files can be used to customize the search order which
allows a single implementation can cover mutliple CPUs.
>> This is chip specific and 32-bit specific so does not belong in the
>> trunc. I think this should be a hard and fast rule. If you insist on the
>> dynamic approach then copy libc-start.c to
>> ports/sysdeps/unix/sysv/linux/powerpc/powerpc32/ and add the 8xx
>> specific hack there to zero the __cache_line_size.
>>     
>
> It certainly makes sense not to penalize 64-bit CPUs for something that
> only affects 32-bit CPUs.  So, I like this suggestion.
>
> The more general question is how to deal with CPU errata, which do seem
> to happen with surprising regularity.  In GCC, we can invent
> -mavoid-xyz-bug options, so that:
>
>   -march=603 -mavoid-8xx-bugs
>
> can generate code for 603 (or later) CPUs, but avoid bugs in 8xx
> processors.  Unfortunately, for bits of assembly code in GLIBC (or
> elsewhere) that doesn't do any good.
>
> So, when a user does --with-cpu=603 in GLIBC, we have two possible meanings:
>
> 1. The user has a 603, and wants code only to run on the 603, with
> maximum performance there.
>
> 2. The user wants code that will run on a 603 and any later processor.
>
> For (2), it makes sense to be dynamic and work around bugs that we know
> of in later CPUs.  For (1), that's just wasted code.  Does that suggest
> that we should have some additional configuration option to specify
> which model you want?
>   

It should be dynamic in the narrowest posible way. Memset on 64-bit
chips is just one example of the dynamic fix having performance
repercussions. sqrt() is another where

--with-cpu=<cpu-type> implies the -mcpu=<cpu-type> where -mcpu implies a
specific ISA (Version and feature set like Altivec, DFP). Any addtional
code picked up by --with-cpu must conform to the implies ISA set. This
can be adjusted by CFLAGS=-mtune=<cpu-type>. For example if I want
support back to power4 put want to performace well on power6 
"CFLAGS=-mtune=power6 configure ... --with-cpu=powe4".

-with-cpu= basically adds another level (to that selected for the
platform/ABI) to the search order. For this case we might like a
--with-fixes=cpu-type1,cpu-type2 where we could select overrides from a
fixes add-on?