[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patches] powerpc 8xx dcbz problem



Mark Mitchell wrote:
> Steven Munroe wrote:
>
>   
>> -with-cpu= basically adds another level (to that selected for the
>> platform/ABI) to the search order. For this case we might like a
>> --with-fixes=cpu-type1,cpu-type2 where we could select overrides from a
>> fixes add-on?
>>     
>
> Yes, perhaps something like that would work.  For convenience, one would
> want a "--with-fixes=all" that would pull in all applicable fixes, so
> that you could really say "for 603 and all allegedly compatible chips
> that we know about, working around bugs in those chips".
>
> I'm not quite sure where the line between "fixes" and "optimizations"
> lies; if you're really trying to build a "generic" GLIBC, you want to
> work on all CPUs, and the costs of some level of dynamism might be worth
> it, so as to get optimized versions of particular routines.
>
>   
fixes are working around a functional problem, that only ocurrs on some
chips. Fixes may have negative performance impact, especially for chips
that don't need the "fix" functionally.

optimizations don't effect/change the function, but do improve the
performance, at least for some chips.
> So, maybe what we really want is "--with-tune=generic --with-arch=XXX"
> to mean "run on XXX and later CPUs, trying to get good performance
> across all of them, including dynamism where necessary to work around
> bugs or to go fast"?  And, maybe we could spell that "--with-cpu=603+"
> to use the existing mechanism?  By default, the "+" variant is exactly
> like the plain variant -- but now, if we populate the "power603+"
> directory with additional files, those take precedence over plain
> "power603", so we can include the necessary dynamism there?
>   
The current --with-cpu=<cpu-type> mechanism simply assets
-mcpu=<cpu-type>, so we are restricted to <cpu-types> that are known to
gcc. Also for power5+, the "+" is actually a ISA statement (power5+ has
more instructions then power5) and not a general modifier. Also we are
trying to standardise on -mcpu being a ISA selector and -mtune being a
instruction schedualing (micro-arch tuning) selector. So what would
-with-tune=generic mean? In the general case, can't be done (CELL PPE
with 2 pipes vs 970 with 10 pipes, both ISA V2.02, but no single
scheduler works for both)!

So be careful. This kind of "performance tuning" can only be true over
some small set of chips you are interrested in. In your world (Embedded
with low order micro-arch, with limited instruction parallelism) a
single binary, dynamic selection may improve performance for some case,
but the dynamic test is not free.  So there will always be costs and
losses in this strategy. in your case other conciderations may out weigh
the performance issues.

In my world this would almost always means I am leaving performance
untapped or seeing negative performance (the cycles cost of the dynamic
test may exceed any gain). And these issues do show up in the SPECcpu
benchmarks. That is why the dynamic library (multilib) selection is the
only real option.

That is why I don't want this stuff in the trunc. It has very limited
range where it fixes or improves.