On 03/04/13 16:08, Joseph S. Myers wrote:
I was previously told by people at ARM that NEON memcpy wasn't a good idea
in practice because of raised power consumption, context switch costs etc.
from using NEON in processes that otherwise didn't use it, even if it
appeared superficially beneficial in benchmarks.
What really matters is system power increase vs performance gain and
what you might be able to save if you finish sooner. If a 10%
improvement to memcpy performance comes at a 12% increase in CPU
power, then that might seem like a net loss. But if the CPU is only
50% of the system power, then the increase in system power increase
is just half of that (ie 6%), but the performance improvement will
still be 10%. Note that 20% is just an example to make the figures
easier here, I've no idea what the real numbers are, and they will be
hightly dependent on the other components in the system: a back-lit
display, in particular, will use a significant amount of power.
It's also necessary to think about how the Neon unit in the processor
is managed. Is it power gated or simply clock gated. Power gated
regions are likely to have long power-up times (relative to normal
CPU operations), but clock-gated regions are typically
instantaneously available.
Finally, you need to consider whether the unit is likely to be
already in use. With the increasing trend to using the hard-float
ABI, VFP (and Neon) are generally much more widely used in code now
than they were, so the other potential cost of using Neon (lazy
context switching) is also likely to be a non-issue, than if the unit
is almost never touched.