[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patches] [PATCH] ARM: NEON detected memcpy.



On Mon, Apr 08, 2013 at 10:11:59AM +0100, Will Newton wrote:
> On 4 April 2013 07:37, OndÅej BÃlka <neleai@xxxxxxxxx> wrote:
> > On Thu, Apr 04, 2013 at 12:15:17PM +0800, Shih-Yuan Lee (FourDollars) wrote:
> >> Hi Ondrej,
> >>
> >> I do have some benchmark data.
> >>
> > Hi,
> >
> > Try also benchmark with real world data (20MB). I put it on
> > http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2
> 
> Hi Ondrej,
> 
> How was the workload chosen for this test run? Is it a known "memcpy
> hot" workload?
> 
Collected during day of normal usage.

Majority of memcpy calls are hot, see how delay between calls are distributes in:
http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/result.html
There more than 95% of calls is less than 2^15 = 32768 cycles from previous
call.

> Also it looks like the data was captured on x86_64? I suspect we
yes.
> should use a specific data set for each architecture - the alignment
> of data will change depending on the ABI alignment rules and different
> compilers inline e.g. constant sized memcpys in different ways. Last
> time I looked gcc seemed to be much more aggressive with inlining
> string functions on x86 than arm for example.
> 
If you want capture data for arm do following:

rm record.rec # Otherwise you would append to x64 data.
make

# I did not test on arm so record for example make or anything other of interest.
LD_PRELOAD=./record.so make

# Then see if data are really recorded
./show #displays alignment and lengths of recorded data.

# Finally you can enably recording globaly by 
echo $PWD/record.so >> /etc/ld.so.preload

> Thanks,
> 
> -- 
> Will Newton
> Toolchain Working Group, Linaro

_______________________________________________
Patches mailing list
Patches@xxxxxxxxxx
http://eglibc.org/cgi-bin/mailman/listinfo/patches