[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Patches] [PATCH] ARM: NEON detected memcpy.



On Thu, Apr 04, 2013 at 12:15:17PM +0800, Shih-Yuan Lee (FourDollars) wrote:
> Hi Ondrej,
> 
> I do have some benchmark data.
>
Hi, 

Try also benchmark with real world data (20MB). I put it on
http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2

To add neon copy test_generic.c file and add compiling neon
implementation to benchmark script.

It now only measures total time.
I would need something like timestamp counter for more detailed results.

> --- Running benchmarks (average case/perfect alignment case) ---
> 
> very small data test:
> memcpy_arm     :  (3 bytes copy) =   86.2 MB/s /   88.3 MB/s
> memcpy_neon    :  (3 bytes copy) =   53.4 MB/s /   54.5 MB/s
> memcpy_arm     :  (4 bytes copy) =   79.8 MB/s /   62.9 MB/s
> memcpy_neon    :  (4 bytes copy) =   72.5 MB/s /   73.9 MB/s
> memcpy_arm     :  (5 bytes copy) =   91.0 MB/s /   78.7 MB/s
> memcpy_neon    :  (5 bytes copy) =   90.2 MB/s /   91.0 MB/s
> memcpy_arm     :  (7 bytes copy) =  109.5 MB/s /  104.7 MB/s
> memcpy_neon    :  (7 bytes copy) =  122.1 MB/s /  126.6 MB/s
> memcpy_arm     :  (8 bytes copy) =  122.4 MB/s /  122.4 MB/s
> memcpy_neon    :  (8 bytes copy) =  142.0 MB/s /  148.2 MB/s
> memcpy_arm     :  (11 bytes copy) =  157.8 MB/s /  161.3 MB/s
> memcpy_neon    :  (11 bytes copy) =  193.8 MB/s /  196.2 MB/s
> memcpy_arm     :  (12 bytes copy) =  170.1 MB/s /  172.7 MB/s
> memcpy_neon    :  (12 bytes copy) =  206.8 MB/s /  212.5 MB/s
> memcpy_arm     :  (15 bytes copy) =  204.0 MB/s /  209.6 MB/s
> memcpy_neon    :  (15 bytes copy) =  247.5 MB/s /  270.3 MB/s
> memcpy_arm     :  (16 bytes copy) =  212.2 MB/s /  225.6 MB/s
> memcpy_neon    :  (16 bytes copy) =  175.3 MB/s /  252.2 MB/s
> memcpy_arm     :  (24 bytes copy) =  274.6 MB/s /  326.5 MB/s
> memcpy_neon    :  (24 bytes copy) =  244.7 MB/s /  367.8 MB/s
> memcpy_arm     :  (31 bytes copy) =  333.3 MB/s /  399.2 MB/s
> memcpy_neon    :  (31 bytes copy) =  304.3 MB/s /  463.5 MB/s
> 
> L1 cached data:
> memcpy_arm     :  (4096 bytes copy) = 1295.5 MB/s / 2691.8 MB/s
> memcpy_neon    :  (4096 bytes copy) = 1826.3 MB/s / 2021.8 MB/s
> memcpy_arm     :  (6144 bytes copy) = 1306.5 MB/s / 2724.1 MB/s
> memcpy_neon    :  (6144 bytes copy) = 1857.8 MB/s / 2053.2 MB/s
> 
> L2 cached data:
> memcpy_arm     :  (65536 bytes copy) = 1291.5 MB/s / 2304.8 MB/s
> memcpy_neon    :  (65536 bytes copy) = 1866.5 MB/s / 2441.7 MB/s
> memcpy_arm     :  (98304 bytes copy) = 1285.6 MB/s / 2283.8 MB/s
> memcpy_neon    :  (98304 bytes copy) = 1860.7 MB/s / 2454.7 MB/s
> 
> SDRAM:
> memcpy_arm     :  (2097152 bytes copy) =  466.7 MB/s /  736.5 MB/s
> memcpy_neon    :  (2097152 bytes copy) =  727.5 MB/s /  868.8 MB/s
> memcpy_arm     :  (3145728 bytes copy) =  507.9 MB/s /  854.7 MB/s
> memcpy_neon    :  (3145728 bytes copy) =  852.9 MB/s / 1038.0 MB/s
> 
> (*) 1 MB = 1000000 bytes
> (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports
> 
> The similar benchmark is at
> http://sourceware.org/ml/libc-ports/2009-07/msg00000.html .
> 
> Regards,
> $4
>
_______________________________________________
Patches mailing list
Patches@xxxxxxxxxx
http://eglibc.org/cgi-bin/mailman/listinfo/patches