Omgili, forum search, forums search, search forums, discussion search,discussions search, search discussions, board search, boards search, search boards
  Advanced Search

[boost] unordered_map 32bit VS 64bit

On Wed, 25 Jan 2012 09:52:33 +0200, Avikam Bara-nes <...@gmail.com

Hi,
I'm porting a project from Linux 32bit to Linux 64bit. While doing so I
notice that I have a dramatic performance degradation with unordered_map.
My platform is Ubuntu 10.4.3 64bit running on Intel Xeon X3220 but, I
tested similar program on WindowsXP SP2 64bit which running on Intel i5 and
I get the same degradation. My boost version is 1_46_1.
I use the following program in order to test unordered_map - Hop I didn't
make any copy mistakes :-).

#include <stdio.h#include <sys/time.h#include <boost/unordered_map.hpp
int main(int argc, char *argv[])
{
struct timeval tv_before, tv_after;
boost::unordered_map<boost::int_32_t, boost::int32_t
boost_map[1] = 2;
boost_map[3] = 4;

gettimeofday(&tv_before, NULL);
for(int i = ; i < 100000; i++)
boost_map.find(1);
gettimeofday(&tv_after, NULL);

printf("Boost time is %f \n",
(double)(((tv_after.tv_sec - tv_before.tv_sec) * 1000000 +
(double)(tv_after.tv_usec - tv_before.tv_usec)) / 100000));

return 0;
}

I use the following compile commands:
64bit - g++ -O3 perf_test -o perf_test_64bit
32bit - g++ -m32 -O3 perf_test -o perf_test_32bit

When I run the 32bit program I get 0.5 sec
When I run the 64bit program I get 1.54 sec

Any ideas ???

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



On Wed, 25 Jan 2012 15:55:14 +0100, Olaf van der Spek <...@vdspek.org

On Wed, Jan 25, 2012 at 8:52 AM, Avikam Bara-nes <...@gmail.com
Why doesn't the optimizer prune the entire loop? What compiler version
are you running?

--
Olaf

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 25 Jan 2012 16:59:00 +0200, Avikam Bara-nes <...@gmail.com

I used both gcc 4.4.3 and 4.6.2 and got the same results
On Jan 25, 2012 4:56 PM, "Olaf van der Spek" <...@vdspek.org

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 25 Jan 2012 12:33:01 +0000, "Stewart, Robert" <...@sig.com

I presume that you've ensured that there's sufficient memory available for the 64b process? That is, it isn't suffering from swapping, is it?

Out of curiosity, what happens if you use int64_t instead of int32_t?

Have you tried -O2 or -O1 to see if they make any difference?

_____
Rob Stewart robe...@sig.com
Software Engineer using std::disclaimer;
Dev Tools & Components
Susquehanna International Group, LLP http://www.sig.com

________________________________

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 25 Jan 2012 13:15:20 +0000, Daniel James <...@gmail.com

On 25 January 2012 12:33, Stewart, Robert <...@sig.com
FWIW I had a quick look at this earlier, and it appears to be mainly
caused by a check to see if the container is empty before searching
(after removing it, the benchmark was about the same for 32 and 64
bit). The check is there because an empty unordered_map doesn't
allocate any memory (a feature request from a while back). I'm a bit
surprised that it's so costly, considering the other things that find
has to do. I suspect it's something to do with the optimizer. I'll
look into it in more detail later.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 25 Jan 2012 19:05:01 +0000, "Grund, Holger (ISGT)" <...@morganstanley.com

My guess would be that there is 64-bit division for transforming the hash to a bucket index, which is killing perf (though one would wonder why there are no peeps for an instructions like MOD(1,x) in the codegen).

I guess, you could try out by changing the bucket count and hash code to a uint32_t in the 64-bit build.

Most compilers will transform a div by constant to a mul. Maybe some toolchains would replace a div by bucketcount with a MUL when doing value-range PGO and numbers are reasonably stable (i.e. there is some geometric growth with a factor of ~2 or so, so that there's only a handful of divisors).

-hg

--------------------------------------------------------------------------
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, 25 Jan 2012 23:35:06 +0000, Daniel James <...@gmail.com

On 25 January 2012 19:05, Grund, Holger (ISGT)
<...@morganstanley.com
You're right. I tried removing all the modulus calculations from the
container (fine in this case since it's only ever looking for hash
values < 4 - I'm also using a modified benchmark which looks up i % 4
rather than 1, to stop things getting optimised away), and the
difference became much smaller. I didn't realise how expensive they
are on 64 bit computers, I think I might change the design of the
container to reduce the number of times they're used, although that
will make some other things a bit slower.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 11:11:54 +0400, Andrey Semashev <...@gmail.com

On Thu, Jan 26, 2012 at 3:35 AM, Daniel James <...@gmail.com
Is it possible to always use power-of-2 number of buckets and bitwise
operations instead of modulus division?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 07:29:05 +0000, Daniel James <...@gmail.com

On 26 January 2012 07:11, Andrey Semashev <...@gmail.com
Since the container can use arbitrary hash functions, and the
standard's quality requirements for hash functions aren't very high,
it's problematic. It might be possible to use a mix function on the
hash value and then use a power of 2, although that adds to the
platform specific issues, which I've been trying to avoid - I've had
enough problems dealing with varying compilers. I'll have to look into
how much faster it is (if at all). I originally developed this on 32
bit machines and it didn't seem worth it there.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 11:51:50 +0400, Andrey Semashev <...@gmail.com

On Thu, Jan 26, 2012 at 11:29 AM, Daniel James <...@gmail.com
I think, the problem of insufficient quality of a hash function should
not be solved by the container itself. Those users who provide good
hash functions will just waste CPU cycles if additional hash values
mixing is done within the container.

IMHO, the best way to address this problem is to provide a set of
"good" hash functions for common types (I believe functional/hash
already does this) and possibly a wrapper function that just does bit
mixing in a user-provided hash function. Boost.Unordered docs should
mention these tools and advise to use them to get better performance.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 08:16:12 +0000, Daniel James <...@gmail.com

On 26 January 2012 07:51, Andrey Semashev <...@gmail.com
That's really an issue for the standard (a cop out I know, but I don't
want to get into this debate).

Actually functional/hash doesn't. It's good enough for the standard,
but no better. For numbers that fit into the hash value, it just
returns them unchanged which is fine for a prime number of buckets but
not for power of 2 containers.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 15:16:47 +0400, Andrey Semashev <...@gmail.com

On Thu, Jan 26, 2012 at 12:16 PM, Daniel James <...@gmail.com
Well, we probably better fix functional/hash then?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 11:29:53 +0000, Daniel James <...@gmail.com

On 26 January 2012 11:16, Andrey Semashev <...@gmail.com
No, it's deliberate. It meets the standard's requirements. No more, no less.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 16:46:56 +0400, Andrey Semashev <...@gmail.com

On Thu, Jan 26, 2012 at 3:29 PM, Daniel James <...@gmail.com
I don't see why it shouldn't do better than required by the standard.
It's quite normal for Boost components to extend the standard and
provide superior solutions.

Anyway, if existing functional/hash functions are not suitable for the
task, we can add the bit mixing wrapper and recommend its usage with
Boost.Unordered.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 26 Jan 2012 16:26:48 +0000, Daniel James <...@gmail.com

On 26 January 2012 12:46, Andrey Semashev <...@gmail.com
A few emails ago you were worried about wasted cycles.

boost::hash should work well with std::unordered_map as it's defined
in the standard and boost::unordered_map should work well with
std::hash as it's defined in the standard. If that isn't the case then
they're failures. These are generic components, they should just work
without a wrapper or any such fuss.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Mon, 6 Feb 2012 11:10:14 +0200, Avikam Bara-nes <...@gmail.com

Hi,
Sorry but I"m a bit confuse,

1. What should I do ?
2. Is there a solution ?
3. Is there a workaround ?

Thanks in advance.

On Thu, Jan 26, 2012 at 6:26 PM, Daniel James <...@gmail.com

--
Avikam Bara-Nes

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, 7 Feb 2012 12:31:54 +0000, Daniel James <...@gmail.com

On 6 February 2012 09:10, Avikam Bara-nes <...@gmail.com
I'm going to try to do something about this for 1.50 (i.e. to be
released in 3 or 4 months). I tried gcc 4.7's implementation and it's
pretty similar to boost 1.46, a tad faster at lookup. Your benchmark
might perform better but it wouldn't if you always look up different
values (as you would in real code). I don't know about libc++ or
Visual C++'s implementation.

If you can't wait for 1.50 then your best bet is to use something like
google's dense hash containers, and try to make sure your hash
function works well with them. Or if you feel like being brave, I'll
add the new version to trunk around the time 1.49 is released.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Discussion Title: [boost] unordered_map 32bit VS 64bit
Title Keywords: [boost]  unordered_map  32bit  64bit