If you spot a problem with this page, click here to create a Bugzilla issue.

Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.

std.digest.murmurhash

Computes MurmurHash hashes of arbitrary data. MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. It is optimized for x86 but can be used on all architectures.

The current version is MurmurHash3, which yields a 32-bit or 128-bit hash value. The older MurmurHash 1 and 2 are currently not supported.

MurmurHash3 comes in three flavors, listed in increasing order of throughput:

MurmurHash3!32 produces a 32-bit value and is optimized for 32-bit architectures
MurmurHash3!(128, 32) produces a 128-bit value and is optimized for 32-bit architectures
MurmurHash3!(128, 64) produces a 128-bit value and is optimized for 64-bit architectures

Note

MurmurHash3!(128, 32) and MurmurHash3!(128, 64) produce different values.
The current implementation is optimized for little endian architectures. It will exhibit different results on big endian architectures and a slightly less uniform distribution.

This module conforms to the APIs defined in std.digest.

This module publicly imports std.digest and can be used as a stand-alone module.

Source std/digest/murmurhash.d

License:

Boost License 1.0.

Authors:

Guillaume Chatelet

References Reference implementation
Wikipedia

Examples:

// MurmurHash3!32, MurmurHash3!(128, 32) and MurmurHash3!(128, 64) implement
// the std.digest Template API.
static assert(isDigest!(MurmurHash3!32));
// The convenient digest template allows for quick hashing of any data.
ubyte[4] hashed = digest!(MurmurHash3!32)([1, 2, 3, 4]);
writeln(hashed); // [0, 173, 69, 68]

Examples:

// One can also hash ubyte data piecewise by instanciating a hasher and call
// the 'put' method.
const(ubyte)[] data1 = [1, 2, 3];
const(ubyte)[] data2 = [4, 5, 6, 7];
// The incoming data will be buffered and hashed element by element.
MurmurHash3!32 hasher;
hasher.put(data1);
hasher.put(data2);
// The call to 'finish' ensures:
// - the remaining bits are processed
// - the hash gets finalized
auto hashed = hasher.finish();
writeln(hashed); // [181, 151, 88, 252]

Examples:

// Using `putElements`, `putRemainder` and `finalize` you gain full
// control over which part of the algorithm to run.
// This allows for maximum throughput but needs extra care.

// Data type must be the same as the hasher's element type:
// - uint for MurmurHash3!32
// - uint[4] for MurmurHash3!(128, 32)
// - ulong[2] for MurmurHash3!(128, 64)
const(uint)[] data = [1, 2, 3, 4];
// Note the hasher starts with 'Fast'.
MurmurHash3!32 hasher;
// Push as many array of elements as you need. The less calls the better.
hasher.putElements(data);
// Put remainder bytes if needed. This method can be called only once.
hasher.putRemainder(ubyte(1), ubyte(1), ubyte(1));
// Call finalize to incorporate data length in the hash.
hasher.finalize();
// Finally get the hashed value.
auto hashed = hasher.getBytes();
writeln(hashed); // [188, 165, 108, 2]

struct MurmurHash3(uint size, uint opt = size_t.sizeof == 8 ? 64 : 32);

Implements the MurmurHash3 functions. You can specify the size of the hash in bit. For 128 bit hashes you can specify whether to optimize for 32 or 64 bit architectures. If you don't specify the opt value it will select the fastest version of the host platform.

This hasher is compatible with the Digest API:

void start()
void put(scope const(ubyte)[] data...)
ubyte[Element.sizeof] finish()

It also provides a faster, low level API working with data of size Element.sizeof:

void putElements(scope const(Element[]) elements...)
void putRemainder(scope const(ubyte[]) data...)
void finalize()
Element get()
ubyte[Element.sizeof] getBytes()

Examples:

The convenient digest template allows for quick hashing of any data.

ubyte[4] hashed = digest!(MurmurHash3!32)([1, 2, 3, 4]);
writeln(hashed); // [0, 173, 69, 68]

Examples:

One can also hash ubyte data piecewise by instanciating a hasher and call the 'put' method.

const(ubyte)[] data1 = [1, 2, 3];
const(ubyte)[] data2 = [4, 5, 6, 7];
// The incoming data will be buffered and hashed element by element.
MurmurHash3!32 hasher;
hasher.put(data1);
hasher.put(data2);
// The call to 'finish' ensures:
// - the remaining bits are processed
// - the hash gets finalized
auto hashed = hasher.finish();
writeln(hashed); // [181, 151, 88, 252]

alias Element = uint;: The element type for 32-bit implementation.
pure nothrow @nogc void putElement(uint block);: Adds a single Element of data without increasing element_count. Make sure to increase element_count by Element.sizeof for each call to putElement.
pure nothrow @nogc void putRemainder(scope const(ubyte[]) data...);: Put remainder bytes. This must be called only once after putElement and before finalize.
pure nothrow @nogc void finalize();: Incorporate element_count and finalizes the hash.
pure nothrow @nogc Element get();: Returns the hash as an uint value.
pure nothrow @nogc ubyte[4] getBytes();: Returns the current hashed value as an ubyte array.
pure nothrow @nogc void putElements(scope const(Element[]) elements...);: Pushes an array of elements at once. It is more efficient to push as much data as possible in a single call. On platforms that do not support unaligned reads (MIPS or old ARM chips), the compiler may produce slower code to ensure correctness.
pure nothrow void put(scope const(ubyte)[] data...);: Adds data to the digester. This function can be called many times in a row after start but before finish.
pure nothrow ubyte[Element.sizeof] finish();: Finalizes the computation of the hash and returns the computed value. Note that finish can be called only once and that no subsequent calls to put is allowed.

Library Reference

std.digest.murmurhash