If you spot a problem with this page, click here to create a GitHub issue.

Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.

core.simd

Builtin SIMD intrinsics

Source core/simd.d

License:

Boost License 1.0.

Authors:

Walter Bright,

Source core/simd.d

template Vector(T)

Create a vector type.

Parameters T = one of double[2], float[4], void[16], byte[16], ubyte[16], short[8], ushort[8], int[4], uint[4], long[2], ulong[2]. For 256 bit vectors, one of double[4], float[8], void[32], byte[32], ubyte[32], short[16], ushort[16], int[8], uint[8], long[4], ulong[4]

alias void16 = __vector(void[16]);

alias double2 = __vector(double[2]);

alias float4 = __vector(float[4]);

alias byte16 = __vector(byte[16]);

alias ubyte16 = __vector(ubyte[16]);

alias short8 = __vector(short[8]);

alias ushort8 = __vector(ushort[8]);

alias int4 = __vector(int[4]);

alias uint4 = __vector(uint[4]);

alias long2 = __vector(long[2]);

alias ulong2 = __vector(ulong[2]);

enum XMM: int;

XMM opcodes that conform to the following:

opcode xmm1,xmm2/mem

and do not have side effects (i.e. do not write to memory).

STOSS: MOVSS xmm1/m32, xmm2
STOSD: MOVSD xmm1/m64, xmm2
STOAPS: MOVAPS xmm2/m128, xmm1
STOAPD: MOVAPD xmm2/m128, xmm1
STODQA: MOVDQA xmm2/m128, xmm1
STOD: MOVD reg/mem64, xmm 66 0F 7E /r
STOQ: MOVQ xmm2/m64, xmm1
LODSS: MOVSS xmm1, xmm2/m32
LODSD: MOVSD xmm1, xmm2/m64
LODAPS: MOVAPS xmm1, xmm2/m128
LODAPD: MOVAPD xmm1, xmm2/m128
LODDQA: MOVDQA xmm1, xmm2/m128
LODD: MOVD xmm, reg/mem64 66 0F 6E /r
LODQ: MOVQ xmm1, xmm2/m64
LODDQU: MOVDQU xmm1, xmm2/mem128 F3 0F 6F /r
STODQU: MOVDQU xmm1/mem128, xmm2 F3 0F 7F /r
MOVDQ2Q: MOVDQ2Q mmx, xmm F2 0F D6 /r
MOVHLPS: MOVHLPS xmm1, xmm2 0F 12 /r
LODHPD: MOVHPD xmm1, m64
STOHPD: MOVHPD mem64, xmm1 66 0F 17 /r
LODHPS: MOVHPS xmm1, m64
STOHPS: MOVHPS m64, xmm1
MOVLHPS: MOVLHPS xmm1, xmm2
LODLPD: MOVLPD xmm1, m64
STOLPD: MOVLPD m64, xmm1
LODLPS: MOVLPS xmm1, m64
STOLPS: MOVLPS m64, xmm1
MOVMSKPD: MOVMSKPD reg, xmm
MOVMSKPS: MOVMSKPS reg, xmm
MOVNTDQ: MOVNTDQ m128, xmm1
MOVNTI: MOVNTI m32, r32
MOVNTPD: MOVNTPD m128, xmm1
MOVNTPS: MOVNTPS m128, xmm1
MOVNTQ: MOVNTQ m64, mm
MOVQ2DQ: MOVQ2DQ
LODUPD: MOVUPD xmm1, xmm2/m128
STOUPD: MOVUPD xmm2/m128, xmm1
LODUPS: MOVUPS xmm1, xmm2/m128
STOUPS: MOVUPS xmm2/m128, xmm1

pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1, void16 op2);

Generate two operand instruction with XMM 128 bit operands.

This is a compiler magic function - it doesn't behave like regular D functions.

Parameters opcode = any of the XMM opcodes; it must be a compile time constant op1 = first operand op2 = second operand

Returns:

result of opcode

Example

import core.simd;
import core.stdc.stdio;

void main()
{
    float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f];
    float4 R = A;
    R = cast(float4) __simd(XMM.RCPSS, R, A);
    printf("%g %g %g %g\n", R.array[0], R.array[1], R.array[2], R.array[3]);
}

Prints 0.427368 -70000 1e-05 345.5. The use of the two operand form for XMM.RCPSS is necessary because the result of the instruction contains elements of both operands.

Example

double[2] A = [56.0, -75.0];
double2 R = cast(double2) __simd(XMM.LODUPD, *cast(double2*)A.ptr);

The cast to double2* is necessary because the type of *A.ptr is double.

Examples:

float4 a;
a = cast(float4)__simd(XMM.PXOR, a, a);

pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1);

Unary SIMD instructions.

pure nothrow @nogc @safe void16 __simd(XMM opcode, double d);

pure nothrow @nogc @safe void16 __simd(XMM opcode, float f);

Examples:

float4 a;
a = cast(float4)__simd(XMM.LODSS, a);

pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1, void16 op2, ubyte imm8);

For instructions: CMPPD, CMPSS, CMPSD, CMPPS, PSHUFD, PSHUFHW, PSHUFLW, BLENDPD, BLENDPS, DPPD, DPPS, MPSADBW, PBLENDW, ROUNDPD, ROUNDPS, ROUNDSD, ROUNDSS

Parameters opcode = any of the above XMM opcodes; it must be a compile time constant op1 = first operand op2 = second operand imm8 = third operand; must be a compile time constant

Returns:

result of opcode

Examples:

float4 a;
a = cast(float4)__simd(XMM.CMPPD, a, a, 0x7A);

pure nothrow @nogc @safe void16 __simd_ib(XMM opcode, void16 op1, ubyte imm8);

For instructions with the imm8 version: PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSRLDQ, PSLLDQ

Parameters opcode = any of the XMM opcodes; it must be a compile time constant op1 = first operand imm8 = second operand; must be a compile time constant

Returns:

result of opcode

Examples:

float4 a;
a = cast(float4) __simd_ib(XMM.PSRLQ, a, 0x7A);

pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, void16 op1, void16 op2);

For "store" operations of the form: op1 op= op2 such as MOVLPS.

Returns:

op2 These cannot be marked as pure, as semantic() doesn't check them.

pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, double op1, void16 op2);

pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, float op1, void16 op2);

pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, void16 op1, long op2);

Examples:

void16 a;
float f = 1;
double d = 1;

cast(void)__simd_sto(XMM.STOUPS, a, a);
cast(void)__simd_sto(XMM.STOUPS, f, a);
cast(void)__simd_sto(XMM.STOUPS, d, a);

void prefetch(bool writeFetch, ubyte locality)(const(void)* address);

Emit prefetch instruction.

Parameters:

const(void)* `address`	address to be prefetched
writeFetch	true for write fetch, false for read fetch
locality	0..3 (0 meaning least local, 3 meaning most local)

Note The Intel mappings are:


writeFetch	locality	Instruction
false	0	prefetchnta
false	1	prefetch2
false	2	prefetch1
false	3	prefetch0
true	0	prefetchw
true	1	prefetchw
true	2	prefetchw
true	3	prefetchw

V loadUnaligned(V)(const V* p) if (is(V == void16) || is(V == byte16) || is(V == ubyte16) || is(V == short8) || is(V == ushort8) || is(V == int4) || is(V == uint4) || is(V == long2) || is(V == ulong2) || is(V == double2) || is(V == float4));

Load unaligned vector from address. This is a compiler intrinsic.

Parameters:

V* p pointer to vector

Returns:

vector

V storeUnaligned(V)(V* p, V value) if (is(V == void16) || is(V == byte16) || is(V == ubyte16) || is(V == short8) || is(V == ushort8) || is(V == int4) || is(V == uint4) || is(V == long2) || is(V == ulong2) || is(V == double2) || is(V == float4));

Store vector to unaligned address. This is a compiler intrinsic.

Parameters:

V* `p`	pointer to vector
V `value`	value to store

Returns:

value

Library Reference

core.simd