Report a bug
If you spot a problem with this page, click here to create a Bugzilla issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.

core.simd

Builtin SIMD intrinsics

Source core/simd.d

Authors:
template Vector(T)
Create a vector type.

Parameters T = one of double[2], float[4], void[16], byte[16], ubyte[16], short[8], ushort[8], int[4], uint[4], long[2], ulong[2]. For 256 bit vectors, one of double[4], float[8], void[32], byte[32], ubyte[32], short[16], ushort[16], int[8], uint[8], long[4], ulong[4]

alias void16 = __vector(void[16]);
alias double2 = __vector(double[2]);
alias float4 = __vector(float[4]);
alias byte16 = __vector(byte[16]);
alias ubyte16 = __vector(ubyte[16]);
alias short8 = __vector(short[8]);
alias ushort8 = __vector(ushort[8]);
alias int4 = __vector(int[4]);
alias uint4 = __vector(uint[4]);
alias long2 = __vector(long[2]);
alias ulong2 = __vector(ulong[2]);
enum XMM: int;
XMM opcodes that conform to the following:
opcode xmm1,xmm2/mem
and do not have side effects (i.e. do not write to memory).
STOSS
MOVSS xmm1/m32, xmm2
STOSD
MOVSD xmm1/m64, xmm2
STOAPS
MOVAPS xmm2/m128, xmm1
STOAPD
MOVAPD xmm2/m128, xmm1
STODQA
MOVDQA xmm2/m128, xmm1
STOD
MOVD reg/mem64, xmm 66 0F 7E /r
STOQ
MOVQ xmm2/m64, xmm1
LODSS
MOVSS xmm1, xmm2/m32
LODSD
MOVSD xmm1, xmm2/m64
LODAPS
MOVAPS xmm1, xmm2/m128
LODAPD
MOVAPD xmm1, xmm2/m128
LODDQA
MOVDQA xmm1, xmm2/m128
LODD
MOVD xmm, reg/mem64 66 0F 6E /r
LODQ
MOVQ xmm1, xmm2/m64
LODDQU
MOVDQU xmm1, xmm2/mem128 F3 0F 6F /r
STODQU
MOVDQU xmm1/mem128, xmm2 F3 0F 7F /r
MOVDQ2Q
MOVDQ2Q mmx, xmm F2 0F D6 /r
MOVHLPS
MOVHLPS xmm1, xmm2 0F 12 /r
LODHPD
MOVHPD xmm1, m64
STOHPD
MOVHPD mem64, xmm1 66 0F 17 /r
LODHPS
MOVHPS xmm1, m64
STOHPS
MOVHPS m64, xmm1
MOVLHPS
MOVLHPS xmm1, xmm2
LODLPD
MOVLPD xmm1, m64
STOLPD
MOVLPD m64, xmm1
LODLPS
MOVLPS xmm1, m64
STOLPS
MOVLPS m64, xmm1
MOVMSKPD
MOVMSKPD reg, xmm
MOVMSKPS
MOVMSKPS reg, xmm
MOVNTDQ
MOVNTDQ m128, xmm1
MOVNTI
MOVNTI m32, r32
MOVNTPD
MOVNTPD m128, xmm1
MOVNTPS
MOVNTPS m128, xmm1
MOVNTQ
MOVNTQ m64, mm
MOVQ2DQ
MOVQ2DQ
LODUPD
MOVUPD xmm1, xmm2/m128
STOUPD
MOVUPD xmm2/m128, xmm1
LODUPS
MOVUPS xmm1, xmm2/m128
STOUPS
MOVUPS xmm2/m128, xmm1
pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1, void16 op2);
Generate two operand instruction with XMM 128 bit operands.
This is a compiler magic function - it doesn't behave like regular D functions.

Parameters opcode = any of the XMM opcodes; it must be a compile time constant op1 = first operand op2 = second operand

Returns:
result of opcode

Example

import core.simd;
import core.stdc.stdio;

void main()
{
    float4 A = [2.34f, -70000.0f, 0.00001f, 345.5f];
    float4 R = A;
    R = cast(float4) __simd(XMM.RCPSS, R, A);
    printf("%g %g %g %g\n", R.array[0], R.array[1], R.array[2], R.array[3]);
}
Prints 0.427368 -70000 1e-05 345.5. The use of the two operand form for XMM.RCPSS is necessary because the result of the instruction contains elements of both operands.

Example

double[2] A = [56.0, -75.0];
double2 R = cast(double2) __simd(XMM.LODUPD, *cast(double2*)A.ptr);
The cast to double2* is necessary because the type of *A.ptr is double.

Examples:
float4 a;
a = cast(float4)__simd(XMM.PXOR, a, a);
pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1);
Unary SIMD instructions.
pure nothrow @nogc @safe void16 __simd(XMM opcode, double d);
pure nothrow @nogc @safe void16 __simd(XMM opcode, float f);
Examples:
float4 a;
a = cast(float4)__simd(XMM.LODSS, a);
pure nothrow @nogc @safe void16 __simd(XMM opcode, void16 op1, void16 op2, ubyte imm8);
For instructions: CMPPD, CMPSS, CMPSD, CMPPS, PSHUFD, PSHUFHW, PSHUFLW, BLENDPD, BLENDPS, DPPD, DPPS, MPSADBW, PBLENDW, ROUNDPD, ROUNDPS, ROUNDSD, ROUNDSS

Parameters opcode = any of the above XMM opcodes; it must be a compile time constant op1 = first operand op2 = second operand imm8 = third operand; must be a compile time constant

Returns:
result of opcode
Examples:
float4 a;
a = cast(float4)__simd(XMM.CMPPD, a, a, 0x7A);
pure nothrow @nogc @safe void16 __simd_ib(XMM opcode, void16 op1, ubyte imm8);
For instructions with the imm8 version: PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSRLDQ, PSLLDQ

Parameters opcode = any of the XMM opcodes; it must be a compile time constant op1 = first operand imm8 = second operand; must be a compile time constant

Returns:
result of opcode
Examples:
float4 a;
a = cast(float4) __simd_ib(XMM.PSRLQ, a, 0x7A);
pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, void16 op1, void16 op2);
For "store" operations of the form: op1 op= op2 such as MOVLPS.
Returns:
op2 These cannot be marked as pure, as semantic() doesn't check them.
pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, double op1, void16 op2);
pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, float op1, void16 op2);
pure nothrow @nogc @safe void16 __simd_sto(XMM opcode, void16 op1, long op2);
Examples:
void16 a;
float f = 1;
double d = 1;

cast(void)__simd_sto(XMM.STOUPS, a, a);
cast(void)__simd_sto(XMM.STOUPS, f, a);
cast(void)__simd_sto(XMM.STOUPS, d, a);
void prefetch(bool writeFetch, ubyte locality)(const(void)* address);
Emit prefetch instruction.
Parameters:
const(void)* address address to be prefetched
writeFetch true for write fetch, false for read fetch
locality 0..3 (0 meaning least local, 3 meaning most local)

Note The Intel mappings are:

writeFetchlocalityInstruction
false0prefetchnta
false1prefetch2
false2prefetch1
false3prefetch0
true0prefetchw
true1prefetchw
true2prefetchw
true3prefetchw

V loadUnaligned(V)(const V* p)
if (is(V == void16) || is(V == byte16) || is(V == ubyte16) || is(V == short8) || is(V == ushort8) || is(V == int4) || is(V == uint4) || is(V == long2) || is(V == ulong2) || is(V == double2) || is(V == float4));
Load unaligned vector from address. This is a compiler intrinsic.
Parameters:
V* p pointer to vector
Returns:
vector
V storeUnaligned(V)(V* p, V value)
if (is(V == void16) || is(V == byte16) || is(V == ubyte16) || is(V == short8) || is(V == ushort8) || is(V == int4) || is(V == uint4) || is(V == long2) || is(V == ulong2) || is(V == double2) || is(V == float4));
Store vector to unaligned address. This is a compiler intrinsic.
Parameters:
V* p pointer to vector
V value value to store
Returns:
value