gcc - SSE2 instructions not working in inline assembly with C++ -


i have function uses sse2 add values it's supposed add lhs , rhs , store result lhs:

template<typename t> void simdadd(t *lhs,t *rhs) {     asm volatile("movups %0,%%xmm0"::"m"(lhs));     asm volatile("movups %0,%%xmm1"::"m"(rhs));      switch(sizeof(t))     {         case sizeof(uint8_t):         asm volatile("paddb %%xmm0,%%xmm1":);         break;          case sizeof(uint16_t):         asm volatile("paddw %%xmm0,%%xmm1":);         break;          case sizeof(float):         asm volatile("addps %%xmm0,%%xmm1":);         break;          case sizeof(double):         asm volatile("addpd %%xmm0,%%xmm1":);         break;          default:         std::cout<<"error"<<std::endl;         break;     }      asm volatile("movups %%xmm0,%0":"=m"(lhs)); } 

and code uses function this:

float *values=new float[4]; float *values2=new float[4];  values[0]=1.0f; values[1]=2.0f; values[2]=3.0f; values[3]=4.0f;  values2[0]=1.0f; values2[1]=2.0f; values2[2]=3.0f; values2[3]=4.0f;  simdadd(values,values2); for(uint32_t count=0;count<4;count++) std::cout<<values[count]<<std::endl; 

however isn't working because when code runs outputs 1,2,3,4 instead of 2,4,6,8

i've found inline assembly support isn't reliable in modern compilers (as in, implementations plain buggy). better off using compiler intrinsics declarations c functions, compile specific opcode.

intrinsics let specify exact sequence of opcodes, leave register coloring compiler. it's more reliable trying move data between c variables , asm registers, inline assemblers have fallen down me. lets compiler schedule instructions, can provide better performance if works around pipeline hazards. ie, in case do

void simdadd(float *lhs,float *rhs) {    _mm_storeu_ps( lhs, _mm_add_ps(_mm_loadu_ps( lhs ), _mm_loadu_ps( rhs )) ); } 

in case, anyway, you've 2 problems:

  1. the terrible gcc inline assembly syntax makes great confusion of difference between pointers , values. use *lhs , *rhs instead of lhs , rhs; apparently "=m" syntax means "implicitly use pointer thing i'm passing instead of thing itself."
  2. gcc has source,destination syntax -- addps stores result in second parameter, you need output xmm1, not xmm0.

i've put a fixed example on codepad (to avoid cluttering answer, , demonstrate works).


Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -