Home Will template function typedef specifier be properly inlined when creating each instance of template function?
Reply: 0

Will template function typedef specifier be properly inlined when creating each instance of template function?

user2895 Published in April 25, 2018, 6:04 pm

Have made function that operates on several streams of data in same time, creates output result which is put to destination stream. It has been put huge amount of time to optimize performance of this function (openmp, intrinsics, and etc...). And it performs beautifully. There is alot math involved here, needless to say very long function.

Now I want to implement in same function with math replacement code for each instance of this without writing each version of this function. Where I want to differentiate between different instances of this function using only #defines or inlined function (code has to be inlined in each version).

Went for templates, but templates allow only type specifiers, and realized that #defines can't be used here. Remaining solution would be inlined math functions, so simplified idea is to create header like this:


#pragma once

typedef struct ALM_DATA
  int l, t, r, b;
  int scan;
  BYTE* data;  

typedef BYTE (*MATH_FX)(BYTE&, BYTE&);
// etc

inline BYTE math_a1(BYTE& A, BYTE& B){ return ((BYTE)((B > A) ? B:A)); }
inline BYTE math_a2(BYTE& A, BYTE& B){ return ((BYTE)(255 - ((long)((long)(255 - A) * (255 - B)) >> 8))); }
inline BYTE math_a3(BYTE& A, BYTE& B){ return ((BYTE)((B < 128)?(2*(((long)A>>1)+64))*((float)B/255):(255-(2*(255-(((long)A>>1)+64))*(float)(255-B)/255)))); }
// etc

template <typename MATH>
inline int const template_math_av (MATH math, ALM_DATA& a, ALM_DATA& b) 
  // ultra simplified version of very complex code
  for (int y = a.t; y <= a.b; y++)
    int yoffset = y * a.scan;
    for (int x = a.l; x <= a.r; x++)
      int xoffset = yoffset + x;
      a.data[xoffset] = math(a.data[xoffset], b.data[xoffset]);
  return 0;

ALM_API int math_caller(int condition, ALM_DATA& a, ALM_DATA& b);

and math_caller is defined in 'alm_quasimodo.cpp' as follows:

#include "stdafx.h"
#include "alm_quazimodo.h"

ALM_API int math_caller(int condition, ALM_DATA& a, ALM_DATA& b)
    case 1: return template_math_av<MATH_FX>(math_a1, a, b);
    case 2: return template_math_av<MATH_FX>(math_a2, a, b);
    case 3: return template_math_av<MATH_FX>(math_a3, a, b);
    // etc
  return -1;

Main concern here is optimization, mainly in-lining of MATH function code, and not to break existing optimizations of original code. Without writing each instance of function for specific math operation, of course ;)

So does this template inlines properly all math functions? And any suggestions how to optimize this function template?

If nothing, thanks for reading this lengthy question.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.318927 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO