Friday, January 30, 2015

Function Pointers and Callbacks in C

Function pointers are among the most powerful tools in C, but are a bit of a pain during the initial stages of learning. This article demonstrates the basics of function pointers, and how to use them to implement function callbacks in C. C++ takes a slightly different route for callbacks, which is another journey altogether.
A pointer is a special kind of variable that holds the address of another variable. The same concept applies to function pointers, except that instead of pointing to variables, they point to functions. If you declare an array, say, int a[10]; then the array name a will in most contexts (in an expression or passed as a function parameter) “decay” to a non-modifiable pointer to its first element (even though pointers and arrays are not equivalent while declaring/defining them, or when used as operands of the sizeof operator). In the same way, for int func();, func decays to a non-modifiable pointer to a function. You can think of func as a const pointer for the time being.
But can we declare a non-constant pointer to a function? Yes, we can — just like we declare a non-constant pointer to a variable:
int (*ptrFunc) ();
Here, ptrFunc is a pointer to a function that takes no arguments and returns an integer. DO NOT forget to put in the parenthesis, otherwise the compiler will assume that ptrFunc is a normal function name, which takes nothing and returns a pointer to an integer.
Let’s try some code. Check out the following simple program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include<stdio.h>
 
/* function prototype */
int func(int, int);
int main(void)
{
    int result;
    /* calling a function named func */
    result = func(10,20);       
    printf("result = %d\n",result);
    return 0;
}
 
/* func definition goes here */
int func(int x, int y)             
{
return x+y;
}
As expected, when we compile it with gcc -g -o example1 example1.c and invoke it with ./example1, the output is as follows:
result = 30
The above program calls func() the simple way. Let’s modify the program to call using a pointer to a function. Here’s the changed main() function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include<stdio.h>
int func(int, int);
int main(void)
{
    int result1,result2;
    /* declaring a pointer to a function which takes
       two int arguments and returns an integer as result */
    int (*ptrFunc)(int,int);
 
    /* assigning ptrFunc to func's address */                     
    ptrFunc=func;
 
    /* calling func() through explicit dereference */
    result1 = (*ptrFunc)(10,20);
 
    /* calling func() through implicit dereference */         
    result2 = ptrFunc(10,20);               
    printf("result1 = %d result2 = %d\n",result1,result2);
    return 0;
}
 
int func(int x, int y)
{
    return x+y;
}
The output has no surprises:
result1 = 30 result2 = 30

A simple callback function

At this stage, we have enough knowledge to deal with function callbacks. According to Wikipedia, “In computer programming, a callback is a reference to executable code, or a piece of executable code, that is passed as an argument to other code. This allows a lower-level software layer to call a subroutine (or function) defined in a higher-level layer.”
Let’s try one simple program to demonstrate this. The complete program has three files: callback.c, reg_callback.h and reg_callback.c.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* callback.c */
#include<stdio.h>
#include"reg_callback.h"
 
/* callback function definition goes here */
void my_callback(void)
{
    printf("inside my_callback\n");
}
 
int main(void)
{
    /* initialize function pointer to
    my_callback */
    callback ptr_my_callback=my_callback;                           
    printf("This is a program demonstrating function callback\n");
    /* register our callback function */
    register_callback(ptr_my_callback);                             
    printf("back inside main program\n");
    return 0;
}
1
2
3
/* reg_callback.h */
typedef void (*callback)(void);
void register_callback(callback ptr_reg_callback);
1
2
3
4
5
6
7
8
9
10
11
/* reg_callback.c */
#include<stdio.h>
#include"reg_callback.h"
 
/* registration goes here */
void register_callback(callback ptr_reg_callback)
{
    printf("inside register_callback\n");
    /* calling our callback function my_callback */
    (*ptr_reg_callback)();                                  
}
Compile, link and run the program with gcc -Wall -o callback callback.c reg_callback.c and ./callback:
This is a program demonstrating function callback
inside register_callback
inside my_callback
back inside main program
The code needs a little explanation. Assume that we have to call a callback function that does some useful work (error handling, last-minute clean-up before exiting, etc.), after an event occurs in another part of the program. The first step is to register the callback function, which is just passing a function pointer as an argument to some other function (e.g., register_callback) where the callback function needs to be called.
We could have written the above code in a single file, but have put the definition of the callback function in a separate file to simulate real-life cases, where the callback function is in the top layer and the function that will invoke it is in a different file layer. So the program flow is like what can be seen in Figure 1.
Program flow
Figure 1: Program flow
The higher layer function calls a lower layer function as a normal call and the callback mechanism allows the lower layer function to call the higher layer function through a pointer to a callback function.
This is exactly what the Wikipedia definition states.

Use of callback functions

One use of callback mechanisms can be seen here:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/ * This code catches the alarm signal generated from the kernel
    Asynchronously */
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
 
struct sigaction act;
 
/* signal handler definition goes here */
void sig_handler(int signo, siginfo_t *si, void *ucontext)
{
   printf("Got alarm signal %d\n",signo);
   /* do the required stuff here */
}
 
int main(void)
{
    act.sa_sigaction = sig_handler;
    act.sa_flags = SA_SIGINFO;
 
    /* register signal handler */
    sigaction(SIGALRM, &act, NULL);  
    /* set the alarm for 10 sec */       
    alarm(10);   
    /* wait for any signal from kernel */                                        
    pause();  
    /* after signal handler execution */                                             
    printf("back to main\n");                     
    return 0;
}
Signals are types of interrupts that are generated from the kernel, and are very useful for handling asynchronous events. A signal-handling function is registered with the kernel, and can be invoked asynchronously from the rest of the program when the signal is delivered to the user process. Figure 2 represents this flow.
Kernel callback
Figure 2: Kernel callback
Callback functions can also be used to create a library that will be called from an upper-layer program, and in turn, the library will call user-defined code on the occurrence of some event. The following source code (insertion_main.c, insertion_sort.c and insertion_sort.h), shows this mechanism used to implement a trivial insertion sort library. The flexibility lets users call any comparison function they want.
1
2
3
4
/* insertion_sort.h */
 
typedef int (*callback)(int, int);
void insertion_sort(int *array, int n, callback comparison);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
/* insertion_main.c */
 
#include<stdio.h>
#include<stdlib.h>
#include"insertion_sort.h"
 
int ascending(int a, int b)
{
    return a > b;
}
 
int descending(int a, int b)
{
    return a < b;
}
 
int even_first(int a, int b)
{
    /* code goes here */
}
 
int odd_first(int a, int b)
{
    /* code goes here */
}
 
int main(void)
{
    int i;
    int choice;
    int array[10] = {22,66,55,11,99,33,44,77,88,0};
    printf("ascending 1: descending 2: even_first 3: odd_first 4: quit 5\n");
    printf("enter your choice = ");
    scanf("%d",&choice);
    switch(choice)
    {
        case 1:
            insertion_sort(array,10, ascending);
            break;
        case 2:
            insertion_sort(array,10, descending);
         case 3:
            insertion_sort(array,10, even_first);
            break;
        case 4:
            insertion_sort(array,10, odd_first);
            break;
        case 5:
            exit(0);
        default:
            printf("no such option\n");
    }
 
    printf("after insertion_sort\n");
    for(i=0;i<10;i++)
        printf("%d\t", array[i]);
    printf("\n");
     return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/* insertion_sort.c */
 
#include"insertion_sort.h"
 
void insertion_sort(int *array, int n, callback comparison)
{
    int i, j, key;
    for(j=1; j<=n-1;j++)
    {
        key=array[j];
        i=j-1;
        while(i >=0 && comparison(array[i], key))
        {
            array[i+1]=array[i];
            i=i-1;
        }
        array[i+1]=key;
    }
}

Monday, January 19, 2015

Memory optimizing for embedded system products

Optimization is important to embedded software developers because they are always facing limited resources. So, being able to control the size and speed trade-off with code is critical. It is less common for thought to be given to the optimization of data, where there can be a similar speed-versus-size tension. This article looks at how this conflict comes about and what the developer can do about it.

A key difference between embedded and desktop system programming is variability: every Windows PC is essentially the same, whereas every embedded system is different. There are a number of implications of this variability: tools need to be more sophisticated and flexible; programmers need to be ready to accommodate the specific requirements of their system; standard programming languages are mostly non-ideal for the job. This last point points towards a key issue: control of optimization.

Optimization is a set of processes and algorithms that enable a compiler to advance from translating code from (say) C into assembly language to translating an algorithm expressed in C into a functionally identical one expressed in assembly. This is a subtle but important difference.

Data/memory optimization
A key aspect of optimization is memory utilization. Typically, a decision has to be made in the trade-off between having fast code or small code - it is rare to have the best of both worlds. This decision also applies to data. The way data is stored into memory affects its access time. With a 32-bit CPU, if everything is aligned with word boundaries, access time is fast; this is termed ‘unpacked data’. Alternatively, if bytes of data are stored as efficiently as possible, it may take more effort to retrieve data and hence the access time is slower; this is ‘packed’ data. So you have a choice much the same as with code: compact data that is slow to access, or some wasted memory but fast access to data.

For example, this structure:

   struct
   {
      short two_byte;
      char one_byte;
   } my_array[4];


could be mapped into memory in a number of ways. The C language standard gives the compiler complete freedom in this regard. Two possibilities are: packed, like this:


or unpacked like this:


Unpacked could be even more wasteful. This graphic shows word (16-bit) alignment. Long word (32-bit) alignment would result in 5 bytes being wasted for every 3 bytes of data!

Most embedded compilers have a switch to select what kind of code generation and optimization is required. However, there may be a situation where you decide to have all your data unpacked for speed, but have certain data structures where you would rather save memory by packing. In this case, the language extension keyword packed may be applied, thus:

   packed struct
   {
      short two_byte;
      char one_byte;
   } my_array[4];


This overrides the optimization setting for this one object.

Alternatively, you may need to pack all the data to save memory, and have certain items that you want unpacked either for speed or for sharing with other software. This is where the unpacked extension keyword applies.

It is unlikely that you would use both packed and unpacked keywords in one program, as only one of the two code generation options can be active at any one time.

Other data optimizations
Space optimization. As previously discussed, modern embedded compilers provide the opportunity to minimize the space used by data objects; this may be controlled quite well by the developer. However, this optimization is only to the level of bytes, which might not be good enough.

For example, imagine an application that uses a large table of values, each of which is in the range 0 to 15. Clearly this requires 4 bits of storage (a nibble), so keeping them in bytes would only be 50% efficient. It is the developer’s job to do better (if memory footprint is deemed to be of greater importance than access time). There are broadly two ways to address this problem.

One way is to use bit fields in structures. This has the advantage that a compiler can readily optimize memory usage, if the target CPU offers a convenient capability. The downside is that bit fields within a structure cannot be indexed without writing additional code, but this is not too difficult. The following code shows how to access nibbles in an array of structures:

   struct nibbles
   {
      unsigned n0 : 4;
      unsigned n1 : 4;
      unsigned n2 : 4;
      unsigned n3 : 4;
   } mydata[100];

   unsigned get_nibble(struct nibbles words[], unsigned index)
   {
      unsigned nibble;

      nibble = index % 4;
      index /= 4;
      switch (nibble)
      {
      case 0:
         return words[index].n0;
         case 1:
      return words[index].n1;
      case 2:
         return words[index].n2;
      case 3:
         return words[index].n3;
      }
   }


A similar put_nibble() function would be required, of course.

The other way to code a solution would be to perform all the bit shifting explicitly in the code, which is really just emulating what the compiler might generate. It is unlikely that a human programmer could produce code substantially more efficient than a modern compiler.

Speed optimization. There is little a developer can do to improve speed of access to data beyond the optimization that the compiler does (i.e., not packing the data for fast access). But one option is to locate data in the fastest available memory. An embedded toolchain includes a linker, which will normally have the flexibility to effect this optimization. This opens up a few possibilities for consideration:

The fastest place to keep data is in a CPU register, but these are in short supply and should be used sparingly. Most compilers make smart choices for register optimization.

RAM is the fastest type of memory in most systems. Obviously, variables tend to be located in RAM, but it may be worthwhile to ensure that constant data is copied into RAM as well. This is commonly done automatically, as code is normally copied from flash to RAM for execution.

Microcontrollers typically have on-chip RAM, which is faster than external memory. So ensuring that speed-critical data is located there makes sense.

Memory is commonly cached into an internal buffer for fast access. Some CPUs permit locking of a cache so that the contents are always immediately available.