GCC optimizes big switches with sse so we could clobber users floating
point registers when they would do a syscall
Reproducer:
```
#include <stdio.h>
#include <stdlib.h>
union num {
float f;
unsigned long long i;
};
#define WORKSIZE (1024 * 1024 * 32)
int main(int argc, char **argv) {
char *work = malloc(WORKSIZE);
char *fromaddr;
char sink;
union num r;
unsigned long long int offset;
r.f = drand48();
printf("r: %llx\n", (long long)r.i);
offset = (long long int)(r.f * (double)WORKSIZE);
fromaddr = work + offset;
printf("%e %llx %llx\n", r.f, offset, fromaddr);
sink = *fromaddr;
return 0;
}
```
Change-Id: I7bb0883ec8ef2f245ab98064e308025422afc115
- make should be $(MAKE)
- add + in front of rules spawning long-lasted make process in a
subshell. (This would not be needed with $(MAKE) -C .. target, but our
makefiles do not handle that because they use $(PWD))
- split the main 'all' rule as all 4 targets are independant
- fix dependencies where appropriate for parallelism
Extra, not speed-related changes:
- remove some double-colon for targets as they do not need it
This cuts build time from 5s to 1.5s on a laptop with -j4, and more
importantly from 85s to 35s on a KNL node.
As a bonus, the fixed dependencies removes the need to clean before
rebuilding all the time. Probably.