Fixed the problem of "return error/goto out" while
locking the memory_range_lock in mbind().
Change-Id: I980a7a440f652b60379acae3cb3575211a749774
Fujitsu: POSTK_DEBUG_TEMP_FIX_100
Fix a problem that does not result in an error even
if MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES are
simultaneously specified in set_mempolicy() mode.
Change-Id: I06e695baf869daee8bc64179748cac27b64e914b
Fujitsu: POSTK_DEBUG_TEMP_FIX_99
Check interrupt enabled state in set_cputime() instead of enabling
them unconditionally on exit.
Change-Id: I99212855f33f5535f67f045665bf5e025c55b690
Fujitsu: POSTK_DEBUG_TEMP_FIX_98
Fixed an issue where errors generated in arch_cpu_read_write_register()
are not transmitted to the caller.
Change-Id: I05d7d872eab834918220cf18f628aee37208a156
Fujitsu: POSTK_DEBUG_TEMP_FIX_94
Since 4.17.0, kernel cannot call syscalls directly because the calling
convention can be different on x86_64, as explained in this email:
https://lore.kernel.org/lkml/20180325162527.GA17492@light.dominikbrodowski.net
Use the ksys_* alternatives instead when possible, or for readlink use
do_readlinkat (and use readlinkat all the time to simplify ifdefs)
It might be possible to change some of these without ifdefs, but for
example ksys_unshare only got introduced in 4.17 so we need to keep some
syscall calling...
Change-Id: Ic47e184b29ef8b21731b2eae6193b0af2548b872
In arm64, glibc-open of /dev/xpmem is hooked in sys_openat. This
commit adds xpmem_openat which is called by sys_openat.
This commit silently applies copy_from_user fix to sys_open as well.
Change-Id: I3b4f7bf0e152c359250bb2b56910db9192390cb1
Fujitsu: POSTK_DEBUG_ARCH_DEP_46, POSTK_DEBUG_ARCH_DEP_62
Since McKernel allocates hugepages by default, we could consider that
madvise call with MADV_HUGEPAGE is supported.
Change-Id: Ibdaa6f77416d029a1d17210773ef79539ba04b1c
envs are stuck after args which are now possibly unaligned, and used
from a non-aligned pointer in prepare_process_ranges_args_envs (env)
The memory immediately after args/envs is copied anyway with memcpy_long,
so make sure the bits are initialized and realign env correctly
Fixes: 70e52faf36 ("flatten_strings: do not return unused trailing bits")
Change-Id: Ic747e947d151c0eea65dec36bc9c888cf6e0c394
Add "-T 0" to mcreboot.sh if you want to turn off time sharing. When
it's turned off, McKernel doesn't activate interval timer when the
length of per-CPU run-queue is larger than one.
Change-Id: I2cedc1b30a9cd9a0f4608a32ecec0a0d58c6225e
it was meant for 3.10 kernels, so the regular < 4.0.0 check
will work for el7 and older kernels as well
Change-Id: I807f030f6303c9c3d17b0d80de55c256a3479486
the libc takes care of trying execve as many times as needed for
execvp, it's not a kernel call.
Also, sneak a double-free fix (desc was not reset properly in case
load_elf_desc_shebang failed)
Fixes: b1681f4a3affff ("mcexec/execve: fix shebangs handling")
Change-Id: If8e3d7ae53acdeffc0331ae8621e0832fcfa406f
Running "mcexec dfsafds" did not print any message in normal use.
Rather than looking for which message shows in debug and turn in into
eprintf, add a single coherent message (more shell-like) at the end and
turn other messages off.
There is a small loss of information but this is equivalent to what
shells give (a single errno value with no details), and it is now easy
to add --debug to mcexec to see more information if required
Change-Id: Id2c3a47880b7d1d7467883351e6e7af561f91bbf
rhel8 is a 4.18 kernel but they've already backported some later fixes.
Instead of relying on the kernel version, the changes removed some defines so
we can check for the define presence to make the code more robust to kernel
version wilderness instead
Change-Id: I6cf5548a7b73a7394405daf850f715a1e20ab0b4
This newer version is much simpler than the old ones:
- the options are noop, this lets the code simplify all the allocating
of a new option struct and passing it around
- ovl_reset_ovl_entry was added and called all the time, but the
mechanism that made this required is gone in this kernel version
On the other hand, one new thing in this version:
- newer kernel check the stacking depth of filesystems now, and we are
reaching the default limit of two with our setup. Bump it to three here.
Also, while we are here, make make fail if requested directory does not
exist, instead of infinitely recurse into make modules in the mcoverlayfs
directory...
Change-Id: I45050d693a0aa6fd3027deaf417c29876ef6a1ea
newer version of this function no longer return an error on the basis
that "no-one checks what it returns anyway"........
See linux 4.18's 27d036e33237e ("mm: Remove return value of zap_vma_ptes()")
Change-Id: I8fb9f060e3e145cc2db21738585c9ee7f1445f74
strncat must not look at the appendee's length, but at how much
is left where we're appending.
This API is stupid anyway, where is strlcat when we need it...
Change-Id: Icdf418083146420a06f8ba5ffdf882982610d39b
newer linux got a 5 level page table now, try to handle that.
Some of the macros will be no-op (e.g. loop only on one iteration) on
architecture/kernels with only 4 levels but the code needs to be there
to compile
Change-Id: Ifc6304cbb066dce7d4e30962687ae05d7e034730
The headers defines __task_cred and other macroes we use, and always
existed; we must have gotten it indirectly on older kernels, it doesn't
hurt to always include
Change-Id: Iacfff0365e7a21e6247eea42606bbbf1dfccc077
on newer x64 kernels (config option?), syscalls can be renamed to allow
both x64 and ia32 versions to coexist. Lookup either names
Change-Id: I2f55cc804d3eee948ee1ed6d18c69c75bd2f652c
The symbol appears in some header in some linux version,
it's still not exported so we need our own lookup anyway; just rename it.
Change-Id: Ia4bce85988641c96fa3f5a0ae1d42c25c713b6c2
In addition to that, mcctrl_perf_set is modified so that it updates
usrdata->perf_event_num with number of registered events.
Change-Id: I3f343176f55b06d3baab0b0fe34e240f39706cf6
Fujitsu: POSTK_DEBUG_TEMP_FIX_80
Trailing bits were displayed in proc->saved_cmdline, displaying
uninitialized data to the user in /proc/<pid>/cmdline
Change-Id: I74831c8c68dd2f2197b35e9b49aaaae29c4c1dd5
This would incorrectly make "mcexec sh -c './script.sh'" run with
/bin/bash instead of /bin/sh (which is important, because bash behaviour
changes depending on how it is invoked)
Change-Id: I80610cf442c6c3ecacfa23e8ed15652bc8d4e3f7
This reverts commit b70d470e20.
That commit had been landed too fast after a mistake during migration
from old to new gerrit that didn't keep -1 vote ; it needs some fix
Change-Id: Ifc8a23e42449dfe471049270b4706e9b137e096e
the 'num_processors' symbol is also used by linux, so trying to load all
symbols from linux and mckernel at the same time renders either symbol
inaccessible (the first to be seen is kept by default).
This provides an alternate name for the mckernel symbol, thus letting us
access both more easily if required.
Change-Id: I8074d4f9f9ac45717df9a8df16be710ff762e161
these functions are more logical to keep together there as they depend
on each other.
Also add a comment about the __printf attribute, if we have a quiet
period it would be useful to enable and clear the thousands of
warnings...
Change-Id: I47d3891c9cd87da28b2883c29384959f5abd1459
Hugetlbfs file mappings are handled differently than regular files:
- pager_req_create will tell us the file is in a hugetlbfs
- allocate memory upfront, we need to fail if not enough memory
- the memory needs to be given again if another process maps the same
file
This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.
To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align
Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2
Change-Id: I327920ff06efd82e91b319b27319f41912169af1
These macros are needed to make sure the compiler does not optimize away
atomic constructs such as "while (!READ_ONCE(foo))" loops that do not
modify foo within the loop
Also move the barrier() define where it belongs while we are here, it is
needed for READ_ONCE/WRITE_ONCE and including ihk/cpu.h here causes
include loops
Change-Id: Ia533a849ed674719ccbc0495be47d22a3c47b8f8
- remove unused MF_END (that only makes sense for enums without holes,
this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
(except for debug print)
Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592