Fix the issue where the value of no_preempt gets unexpected value
(-1, 1, 2 etc.) after process ends when running the UTI tests.
Change-Id: I7d9c08b754a171ea3fdec20ab2e635df3b607cbd
Fujitsu MPI tries to attach a segment with the size of the source range size plus one.
Change-Id: Iab3801727f938dfb6242b6b90c88e4986b84d08e
Refs: #1507
Previously, futex code of McKerenl was called by mccontrol,
but there ware some problems with this method.
(Mainly, location of McKernel image on memory)
Call futex code in mcctrl instead of the one in McKernel image,
giving the following benefits:
1. Not relying on shared kernel virtual address space with Linux any more
2. The cpu id store / retrieve is not needed and resulting in the code
Change-Id: Ic40929b64a655b270c435859fa287fedb713ee5c
refe: #1428
1. When nodes array is NULL, move_pages doesn't move any pages,
instead will return the node where each page
currently resides by status array.
2. Check whether all specified node is online or not.
Change-Id: Ie3534997833d797e2a9f595d1107b07d46e1c6cf
Refs: #1523
If flags specifies both MPOL_F_NODE and MPOL_F_ADDR,
get_mempolicy() will return the node ID of the node on
which the address addr is allocated into the location pointed to by mode.
Change-Id: Id485e3f4838e3679d877a95e53b21e3421cac88a
Don't use kernel_module_package not to create a separate
kmod-mckernel-*.rpm containing .ko files.
Change-Id: I25b7ff662476bfc735d319b57cdf2da82f2c6aa7
This integrates some of the changes of the following commit:
1cf0bd5a ("TO RESET: add debug instruments, map Linux areas for tofu")
Change-Id: Iffd8432d5a7b35f20bd45829a125583a0363dbf0
To prevent "TO RESET: send SIGSTOP instead of SIGV in PF" from making
some tests expecting SIGSEGV fail.
Change-Id: I8bb111cff59fe5b0b2bf6bc652dfd2fa308321ed
The steps of the technique to replace stack with hugetlbfs map are as
follows:
(1) Prepare a hugetlbfs map with the size of rlim_cur
(2) Copy the active region of the stack to the hugetlbfs map.
The range to copy is determined by reading /proc/[pid]/maps.
(3) Replace the stack map with the hugetlbfs map
The step (2) tries to copy a huge region if McKernel doesn't grow the
stack at run-time.
Change-Id: I5858c35b5c26dd0a42cccf9e3cc4c64b1a81f160
Formerly, if tgid is specified as -1, tgkill() was equivalent to tkill().
Now it is treated as an error EINVAL.
Change-Id: I47bc75d439662a36dc6167c4446a5277422de507
Refs: 1380
* Mapping attached part of segment is done at attach time instead of
make time to work with runtimes (e.g. OpenMPI) xpmem_make-ing the
entire user-space
* Mapping attached part of segment at attach time can be turned off by
specifying xpmem_remote_on_demand in kernel argument
* Mapping attachment chooses appropriate page-sizes, i.e., largest
allowed by memory range and segment page boundary
Fixes: a8696d8 "xpmem: Support large page attachment"
Change-Id: I44663865204036520e5f62fe22b9134ee4629f9b
* manage pages by an array
* fix mmap of fd created by memfd_create() populates the map
* refactor pgsize and pgshift handling
Change-Id: Icaf015b10afc35f2b95f93059adf1a1b6b92e14e
refs: #1475
Fixes: 8ee1d61d "Revert "Detect hang of McKernel in mcexec""
Fixes: b87ac8b8 "reproductible builds: remove most install paths in c code"
Change-Id: I8ef9ab81cd0a41ccd0e227ebc3e45c0745c150e9
- Specify AT_SYMLINK_NOFOLLOW in faccessat only when
the symbolic-link is analyzed by overlay_path().
Change-Id: Ie3b1f7fedef7441fd4b39c5c8b2ef0f73cba770e
Refs: #1370
make sure the caller thread holds migration queue lock with IRQs disabled
until it notifies the target CPU so that an interrupt can not deschedule
it in the middle of the request.
Change-Id: I85995018ca1e8478ccc9723985b6e8efc9c3acfb
Symptom:
A thread could call schedule() twice.
Cause:
(1) The migrator raises rescheduling flag
(2) The thread calls check_need_resched() for other
reason than the migrate IPI, e.g, response to system call
offload. And it finds that the flag is set and it's trying to
call schedule().
(3) The thread is interrupted by the migrate IPI and it finds that
the flag is set and calls schedule() in the interrupt context.
(4) The thread resumes the execution and call schedule()
Solution:
(1) Reset the rescheduling flag when checking it and it's set
(2) Set it again if it's decided not to call schedule()
Change-Id: I5376662d0b02ca4ebb29b42732e347f3b82d766d
Refs: #1400
Symptom and analysis:
runq_lock of the migration source is acquired on
the migration destination CPU.
This happens in the following steps:
(1) The thread stores value of cpu_local_var(runq_lock)
to its register when trying to perform
ihk_mc_spinlock_lock() on the lock variable.
(2) The thread takes IPI and migrates to another CPU.
(3) The thread resumes execution and acquires the wrong lock.
Solution:
* Disable interrupts before getting the value of
cpu_local_var(runq_lock)
Change-Id: Ia0ea450b97f872dd6116252537e4a79f85adfc88
Refs: #1400
rhel 7.5 and later kernels have a page offset that is no longer
necessarily 0xffff880000000000, leading to kernel panics if we
use the wrong address
Change-Id: I3572fde1c31303a937855c23fbd3815ce0f96c64
Add similar protection to clear_host_pte than to set_host_vma (see #986)
Also make the page fault handler only skip taking lock if the munmap
happened on the same cpu id
Change-Id: I6d9e68e8f8905b20bb2ccfa72848e04fe6404ab6
This reverts commit 0d3ef65092.
Reason for revert: This fix causes circular dependency with memory_range manipulation and TLB flush. See #1394.
Change-Id: I4774e81ff300c199629e283e538c0a30ad0eeaae
- Fixed the problem that instruction rewriting by PTRACE_POKETEXT is not reflected.
The cause is that the instruction cache was not flushed.
- Add instruction chache flush in ptrace_report_signal().
Change-Id: Ie9d34d3d33e1fd85aef5fe419345d82c6ca781fb
This fixes ostest-mem_limits.001 which tries to anonymous-mmap 95% of
total memory. It reports a failure because:
(1) McKernel tries to allocate physically contiguous area and
fails
(2) It turns on demand-paging
(3) It tries to obtain a page from zeroobj and fails
(4) It allocates a new page
(5) It performs COW on the page, which is unnecessary
Change-Id: Iddf0548bb9216f9bf91fb03fa21f890e599bfdad
This fixes ostest-mem_limits.005 which tries to move brk by 95% of
total memory. It reports a failure because McKernel tries to allocate
physically contiguous area and fails.
Change-Id: I50a61cb7103fdbdbe051f0ae276a79e8e2dcdda3
Consider "arm64" to be "aarch64".
It mistakenly considers cross-compilation when compiled through spack.
Change-Id: I914df482e21517adc1105512ea3d8919ef1577b1
Fix the error handling of the following two functions:
ihk_device_get_cpu_topology: Returns NULL when not found,
valid non-NULL pointer when found
get_cache_topology: Returns NULL when not found,
valid non-NULL pointer when found,
minus error number on error
Change-Id: Ied13a61d4ab0c314477c45ea659ff2b798ad97ee
Fujitsu: POSTK_DEBUG_TEMP_FIX_21
Determine V2PHYS_OFFSET dynamically.
Fix x86 hole handling in 64 bit address space.
Fix ARM64 virtual address handling and support separate user-space
and kernel-space translation tables (i.e., TTRB0 and TTRB1).
Fix page table walker's lookup functionality.
Change-Id: I6b281693cdc88bd1b8fe3f4b8f40a6af3ca95cc0
It was telling the vmap allocator to release a wrong address range
(physical address range).
Change-Id: I82236ac0086b5da24ac49219166abf363672d838
Refs: #985
Fujitsu: #11
Document known major (e.g. linux crash) bugs that have not been
fixed downstream and might require workarounds on specific
hardware configurations
Change-Id: I51e5d23243afd4489ce1ae25e736afc27b2c8202
Using a drop-in instead of an extra service avoids having to juggle
between both services (especially since irqbalance_mck did not have a
Conflict=irqbalance.service statement)
That way, we only have a single service to check for (irqbalance.service),
and system administrators should find this less confusing if they normally
rely on irqbalance.
The drop-in is also installed in /run so will automatically disappear in
the event of a linux crash or a reboot without shutting down mckernel
Change-Id: I004f4f25d9ca037e411e0bc91f4555db138ecfef
Note: the original fujitsu implementation didn't rename the various
save_fs function/desc to save_tls for some reason, might as well go all
the way though...
Change-Id: Ic362c15c8b320c4d258d2ead8c5fd4eafd9d0ae9
Fujitsu: POSTK_DEBUG_ARCH_DEP_91
arm64 performs context-switch in kernel space instead of user space as in
x86_64.
Change-Id: Ib119b9ff014effb970183ee86cfac67fab773cba
Futjitsu: POSTK_DEBUG_ARCH_DEP_99
Some old commit before -Werror was enabled got merged,
blocking other builds. Quickly fix before anyone notices
Change-Id: I5a034cef6f79e3e99b381bb1a5d97088e33a6718
mcoverlayfs code is now unused (technically should work on top of the
soft emulation but not well tested, and untested unused code is bad).
Remove it.
Left the unshare/bind_mount_recursive code in mcexec in a new
MCEXEC_BIND_MOUNT ifdef (only in config.h.in directly to discourage use.
it disables the ioctl as well, but the main code is still compiled to
keep up to date with linux api changes... although it's using kallsyms
lookup so it does not validate much more than "the symbol still exists")
I honestly think this should go as well (people who would want to use it
are root and could do it manually), but will give up for now.
Change-Id: I832b6a8ab19e24ed67a1a5044b1c6c32381ae0aa
Target commit:
Test "Direct access to McKernel memory from Linux." on arm64
Test "Scalable Vector Extension (SVE) support." on arm64
Change-Id: Ia9dc97c5cf0c4cf223423b4257745ea2101bee1d
In order to speed up test bot work it would be helpful to check for
identical build outputs and skip tests if required.
This removes most use of the install path in c code:
- ql_mpi uses /proc/self/exe and looks for talker/server in same
directory as itself
- mcexec looks for libihk.so in /proc/self/maps and use that path for
LD_PRELOAD prefix path
- rootfsdir is not used right now but until a better fix happens just
hardcode it, someone who wants to change it can set it through cmake
There is one last occurence of the install directory, MCEXEC_PATH in
mcctrl's binfmt code, for which the build system will just overwrite it
to a constant string at build time instead of trying to remove it too
hard. It would be possible to pass it as a kernel parameter or look for
mcexec in PATH but this is too much work for now.
Change-Id: I5d1352bc5748a1ea10dcae4be630f30a07609296
Use lsof to check for processes that still open /dev/mcosX at shutdown
time.
If lsof is not installed then the check is just not done (empty PROCS
result)
If -k is not passed, print a message listing pids of users and exit
(taking bets someone will use that and sed to kill out of mcstop+release
and rerun the stop script instead of passing -k at some point)
Change-Id: Idba7486fdede4990d9885d23f8077f33839daeed
Setting CMAKE_C_FLAGS_DEBUG does not work as first expected:
- set(... CACHE) didn't do anything because the variables were
initialized previously
- We could set with FORCE but then users could not change the value
- There is a way to only do that on initial cmake run but it has the
same problem
Thus, use a new regular cache variable directly instead
Change-Id: I20741fb385c171c6c1088bbd6c25666067e07288
Since arm64 shares the return value with the area of
the first argument, rewriting the return value before
the system call execution completes destroys the first argument.
Change-Id: I959944879254d8dd3a29489a65d8f274d45338e6
Fujitsu: POSTK_DEBUG_ARCH_DEP_110
Some memobjs (e.g. devobj) will not be considered 'in memobj' by
page_is_in_memobj.
Instead of trying to play whack-a-mole with the non-fileobj memobjs,
base the copy check on range's memobj and VR_PRIVATE (do not copy
MAP_SHARED mappings, so the fault handler will do the right thing™
when required)
Change-Id: Ic32cdc7766754f6559753b34845eb8c5cff6ed13
Refs: #1255
old gcc versions are stupid with nested structs and need us
to initialize .tickets.head and .tickets.tail in one go
Change-Id: I0d4caf8236066e7edf4a12e3270114132ced9585
The midr_* part of the struct was never used, and confuses older gcc
with partially uninitialized assignments that were not correct.
Just flatten the struct
Change-Id: I7a9cfe064ab97cdcd5ac50ce4fb713c4d7983bd3
These variables cannot be used uninitialized, and newer gcc versions
correctly do not bring the warning up, but this will shut up older ones
Change-Id: I2b2ea9b557196a3e7eea1e04dd1f160bd12d6e54
use generic struct zero initializer instead.
Older gcc used on arm also seem to have trouble with '{}',
so use '{ 0 }' instead
Change-Id: I83d43b05f8d1d44e1dd86502b48e28fe242e1db2
arch_setup_vdso() needs to return something even on panic to please gcc.
In theory, flagging panic() with __attribute__((noreturn)) should work
just the same and is a much better solution but for some reason on older
gcc versions setting the flag leads to the weak memset() symbol not
being found !?
Change-Id: Ifed100df5440ca24bb495817db9afc79f0ba6751
The arrays first init every fields to invalid op then override a few
fields, since this is not something we want to allow everywhere use
a GCC pragma to only ignore the warning there.
Change-Id: I498546fe60d60d4b000d711e22e04c8c360b5b83
We do not need two debug.h files.
Take Fujitsu's STATIC_ASSERT over BUILD_BUG_ON because it is more used
Change-Id: If04c17fbb7406ab15fe86267fed8d6da460cec62
Fujitsu: POSTK_DEBUG_ARCH_DEP_9
Fujitsu added this ifdef together with ifndef __arch64__ and thus disabled
the option for both archs in practice; it probably does not hurt to restore...
I'm not sure I see the point of disabling the option at mcexec level though,
but who am I to care.
Change-Id: I0d4bffb6ed325edac8ae577773e19c0fff6ca2ed
Fujitsu: POSTK_DEBUG_ARCH_DEP_53
Don't ask me why this shares POSTK_DEBUG_ARCH_DEP_50 with the ksym lookups...
Change-Id: Ic3db2cd77ca88be361cefec85d8ed9deb21ffcd8
Fujitsu: POSTK_DEBUG_ARCH_DEP_50
There are valid use cases where a remote page fault has no available
thread data/packet available to use, e.g. when device driver threads
need to access the data (BXI).
Do the per thread data lookup to use the right channel/tid if available,
and use mcctrl_ikc_send_wait with a new message number directly.
The fault is no longer handled in mckernel syscall forwarding code but
in the ikc handler directly in irq, this should be ok because page
faults are interrupts anyway so the code should be irq-safe.
Change-Id: Ie60f413cdaee6c1a824b4a2c93637899cb9bf9c9
Everything already uses kallsyms_lookup_name or similar, this
was leftover from when the build system was ported ages ago
Change-Id: I09dd0249845df90ab2e0adc28d0eb285c0ebb64b
Fujitsu: POSTK_DEBUG_ARCH_DEP_50
coredump() proceeds as follows:
1. coredump() calls gencore()
2. gencore() allocates ELF header to stack
3. gencore() prepares the core table and record the address of the ELF
header to the table and return to coredump()
4. coredump() offloads __NR_coredump with the address of the core
table
This fix prevents the ELF header from getting destroyed in the 3rd
step.
Change-Id: I770418c1658a6fdb640bb491fc076a31dfd41c22
Fujitsu: POSTK_TEMP_FIX_39
CPack takes the source dir as is, so if it was used to build something
it will incorrectly grab the temporary CMakeCache file and cmake will
complain during rpmbuild later on.
The BuildRequires should be a separate patch but logic behind the change
is that the dependencies need to be installed in the sysroot, and
rpmbuild cannot test this, so just move them all to only enforce
BuildRequires for native build.
And while we are here, also add a new kernel_dir specfile option.
Change-Id: Ie67932798f632e6d307f8ead93bdbe043e6e8898
- arm64: Get TSC corresponding to boot time from IHK.
- x86_64: Calculate the current time using vdso.
Refs: #1186
Fujitsu: POSTK_DEBUG_ARCH_DEP_52
Change-Id: I293ba4bbe5390d50dea44b8a5b7471f59237daff
user code also needs these defines; there was a hard-coded
definition left out from debugging that didn't get cleaned up
Change-Id: I951fcd6a3d6bc1d1f1c3e897058908167520f7bc
the application processor trampoline needs the trampoline physical
address to be mapped for the few instructions between loading the
page table and jumping to normal memory area; setup a new pt for them.
Also make it use its stack where it needs to be directly.
With that, x86 can finally remove the 0 page from its init mapping
Change-Id: Iab3f33a2ed22570eeb47b5ab6e068c9a17c25413
The attribute would impose 64-bytes alignment that we do not
respect later because the whole structures (e.g. process/thread)
are allocated at 32bytes boundaries with kmalloc
These are however justified for performance reason as we do not want
them on same page cache line, so just accept slower performance for
UBSAN only
Change-Id: Ia28968257675b7ae97b0391471986e6bf6485b7b
sprintf is implemented as snprintf(..., INT_MAX, ...) which will overflow
the argument pointer for the end, then fix the end to be -1.
This technically works but we know the actual buffer size in all these
call sites, might as well do this properly
Change-Id: I807d09f46a0221f539063fda515e1c504e658d40
A signed integer cannot be shifted in a way that will flip the
sign bit; make such arguments unsigned to be safe
Change-Id: Iafc060f98f899ae3ffb876ba22fdd6183fbb6e57
The linker maps parts of libs with different access flags,
so we cannot prepopulate the whole file.
[dominique.martinet@cea.fr: moved min and friends in compiler.h]
Change-Id: Ifbeddc0908699099cfae5ce9cc2adc578221db31
vmf_insert_pfn got added as a wrapper around vm_insert_pfn in 4.17
1c8f422059ae5da ("mm: change return type to vm_fault_t") and totally
replaced the later in 4.20 ae2b01f37044c ("mm: remove vm_insert_pfn()")
Compare with 4.18 here specifically to avoid troubles when rhel
backports this change later, and avoid adding a rhel version check down
the road.
Change-Id: Ibf108e2fb6f1199f89cde6a7973f4eb55447260b
This fix is rejected because it only makes the setfsuid test in ostest
pass and doesn't fix the other issues including the one in which file
I/O could be done with the old fsuid because an mcexec thread with an
arbitrary tid could handle the system-call offload request.
Explanation of the rejected fix:
setfsuid() proceeds as follows:
1. McKernel asks mcexec for __NR_setfsuid (set)
2. mcexec calls setfsuid, reports the id to McKernel
3. McKernel asks mcexec for __NR_setfsuid (get)
4. mcexec calls mcexec_getcred(), reports the id to Mckernel
5. McKernel sets proc->fsuid to the obtained value
tid of mcexec on the 2nd and 4th step could be different. So this
fix lets mcexec report its tid on the 2nd step and McKernel specify
it in the 3rd step.
Change-Id: Id5cfeed18c64430d576a56e961bbca1ecb2e39ad
Fujitsu: POSTK_DEBUG_TEMP_FIX_45
The original fujitsu code added a whole new ihk_mc_perfctr_stop_first
function, duplicating a lot of code - add a flag to existing function
instead.
Change-Id: Ic9ce0236d68f967ff72cf88e5d9f1bda5c98aa1b
Fujitsu: POSTK_DEBUG_ARCH_DEP_107
finalize_process().
The process of making a child process zombie and the process of setting
the parent of the child process to process ID 1 are excluded.
Refs: #1257
Change-Id: Ic95d4d8ee92d6a4a63847e5eda20ec1ba92566ac
Fix that process will remain even if signal is received between PPD
registration and release_handler registration.
Refs: #1201
Fujitsu: POSTK_DEBUG_TEMP_FIX_64
Change-Id: I571781963578df8cedb327f19298f595cfb137a3
Since interrupts are disabled on panic, linux cannot reset a
panic'd core when NMI are disabled (for e.g. mcreboot/mcstop)
Just always offline it, so linux can get it back
Change-Id: If8107172375f2924e02bd4c36e24645ec38a8999
We need to separate the two because the heap of a PIE is created in
the area to which it is mapped.
Related commits:
b1309a5d: PIE is mapped at map_end instead of at
user_start
c4219655: Interpreter is mapped to map_start to make a
system call that dereferences a NULL pointer fail
[dominique.martinet@cea.fr: Also add ULONG_MAX and friend macroes,
used for data_min]
[ken.sato.ty@hitachi-solutions.com: fix execve]
Change-Id: I8ecaf22b7965090ab67bebece57c68283ba23664
Add the following patterns of symlinks:
- /sys/bus/cpu/drivers/processor/cpu*
- /sys/bus/node/devices/node*
And slightly change how /sys/devices/system/cpu/cpu*/node* are created
to avoid duplicate lookups
Change-Id: Id94a4d157da06d75f6bd450d5bd9a9e7709a1414
Submodule check used to match any file containing submodule name (e.g.
lib/include/ihk/foo would match ihk and incorrectly be identified as
a submodule change) -- properly check for full name with anchors instead
Change-Id: Ib4330aec97e9da713cd3ab9e791962f2e0c8d396
mcoverlayfs has a high maintenance burden and does not work on rhel8's 4.18
kernel (while it works on vanilla 4.18...); instead of debugging this further
time is better spent making it independent from overlayfs.
Change-Id: I7454ae95b0fbb3373c256aa2fd83cdfec466c009
* Merge cd7ab307fae9bc8aa49d23b32becf37368a1603e
* Merge commit is changed to one commit for gerrit
Change-Id: I75f0f4cf6b8b3286284638ac2c7816c5257551e4
Its call site is moved before numa_init() as well because
monitor_init() defines ihk_os_monitor that was used in
rusage_total_memory_add() called from numa_init().
I didn't revert this modification because I don't want to touch the
working code.
Change-Id: I602467284581ce45989dd071cfe59d3fc4827e29
Fujitsu: POSTK_DEBUG_TEMP_FIX_73
Apparently /proc needs it; it's normally implemented using get_link if
readlink isn't implemented but proc's get_link crashes the kernel in
this case (because nameidata is only defined for open* paths)
Change-Id: I1864d6c948db879d33ea29b1b281bf84ff8eeec6
zap_vma_ptes no longer returns an error code as of Linux's
27d036e33237e4 ("mm: Remove return value of zap_vma_ptes()"),
where they decided nobody is interested in it....
Just copy the check out of the function.
Change-Id: I2eda0f91ec55a34bba96f45cc3d887bc80132a82
Originally-by: Kagawa Kodai <fj1731iw@aa.jp.fujitsu.com>
Separate copyright bumps in a different commit.
A lot of files only had the copyright change at this point; these
were probably changes I added separatly in other patches but just
split these in a different commit instead to simplify git stats
Change-Id: I93cf3fc1c0fa04ee743a79c3fe9768933e6bd0d2
arch perf counters are placed at start, so offset all
other counters (because placing arch perf counters at the end
wouldn't have been intrusive enough?)
Change-Id: Ifab1047872384927d9cfa0a0212327ee73545c29
Fujitsu: POSTK_DEBUG_ARCH_DEP_86
some archs do not have the simple open/unlink variants, while the *at
is always available -- this is simpler than making these arch-dependent
functions
Change-Id: Ic16ae5683e6e375210b1744538d291585e67a2fa
Fujitsu: POSTK_DEBUG_ARCH_DEP_78
at least funlinkat is needed because these macros define __NR_x for mckernel
side and we will use funlinkat in a later commit
Change-Id: I6b6a2eee11e2fa1e42f97eab4b67e1128cd83ddf
A later version would probably want to check some mask for arm64...
Change-Id: I67e13a852c3ed406fbf8ae1688539b9e069c0e81
Fujitsu: POSTK_DEBUG_ARCH_DEP_87
Check we mapped the correct region with a magic header in the struct
Original commit: d246b93a3bced92d0ac2a4a337118091b010658a
Fujitsu: POSTK_DEBUG_TEMP_FIX_76
Change-Id: If848be64af5d76844ba65b48493021637c8114f4
The following test set:
execve: fix memory leak
add: NULL check for master_channel at IKC interrupt_handler.
Fix the check routine for elf sections (Fujitsu: POSTK_TEMP_FIX_77)
Change-Id: I16c2a341c48f6df10a4839be08b93ea16bda8fbe
Refs: #727
Refs: #873
Refs: #1011
The following test set:
fix: Bug for getrusage return incorrect ru_maxrss
fix: Bug for getrusage(RUSAGE_CHILDREN) return parent info (POSTK_DEBUG_TEMP_FIX_72)
fix: Bug for getrusage often return incorrect ru_stime
Change-Id: I6734b1e34565d5d2715f9901a04ba5b6f0278032
Refs: #1032
Refs: #1033
Refs: #1034
init_linux_kernel_mapping is called in setup_x86_phase1 way
before arguments are setup, but we can access kernel boot args
directly and use that, so ugly fix for now.
Change-Id: I285ecc31c6646d6d18566d411b09ae3190e8101e
Refs: #1228
1. Catch up with the interface change in
ihk_os_destroy_pseudofs() and ihk_os_create_pseudofs()
2. Expect ihk_os_shutdown() to return zero when the OS had been shut
down
Refs: #898
Refs: #928
Change-Id: Ic430550ebfd5cd21164eefaed155fe769adf8395
The previous commit made BUILDID use git for submodule, but for complex
git setup (e.g. worktree) and older git version or dead .git 'link' it
would blindly rely on the existence of the .git file even if git does not
actually find anything.
This would lead to possibly empty BUILDID which would fail building.
Just always run the git command, and echo the version string if it failed
Change-Id: Ied268d2150a30dc1146498e15fa8394afc8a8d0d
While we are here:
- fix uname -r (single quote?!)
- add compat for rhel8 (el kernel and version is 4.18)
- also remove linux version check in mcreboot.sh, trust configure check
Change-Id: I14726d4374b0dfd941640096044ea1d5d88bfcb8
ihk_ikc_release_packet takes the channel and puts the packet into its
free-list. This fix makes it easy and safe to identify the proper
channel.
Change-Id: I5584b1e8a3ed675c2f9d68f0b5ed331b909197f6
Fujitsu: POSTK_DEBUG_TEMP_FIX_89
Fixed the problem of "return error/goto out" while
locking the memory_range_lock in mbind().
Change-Id: I980a7a440f652b60379acae3cb3575211a749774
Fujitsu: POSTK_DEBUG_TEMP_FIX_100
Fix a problem that does not result in an error even
if MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES are
simultaneously specified in set_mempolicy() mode.
Change-Id: I06e695baf869daee8bc64179748cac27b64e914b
Fujitsu: POSTK_DEBUG_TEMP_FIX_99
Check interrupt enabled state in set_cputime() instead of enabling
them unconditionally on exit.
Change-Id: I99212855f33f5535f67f045665bf5e025c55b690
Fujitsu: POSTK_DEBUG_TEMP_FIX_98
Fixed an issue where errors generated in arch_cpu_read_write_register()
are not transmitted to the caller.
Change-Id: I05d7d872eab834918220cf18f628aee37208a156
Fujitsu: POSTK_DEBUG_TEMP_FIX_94
Since 4.17.0, kernel cannot call syscalls directly because the calling
convention can be different on x86_64, as explained in this email:
https://lore.kernel.org/lkml/20180325162527.GA17492@light.dominikbrodowski.net
Use the ksys_* alternatives instead when possible, or for readlink use
do_readlinkat (and use readlinkat all the time to simplify ifdefs)
It might be possible to change some of these without ifdefs, but for
example ksys_unshare only got introduced in 4.17 so we need to keep some
syscall calling...
Change-Id: Ic47e184b29ef8b21731b2eae6193b0af2548b872
In arm64, glibc-open of /dev/xpmem is hooked in sys_openat. This
commit adds xpmem_openat which is called by sys_openat.
This commit silently applies copy_from_user fix to sys_open as well.
Change-Id: I3b4f7bf0e152c359250bb2b56910db9192390cb1
Fujitsu: POSTK_DEBUG_ARCH_DEP_46, POSTK_DEBUG_ARCH_DEP_62
Since McKernel allocates hugepages by default, we could consider that
madvise call with MADV_HUGEPAGE is supported.
Change-Id: Ibdaa6f77416d029a1d17210773ef79539ba04b1c
envs are stuck after args which are now possibly unaligned, and used
from a non-aligned pointer in prepare_process_ranges_args_envs (env)
The memory immediately after args/envs is copied anyway with memcpy_long,
so make sure the bits are initialized and realign env correctly
Fixes: 70e52faf36 ("flatten_strings: do not return unused trailing bits")
Change-Id: Ic747e947d151c0eea65dec36bc9c888cf6e0c394
Add "-T 0" to mcreboot.sh if you want to turn off time sharing. When
it's turned off, McKernel doesn't activate interval timer when the
length of per-CPU run-queue is larger than one.
Change-Id: I2cedc1b30a9cd9a0f4608a32ecec0a0d58c6225e
it was meant for 3.10 kernels, so the regular < 4.0.0 check
will work for el7 and older kernels as well
Change-Id: I807f030f6303c9c3d17b0d80de55c256a3479486
the libc takes care of trying execve as many times as needed for
execvp, it's not a kernel call.
Also, sneak a double-free fix (desc was not reset properly in case
load_elf_desc_shebang failed)
Fixes: b1681f4a3affff ("mcexec/execve: fix shebangs handling")
Change-Id: If8e3d7ae53acdeffc0331ae8621e0832fcfa406f
Running "mcexec dfsafds" did not print any message in normal use.
Rather than looking for which message shows in debug and turn in into
eprintf, add a single coherent message (more shell-like) at the end and
turn other messages off.
There is a small loss of information but this is equivalent to what
shells give (a single errno value with no details), and it is now easy
to add --debug to mcexec to see more information if required
Change-Id: Id2c3a47880b7d1d7467883351e6e7af561f91bbf
rhel8 is a 4.18 kernel but they've already backported some later fixes.
Instead of relying on the kernel version, the changes removed some defines so
we can check for the define presence to make the code more robust to kernel
version wilderness instead
Change-Id: I6cf5548a7b73a7394405daf850f715a1e20ab0b4
This newer version is much simpler than the old ones:
- the options are noop, this lets the code simplify all the allocating
of a new option struct and passing it around
- ovl_reset_ovl_entry was added and called all the time, but the
mechanism that made this required is gone in this kernel version
On the other hand, one new thing in this version:
- newer kernel check the stacking depth of filesystems now, and we are
reaching the default limit of two with our setup. Bump it to three here.
Also, while we are here, make make fail if requested directory does not
exist, instead of infinitely recurse into make modules in the mcoverlayfs
directory...
Change-Id: I45050d693a0aa6fd3027deaf417c29876ef6a1ea
newer version of this function no longer return an error on the basis
that "no-one checks what it returns anyway"........
See linux 4.18's 27d036e33237e ("mm: Remove return value of zap_vma_ptes()")
Change-Id: I8fb9f060e3e145cc2db21738585c9ee7f1445f74
strncat must not look at the appendee's length, but at how much
is left where we're appending.
This API is stupid anyway, where is strlcat when we need it...
Change-Id: Icdf418083146420a06f8ba5ffdf882982610d39b
newer linux got a 5 level page table now, try to handle that.
Some of the macros will be no-op (e.g. loop only on one iteration) on
architecture/kernels with only 4 levels but the code needs to be there
to compile
Change-Id: Ifc6304cbb066dce7d4e30962687ae05d7e034730
The headers defines __task_cred and other macroes we use, and always
existed; we must have gotten it indirectly on older kernels, it doesn't
hurt to always include
Change-Id: Iacfff0365e7a21e6247eea42606bbbf1dfccc077
on newer x64 kernels (config option?), syscalls can be renamed to allow
both x64 and ia32 versions to coexist. Lookup either names
Change-Id: I2f55cc804d3eee948ee1ed6d18c69c75bd2f652c
The symbol appears in some header in some linux version,
it's still not exported so we need our own lookup anyway; just rename it.
Change-Id: Ia4bce85988641c96fa3f5a0ae1d42c25c713b6c2
In addition to that, mcctrl_perf_set is modified so that it updates
usrdata->perf_event_num with number of registered events.
Change-Id: I3f343176f55b06d3baab0b0fe34e240f39706cf6
Fujitsu: POSTK_DEBUG_TEMP_FIX_80
Trailing bits were displayed in proc->saved_cmdline, displaying
uninitialized data to the user in /proc/<pid>/cmdline
Change-Id: I74831c8c68dd2f2197b35e9b49aaaae29c4c1dd5
This would incorrectly make "mcexec sh -c './script.sh'" run with
/bin/bash instead of /bin/sh (which is important, because bash behaviour
changes depending on how it is invoked)
Change-Id: I80610cf442c6c3ecacfa23e8ed15652bc8d4e3f7
This reverts commit b70d470e20.
That commit had been landed too fast after a mistake during migration
from old to new gerrit that didn't keep -1 vote ; it needs some fix
Change-Id: Ifc8a23e42449dfe471049270b4706e9b137e096e
the 'num_processors' symbol is also used by linux, so trying to load all
symbols from linux and mckernel at the same time renders either symbol
inaccessible (the first to be seen is kept by default).
This provides an alternate name for the mckernel symbol, thus letting us
access both more easily if required.
Change-Id: I8074d4f9f9ac45717df9a8df16be710ff762e161
these functions are more logical to keep together there as they depend
on each other.
Also add a comment about the __printf attribute, if we have a quiet
period it would be useful to enable and clear the thousands of
warnings...
Change-Id: I47d3891c9cd87da28b2883c29384959f5abd1459
Hugetlbfs file mappings are handled differently than regular files:
- pager_req_create will tell us the file is in a hugetlbfs
- allocate memory upfront, we need to fail if not enough memory
- the memory needs to be given again if another process maps the same
file
This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.
To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align
Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2
Change-Id: I327920ff06efd82e91b319b27319f41912169af1
These macros are needed to make sure the compiler does not optimize away
atomic constructs such as "while (!READ_ONCE(foo))" loops that do not
modify foo within the loop
Also move the barrier() define where it belongs while we are here, it is
needed for READ_ONCE/WRITE_ONCE and including ihk/cpu.h here causes
include loops
Change-Id: Ia533a849ed674719ccbc0495be47d22a3c47b8f8
- remove unused MF_END (that only makes sense for enums without holes,
this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
(except for debug print)
Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592
There were mainly two problems with shebangs:
- Suffix arguments handling e.g. '#!/bin/sh -x'
- Recursive handling e.g. script1 fetchs '#!/path/to/script2'
and script2 itself has a shebang
- (did I say two?) running shebang would replace argv[optind] instead
of appending e.g. script with '#!/bin/sh' and running './script -c'
would run '/bin/sh -c' instead of '/bin/sh ./script -c'
There also are two places where this needs parsing:
- starting a fresh program from mcexec
- starting a new program from execve in mcexec
The first was easy to fix as we already had argv around, but the later
required a new way to transfer the 'new argv elements from the script'
to mckernel to append before its argv -- it used to be 'desc->shell_path'
but that was no longer used at some point and just one keyword is not
enough to handle this properly.
This commit does:
- Refactors the lookup_path + load_elf_desc that was only done at most
twice in its own function that loops indefinitely and use that in both
situations described above
- Transmits the argv addition in the transfer to mckernel after the
desc; mckernel allocates 4 pages (hardcoded) for the descs and we will
hopefully have room for the script arguments on top of that... (there is
no guard!!!)
- Change flatten_strings to allow prepending a flattened string instead
of a single string.
Note that the flatten_string change also brought in a difference in the
format, to have the full length embedded within the string, the latest
slot that used to be zeroes now contains the position of the end of the
buffer (where the last+1 string would be if there had been one)
This required a trivial change in mckernel prepare args function that
used this property for no real reason.
Hopefully things work™, this probably warrants adding a couple of new
ostests...
- create a couple of scripts with recursive invocation/arguments and
check their own argv.
- execute "mcexec script args" and "mcexec sh -c 'script args'"
Change-Id: I2cf9cde5c07c9293f730de89c9731bd93dbfa789
Refs: #1115
Do not return from fork() until mcctrl side has created mckernel's
procfs entries for the child PID.
This fixes programs doing fork() immediately followed by opening
/proc/<child pid>/something, and would get some error
Refs: #1189
Change-Id: Ie10ea56b65c55f59e96a1ab6ef83a1070e36048d
For get_user_pages_remote in binfmt_mcexec.c:
In 4.10 with 5b56d49fc31d ("mm: add locked parameter to
get_user_pages_remote()")
In 4.9 with 9beae1ea8930 ("mm: replace get_user_pages_remote()
write/force parameters with gup_flags")
For vmf in syscall.c, these two patches in 4.10:
82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
1a29d85eb0f1 ("mm: use vmf->address instead of
vmf->virtual_address")
Fujitsu: POSTK_DEBUG_ARCH_DEP_41
Change-Id: I89a02d03169a2162ea186da1804bf48910446d11
user_space/swapout/swapout_copy_to_01.sh:
* Use ~/.mck_test_config
* Fix checking if McKernel version is written in swap-file
user_space/futex/futex_test.sh:
* Use ~/.mck_test_config
user_space/perf_event_open/perf_event_open_test.sh
* Use ~/.mck_test_config
Change-Id: Id93b207ed0e3e9ebf307073db81b40335bc5b140
Add new file with common functions for tests to use.
- loads config file
- checks for mcexec etc
- checks for LTP and OSTEST if required
- handle mcstop / mcreboot if required, and provide function for it
At the same time, make a few changes to mck_test_config:
- move to ~/.mck_test_config
- add boot params to the config, tests the require specific params can
overwite it
- make the config "set-if-variable-is-empty", so someone can overwrite
any param by setting the environment value e.g. LTP=.... ./test.sh
will use the value given
Change-Id: Ib04112043e3eb89615dc7afaa8842a98571fab93
This includes the following fix:
send_syscall, do_syscall: remove argument pid
Fujitsu: POSTK_TEMP_FIX_26
Refs: #1165
Change-Id: I702362c07a28f507a5e43dd751949aefa24bc8c0
We had a deadlock between:
- free_process_memory_range (take lock) -> ihk_mc_pt_free_range ->
... -> remote_flush_tlb_array_cpumask -> "/* Wait for all cores */"
and
- obj_list_lookup() under fileobj_list_lock that disabled irqs
and thus never ack'd the remote flush
The rework is quite big but removes the need for the big lock,
although devobj and shmobj needed a new smaller lock to be
introduced - the new locks are used much more locally and
should not cause problems.
On the bright side, refcounting being moved to memobj level means
we could remove refcounting implemented separately in all object
types and simplifies code a bit.
Change-Id: I6bc8438a98b1d8edddc91c4ac33c11b88e097ebb
This optimization make the offloading thread quickly yield to
another thread. Without this, it yileded only after the interval timer
set the rescheduling flag.
Change-Id: Ida3b17ed94782d5d1af0185a96b1f50d9db8d244
Defining C structures for the following objects:
(1) Remote and local context
(2) Stack of system call arguments / return values
Change-Id: Iafbb6c795bd765e3c78c54a255d8a1e4d4536288
(1) Add --enable-uti option. The binary-patch library is
preloaded with this option.
(2) Binary-patching is done by syscall_intercept developed by Intel
This commit includes the following fixes:
(1) Fix do_exit() and terminate() handling
(2) Fix timing of killing mcexec threads when McKernel thread calls terminate()
Change-Id: Iad885e1e5540ed79f0808debd372463e3b8fecea
Set PROT_EXEC to host VMA because uti needs PROT_EXEC for text VMAs.
Meanings of prot bits of Host VMA has been changed as follows.
RWX: No mapping or RW mapping
RX: Read only mapping
This is because this is a normal case since terminate() is changed so
that it first kills all mcexec threads and then kill McKernel threads.
Change-Id: I88380bf28b60645d361baded525d71105235c16f
It is accompanied by the following fixes:
(1) Fix put ppd locations in mcexec_wait_syscall()
(2) Move put ptd to end of mcexec_terminate_thread_unsafe() and mcexec_ret_syscall()
(3) Add debug messages for ptd add/get/put
(4) Fix ptd-add/get/put matching in mcexec_wait_syscall()
* Skip put when woken-up from wait_event_interruptible() by signal
Change-Id: Ib9be3f5e62a7a370197cb36c9fa7c4d79f44c314
(1) Masquerade clv
(2) Fix timeout
(3) Let mcexec thread with the same tid as McKernel thread migrating
to Linux handles the migration request
(4) Call create_tracer() before creating proxy related objects
Change-Id: I6b2689b70db49827f10aa7d5a4c581aa81319b55
One CPU could be chosen by concurrent forks because CPU selection and
runq addition are not done atomicly. So this fix makes the two steps
atomic.
Change-Id: Ib6b75ad655789385d13207e0a47fa4717dec854a
Add check for start/end being larger than the range we're checking.
Fix corner case where the access_check() was done on last vm range, and
we would be looking beyond last element (null deref)
It's enabled by adding -s to mcreboot.sh.
Cherry-pick of the following commit:
commit b5c13ce51a5a4926c2cf11c817cd0d369ac4402d
Author: Katsuya Horigome <katsuya.horigome.rj@ps.hitachi-solutions.com>
Date: Mon Nov 20 09:40:41 2017 +0900
Include measures to prevent memory destruction on Linux side (This is rebase commit for merging to development+hfi)
(1) Cherry-pick of 644afd8b45fc253ad7b90849e99aae354bac5b17
(2) Pass length to functions with arguments of variable length
* POSTK_DEBUG_ARCH_DEP_38
(3) Separate architecture dependent functions/structures
* POSTK_DEBUG_ARCH_DEP_34
(4) Fix include path
* POSTK_DEBUG_ARCH_DEP_76
(5) Include config.h
* POSTK_DEBUG_ARCH_DEP_33
Device map with MAP_PRIVATE is copied when forking using copy_user_pte.
So the map isn't copied by those statements.
Futjitsu: POSTK_TEMP_FIX_14
Refs: #1039
Change-Id: I1a697ed2e003055d66a8eebd3e8d5e9e49d094ad
the pagers are all destroyed when linux thinks there is no process left,
but there is no synchronisation with mcexec on that and some new process
might have spawned and started using these pagers in the meantime,
leading to weird crashes because an invalid pager was used.
The reason we're cleaning up pagers when no process is left is that
mcctrl does not handle pager_req_release is the linux-side process got
killed or died before the mckernel one for some reason, so:
- move pager_req_release to a new __do_in_kernel_irq_syscall() helper
- have free_all_process_memory_range not set MF_HOST_RELEASED on the
memobj
- just in case, clean up everything like before on mcctrl shutdown
instead of when no process is left.
Change-Id: I53b8b9b81b1e5b807593850af17b5ea5e8471174
Refs: #1154
-ESRCH from mcctrl doesn't mean an error but the file is not a regular
file and mcctrl wants McKernel to treat it as a device file.
Change-Id: Ie121f0e6a8b1f0a29c2f2cf193a51f4f52337809
Report timeout when McKernel doesn't respond to prevent the caller
from waiting forever.
Refs: #1167
Change-Id: I8bd87e43aafffdd0952198224e44195af4368883
The size of tid table needs to be more than #CPUs when CPU oversubscription
is needed.
Note that the max number of simultaneous threads are the min of the
following two:
(1) Number of mcexec worker threads
(2) NR_TID defined in kernel/syscall.c
Change-Id: I425189da415e1d3a763ad62567950d001850cf0d
schedule_timeout() with idle_halt should use spin sleep because sleep
with timeout is not implemented.
Change-Id: Ia0bebcc10ddfb872bffeece7f13fb35a4791db18
vfs_read has been unexported in bd8df82be66 ("fs: unexport vfs_read and vfs_write")
in kernel 4.14.
kernel_read has always™ existed and is actually more appropriate: we can
remove the set_fs calls that are done in kernel_read.
The downside is that the function prototype also changed in 4.14 with
bdd1d2d3d251 ("fs: fix kernel_read prototype")...
(same with kernel_write e13ec939e96b ("fs: fix kernel_write prototype"))
Change-Id: I6f76a6387ae02b4d33bd62952d995a90b1952fc9
The functions now take a bitmask in argument since commit d7416c6f79
("perf_event: Specify counter by bit_mask on start/stop")...
Thanksfully the change also induced a type modification so it was easy
to notice.
(On the other hand I'm building with --disable-perf so why the hell is
that file compiled?!)
Change-Id: Ie16367cc94e81068b70e1b80142a6394de896c4f
struct sched_param is defined differently since headers changed in
linux ae7e81c07 ("sched/headers...")
Change-Id: I22af79bf3d9df69d09903b2830d99426309cf911
Some enums were redefined in lib/include/mc_perf_event.h in commit
1284060 ("support PERF_TYPE_{HARDWARE|HWCACHE} in perf_event_open")
Change-Id: I1a98699955ca7fd6135b2a7dde72ed4df77b1974
The function was between two perf functions when perf functions don't
use it...
It seemed simpler to move the function than to add an extra ifdef
Use that occasion to fix style warnings, no actual code changes were
made.
Change-Id: Ie8b5fa7968a3d5e54a690d079874db54f5e6c8c9
This function was added for x86 by commit 140f813d77 ("fix:
differences in behavior of sigaction between Linux and Mckernel")
The x86 and arm files are actually pretty close and could use
factoring...
Change-Id: Ia8820fd2f824d898610b384a3e137c96aadbc911
Instead of parsing System.map, use kallsyms_lookup_name() to
get unexported symbols addresses at module loading time.
This lets mckernel work with kaslr enabled (it gets enabled by
default from el7.5 onwards)
Change-Id: Ie4349fc1145ebce44f37f1f40c16f9d75584074d
Probably only needed for recent system, see ihk's 3271b5e6 ("fix
compilation with recent glibc (cpu_set define change)")
The root of the problem really is that we rely on system headers for
mckernel that ought to be independent...
Change-Id: Ieb9a017e5a7697ad767087370ced7b615efc917e
vma is part of vmf and isn't needed, so type changed (see linux 11bac80
("mm, fs: reduce fault, [...] to take only vmf"))
Change-Id: I4c023e23c7e7416ad2df2dcc0698a0032e574e4c
See linux's commit 0ee931c4 ("mm: treewide: remove GFP_TEMPORARY
allocation flag") for a long explanation, but basically that flag
"is just cargo cult" and should be removed
Change-Id: I2147cd65b6b9ec509a72e11cc3abf1fe1561c10b
Similarily, pgoff << PAGE_SHIFT would need pgoff to be unsigned to fit,
but off_t is signed.
The reason for this shift was to truncate the offset argument to be
aligned to page boundaries, do that instead
Change-Id: I36c3de34b1834fdb0503942a6f3212e94986effd
Heavily inspired off linux kernel's dynamic debug:
* add a /sys/kernel/debug/dynamic_debug/control file
(accessible from linux side in /sys/class/mcos/mcos0/sys/kernel/debug/dynamic_debug/control)
* read from file to list debug statements (currently limited to 4k in size)
* write to file with '[file foo ][func bar ][line [x][-[y]]] [+-]p' to change values
Side effects:
* reindented all linker scripts, there is a new __verbose section
* added string function strpbrk
Change-Id: I36d7707274dcc3ecaf200075a31a2f0f76021059
The standard UNIX tool to get processes information, need to have the
process id inside /proc/<PID>/status.
Using ps without PID in /proc/<PID>/status gives :
PID TTY TIME CMD
2551 pts/0 00:00:00 bash
0 pts/0 00:00:00 exe
0 pts/0 00:00:00 exe
With this patch:
PID TTY TIME CMD
2551 pts/0 00:00:00 bash
11966 pts/0 00:00:00 exe
12619 pts/0 00:00:00 exe
Change-Id: Ic9d255cbef4d49e49bdaedcfc8e3545d9c144325
Some are important, e.g. the seemingly harmless braces around if with dprintf,
since that dprintf is defined as empty, will screw things up and grab the next
line
Change-Id: Ie5e1cf813178ad708ff42ae5e477fbc96034471c
search_free_space changed since this was implemented and the code is
no longer compatible
Looking at it again, the function is not used anywhere other than syscall.c
and the second function does not seem to fix anything specific so this
just removes the untested side.
Change-Id: If28d35ec4da083a40dc6936fcb21f05fb64e378a
Fujitsu: POSTK_DEBUG_ARCH_DEP_27
do_frees is allowed to be NULL only if free_addrs_count is 0, but that
is increased to account for the wakeup_desc itself before this failure
Change-Id: Iab33712c76ae452df7044558a12745a89adb47ac
Surplus refs on the linux side will not change anything, so spare
ourselves a message.
The final message will free all refs at once when the object is
destroyed.
Change-Id: Ie086b9dda663729962037c67e8233370509234a5
init_normal_area was mapping identity lookups (phys = virt) from 0,
leading to many undetected null pointer dereferences in init_pt (but
not in new process page tables leading to odd behaviour)
This also makes the code use the set_pt_large_page() function, cleaning
it up a bit
Change-Id: I22889031de26a7e48501b0eb4d453ca62e671835
This helps catching errors like accessing a field that no longer exists
in a debug print that wasn't compiled...
Change-Id: If6c862ea2b866f819195aae93c7fd68e610fe48e
Was done in x86_64 for fileobj in commit 249bda4aef ("fileobj: use
MCS locks for per-file page hash")
Change-Id: I61957de336b6657687803e6288afed9360a42032
GCC optimizes big switches with sse so we could clobber users floating
point registers when they would do a syscall
Reproducer:
```
#include <stdio.h>
#include <stdlib.h>
union num {
float f;
unsigned long long i;
};
#define WORKSIZE (1024 * 1024 * 32)
int main(int argc, char **argv) {
char *work = malloc(WORKSIZE);
char *fromaddr;
char sink;
union num r;
unsigned long long int offset;
r.f = drand48();
printf("r: %llx\n", (long long)r.i);
offset = (long long int)(r.f * (double)WORKSIZE);
fromaddr = work + offset;
printf("%e %llx %llx\n", r.f, offset, fromaddr);
sink = *fromaddr;
return 0;
}
```
Change-Id: I7bb0883ec8ef2f245ab98064e308025422afc115
Behave in the same way as Linux which returns old_address when
old_size == new_size && !MREMAP_FIXED.
Refs: #1112
Change-Id: Ice1421a8a77f962d087de8475aa2cd40c59be5f7
(1) Check if rlim's address is valid
(2) Check if soft-limit does not exceed hard-limit
Fujitsu: POSTK_DEBUG_TEMP_FIX_3
Refs: #1050
Change-Id: I5bf1008ce172f9dff64ec89b1f97614926abaf13
(1) Check if size is large enough
(2) Check if size is positive
Fujitsu: POSTK_DEBUG_TEMP_FIX_5
Refs: #1121
Change-Id: I3e41720c89ef89294820f7f4fa8df1a69a7011b0
While we are here, also optimize code a bit: perf_desc does not need
to be allocated for every cpu; and fix coding style.
Change-Id: Iad19fed08205d38594fd3f1b7ddf2b19a9cf0d9d
Many ikc messages expecting a reply use wait_event_interruptible
incorrectly, freeing memory that could still be used on the other side.
This commit implements a generic ikc send and wait helper that helps
with memory management and ownership properly:
- if the message succeeds and a reply comes back normally, the memory
is freed by the caller as usual
- if the wait fails (signal before the reply comes or timeout) then the
memory is set as owner by ikc and will be free when the reply comes back
later
- if the reply never comes, the memory is freed at shutdown when
destroying ikc channels
Refs: #1076
Change-Id: I7f348d9029a6ad56ba9a50c836105ec39fa14943
This fixes crashes _without_ oversubscribing with a process doing
fork() execve() / wait() in a loop
Issue: #1132
Change-Id: I98531f4643ad6b6a8f750a1a3f05b9ff3ebfd50f
This is a response to a CEA's request.
- The tools directory is created under the mckernel directory.
- Some include files are now installed in the install directory,
but we should rethink of it.
The gencore() and freecore() code in gencore.c is guarded by
POSTK_DEBUG_ARCH_DEP_18, so the call to these functions should
also be guarded, otherwise linking fails.
Previously the allocator would return all availble memory for a
request of 0 pages. This is rather counter-intuitive and left no
memory for subsequent allocations.
This is for linking only. visit_pte_range_safe() is required only
for memdump, as far as i can tell. Since memdump is disabled anyway
I think it's ok to leave this function empty for now.
This lets one specify arbitrary kernel parameters, instead of manually
fiddling with the script.
Could ultimately replace params like -t (turbo) and -d (dump_level) that
do not have any side effect (logmode starts a userland daemon)
There is no good reason to map these low addresses (userspace could with
mmap fixed, but that is grounds for many exploits...);
the main advantage however is if we do a null deref or close to (0->foo)
within a pagefault we will get a panic stack instead of getting a hang
because we cannot get some locks.
- make should be $(MAKE)
- add + in front of rules spawning long-lasted make process in a
subshell. (This would not be needed with $(MAKE) -C .. target, but our
makefiles do not handle that because they use $(PWD))
- split the main 'all' rule as all 4 targets are independant
- fix dependencies where appropriate for parallelism
Extra, not speed-related changes:
- remove some double-colon for targets as they do not need it
This cuts build time from 5s to 1.5s on a laptop with -j4, and more
importantly from 85s to 35s on a KNL node.
As a bonus, the fixed dependencies removes the need to clean before
rebuilding all the time. Probably.
McKernel would terminate() running program on terminal resizing
It actually looks like there is nothing for us to do when we
get that anyway (tested with `dialog`)
- myfree in pager.c was called with an argument, so add one to the
dummy definition
- pgoff is offset_t (unsigned) and doesn't need to be compared to 0
- clang says '*(int *)0 = 0' will be optimized away instead of keeping
the segfault without a volatile hint (?! that is wrong!), but it causes
no harm to add anyway.
This replaces the chained list used to keep track of all memory ranges
of a process by a standard rbtree (no need of interval tree here
because there is no overlap)
Accesses that were done directly through vm_range_list before were
replaced by lookup_process_memory_range, even full list scan (e.g.
coredump).
The full scans will thus be less efficient because calls to rb_next()
will not be inlined, but these are rarer calls that can probably afford
this compared to code simplicity.
The only reference to the actual backing structure left outside of
process.c is a call to rb_erase in xpmem_free_process_memory_range.
v2: fix lookup_process_memory_range with small start address
v3: make vm_range_insert error out properly
Panic does not lead to easy debug, all error paths
are handled to just return someting on error
v4: fix lookup_process_memory_range (again)
That optimistically going left was a more serious bug than just
last iteration, we could just pass by a match and continue down
the tree if the match was not a leaf.
v5: some users actually needed leftmost match, so restore behavior
without the breakage (hopefully)
<premap_size> of stack is pre-mapped on creating a process.
And its max size of stack is set to <max>.
This replaces MCKERNEL_RLIMIT_STACK=<premap_size>,<max>.
# Overlay /proc, /sys with McKernel specific contents
#
# Revert any state that has been initialized before the error occured.
#
if[ -z "$(declare -f error_exit)"];then
error_exit(){
localstatus=$1
case$status in
mcos_sys_mounted)
if["$enable_mcoverlay"=="yes"];then
umount /tmp/mcos/mcos0_sys
fi
;&
mcos_proc_mounted)
if["$enable_mcoverlay"=="yes"];then
umount /tmp/mcos/mcos0_proc
fi
;&
mcoverlayfs_loaded)
if["$enable_mcoverlay"=="yes"];then
rmmod mcoverlay 2>/dev/null
fi
;&
linux_proc_bind_mounted)
if["$enable_mcoverlay"=="yes"];then
umount /tmp/mcos/linux_proc
fi
;&
tmp_mcos_mounted)
if["$enable_mcoverlay"=="yes"];then
umount /tmp/mcos
fi
;&
tmp_mcos_created)
if["$enable_mcoverlay"=="yes"];then
rm -rf /tmp/mcos
fi
;&
initial)
# Nothing more to revert
;;
esac
exit1
}
fi
if[ ! -e /tmp/mcos ];then
mkdir -p /tmp/mcos;
fi
if ! mount -t tmpfs tmpfs /tmp/mcos;then
echo"error: mount /tmp/mcos" >&2
error_exit "tmp_mcos_created"
fi
if[ ! -e /tmp/mcos/linux_proc ];then
mkdir -p /tmp/mcos/linux_proc;
fi
if ! mount --bind /proc /tmp/mcos/linux_proc;then
echo"error: mount /tmp/mcos/linux_proc" >&2
error_exit "tmp_mcos_mounted"
fi
if ! taskset -c 0 insmod @KMODDIR@/mcoverlay.ko 2>/dev/null;then
echo"error: inserting mcoverlay.ko" >&2
error_exit "linux_proc_bind_mounted"
fi
while[ ! -e /proc/mcos0 ]
do
sleep 0.1
done
if[ ! -e /tmp/mcos/mcos0_proc ];then
mkdir -p /tmp/mcos/mcos0_proc;
fi
if[ ! -e /tmp/mcos/mcos0_proc_upper ];then
mkdir -p /tmp/mcos/mcos0_proc_upper;
fi
if[ ! -e /tmp/mcos/mcos0_proc_work ];then
mkdir -p /tmp/mcos/mcos0_proc_work;
fi
if ! mount -t mcoverlay mcoverlay -o lowerdir=/proc/mcos0:/proc,upperdir=/tmp/mcos/mcos0_proc_upper,workdir=/tmp/mcos/mcos0_proc_work,nocopyupw,nofscheck /tmp/mcos/mcos0_proc;then
echo"error: mounting /tmp/mcos/mcos0_proc" >&2
error_exit "mcoverlayfs_loaded"
fi
# TODO: How de we revert this in case of failure??
if ! mount -t mcoverlay mcoverlay -o lowerdir=/sys/devices/virtual/mcos/mcos0/sys:/sys,upperdir=/tmp/mcos/mcos0_sys_upper,workdir=/tmp/mcos/mcos0_sys_work,nocopyupw,nofscheck /tmp/mcos/mcos0_sys;then
echo"error: mount /tmp/mcos/mcos0_sys" >&2
error_exit "mcos_proc_mounted"
fi
# TODO: How de we revert this in case of failure??
mount --make-rprivate /sys
touch /tmp/mcos/mcos0_proc/mckernel
rm -rf /tmp/mcos/mcos0_sys/setup_complete
# Hide NUMA related files which are outside the LWK partition
for cpuid in `find /sys/devices/system/cpu/* -maxdepth 0 -name "cpu[0123456789]*" -printf "%f "`;do
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.