Commit Graph

107 Commits

Author SHA1 Message Date
c78803ac08 madvise: Support MADV_REMOVE on tmpfs
Change-Id: Ic99d374c4d2630944c7bc838937d7f45601783c6
refs: #1371
2020-03-23 13:06:26 +09:00
569dc33a9c mmap: fail and set -ENODEV when map to unmappable special file
mappable special files are /dev/mem and /dev/zero

Change-Id: Id1d4317104f901644e565007913e320d287e376f
2019-12-05 07:22:17 +00:00
18412616e1 munmap: Change permission of VMA back to RWX on unmap
Change-Id: Ic02098e7458dd8fa2961fb03dc32e37fb18c5dc5
Refs: #988
2019-09-26 03:49:50 +00:00
c371fbf13b file map: cause SIGBUS when access to a page beyond EOF
Change-Id: Iaf7d792413e674267fd1c05c382212c8f67d8f5b
Refs: #1291
2019-09-26 03:41:23 +00:00
beac6c3e80 make checking write-combine arch-dependent
Change-Id: I4c0fca7d34e69b4774141e115b8ebc03c5c1e8b3
Fujitsu: POSTK_DEBUG_ARCH_DEP_12
Refs: #1355
2019-09-23 16:42:26 +09:00
11ef2f8092 coredump: Support threads
Change-Id: Id75ade6c87b15abcff5d772d90f77950376a32c1
Refs: #1219
2019-08-09 04:00:15 +00:00
207eba93ea uti: syscall_backward: Use kmalloc area to pass syscall arguments
Change-Id: I478a9b40b75f3d1d68c4446810a6236fe2f3a96c
Fujitsu: POSTK_DEBUG_ARCH_DEP_106
Refs: #1320
2019-07-22 03:52:44 +00:00
80f964e44f rus_vm_fault(): cleanup and early exit on NULL access
Change-Id: I90b18988989d4e377ed9c35df6b2e6bcdddd13b6
2019-04-23 08:53:59 +00:00
cc07d6e017 mcctrl_get_per_thread_data: Un-inline
Change-Id: I881db244ca551b3ca232918cb0b4245776f17295
Fujitsu: POSTK_DEBUG_ARCH_DEP_56
2019-04-18 02:35:52 +00:00
a5d5baf8a8 rus_vm_fault: always use a packet on the stack
There are valid use cases where a remote page fault has no available
thread data/packet available to use, e.g. when device driver threads
need to access the data (BXI).

Do the per thread data lookup to use the right channel/tid if available,
and use mcctrl_ikc_send_wait with a new message number directly.

The fault is no longer handled in mckernel syscall forwarding code but
in the ikc handler directly in irq, this should be ok because page
faults are interrupts anyway so the code should be irq-safe.

Change-Id: Ie60f413cdaee6c1a824b4a2c93637899cb9bf9c9
2019-03-01 05:08:03 +00:00
2a63c962fc build system switch to cmake
Remove old build system at the same time

Change-Id: Ifdffe1fcd4cfece05f036d8de6e7cb74aca65f62
2019-02-14 16:44:09 +09:00
366e95856c Null-check ihk_os_t and mcctrl_usrdata pointers
Change-Id: I941c58d4ab6a0c1ce6bd53c24b552218a1716750
Refs: #1216
2019-02-14 16:26:19 +09:00
9cfc373538 Refactor "do write back only MAP_SHARED pages"
* free_process_memory_range() always passes memobj to
  ihk_mc_pt_free_range()
* clear_range_*() don't flush page in fileobj with MF_PRIVATE flag

Fujitsu: POSTK_DEBUG_TEMP_FIX_87
Change-Id: I8d46d029b3fc51ca6f0e59d748a2fe93e324a374
2019-02-14 16:25:58 +09:00
207d653b41 mcctrl: use vmf_insert_pfn for kernel >= 4.18
vmf_insert_pfn got added as a wrapper around vm_insert_pfn in 4.17
1c8f422059ae5da ("mm: change return type to vm_fault_t") and totally
replaced the later in 4.20 ae2b01f37044c ("mm: remove vm_insert_pfn()")

Compare with 4.18 here specifically to avoid troubles when rhel
backports this change later, and avoid adding a rhel version check down
the road.

Change-Id: Ibf108e2fb6f1199f89cde6a7973f4eb55447260b
2019-02-14 16:25:49 +09:00
97e0219f50 Make Linux handler run when mmap to procfs.
Change-Id: I98a3d098c5c676f33c83fa4354c623988ee591f2
Refs: #1222
2019-02-06 11:54:50 +00:00
452d93f14d mcctrl_clear_pte_range: fix zap_page for kernel >= 4.18
zap_vma_ptes no longer returns an error code as of Linux's
27d036e33237e4 ("mm: Remove return value of zap_vma_ptes()"),
where they decided nobody is interested in it....

Just copy the check out of the function.

Change-Id: I2eda0f91ec55a34bba96f45cc3d887bc80132a82
Originally-by: Kagawa Kodai <fj1731iw@aa.jp.fujitsu.com>
2019-02-01 13:18:58 +09:00
516ab87ab9 Copyrights: fujitsu 2018 bump
Separate copyright bumps in a different commit.
A lot of files only had the copyright change at this point; these
were probably changes I added separatly in other patches but just
split these in a different commit instead to simplify git stats

Change-Id: I93cf3fc1c0fa04ee743a79c3fe9768933e6bd0d2
2019-02-01 13:18:52 +09:00
0e895478a1 mcctrl rus_mmap: make vma->vm_flags arch-dependent
[Dominique: renamed arch_vm_flags to arch_rus_vm_flags]
Change-Id: I5ec89b3ff80af6bf0ede342eb5816df8c78de348
Fujitsu: POSTK_DEBUG_ARCH_DEP_100
2019-02-01 13:18:07 +09:00
19659aa908 mcctrl: move translate_rva_to_rpa to archdep
Change-Id: I0efa51468a7ff4d776d8340a612e6f44eac2ed53
Fujitsu: POSTK_DEBUG_ARCH_DEP_83
2019-02-01 13:18:06 +09:00
ae9a1f39df ihk_ikc_recv: Record channel to packet for release
ihk_ikc_release_packet takes the channel and puts the packet into its
free-list.  This fix makes it easy and safe to identify the proper
channel.

Change-Id: I5584b1e8a3ed675c2f9d68f0b5ed331b909197f6
Fujitsu: POSTK_DEBUG_TEMP_FIX_89
2018-11-21 17:01:58 +09:00
6581f9b4b2 mcctrl syscall: compat for newer zap_vma_ptes
newer version of this function no longer return an error on the basis
that "no-one checks what it returns anyway"........

See linux 4.18's 27d036e33237e ("mm: Remove return value of zap_vma_ptes()")

Change-Id: I8fb9f060e3e145cc2db21738585c9ee7f1445f74
2018-11-21 16:06:31 +09:00
03802052ed mcctrl: add handling for one more level of page tables
newer linux got a 5 level page table now, try to handle that.

Some of the macros will be no-op (e.g. loop only on one iteration) on
architecture/kernels with only 4 levels but the code needs to be there
to compile

Change-Id: Ifc6304cbb066dce7d4e30962687ae05d7e034730
2018-11-21 07:03:24 +00:00
39f9d7fdff Handle hugetlbfs file mapping
Hugetlbfs file mappings are handled differently than regular files:
 - pager_req_create will tell us the file is in a hugetlbfs
 - allocate memory upfront, we need to fail if not enough memory
 - the memory needs to be given again if another process maps the same
   file

This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.

To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2

Change-Id: I327920ff06efd82e91b319b27319f41912169af1
2018-10-11 08:54:13 +00:00
13e71ac9dc pager: minor cleanups
- remove unused MF_END (that only makes sense for enums without holes,
  this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
  (except for debug print)

Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592
2018-10-11 08:54:13 +00:00
42b9b31606 mcctrl: Propagate writecore()'s return value to caller
Fujitsu: POSTK_DEBUG_TEMP_FIX_62
Change-Id: I847dd520187cbf66fbad8140f79f62c6d5d9d5fc
2018-09-20 11:01:22 +09:00
29c5c68761 coredump: Change type of coretable.len to loff_t from int
Fujitsu: POSTK_DEBUG_TEMP_FIX_61
Change-Id: I6a27a8d477c3b3dcc12be772a15dfcff370bd2a8
2018-09-20 11:01:22 +09:00
38c08a6663 coredump: Add O_TRUNC to flags opening corefile
Fujitsu: POSTK_DEBUG_TEMP_FIX_59
Change-Id: I36c89fa894dfc0cdd170781e8ca4aab6149d4928
2018-09-20 11:01:20 +09:00
8c33c92720 mcctrl: Switch Linux functions/structures according to the version
For get_user_pages_remote in binfmt_mcexec.c:
In 4.10 with 5b56d49fc31d ("mm: add locked parameter to
get_user_pages_remote()")
In 4.9 with 9beae1ea8930 ("mm: replace get_user_pages_remote()
write/force parameters with gup_flags")

For vmf in syscall.c, these two patches in 4.10:
82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
1a29d85eb0f1 ("mm: use vmf->address instead of
vmf->virtual_address")

Fujitsu: POSTK_DEBUG_ARCH_DEP_41
Change-Id: I89a02d03169a2162ea186da1804bf48910446d11
2018-09-20 01:50:04 +00:00
a269d96978 coredump: Exclude special areas
Fujitsu: POSTK_DEBUG_TEMP_FIX_38
Refs: #1005
Change-Id: I8934d2aecf06a09469afe131347e42b48b6f67f6
2018-09-20 01:48:17 +00:00
7e342751a2 do_syscall: Delegate system calls to the mcexec with the same pid
This includes the following fix:
send_syscall, do_syscall: remove argument pid

Fujitsu: POSTK_TEMP_FIX_26
Refs: #1165
Change-Id: I702362c07a28f507a5e43dd751949aefa24bc8c0
2018-09-13 16:59:47 +09:00
c25fb2aa39 memobj: transform memobj lock to refcounting
We had a deadlock between:
 - free_process_memory_range (take lock) -> ihk_mc_pt_free_range ->
... -> remote_flush_tlb_array_cpumask -> "/* Wait for all cores */"
and
 - obj_list_lookup() under fileobj_list_lock that disabled irqs
and thus never ack'd the remote flush

The rework is quite big but removes the need for the big lock,
although devobj and shmobj needed a new smaller lock to be
introduced - the new locks are used much more locally and
should not cause problems.

On the bright side, refcounting being moved to memobj level means
we could remove refcounting implemented separately in all object
types and simplifies code a bit.

Change-Id: I6bc8438a98b1d8edddc91c4ac33c11b88e097ebb
2018-09-12 18:03:25 +09:00
e42c414454 uti: Hook system calls by binary-patching glibc
(1) Add --enable-uti option. The binary-patch library is
    preloaded with this option.
(2) Binary-patching is done by syscall_intercept developed by Intel

This commit includes the following fixes:

(1) Fix do_exit() and terminate() handling
(2) Fix timing of killing mcexec threads when McKernel thread calls terminate()

Change-Id: Iad885e1e5540ed79f0808debd372463e3b8fecea
2018-09-04 19:53:02 +09:00
c0271f4727 Add debug messages for per-process data 2018-09-04 19:53:02 +09:00
07db4a80a7 __do_in_kernel_syscall: Move ihk_ikc_release_packet from mcexec_wait_syscall
Change-Id: Ieeb5fda42dbddc9da27242f4b547c2143659f97a
2018-09-04 19:52:14 +09:00
b8bacdd2de Reference counting per-thread data
It is accompanied by the following fixes:
(1) Fix put ppd locations in mcexec_wait_syscall()
(2) Move put ptd to end of mcexec_terminate_thread_unsafe() and mcexec_ret_syscall()
(3) Add debug messages for ptd add/get/put
(4) Fix ptd-add/get/put matching in mcexec_wait_syscall()
    * Skip put when woken-up from wait_event_interruptible() by signal

Change-Id: Ib9be3f5e62a7a370197cb36c9fa7c4d79f44c314
2018-09-04 19:52:14 +09:00
a121ffc785 uti: Release packet of reply from McKernel in backward_offload() 2018-09-04 19:52:14 +09:00
e29f579061 uti: Prevent user space vma from getting copied when forking 2018-09-04 19:52:12 +09:00
63703589e5 uti: Clear user space PTEs after first fork in create_tracer()
Change-Id: I60755f0cb5e84c3a5a5cd91515411a30f0995822
2018-09-04 19:52:12 +09:00
439dc0928b uti: Streamline syscall_backward() 2018-09-04 19:52:11 +09:00
52afbbbc98 uti: Call into McKernel futex()
(1) Masquerade clv
(2) Fix timeout
(3) Let mcexec thread with the same tid as McKernel thread migrating
    to Linux handles the migration request
(4) Call create_tracer() before creating proxy related objects

Change-Id: I6b2689b70db49827f10aa7d5a4c581aa81319b55
2018-09-04 19:52:10 +09:00
460917c4a0 remote_page_fault,syscall_backward: Zero-clear waitq entry
Change-Id: I151a35004183e911aaba766a8749830e1768bfe6
2018-09-04 19:52:10 +09:00
7803468afe remote_page_fault,syscall_backward: Retry when interrupted by signal
Change-Id: Ic7d72ad9ca32bb3c8e3522e00fef1d98caf3c049
2018-09-04 19:52:10 +09:00
8f2c7d2265 Fix thread-safety issue in rus_vm_fault
Change-Id: I8640a8e0de8a0dfaee700b25e5f9e2941ac98fc8
2018-09-04 19:52:10 +09:00
5a7ca14fcc rus_vm_fault: Return VM_FAULT_SIGBUS when per-process data is not found 2018-09-04 19:52:10 +09:00
2337832e4c pager_req_release(): Correct debug messages 2018-09-04 19:52:10 +09:00
82914c6a2e remote_page_fault: Retry when interrupted
Change-Id: Ib71a87ad03420e1918dc97da43351cb93e7d0754
2018-09-04 19:51:11 +09:00
e531ee626e mcctrl pager: handle pagers more properly
the pagers are all destroyed when linux thinks there is no process left,
but there is no synchronisation with mcexec on that and some new process
might have spawned and started using these pagers in the meantime,
leading to weird crashes because an invalid pager was used.

The reason we're cleaning up pagers when no process is left is that
mcctrl does not handle pager_req_release is the linux-side process got
killed or died before the mckernel one for some reason, so:
 - move pager_req_release to a new __do_in_kernel_irq_syscall() helper
 - have free_all_process_memory_range not set MF_HOST_RELEASED on the
memobj
 - just in case, clean up everything like before on mcctrl shutdown
instead of when no process is left.

Change-Id: I53b8b9b81b1e5b807593850af17b5ea5e8471174
Refs: #1154
2018-08-24 09:18:20 +09:00
9b8424523a mcctrl: remove rus page cache
Change-Id: Ieed7a2a0077ffde3fec8a64d2051e56a53924a42
2018-08-23 02:10:44 +00:00
ea35954613 linux side: replace vfs_read by kernel_read
vfs_read has been unexported in bd8df82be66 ("fs: unexport vfs_read and vfs_write")
in kernel 4.14.
kernel_read has always™ existed and is actually more appropriate: we can
remove the set_fs calls that are done in kernel_read.

The downside is that the function prototype also changed in 4.14 with
bdd1d2d3d251 ("fs: fix kernel_read prototype")...
(same with kernel_write e13ec939e96b ("fs: fix kernel_write prototype"))

Change-Id: I6f76a6387ae02b4d33bd62952d995a90b1952fc9
2018-08-22 06:27:12 +00:00
e8f8660b73 mcctrl: lookup unexported symbols at runtime
Instead of parsing System.map, use kallsyms_lookup_name() to
get unexported symbols addresses at module loading time.

This lets mckernel work with kaslr enabled (it gets enabled by
default from el7.5 onwards)

Change-Id: Ie4349fc1145ebce44f37f1f40c16f9d75584074d
2018-08-08 06:00:20 +00:00