Commit Graph

95 Commits

Author SHA1 Message Date
9cfc373538 Refactor "do write back only MAP_SHARED pages"
* free_process_memory_range() always passes memobj to
  ihk_mc_pt_free_range()
* clear_range_*() don't flush page in fileobj with MF_PRIVATE flag

Fujitsu: POSTK_DEBUG_TEMP_FIX_87
Change-Id: I8d46d029b3fc51ca6f0e59d748a2fe93e324a374
2019-02-14 16:25:58 +09:00
207d653b41 mcctrl: use vmf_insert_pfn for kernel >= 4.18
vmf_insert_pfn got added as a wrapper around vm_insert_pfn in 4.17
1c8f422059ae5da ("mm: change return type to vm_fault_t") and totally
replaced the later in 4.20 ae2b01f37044c ("mm: remove vm_insert_pfn()")

Compare with 4.18 here specifically to avoid troubles when rhel
backports this change later, and avoid adding a rhel version check down
the road.

Change-Id: Ibf108e2fb6f1199f89cde6a7973f4eb55447260b
2019-02-14 16:25:49 +09:00
97e0219f50 Make Linux handler run when mmap to procfs.
Change-Id: I98a3d098c5c676f33c83fa4354c623988ee591f2
Refs: #1222
2019-02-06 11:54:50 +00:00
452d93f14d mcctrl_clear_pte_range: fix zap_page for kernel >= 4.18
zap_vma_ptes no longer returns an error code as of Linux's
27d036e33237e4 ("mm: Remove return value of zap_vma_ptes()"),
where they decided nobody is interested in it....

Just copy the check out of the function.

Change-Id: I2eda0f91ec55a34bba96f45cc3d887bc80132a82
Originally-by: Kagawa Kodai <fj1731iw@aa.jp.fujitsu.com>
2019-02-01 13:18:58 +09:00
516ab87ab9 Copyrights: fujitsu 2018 bump
Separate copyright bumps in a different commit.
A lot of files only had the copyright change at this point; these
were probably changes I added separatly in other patches but just
split these in a different commit instead to simplify git stats

Change-Id: I93cf3fc1c0fa04ee743a79c3fe9768933e6bd0d2
2019-02-01 13:18:52 +09:00
0e895478a1 mcctrl rus_mmap: make vma->vm_flags arch-dependent
[Dominique: renamed arch_vm_flags to arch_rus_vm_flags]
Change-Id: I5ec89b3ff80af6bf0ede342eb5816df8c78de348
Fujitsu: POSTK_DEBUG_ARCH_DEP_100
2019-02-01 13:18:07 +09:00
19659aa908 mcctrl: move translate_rva_to_rpa to archdep
Change-Id: I0efa51468a7ff4d776d8340a612e6f44eac2ed53
Fujitsu: POSTK_DEBUG_ARCH_DEP_83
2019-02-01 13:18:06 +09:00
ae9a1f39df ihk_ikc_recv: Record channel to packet for release
ihk_ikc_release_packet takes the channel and puts the packet into its
free-list.  This fix makes it easy and safe to identify the proper
channel.

Change-Id: I5584b1e8a3ed675c2f9d68f0b5ed331b909197f6
Fujitsu: POSTK_DEBUG_TEMP_FIX_89
2018-11-21 17:01:58 +09:00
6581f9b4b2 mcctrl syscall: compat for newer zap_vma_ptes
newer version of this function no longer return an error on the basis
that "no-one checks what it returns anyway"........

See linux 4.18's 27d036e33237e ("mm: Remove return value of zap_vma_ptes()")

Change-Id: I8fb9f060e3e145cc2db21738585c9ee7f1445f74
2018-11-21 16:06:31 +09:00
03802052ed mcctrl: add handling for one more level of page tables
newer linux got a 5 level page table now, try to handle that.

Some of the macros will be no-op (e.g. loop only on one iteration) on
architecture/kernels with only 4 levels but the code needs to be there
to compile

Change-Id: Ifc6304cbb066dce7d4e30962687ae05d7e034730
2018-11-21 07:03:24 +00:00
39f9d7fdff Handle hugetlbfs file mapping
Hugetlbfs file mappings are handled differently than regular files:
 - pager_req_create will tell us the file is in a hugetlbfs
 - allocate memory upfront, we need to fail if not enough memory
 - the memory needs to be given again if another process maps the same
   file

This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.

To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2

Change-Id: I327920ff06efd82e91b319b27319f41912169af1
2018-10-11 08:54:13 +00:00
13e71ac9dc pager: minor cleanups
- remove unused MF_END (that only makes sense for enums without holes,
  this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
  (except for debug print)

Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592
2018-10-11 08:54:13 +00:00
42b9b31606 mcctrl: Propagate writecore()'s return value to caller
Fujitsu: POSTK_DEBUG_TEMP_FIX_62
Change-Id: I847dd520187cbf66fbad8140f79f62c6d5d9d5fc
2018-09-20 11:01:22 +09:00
29c5c68761 coredump: Change type of coretable.len to loff_t from int
Fujitsu: POSTK_DEBUG_TEMP_FIX_61
Change-Id: I6a27a8d477c3b3dcc12be772a15dfcff370bd2a8
2018-09-20 11:01:22 +09:00
38c08a6663 coredump: Add O_TRUNC to flags opening corefile
Fujitsu: POSTK_DEBUG_TEMP_FIX_59
Change-Id: I36c89fa894dfc0cdd170781e8ca4aab6149d4928
2018-09-20 11:01:20 +09:00
8c33c92720 mcctrl: Switch Linux functions/structures according to the version
For get_user_pages_remote in binfmt_mcexec.c:
In 4.10 with 5b56d49fc31d ("mm: add locked parameter to
get_user_pages_remote()")
In 4.9 with 9beae1ea8930 ("mm: replace get_user_pages_remote()
write/force parameters with gup_flags")

For vmf in syscall.c, these two patches in 4.10:
82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
1a29d85eb0f1 ("mm: use vmf->address instead of
vmf->virtual_address")

Fujitsu: POSTK_DEBUG_ARCH_DEP_41
Change-Id: I89a02d03169a2162ea186da1804bf48910446d11
2018-09-20 01:50:04 +00:00
a269d96978 coredump: Exclude special areas
Fujitsu: POSTK_DEBUG_TEMP_FIX_38
Refs: #1005
Change-Id: I8934d2aecf06a09469afe131347e42b48b6f67f6
2018-09-20 01:48:17 +00:00
7e342751a2 do_syscall: Delegate system calls to the mcexec with the same pid
This includes the following fix:
send_syscall, do_syscall: remove argument pid

Fujitsu: POSTK_TEMP_FIX_26
Refs: #1165
Change-Id: I702362c07a28f507a5e43dd751949aefa24bc8c0
2018-09-13 16:59:47 +09:00
c25fb2aa39 memobj: transform memobj lock to refcounting
We had a deadlock between:
 - free_process_memory_range (take lock) -> ihk_mc_pt_free_range ->
... -> remote_flush_tlb_array_cpumask -> "/* Wait for all cores */"
and
 - obj_list_lookup() under fileobj_list_lock that disabled irqs
and thus never ack'd the remote flush

The rework is quite big but removes the need for the big lock,
although devobj and shmobj needed a new smaller lock to be
introduced - the new locks are used much more locally and
should not cause problems.

On the bright side, refcounting being moved to memobj level means
we could remove refcounting implemented separately in all object
types and simplifies code a bit.

Change-Id: I6bc8438a98b1d8edddc91c4ac33c11b88e097ebb
2018-09-12 18:03:25 +09:00
e42c414454 uti: Hook system calls by binary-patching glibc
(1) Add --enable-uti option. The binary-patch library is
    preloaded with this option.
(2) Binary-patching is done by syscall_intercept developed by Intel

This commit includes the following fixes:

(1) Fix do_exit() and terminate() handling
(2) Fix timing of killing mcexec threads when McKernel thread calls terminate()

Change-Id: Iad885e1e5540ed79f0808debd372463e3b8fecea
2018-09-04 19:53:02 +09:00
c0271f4727 Add debug messages for per-process data 2018-09-04 19:53:02 +09:00
07db4a80a7 __do_in_kernel_syscall: Move ihk_ikc_release_packet from mcexec_wait_syscall
Change-Id: Ieeb5fda42dbddc9da27242f4b547c2143659f97a
2018-09-04 19:52:14 +09:00
b8bacdd2de Reference counting per-thread data
It is accompanied by the following fixes:
(1) Fix put ppd locations in mcexec_wait_syscall()
(2) Move put ptd to end of mcexec_terminate_thread_unsafe() and mcexec_ret_syscall()
(3) Add debug messages for ptd add/get/put
(4) Fix ptd-add/get/put matching in mcexec_wait_syscall()
    * Skip put when woken-up from wait_event_interruptible() by signal

Change-Id: Ib9be3f5e62a7a370197cb36c9fa7c4d79f44c314
2018-09-04 19:52:14 +09:00
a121ffc785 uti: Release packet of reply from McKernel in backward_offload() 2018-09-04 19:52:14 +09:00
e29f579061 uti: Prevent user space vma from getting copied when forking 2018-09-04 19:52:12 +09:00
63703589e5 uti: Clear user space PTEs after first fork in create_tracer()
Change-Id: I60755f0cb5e84c3a5a5cd91515411a30f0995822
2018-09-04 19:52:12 +09:00
439dc0928b uti: Streamline syscall_backward() 2018-09-04 19:52:11 +09:00
52afbbbc98 uti: Call into McKernel futex()
(1) Masquerade clv
(2) Fix timeout
(3) Let mcexec thread with the same tid as McKernel thread migrating
    to Linux handles the migration request
(4) Call create_tracer() before creating proxy related objects

Change-Id: I6b2689b70db49827f10aa7d5a4c581aa81319b55
2018-09-04 19:52:10 +09:00
460917c4a0 remote_page_fault,syscall_backward: Zero-clear waitq entry
Change-Id: I151a35004183e911aaba766a8749830e1768bfe6
2018-09-04 19:52:10 +09:00
7803468afe remote_page_fault,syscall_backward: Retry when interrupted by signal
Change-Id: Ic7d72ad9ca32bb3c8e3522e00fef1d98caf3c049
2018-09-04 19:52:10 +09:00
8f2c7d2265 Fix thread-safety issue in rus_vm_fault
Change-Id: I8640a8e0de8a0dfaee700b25e5f9e2941ac98fc8
2018-09-04 19:52:10 +09:00
5a7ca14fcc rus_vm_fault: Return VM_FAULT_SIGBUS when per-process data is not found 2018-09-04 19:52:10 +09:00
2337832e4c pager_req_release(): Correct debug messages 2018-09-04 19:52:10 +09:00
82914c6a2e remote_page_fault: Retry when interrupted
Change-Id: Ib71a87ad03420e1918dc97da43351cb93e7d0754
2018-09-04 19:51:11 +09:00
e531ee626e mcctrl pager: handle pagers more properly
the pagers are all destroyed when linux thinks there is no process left,
but there is no synchronisation with mcexec on that and some new process
might have spawned and started using these pagers in the meantime,
leading to weird crashes because an invalid pager was used.

The reason we're cleaning up pagers when no process is left is that
mcctrl does not handle pager_req_release is the linux-side process got
killed or died before the mckernel one for some reason, so:
 - move pager_req_release to a new __do_in_kernel_irq_syscall() helper
 - have free_all_process_memory_range not set MF_HOST_RELEASED on the
memobj
 - just in case, clean up everything like before on mcctrl shutdown
instead of when no process is left.

Change-Id: I53b8b9b81b1e5b807593850af17b5ea5e8471174
Refs: #1154
2018-08-24 09:18:20 +09:00
9b8424523a mcctrl: remove rus page cache
Change-Id: Ieed7a2a0077ffde3fec8a64d2051e56a53924a42
2018-08-23 02:10:44 +00:00
ea35954613 linux side: replace vfs_read by kernel_read
vfs_read has been unexported in bd8df82be66 ("fs: unexport vfs_read and vfs_write")
in kernel 4.14.
kernel_read has always™ existed and is actually more appropriate: we can
remove the set_fs calls that are done in kernel_read.

The downside is that the function prototype also changed in 4.14 with
bdd1d2d3d251 ("fs: fix kernel_read prototype")...
(same with kernel_write e13ec939e96b ("fs: fix kernel_write prototype"))

Change-Id: I6f76a6387ae02b4d33bd62952d995a90b1952fc9
2018-08-22 06:27:12 +00:00
e8f8660b73 mcctrl: lookup unexported symbols at runtime
Instead of parsing System.map, use kallsyms_lookup_name() to
get unexported symbols addresses at module loading time.

This lets mckernel work with kaslr enabled (it gets enabled by
default from el7.5 onwards)

Change-Id: Ie4349fc1145ebce44f37f1f40c16f9d75584074d
2018-08-08 06:00:20 +00:00
794684985f mcctrl syscall: remove unused walk page debug function
This saves looking up one symbol for a debug function that is not
used anywhere

Change-Id: I6a3a480ce8067b4f6f0faf9aa837119ea46888ad
2018-08-08 05:57:46 +00:00
6cf89076dc mcctrl handle_mm_fault compat: add el7.5 support
Change-Id: I8c7738b70ca914e857be119b7720cdc22e61ae0e
2018-08-08 05:36:35 +00:00
1543119139 mcctrl rus_vm_fault: tpe changed with kernel >= 4.11
vma is part of vmf and isn't needed, so type changed (see linux 11bac80
("mm, fs: reduce fault, [...] to take only vmf"))

Change-Id: I4c023e23c7e7416ad2df2dcc0698a0032e574e4c
2018-07-27 02:31:39 +00:00
0a0a78ac2e mcctrl: replace GFP_TEMPORARY by GFP_KERNEL
See linux's commit 0ee931c4 ("mm: treewide: remove GFP_TEMPORARY
allocation flag") for a long explanation, but basically that flag
"is just cargo cult" and should be removed

Change-Id: I2147cd65b6b9ec509a72e11cc3abf1fe1561c10b
2018-07-27 02:31:00 +00:00
dc8d6b740c pager_req_read: handle short read
Change-Id: Iff89046041e012a65c80a29b485ddbb636435dd0
2018-07-26 04:37:54 +00:00
992705d465 pager_get_path: Append \0 to path
Change-Id: Iaabd89a649bb20b37b35cd345da0f468fd5dd0b5
2018-07-10 02:10:19 +00:00
f148863586 pager_req_map(): do not take mmap_sem if not needed 2018-06-07 07:17:41 +09:00
ec375da27a pager_req_create(): prefetch libiomp, libpthread and libc 2018-06-07 07:17:31 +09:00
f3d18eb9de fileobj/devobj: record path name (originally by Takagi-san) 2018-05-14 17:46:52 +09:00
bfb5080b71 pager_req_unmap: Put per-process data at exit 2018-04-10 11:35:03 +09:00
642520f80c rus_vm_fault: If page fault occurs in a thread that has not processed system call offloading, incorrectly return to normal.
refs #923
2018-03-07 10:22:47 +09:00
a9dfcd9a89 translate_rva_to_rpa(): use 2MB blocks in 1GB pages on x86 2018-01-31 11:16:44 +09:00