Compare commits

...

269 Commits
1.5.0 ... 1.6.0

Author SHA1 Message Date
270dd28b51 Merge branch 'development' 2018-11-08 17:28:28 +09:00
85c936a6cb mcexec: fix terminating zero after readlink()
Change-Id: Icb5432f157ceb2182d93e2d327cfa63ad02a8c0e
2018-11-08 17:01:22 +09:00
bfff009f7c configure.ac: Update version number to 1.6.0
Change-Id: I91148618c8db8035c8a2f11a20898df48607ad1f
2018-11-08 10:54:58 +09:00
a1fef219ad Merge tag '1.6.0-rc1' into master-1.6.0
Release target Nov 11, 2018

Conflicts:
	configure
2018-11-08 10:49:38 +09:00
6f9fef2b13 procfs: Make /proc/<PID>/mem unwritable
refs: #1177
Change-Id: Ibb319221155547febf9126e05a9e322bd9f140cc
2018-10-26 08:58:31 +00:00
cc1d39e55d mcctrl_perf_enable: Fix type of integer constant
Change-Id: Ib98eca85a9962520dafdd08b8fc223a6a83bafd0
2018-10-24 14:56:26 +09:00
fd8bed670e ihk_os_setperfevent: Return number of registered events
In addition to that, mcctrl_perf_set is modified so that it updates
usrdata->perf_event_num with number of registered events.

Change-Id: I3f343176f55b06d3baab0b0fe34e240f39706cf6
Fujitsu: POSTK_DEBUG_TEMP_FIX_80
2018-10-24 06:16:41 +00:00
24a3b236a0 Update .gitmodules to point IHK at github
Change-Id: I712f4cf2fb012d2b268f0881a156268024df57b9
2018-10-24 11:20:13 +09:00
27e55b8cf1 mcreboot.sh: Fix error reporting for missing argment
Change-Id: I3af99d7a117d4401c2e0a143fa74513094a53302
2018-10-18 12:06:58 +09:00
70e52faf36 flatten_strings: do not return unused trailing bits
Trailing bits were displayed in proc->saved_cmdline, displaying
uninitialized data to the user in /proc/<pid>/cmdline

Change-Id: I74831c8c68dd2f2197b35e9b49aaaae29c4c1dd5
2018-10-15 08:35:50 +00:00
8db36c3828 mcexec: do not resolve links in lookup_exec_path
This would incorrectly make "mcexec sh -c './script.sh'" run with
/bin/bash instead of /bin/sh (which is important, because bash behaviour
changes depending on how it is invoked)

Change-Id: I80610cf442c6c3ecacfa23e8ed15652bc8d4e3f7
2018-10-15 08:35:41 +00:00
06dd71a7e0 Revert "procfs: add '/proc/pid/stat' to mckernel side and fix its comm"
This reverts commit b70d470e20.

That commit had been landed too fast after a mistake during migration
from old to new gerrit that didn't keep -1 vote ; it needs some fix

Change-Id: Ifc8a23e42449dfe471049270b4706e9b137e096e
2018-10-12 10:54:14 +09:00
01fe83dcb3 do_mmap: change addr to uintptr_t
Change-Id: I7df45e125387083aef7e62b046c20b7422f60f22
2018-10-11 09:24:23 +00:00
c86d168165 procfs: handle 'comm' on mckernel side
Change-Id: Ie68514ba3e5161b931b88eeee9e8a2267ee69354
2018-10-11 09:19:42 +00:00
a032dc3d1b procfs: use length from snprintf instead of recomputing
Change-Id: I75ba4cf5c2e94798d183728c11bb34032cdddf5a
2018-10-11 09:17:58 +00:00
201fa7fb55 fork: copy saved_cmdline from parent process
This fixes empty children names for forked children.

Change-Id: I9512f0981d2a241c106ee3e8500f2084ef61a660
2018-10-11 09:14:14 +00:00
dd676f7149 saved_cmdline: only allocated necessary space
Change-Id: Ibb3fe66b46485a28c15e45dca9213f42f5afaa1c
2018-10-11 09:13:15 +00:00
a751e96b1a Add mck_num_processors symbol pointing to num_processors
the 'num_processors' symbol is also used by linux, so trying to load all
symbols from linux and mckernel at the same time renders either symbol
inaccessible (the first to be seen is kept by default).

This provides an alternate name for the mckernel symbol, thus letting us
access both more easily if required.

Change-Id: I8074d4f9f9ac45717df9a8df16be710ff762e161
2018-10-11 09:12:04 +00:00
c3bfa3f6a9 move BUG_ON, panic and kprintf define to debug.h; add BUILD_BUG_ON
these functions are more logical to keep together there as they depend
on each other.

Also add a comment about the __printf attribute, if we have a quiet
period it would be useful to enable and clear the thousands of
warnings...

Change-Id: I47d3891c9cd87da28b2883c29384959f5abd1459
2018-10-11 09:03:53 +00:00
1e1fa4f70d trivial warnings fixes (unused variable/function)
Change-Id: I71cedd2c09eeb5d2c2fd2e988dfdde0877627abc
2018-10-11 09:03:53 +00:00
39f9d7fdff Handle hugetlbfs file mapping
Hugetlbfs file mappings are handled differently than regular files:
 - pager_req_create will tell us the file is in a hugetlbfs
 - allocate memory upfront, we need to fail if not enough memory
 - the memory needs to be given again if another process maps the same
   file

This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.

To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2

Change-Id: I327920ff06efd82e91b319b27319f41912169af1
2018-10-11 08:54:13 +00:00
3e3ccf377c compiler.h: add READ_ONCE/WRITE_ONCE macro
These macros are needed to make sure the compiler does not optimize away
atomic constructs such as "while (!READ_ONCE(foo))" loops that do not
modify foo within the loop

Also move the barrier() define where it belongs while we are here, it is
needed for READ_ONCE/WRITE_ONCE and including ihk/cpu.h here causes
include loops

Change-Id: Ia533a849ed674719ccbc0495be47d22a3c47b8f8
2018-10-11 08:54:13 +00:00
13e71ac9dc pager: minor cleanups
- remove unused MF_END (that only makes sense for enums without holes,
  this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
  (except for debug print)

Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592
2018-10-11 08:54:13 +00:00
b1681f4a3a mcexec/execve: fix shebangs handling
There were mainly two problems with shebangs:
 - Suffix arguments handling e.g. '#!/bin/sh -x'
 - Recursive handling e.g. script1 fetchs '#!/path/to/script2'
and script2 itself has a shebang
 - (did I say two?) running shebang would replace argv[optind] instead
of appending e.g. script with '#!/bin/sh' and running './script -c'
would run '/bin/sh -c' instead of '/bin/sh ./script -c'

There also are two places where this needs parsing:
 - starting a fresh program from mcexec
 - starting a new program from execve in mcexec

The first was easy to fix as we already had argv around, but the later
required a new way to transfer the 'new argv elements from the script'
to mckernel to append before its argv -- it used to be 'desc->shell_path'
but that was no longer used at some point and just one keyword is not
enough to handle this properly.

This commit does:
 - Refactors the lookup_path + load_elf_desc that was only done at most
twice in its own function that loops indefinitely and use that in both
situations described above
 - Transmits the argv addition in the transfer to mckernel after the
desc; mckernel allocates 4 pages (hardcoded) for the descs and we will
hopefully have room for the script arguments on top of that... (there is
no guard!!!)
 - Change flatten_strings to allow prepending a flattened string instead
of a single string.
Note that the flatten_string change also brought in a difference in the
format, to have the full length embedded within the string, the latest
slot that used to be zeroes now contains the position of the end of the
buffer (where the last+1 string would be if there had been one)
This required a trivial change in mckernel prepare args function that
used this property for no real reason.

Hopefully things work™, this probably warrants adding a couple of new
ostests...
 - create a couple of scripts with recursive invocation/arguments and
check their own argv.
 - execute "mcexec script args" and "mcexec sh -c 'script args'"

Change-Id: I2cf9cde5c07c9293f730de89c9731bd93dbfa789
Refs: #1115
2018-10-04 14:31:02 +09:00
1226e692d9 mcstat: Install mcstat.1
Change-Id: Id5af2f56ef9cc9c444bfc0500190f52ffc779936
2018-10-04 02:52:18 +00:00
73ea4b1ce9 ihk_os_getperfevent,setperfevent: Return -ETIME when IKC timeouts
Change the return value from -EINVAL to -ETIME.

Refs: #1167
Change-Id: I87fa57bb45d0036b7e4b25366aa7b7ce6fb2c764
2018-10-04 02:44:22 +00:00
09f663c246 mcctrl procfs: check entry was returned before using it
Change-Id: If66e95d217d1045e2e65bc5978bba020e3fa7c0d
Refs: #1116
2018-10-04 02:41:16 +00:00
9b77630c8b mcexec: readlink and use full path for reexec
This fixes comm on linux side, showing mcexec instead of 'exe'

Change-Id: I9345d7a23dccb36b3a1e17fd3e7491eaeca54e5b
2018-10-04 01:03:10 +00:00
b70d470e20 procfs: add '/proc/pid/stat' to mckernel side and fix its comm
This lets ps show the proper executable name instead of mcexec's comm
on linux side

Change-Id: I62732037451f129fc2e905357ebdc351bf7f6d2d
Refs: #1114
2018-10-04 01:01:19 +00:00
ecc850dfef procfs/do_fork: wait until procfs entries are registered
Do not return from fork() until mcctrl side has created mckernel's
procfs entries for the child PID.

This fixes programs doing fork() immediately followed by opening
/proc/<child pid>/something, and would get some error

Refs: #1189
Change-Id: Ie10ea56b65c55f59e96a1ab6ef83a1070e36048d
2018-10-04 01:00:52 +00:00
b11377f2e9 Increase IKC master channel size
Change-Id: I183878bb22b848e1230f8028947cf46485293471
2018-10-03 06:23:17 +00:00
ed1edb152b ptrace supports threads
Fujitsu: POSTK_DEBUG_TEMP_FIX_53, POSTK_DEBUG_ARCH_DEP_44
Refs: #771, #1179, #1143
Change-Id: Ie17ece6864f0eeb0c0e550f4e369abb77980a0d0
2018-10-01 03:57:16 +00:00
28c434a230 test: Fix test for 898 and 928
Change-Id: If939dda7ccdcf568abfa42ccab7ff6be2b983cc2
2018-09-28 02:55:55 +00:00
daa234d8b9 mcexec_create_per_process_data: use copy_from_user
Refs: #1205
Change-Id: Idced73a7f88aada5fc2462b490d56603f8fe2472
2018-09-27 15:42:01 +00:00
e803698618 test: Refactor test programs
Change-Id: I77fec2f5f30f6fda3bda6f85ce00f1c2e7f7a9b3
2018-09-25 12:45:20 +09:00
c862b29d65 sched_setaffinity: Check migration after decrementing in_interrupt
refs: #1180
Change-Id: I2b3fb03066812ecc802406297084977e757092fe
2018-09-25 01:52:54 +00:00
dd58d366c3 procfs: Fix pread/pwrite to procfs fail when specified size is bigger than 4MB
Fujitsu: POSTK_DEBUG_TEMP_FIX_43
Refs: #1018
Change-Id: I736ac69885695ef8eeababc3fcfe69a6258b4e16
2018-09-20 02:06:17 +00:00
ab284b0531 test: Add test programs for #1158
refs: #1158
Change-Id: I853dd84f5433a01da510813e9fb1276e5477f73f
2018-09-20 02:05:55 +00:00
42b9b31606 mcctrl: Propagate writecore()'s return value to caller
Fujitsu: POSTK_DEBUG_TEMP_FIX_62
Change-Id: I847dd520187cbf66fbad8140f79f62c6d5d9d5fc
2018-09-20 11:01:22 +09:00
29c5c68761 coredump: Change type of coretable.len to loff_t from int
Fujitsu: POSTK_DEBUG_TEMP_FIX_61
Change-Id: I6a27a8d477c3b3dcc12be772a15dfcff370bd2a8
2018-09-20 11:01:22 +09:00
38c08a6663 coredump: Add O_TRUNC to flags opening corefile
Fujitsu: POSTK_DEBUG_TEMP_FIX_59
Change-Id: I36c89fa894dfc0cdd170781e8ca4aab6149d4928
2018-09-20 11:01:20 +09:00
57258e7f59 coredump: Don't dump when MCK_RLIMIT_CORE is zero
Fujitsu: POSTK_DEBUG_ARCH_DEP_67
Change-Id: Ic85c793b052cde9d7fa4fe510c5daee303d370c4
2018-09-20 01:51:18 +00:00
8c33c92720 mcctrl: Switch Linux functions/structures according to the version
For get_user_pages_remote in binfmt_mcexec.c:
In 4.10 with 5b56d49fc31d ("mm: add locked parameter to
get_user_pages_remote()")
In 4.9 with 9beae1ea8930 ("mm: replace get_user_pages_remote()
write/force parameters with gup_flags")

For vmf in syscall.c, these two patches in 4.10:
82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
1a29d85eb0f1 ("mm: use vmf->address instead of
vmf->virtual_address")

Fujitsu: POSTK_DEBUG_ARCH_DEP_41
Change-Id: I89a02d03169a2162ea186da1804bf48910446d11
2018-09-20 01:50:04 +00:00
a269d96978 coredump: Exclude special areas
Fujitsu: POSTK_DEBUG_TEMP_FIX_38
Refs: #1005
Change-Id: I8934d2aecf06a09469afe131347e42b48b6f67f6
2018-09-20 01:48:17 +00:00
2910818f06 execve: Fix calling ptrace_report_signal after preemption is disabled
Change-Id: I451d28d985ab330d855501597713e982b8febf4e
Refs: 1194
2018-09-20 01:31:31 +00:00
3df82d61ce test: Fix tests of "user_space"
user_space/swapout/swapout_copy_to_01.sh:
* Use ~/.mck_test_config
* Fix checking if McKernel version is written in swap-file

user_space/futex/futex_test.sh:
* Use ~/.mck_test_config

user_space/perf_event_open/perf_event_open_test.sh
* Use ~/.mck_test_config

Change-Id: Id93b207ed0e3e9ebf307073db81b40335bc5b140
2018-09-19 08:54:08 +00:00
159092c58e rusage: Refactor test programs
Change-Id: I846a6416acf903f7fa19db98d4d937c51c10b4af
2018-09-18 18:42:19 +09:00
60011718d2 add common test framework
Add new file with common functions for tests to use.

 - loads config file
 - checks for mcexec etc
 - checks for LTP and OSTEST if required
 - handle mcstop / mcreboot if required, and provide function for it

At the same time, make a few changes to mck_test_config:
 - move to ~/.mck_test_config
 - add boot params to the config, tests the require specific params can
   overwite it
 - make the config "set-if-variable-is-empty", so someone can overwrite
   any param by setting the environment value e.g. LTP=.... ./test.sh
   will use the value given

Change-Id: Ib04112043e3eb89615dc7afaa8842a98571fab93
2018-09-14 03:30:06 +00:00
7e342751a2 do_syscall: Delegate system calls to the mcexec with the same pid
This includes the following fix:
send_syscall, do_syscall: remove argument pid

Fujitsu: POSTK_TEMP_FIX_26
Refs: #1165
Change-Id: I702362c07a28f507a5e43dd751949aefa24bc8c0
2018-09-13 16:59:47 +09:00
c23bc8d401 syscall_time: Handle by McKernel
refs: #1036
Change-Id: Ifa81b613c7ee8d95ae7cdf3dd54643f60526fa73
2018-09-13 07:44:02 +00:00
5e760db417 syscall: the signal received during system call processing is not processed.
Refs: #1176
Fujitsu: POSTK_DEBUG_TEMP_FIX_56
Change-Id: I410160ccbcef3ef49a0e37611a608bc87c97e63b
2018-09-13 07:04:11 +00:00
e4da71010c check_signal: system call restart is done only once
Fujitsu: POSTK_TEMP_FIX_66
Refs: #1009
Change-Id: Ic0f04ac6b7f6c6bb01b55fb389bf9befd56b1dd9
2018-09-13 07:00:49 +00:00
c25fb2aa39 memobj: transform memobj lock to refcounting
We had a deadlock between:
 - free_process_memory_range (take lock) -> ihk_mc_pt_free_range ->
... -> remote_flush_tlb_array_cpumask -> "/* Wait for all cores */"
and
 - obj_list_lookup() under fileobj_list_lock that disabled irqs
and thus never ack'd the remote flush

The rework is quite big but removes the need for the big lock,
although devobj and shmobj needed a new smaller lock to be
introduced - the new locks are used much more locally and
should not cause problems.

On the bright side, refcounting being moved to memobj level means
we could remove refcounting implemented separately in all object
types and simplifies code a bit.

Change-Id: I6bc8438a98b1d8edddc91c4ac33c11b88e097ebb
2018-09-12 18:03:25 +09:00
b51886421e uti: Don't compile syscall_intercept related stuff when not specified with configure option
Change-Id: I9be8cb9b3fcae78d33a33b057c43caee23a81fc1
2018-09-05 16:29:20 +09:00
22c6c5c736 do_syscall: Call schedule() when runq_len > 1
This optimization make the offloading thread quickly yield to
another thread. Without this, it yileded only after the interval timer
set the rescheduling flag.

Change-Id: Ida3b17ed94782d5d1af0185a96b1f50d9db8d244
2018-09-04 19:53:03 +09:00
cd00fc3a78 set_timer: Start timer when runnable thread count is bigger than one
Change-Id: Ie32799fff2936ffc057f166db5681edccdbf5920
2018-09-04 19:53:03 +09:00
00a34a8ba3 uti: util_thread: Hoist uti_desc check
Change-Id: I8c4b75140df2fe149dfe20e0a8f0bf323b5f1763
2018-09-04 19:53:03 +09:00
8900c2cec5 uti: mcexec_uti_attr: Fix CPU binding decision
Change-Id: I4047858895503ae912e5575bb232dbbb2f915722
2018-09-04 19:53:03 +09:00
fca02ee248 uti: Add error checks to kmalloc of struct uti_attr 2018-09-04 19:53:03 +09:00
781a69617b uti: Replace data types represented as arrays with C structures
Defining C structures for the following objects:
(1) Remote and local context
(2) Stack of system call arguments / return values

Change-Id: Iafbb6c795bd765e3c78c54a255d8a1e4d4536288
2018-09-04 19:53:03 +09:00
04d4145b3e uti: Replace dead uti thread with new mcexec thread in proc->tids
Change-Id: Ic6e906dd1bfac1b07f1317732cbe0a5191831cd8
2018-09-04 19:53:03 +09:00
96aab7e215 uti: Cosmetic change in util_thread
Change-Id: I8aa75efa4dbfb798e40e75f76bacbd184dae23b8
2018-09-04 19:53:02 +09:00
98ee584ab6 uti: Change field name of release_user_space_desc
Change-Id: I18ada86ec3835198c1a947d8ceb36075d6ff2e94
2018-09-04 19:53:02 +09:00
6b031c5472 uti: Fix condition for pthread_join of mcexec threads
Change-Id: Iaeee91c197b84436f84ce4380768aa79e7f9419e
2018-09-04 19:53:02 +09:00
e42c414454 uti: Hook system calls by binary-patching glibc
(1) Add --enable-uti option. The binary-patch library is
    preloaded with this option.
(2) Binary-patching is done by syscall_intercept developed by Intel

This commit includes the following fixes:

(1) Fix do_exit() and terminate() handling
(2) Fix timing of killing mcexec threads when McKernel thread calls terminate()

Change-Id: Iad885e1e5540ed79f0808debd372463e3b8fecea
2018-09-04 19:53:02 +09:00
e613483bee uti: Add system call profile 2018-09-04 19:53:02 +09:00
c0271f4727 Add debug messages for per-process data 2018-09-04 19:53:02 +09:00
4969762f15 uti: Add usage of uti specific options to mcexec 2018-09-04 19:53:02 +09:00
09d3648e43 uti: Set PROT_EXEC to host VMA when PROT_READ is set
Set PROT_EXEC to host VMA because uti needs PROT_EXEC for text VMAs.

Meanings of prot bits of Host VMA has been changed as follows.
   RWX: No mapping or RW mapping
   RX: Read only mapping
2018-09-04 19:53:02 +09:00
4e905cd412 uti: do_syscall: Don't warn when proxy is gone
This is because this is a normal case since terminate() is changed so
that it first kills all mcexec threads and then kill McKernel threads.

Change-Id: I88380bf28b60645d361baded525d71105235c16f
2018-09-04 19:53:01 +09:00
8c11daf726 uti: Fix signal relay from mcexec to McKernel
Change-Id: I2ffd8049a0fb1637cfc6bab7fe24c6a85e5e53fc
2018-09-04 19:53:01 +09:00
5cb8a1f10f uti: Workaround not to share CPU with OpenMP threads
* Assign uti thread to the last idle CPU so that it's not shared with
  an OpenMP thread

Change-Id: Ia42cae056ce81fde9b6dab6286b39a52f3c9e172
2018-09-04 19:53:01 +09:00
dbba7dea18 uti: Allow only the first do_fork() call to create a uti thread 2018-09-04 19:53:01 +09:00
b6ab5911b7 uti: Identify uti thread by clone count
--uti-thread-count <count> is added to mcexec.

Change-Id: Id9ec464412a5bb71e4d9e87d05f79de22d35b067
2018-09-04 19:53:01 +09:00
b0d7f890d0 uti: Reverse-offload msync() 2018-09-04 19:53:01 +09:00
b9c0cdddab uti: Cosmetic change 2018-09-04 19:52:14 +09:00
7ee7dd5e2c uti: Allow tracer to call release_handler() for the main process
Change-Id: I934a6eefbcb87473e87c109d6b4d32c7ab486894
2018-09-04 19:52:14 +09:00
07db4a80a7 __do_in_kernel_syscall: Move ihk_ikc_release_packet from mcexec_wait_syscall
Change-Id: Ieeb5fda42dbddc9da27242f4b547c2143659f97a
2018-09-04 19:52:14 +09:00
f04e5c24ab uti: Don't call mcexec_terminate_thread() when McKernel asks mcexec to interrupt system call 2018-09-04 19:52:14 +09:00
b8bacdd2de Reference counting per-thread data
It is accompanied by the following fixes:
(1) Fix put ppd locations in mcexec_wait_syscall()
(2) Move put ptd to end of mcexec_terminate_thread_unsafe() and mcexec_ret_syscall()
(3) Add debug messages for ptd add/get/put
(4) Fix ptd-add/get/put matching in mcexec_wait_syscall()
    * Skip put when woken-up from wait_event_interruptible() by signal

Change-Id: Ib9be3f5e62a7a370197cb36c9fa7c4d79f44c314
2018-09-04 19:52:14 +09:00
a121ffc785 uti: Release packet of reply from McKernel in backward_offload() 2018-09-04 19:52:14 +09:00
88f9693390 uti: Return -ENOSYS without offloading for set_robust_list()
Change-Id: I43466e3850fd2ad68e5754d1d460438fa47f3ed4
2018-09-04 19:52:13 +09:00
124ec580a0 uti: Call do_exit when tracer isn't working and do_syscall returned -ERESTARTSYS 2018-09-04 19:52:13 +09:00
af7f61db49 uti: mcexec: Fix error check of pthread_detach
Change-Id: Idda8e060641bbd7b01c50163140a2c5f7466d193
2018-09-04 19:52:13 +09:00
ee299b5780 uti: Check size of syscall arguments for syscall_intercept
Change-Id: I747b90e1f521b08266cfc021ef4b23e2e3c7ba4c
2018-09-04 19:52:13 +09:00
c60a778c8d uti: Zero-clear struct mckernel_exec_file before initialization
Change-Id: I315008b7f5c9e66a93b80da87d1a6332d717c2aa
2018-09-04 19:52:13 +09:00
25a129ea6a uti: Disable jumping to McKernel futex code 2018-09-04 19:52:13 +09:00
8e9924c523 uti: Lock per_thread_data_hash_lock in mcctrl_put_per_proc_data() 2018-09-04 19:52:13 +09:00
c71291a429 mcctrl: Add mcexec_terminate_thread_unsafe()
Change-Id: I6ca54cdac2ab9449d40b22f7329f1a215e5aa33b
2018-09-04 19:52:13 +09:00
ba93b83d68 uti: Add __user to mcexec_terminate_thread argument
Change-Id: Ic96a91e6a892a1bd2f1d333580e28bced6a40dc0
2018-09-04 19:52:13 +09:00
c2f41ca9ad uti: Replace hand-made list of host_threads with Linux macro
Change-Id: Ib46cc9fcdd2854b7bbe21c2cc885beeb22d16dd2
2018-09-04 19:52:13 +09:00
062d7ecae3 uti: Use copy_from_user() in mcexec_terminate_thread() 2018-09-04 19:52:12 +09:00
58d038fcac uti: Fix wrong argument passed to ihk_ikc_release_packet() in mcexec_terminate_thread() 2018-09-04 19:52:12 +09:00
510310342c uti: Use fresh struct syscall_request instance when replying to syscall_backward() 2018-09-04 19:52:12 +09:00
a6198f267b uti: Offload set_robust_list to McKernel 2018-09-04 19:52:12 +09:00
5e78bd85ab uti: Fix tracer exit code for the case when create_tracer() isn't called 2018-09-04 19:52:12 +09:00
85c0c8a01f uti: Add debug messages for syscall
Change-Id: I2f96e71d5384f883f7dc568122c57d92bc1cd818
2018-09-04 19:52:12 +09:00
e29f579061 uti: Prevent user space vma from getting copied when forking 2018-09-04 19:52:12 +09:00
63703589e5 uti: Clear user space PTEs after first fork in create_tracer()
Change-Id: I60755f0cb5e84c3a5a5cd91515411a30f0995822
2018-09-04 19:52:12 +09:00
5c8c1986b5 uti: Add comment on ppd life cycle
Change-Id: Id16cf036b2d919444e8634b536fd701d996bcef2
2018-09-04 19:52:12 +09:00
e4370d235c uti: Make tracer not call mcexec_terminate_thread() when tracee is killed by signal
Change-Id: I5878c7d623ce182a7cb9578c9d5c430c1bee8e1e
2018-09-04 19:52:12 +09:00
31ac007cb5 uti: Increase CPU_HZ to 1000
Change-Id: I8619263845fd8ebabe6fc7de619a5b51ac04470a
2018-09-04 19:52:11 +09:00
56da7e2de9 uti: Allocate memory area directly to uti_desc->wp
Change-Id: Ia5a1dbf56b937d9d05cd7fa1c5eec4a5b4b7b196
2018-09-04 19:52:11 +09:00
35300e7b4f uti: Create tracer when forking
Change-Id: Ic66cf6289ac6f32a884ba1266e641ce61620a239
2018-09-04 19:52:11 +09:00
439dc0928b uti: Streamline syscall_backward() 2018-09-04 19:52:11 +09:00
4b3e58fd3d uti: Call terminate only when exit_group is called
Tracer tells McKernel side to call do_exit() in WIFSIGNALED case.

Change-Id: If85c6cbb4856036b406b11335f1384e57f26292d
2018-09-04 19:52:11 +09:00
b7cdbd6c42 uti: Enforce mcexec is destroyed and then McKernel process is destroyed 2018-09-04 19:52:11 +09:00
77f5cac2bf uti: Make tracer exit when not used
Change-Id: I3d3b2f92fa2b160ffce633c46d1b60e9079e7f1b
2018-09-04 19:52:11 +09:00
9102b176c4 uti: Make per_proc_data of tracee survive over the signal-kill of the tracee
Change-Id: I8ff1dddb526ef2fd948cfe1b8f3aa8403c2006d6
2018-09-04 19:52:11 +09:00
bb4317beaf uti: futex: Propagate -ERESTARTSYS returned by wait_event_interruptible()
Change-Id: Id36c4df0e0a8e1f64b12c635c0502f63552ba50b
2018-09-04 19:52:11 +09:00
d24b7585b7 uti: Make tracee pthread-detached
Change-Id: I672ee18739b956980901b63e55ee3ebc192b4e56
2018-09-04 19:52:11 +09:00
4438f994dc uti: Add/Modify test programs
Change-Id: I27a39d6b11af5243f93d07c31c2ef80f6727dd53
2018-09-04 19:52:11 +09:00
52afbbbc98 uti: Call into McKernel futex()
(1) Masquerade clv
(2) Fix timeout
(3) Let mcexec thread with the same tid as McKernel thread migrating
    to Linux handles the migration request
(4) Call create_tracer() before creating proxy related objects

Change-Id: I6b2689b70db49827f10aa7d5a4c581aa81319b55
2018-09-04 19:52:10 +09:00
460917c4a0 remote_page_fault,syscall_backward: Zero-clear waitq entry
Change-Id: I151a35004183e911aaba766a8749830e1768bfe6
2018-09-04 19:52:10 +09:00
7803468afe remote_page_fault,syscall_backward: Retry when interrupted by signal
Change-Id: Ic7d72ad9ca32bb3c8e3522e00fef1d98caf3c049
2018-09-04 19:52:10 +09:00
8f2c7d2265 Fix thread-safety issue in rus_vm_fault
Change-Id: I8640a8e0de8a0dfaee700b25e5f9e2941ac98fc8
2018-09-04 19:52:10 +09:00
c6c3a84a46 syscall: Add missing definition of thread to access thread->sigpending 2018-09-04 19:52:10 +09:00
5a7ca14fcc rus_vm_fault: Return VM_FAULT_SIGBUS when per-process data is not found 2018-09-04 19:52:10 +09:00
d7b882855a Correct comments in declaration of struct ikc_scd_packet 2018-09-04 19:52:10 +09:00
2337832e4c pager_req_release(): Correct debug messages 2018-09-04 19:52:10 +09:00
be635ceb19 terminate: Fix coutning of non-leader threads
Change-Id: I8399ad553bb8e09bef508ac976e8cd56cdae8013
2018-09-04 19:51:11 +09:00
0b0b7b03d7 Prevent one CPU from getting chosen by concurrent forks
One CPU could be chosen by concurrent forks because CPU selection and
runq addition are not done atomicly. So this fix makes the two steps
atomic.

Change-Id: Ib6b75ad655789385d13207e0a47fa4717dec854a
2018-09-04 19:51:11 +09:00
82914c6a2e remote_page_fault: Retry when interrupted
Change-Id: Ib71a87ad03420e1918dc97da43351cb93e7d0754
2018-09-04 19:51:11 +09:00
f127dfdf1e mcexec_create_per_process_data: Zero ppd on allocation
Change-Id: I06306f30ce30ad6ddc6e8b8cab46ee39be0e4940
2018-09-04 19:51:11 +09:00
567dcd3846 Fix deadlock involving mmap_sem and memory_range_lock
Change-Id: I187246271163e708af6542c057d0a8dfde5b211e
Fujitsu: TEMP_FIX_1
Refs: #986
2018-09-04 19:51:10 +09:00
b080e0f301 spinlock: Add trylock
Change-Id: If349d7c0065609615f5df229f70c59f92bf97adf
2018-09-04 19:51:10 +09:00
ff383d96ba spinlock: rewrite spinlock to use Linux ticket head/tail format
This is a cherry-pick of 2964302d094f035242d6257d8af5450f72f9b5a7.

Change-Id: Ie8b7e825b28415dd41cc232fbeceb4653251f9e3
2018-09-04 19:51:10 +09:00
0bcd3d5de3 unimap: update ihk to unimap
Change-Id: I5b23270f9253d26031ad90bb38721a6234bd98e1
2018-09-04 19:51:10 +09:00
9d6e0319f7 atobytes(): restore postfix before return 2018-09-04 19:51:10 +09:00
0e50eb44a9 process/vm/access_ok: fix edge checks.
Add check for start/end being larger than the range we're checking.
Fix corner case where the access_check() was done on last vm range, and
we would be looking beyond last element (null deref)
2018-09-04 19:51:10 +09:00
2db69d0f24 process/vm: implement access_ok() 2018-09-04 19:51:10 +09:00
a697f5e98d partitioned execution: pass process rank to LWK
Cherry-pick of d2d134d5e6a4b16a34d55d31b14614a2a91ecf47

Conflicts:
	kernel/include/process.h
2018-09-04 19:51:10 +09:00
4439b04d9f ihk_mc_get_linux_kernel_pgt(): add declaration
Cherry-pick of caff967a442907dd75f8cd878b9f2ea7608c77b2
2018-09-04 19:51:10 +09:00
38c3b2358a Exclude areas not assigned to Mckernel from direct map of all phys. memory
It's enabled by adding -s to mcreboot.sh.

Cherry-pick of the following commit:

commit b5c13ce51a5a4926c2cf11c817cd0d369ac4402d
Author: Katsuya Horigome <katsuya.horigome.rj@ps.hitachi-solutions.com>
Date:   Mon Nov 20 09:40:41 2017 +0900

    Include measures to prevent memory destruction on Linux side (This is rebase commit for merging to development+hfi)
2018-09-04 19:51:10 +09:00
221ce34da2 eclair: fix MAP_KERNEL_START and apply Fujitsu's proposals
(1) Cherry-pick of 644afd8b45fc253ad7b90849e99aae354bac5b17
(2) Pass length to functions with arguments of variable length
    * POSTK_DEBUG_ARCH_DEP_38
(3) Separate architecture dependent functions/structures
    * POSTK_DEBUG_ARCH_DEP_34
(4) Fix include path
    * POSTK_DEBUG_ARCH_DEP_76
(5) Include config.h
    * POSTK_DEBUG_ARCH_DEP_33
2018-09-04 19:51:09 +09:00
4246d41007 kmalloc_header: use signed integer for target CPU id
Cherry-pick of bdb2d4d8fa94f9c0268cdfdb21af1a2a5c2bcae5
2018-09-04 19:51:09 +09:00
65df9c8084 ihk_mc_get_processor_id(): return -1 for non-McKernel CPUs
Cherry-pick of c45641e97add9fde467844d9272f2626cf4317de
2018-09-04 19:51:09 +09:00
7836aa0136 Map LWK TEXT to the end of Linux modules section (0xFFFFFFFFFE800000) 2018-09-04 19:51:09 +09:00
1cf7fad15a virt_to_phys(): fix debug messages
Cherry-pick of 46eb3b73dac75b28ead62476f017ad0f29ec4b0a
2018-09-04 19:51:09 +09:00
0076e1f5e0 mem: make McKernel kernel heap virtual addresses Linux compatible
Cherry-pick of e5334c646d2dc6fb11d419918d8139a0de583fde
2018-09-04 19:51:09 +09:00
cae6b9f154 move McKernel out of Linux kernel virtual 2018-09-04 19:51:09 +09:00
5fcbfa2eb5 page_fault_process_memory_range: Remove ihk_mc_map_virtual for CoW of device map
Device map with MAP_PRIVATE is copied when forking using copy_user_pte.
So the map isn't copied by those statements.

Futjitsu: POSTK_TEMP_FIX_14
Refs: #1039
Change-Id: I1a697ed2e003055d66a8eebd3e8d5e9e49d094ad
2018-08-30 02:21:42 +00:00
9a20cfaefb mem: Check if phys-mem is within the range of McKernel memory
Fujitsu: POSTK_DEBUG_TEMP_FIX_52
Refs: #1164
Change-Id: Idb9a6eac1d2e1df4c663c3171925c774421177fd
2018-08-30 02:18:37 +00:00
f57b0c5d4f wait: Delay wake-up parent within switch context
Fujitsu: POSTK_DEBUG_TEMP_FIX_41
Refs: #1006
Change-Id: Ia98e896505ad0f6549766604ade84550eee8bd2d
2018-08-30 02:13:51 +00:00
0fdeb254b3 switch context: Move to arch-dependent (arch_switch_context())
Fujitsu: POSTK_DEBUG_ARCH_DEP_22
Change-Id: I6faf8d9daa1e639350c2cd83db9bb27b9d37ba01
2018-08-30 02:13:34 +00:00
895a8c4099 procfs: Support multiple reads of e.g. /proc/*/maps
Refs: #1021
Change-Id: If36e1a0f3f41f0215868daf578e96775d96a59a3
2018-08-30 01:48:06 +00:00
e531ee626e mcctrl pager: handle pagers more properly
the pagers are all destroyed when linux thinks there is no process left,
but there is no synchronisation with mcexec on that and some new process
might have spawned and started using these pagers in the meantime,
leading to weird crashes because an invalid pager was used.

The reason we're cleaning up pagers when no process is left is that
mcctrl does not handle pager_req_release is the linux-side process got
killed or died before the mckernel one for some reason, so:
 - move pager_req_release to a new __do_in_kernel_irq_syscall() helper
 - have free_all_process_memory_range not set MF_HOST_RELEASED on the
memobj
 - just in case, clean up everything like before on mcctrl shutdown
instead of when no process is left.

Change-Id: I53b8b9b81b1e5b807593850af17b5ea5e8471174
Refs: #1154
2018-08-24 09:18:20 +09:00
94d093f058 fileobj_create: Suppress message on getting -ESRCH
-ESRCH from mcctrl doesn't mean an error but the file is not a regular
file and mcctrl wants McKernel to treat it as a device file.

Change-Id: Ie121f0e6a8b1f0a29c2f2cf193a51f4f52337809
2018-08-23 04:01:20 +00:00
9b8424523a mcctrl: remove rus page cache
Change-Id: Ieed7a2a0077ffde3fec8a64d2051e56a53924a42
2018-08-23 02:10:44 +00:00
ebc702624b devobj: fix object size (POSTK_DEBUG_TEMP_FIX_36)
Fujitsu: POSTK_DEBUG_TEMP_FIX_36
Change-Id: I5f020708f97b7468f19496b44c98e164d856598d
2018-08-22 07:26:50 +00:00
ea125cb58c checkpatch: remove warning on LINUX_KERNEL_VERSION and split strings
Change-Id: Ia22f3106208c6ddf46a767e142b8842373e9d6b5
2018-08-22 07:14:48 +00:00
689a799bb9 mcctrl prepare_image: return reserve_user_space error
Change-Id: I00556cb58b12acca888f9512c144a3ce3f5332b1
2018-08-22 07:14:40 +00:00
802b1ac14b ihk_os_getperfevent,setperfevent: Timeout IKC sent by mcctrl
Report timeout when McKernel doesn't respond to prevent the caller
from waiting forever.

Refs: #1167
Change-Id: I8bd87e43aafffdd0952198224e44195af4368883
2018-08-22 06:43:27 +00:00
affe3e9010 do_fork: Increase tid table size when allowing oversubscription
The size of tid table needs to be more than #CPUs when CPU oversubscription
is needed.

Note that the max number of simultaneous threads are the min of the
following two:
(1) Number of mcexec worker threads
(2) NR_TID defined in kernel/syscall.c

Change-Id: I425189da415e1d3a763ad62567950d001850cf0d
2018-08-22 06:42:13 +00:00
0b2169964a futex_wait_queue_me: Spin-sleep when timeout and idle_halt is specified
schedule_timeout() with idle_halt should use spin sleep because sleep
with timeout is not implemented.

Change-Id: Ia0bebcc10ddfb872bffeece7f13fb35a4791db18
2018-08-22 06:36:43 +00:00
f18d1f5383 __sched_wakeup_thread: Notify interrupt_exit() of re-schedule
Change-Id: I438eb168f818eb5649857e22bdc7e68a145872f7
2018-08-22 06:33:23 +00:00
ea35954613 linux side: replace vfs_read by kernel_read
vfs_read has been unexported in bd8df82be66 ("fs: unexport vfs_read and vfs_write")
in kernel 4.14.
kernel_read has always™ existed and is actually more appropriate: we can
remove the set_fs calls that are done in kernel_read.

The downside is that the function prototype also changed in 4.14 with
bdd1d2d3d251 ("fs: fix kernel_read prototype")...
(same with kernel_write e13ec939e96b ("fs: fix kernel_write prototype"))

Change-Id: I6f76a6387ae02b4d33bd62952d995a90b1952fc9
2018-08-22 06:27:12 +00:00
61a942acdc arm64 vdso/gettimeofday: add new includes for cpu_set_t and pte_t
Change-Id: I4035b179a173a6b29c34c73670d68a38d4dc5dc4
2018-08-22 06:17:56 +00:00
c4b4b7222e arm64: ihk_mc_perfctr_start/stop: fix prototype that was changed in x86
The functions now take a bitmask in argument since commit d7416c6f79
("perf_event: Specify counter by bit_mask on start/stop")...
Thanksfully the change also induced a type modification so it was easy
to notice.

(On the other hand I'm building with --disable-perf so why the hell is
that file compiled?!)

Change-Id: Ie16367cc94e81068b70e1b80142a6394de896c4f
2018-08-22 06:14:15 +00:00
21af0351d1 arm64 syscall.c needs uio.h for struct iovec
Change-Id: I9d070d0e148636be1d9ecec8ec4dfb72f93c4ed6
2018-08-22 06:08:27 +00:00
1e1c91962e mcctrl: add missing sched_param include for newer linux
struct sched_param is defined differently since headers changed in
linux ae7e81c07 ("sched/headers...")

Change-Id: I22af79bf3d9df69d09903b2830d99426309cf911
2018-08-22 06:04:35 +00:00
b1aa94d417 arm64 arch-perfctr.h: remove duplicate enums
Some enums were redefined in lib/include/mc_perf_event.h in commit
1284060 ("support PERF_TYPE_{HARDWARE|HWCACHE} in perf_event_open")

Change-Id: I1a98699955ca7fd6135b2a7dde72ed4df77b1974
2018-08-22 06:04:08 +00:00
a6a9bac5b7 Protect more code by #ifdef PERF_ENABLE
Change-Id: I20a67c56c4d7817fdb87cc6a2aa47d68fe3eae8d
2018-08-22 06:03:12 +00:00
240a23a21b arch-lock: tentative implementation of irqflags_can_interrupt for arm64
Change-Id: I814e02e757039cab8c142c0b774ad470154454c1
2018-08-22 06:02:06 +00:00
d5108dba80 arm64 eclair build: add missing explicit libs
Change-Id: I5b6f8825430c2d495da50d868a3f54fc0b354d84
2018-08-22 05:56:20 +00:00
20368dd317 syscall: move sync_child_event up a bit
The function was between two perf functions when perf functions don't
use it...
It seemed simpler to move the function than to add an extra ifdef

Use that occasion to fix style warnings, no actual code changes were
made.

Change-Id: Ie8b5fa7968a3d5e54a690d079874db54f5e6c8c9
2018-08-22 05:55:26 +00:00
b93e14f695 arm64 signal.h: add valid_signal() function
This function was added for x86 by commit 140f813d77 ("fix:
differences in behavior of sigaction between Linux and Mckernel")

The x86 and arm files are actually pretty close and could use
factoring...

Change-Id: Ia8820fd2f824d898610b384a3e137c96aadbc911
2018-08-22 05:54:31 +00:00
3e3f3c5590 mcoverlayfs: vfs_readdir -> iterate_dir compat for el7.5
Also enable mcoverlay for new kernel version / actually build it

Change-Id: I80bc043c65cf99c3b41a54a5666ea7652e6c2bbd
2018-08-09 04:30:24 +00:00
e8f8660b73 mcctrl: lookup unexported symbols at runtime
Instead of parsing System.map, use kallsyms_lookup_name() to
get unexported symbols addresses at module loading time.

This lets mckernel work with kaslr enabled (it gets enabled by
default from el7.5 onwards)

Change-Id: Ie4349fc1145ebce44f37f1f40c16f9d75584074d
2018-08-08 06:00:20 +00:00
794684985f mcctrl syscall: remove unused walk page debug function
This saves looking up one symbol for a debug function that is not
used anywhere

Change-Id: I6a3a480ce8067b4f6f0faf9aa837119ea46888ad
2018-08-08 05:57:46 +00:00
625607e6db mcctrl sysfs_files: cleanup vfs_readdir -> iterate_dir compat
Cleanup the fix suggested by Fujitsu a bit

Change-Id: I95165b834e32a01f43eb3b4fcaca039e4d04fe86
2018-08-08 05:41:04 +00:00
05afa8b6dd mcctrl sysfs_files: vfs_readdir -> iterate_dir compat
vfs_readdir got removed in recent kernels

Change-Id: Iac9a9954afefa0f6dbcdc2c94786cf747e21e1fe
Fujitsu: POSTK_DEBUG_TEMP_FIX_22
2018-08-08 05:39:07 +00:00
6cf89076dc mcctrl handle_mm_fault compat: add el7.5 support
Change-Id: I8c7738b70ca914e857be119b7720cdc22e61ae0e
2018-08-08 05:36:35 +00:00
29a658716b configure: Create config file for test programs
Change-Id: I3ec90fed348ff535b24c8116416c6b89636c532c
2018-08-02 02:29:19 +00:00
a7c9988aeb schedule: Don't reschedule immediately when wake up on migrate
Refs: #1027
Change-Id: Ibe563c45c42611170273f1e437566c20fbef68d3
2018-08-02 02:28:25 +00:00
d4fa953975 test: Add testcase for #1001
Refs: #1001
Change-Id: I3edd750108bd3f887af1f0afe3f2651f1243062b
2018-08-02 02:24:41 +00:00
786649d2a3 perf_event: Move changing monitoring-status into perf_stop
Change-Id: I84a13c2a825de24bfdada533c7049e8770a07061
2018-08-02 02:23:38 +00:00
d7416c6f79 perf_event: Specify counter by bit_mask on start/stop
Fujitsu: POSTK_DEBUG_TEMP_FIX_30
Refs: #1002
Change-Id: Iea51e9aef78927a5033e3a226d5efc6298da056a
2018-08-02 11:22:28 +09:00
cb1522ca92 perf_event: Handle fixed-pmc in arch-dep part
Fujitsu: POSTK_DEBUG_TEMP_FIX_31
Refs: #1003
Change-Id: I66c7d18b9137894cf5764464482e2ebd5ecb9d52
2018-08-02 02:14:04 +00:00
14660a10c3 Fix to procfs read returns EIO
Refs: #1152
Change-Id: I48b330953fd7674ba1a3ac35744f9f50a5712730
2018-08-02 01:48:51 +00:00
1387c9687b Add test cases for #765
Refs: #765
Change-Id: I50d70a15d5d5ce31227cacbed4eccd49b218713b
2018-08-02 01:42:46 +00:00
ec99adde4a Add test cases for #998 and #999
Refs: #998 #999
Change-Id: I86f8857594b2446c833c1e59d53b484ef022a9ee
2018-08-02 01:42:11 +00:00
c716e87c53 execve: Clear sigaltstack and fp_regs
Fujitsu: POSTK_DEBUG_TEMP_FIX_19
Refs: #976
Change-Id: I16895eab13eecbb47b7e6da961fae82ee5e570ee
2018-08-01 15:11:05 +09:00
d898f18293 mcexec: Do not close fd returned to mckernel side
Fixes: 9a79920ef9 ("Static analysis fixes")
Change-Id: I2b51d6e288e7bb2b0f4bff579fa237d575dcb026
Reported-by: Tomoki Shirasawa <tomoki.shirasawa.kk@hitachi-solutions.com>
2018-07-30 23:27:17 +00:00
bc0759e2dc arm64 arch-lock: add missing include for cpu_set
Probably only needed for recent system, see ihk's 3271b5e6 ("fix
compilation with recent glibc (cpu_set define change)")

The root of the problem really is that we rely on system headers for
mckernel that ought to be independent...

Change-Id: Ieb9a017e5a7697ad767087370ced7b615efc917e
2018-07-27 02:33:03 +00:00
1aa429d4f5 init_normal_area: fix warnings
- unused variable pt_phys
 - undeclared function set_pt_large_page (move definition lower)

Change-Id: I4625b70efe8e914160b17064078c42b86a461d3e
2018-07-27 02:32:23 +00:00
1543119139 mcctrl rus_vm_fault: tpe changed with kernel >= 4.11
vma is part of vmf and isn't needed, so type changed (see linux 11bac80
("mm, fs: reduce fault, [...] to take only vmf"))

Change-Id: I4c023e23c7e7416ad2df2dcc0698a0032e574e4c
2018-07-27 02:31:39 +00:00
0a0a78ac2e mcctrl: replace GFP_TEMPORARY by GFP_KERNEL
See linux's commit 0ee931c4 ("mm: treewide: remove GFP_TEMPORARY
allocation flag") for a long explanation, but basically that flag
"is just cargo cult" and should be removed

Change-Id: I2147cd65b6b9ec509a72e11cc3abf1fe1561c10b
2018-07-27 02:31:00 +00:00
6999d0a3f9 bind_mount_recursive: Use lstat instead of d_type of readdir
Change-Id: I0eb8d6c7e1fa5df6dbc5962a639901546a159d04
2018-07-26 18:38:48 +09:00
f01a883971 devobj: fix out of bounds shift
Similarily, pgoff << PAGE_SHIFT would need pgoff to be unsigned to fit,
but off_t is signed.
The reason for this shift was to truncate the offset argument to be
aligned to page boundaries, do that instead

Change-Id: I36c3de34b1834fdb0503942a6f3212e94986effd
2018-07-26 05:20:19 +00:00
3185334c1c debug messages: implement dynamic debug
Heavily inspired off linux kernel's dynamic debug:
 * add a /sys/kernel/debug/dynamic_debug/control file
 (accessible from linux side in /sys/class/mcos/mcos0/sys/kernel/debug/dynamic_debug/control)
 * read from file to list debug statements (currently limited to 4k in size)
 * write to file with '[file foo ][func bar ][line [x][-[y]]] [+-]p' to change values

Side effects:
 * reindented all linker scripts, there is a new __verbose section
 * added string function strpbrk

Change-Id: I36d7707274dcc3ecaf200075a31a2f0f76021059
2018-07-26 14:16:31 +09:00
bc887aab44 x86 futex: fix out of bounds shift
8 << 28 needs unsigned to fit, other shifts were done to truncate
the input, use a mask instead

Change-Id: I81ba41595f4629f1df554e34392116440ff3b641
2018-07-26 05:10:36 +00:00
6f7c428a34 terminate: fix oversubscribe hang when waiting for other threads on same CPU to die
Change-Id: I8c4fbdd3aab9d0567ce5457a4a6405490608925d
2018-07-26 05:02:13 +00:00
68c702d024 process_procfs_request: Add Pid to /proc/<PID>/status
The standard UNIX tool to get processes information, need to have the
process id inside /proc/<PID>/status.

Using ps without PID in /proc/<PID>/status gives :

  PID TTY          TIME CMD
 2551 pts/0    00:00:00 bash
    0 pts/0    00:00:00 exe
    0 pts/0    00:00:00 exe

With this patch:
  PID TTY          TIME CMD
 2551 pts/0    00:00:00 bash
11966 pts/0    00:00:00 exe
12619 pts/0    00:00:00 exe

Change-Id: Ic9d255cbef4d49e49bdaedcfc8e3545d9c144325
2018-07-26 05:00:21 +00:00
97273adcc5 x86_64 move_pages_smp_handler: rework initialisation
- add missing break statement
- remove duplicate memset for mpsr->status

Change-Id: I1fd1a8b2bb7bbabb32db9e7d3fc84102d9b0ff82
2018-07-26 04:59:23 +00:00
ad2cb6375a kprintf: only call eventfd() if it is safe to interrupt
Missing ARM64 implementation, cannot test right now

Change-Id: Ia05e8b7952b19bcd8fdac1f920d9bfe341be8b97
2018-07-26 04:57:30 +00:00
6df4bd8f8c Fix a few more warnings
Some are important, e.g. the seemingly harmless braces around if with dprintf,
since that dprintf is defined as empty, will screw things up and grab the next
line

Change-Id: Ie5e1cf813178ad708ff42ae5e477fbc96034471c
2018-07-26 04:52:17 +00:00
0994c3300e search_free_space: remove POSTK_DEBUG_ARCH_DEP_27 side
search_free_space changed since this was implemented and the code is
no longer compatible
Looking at it again, the function is not used anywhere other than syscall.c
and the second function does not seem to fix anything specific so this
just removes the untested side.

Change-Id: If28d35ec4da083a40dc6936fcb21f05fb64e378a
Fujitsu: POSTK_DEBUG_ARCH_DEP_27
2018-07-26 04:43:05 +00:00
a5c3e48843 search_free_space(): manage region->map_end internally
Change-Id: If9176773868c44fa1eb801c0815c35cea9f4b54b
2018-07-26 04:43:05 +00:00
df2c993721 fileobj_create: only allocate new object if one wasn't found
Change-Id: I5e12439333bf0c9cc7dad6e3cf410bfee616f77e
2018-07-26 04:41:03 +00:00
dc8d6b740c pager_req_read: handle short read
Change-Id: Iff89046041e012a65c80a29b485ddbb636435dd0
2018-07-26 04:37:54 +00:00
c2e1b8d694 mcctrl_ikc_send_wait: fix interrupt with do_frees == NULL
do_frees is allowed to be NULL only if free_addrs_count is 0, but that
is increased to account for the wakeup_desc itself before this failure

Change-Id: Iab33712c76ae452df7044558a12745a89adb47ac
2018-07-26 04:34:03 +00:00
f6d8138e05 mcexec_wait_syscall: requeue potential request on interrupted wait
Change-Id: Id7a324f18ebb8c81f05bd8362e19d9314a445308
2018-07-26 04:31:34 +00:00
9d587dcbe8 fileobj_release: do not notify linux of surplus refs
Surplus refs on the linux side will not change anything, so spare
ourselves a message.
The final message will free all refs at once when the object is
destroyed.

Change-Id: Ie086b9dda663729962037c67e8233370509234a5
2018-07-26 04:08:43 +00:00
eb675818c7 x86 mmap: fix out of bounds shift
0x3F << MAP_HUGE_SHIFT is too big to fit in signed int,
make it unsigned

Change-Id: I0e476b80ff51a8e141c90da6f985ba18a3438752
2018-07-26 03:50:44 +00:00
3ce7763715 x86 mem init: do not map identity mapping
init_normal_area was mapping identity lookups (phys = virt) from 0,
leading to many undetected null pointer dereferences in init_pt (but
not in new process page tables leading to odd behaviour)

This also makes the code use the set_pt_large_page() function, cleaning
it up a bit

Change-Id: I22889031de26a7e48501b0eb4d453ca62e671835
2018-07-26 03:50:44 +00:00
fd429ecc5b rusage_private: fix null pointer dereference
Change-Id: Id1f066699a41c249203073c5937e34012f5fe6c3
2018-07-26 03:50:44 +00:00
ed7f5abc28 schedule: fix null pointer dereferences
Change-Id: I1d4b0a2fabb5810a89cca4c6a0a837db3a9813ee
2018-07-26 03:50:44 +00:00
79e5026f01 x86 mem init: fix clearing of init_pt
memset(init_pt...) had the wrong size.

Change-Id: Idb5d0d53b3c70ee4a16a101dd265d0854cfd3b72
2018-07-26 03:50:31 +00:00
a1b50051ed mcexec: always compile debug statements
This helps catching errors like accessing a field that no longer exists
in a debug print that wasn't compiled...

Change-Id: If6c862ea2b866f819195aae93c7fd68e610fe48e
2018-07-26 03:38:00 +00:00
9a79920ef9 Static analysis fixes
Change-Id: I7bc42545a1c497f704d7bfa6ea1b7e3893acc697
2018-07-26 03:36:50 +00:00
141fa5120e git hooks: use correct directory for submodule
Change-Id: I7a39021dc02212065612b21cafcb6c653e2280f0
2018-07-26 03:29:43 +00:00
699cb4f88c arm64/arch-lock: typedef mcs_lock_t
Was done in x86_64 for fileobj in commit 249bda4aef ("fileobj: use
MCS locks for per-file page hash")

Change-Id: I61957de336b6657687803e6288afed9360a42032
2018-07-26 03:28:40 +00:00
bc3e6ded65 disable sse for everyone
GCC optimizes big switches with sse so we could clobber users floating
point registers when they would do a syscall

Reproducer:
```
 #include <stdio.h>
 #include <stdlib.h>

 union num {
 	float f;
 	unsigned long long i;
 };

 #define WORKSIZE (1024 * 1024 * 32)

 int main(int argc, char **argv) {
 	char *work = malloc(WORKSIZE);
 	char *fromaddr;
 	char sink;
 	union num r;
 	unsigned long long int offset;

 	r.f = drand48();
 	printf("r: %llx\n", (long long)r.i);
 	offset = (long long int)(r.f * (double)WORKSIZE);
 	fromaddr = work + offset;
 	printf("%e %llx %llx\n", r.f, offset, fromaddr);
 	sink = *fromaddr;

 	return 0;
 }
```

Change-Id: I7bb0883ec8ef2f245ab98064e308025422afc115
2018-07-26 03:26:25 +00:00
eae5c40f60 init_process_stack: Support "ulimit -s unlimited"
Refs: #1109
Change-Id: I395f012fd747cb6a2f93be71e34c7f6f3666ed67
2018-07-26 02:40:27 +00:00
0c7384f980 Add test cases for #840
Refs: #840
Change-Id: Ie29867d29ba6a25cfac77b95b8effc2f057aae14
2018-07-26 02:39:24 +00:00
67ebcca74d Fix to VMAP virtual address leak
Fujitsu: POSTK_DEBUG_TEMP_FIX_51
Refs: #1024
Change-Id: I1692ee4f004cb4d1f725baf47a8ed31fce1bf42a
2018-07-26 02:17:55 +00:00
3d365b0d7a add ihk as submodule
Change-Id: I512255a96d0d95795bd0d803289fffe4394eb7ec
2018-07-26 01:50:48 +00:00
94e96927a6 mremap: Do nothing when no size change and !MREMAP_FIXED
Behave in the same way as Linux which returns old_address when
old_size == new_size && !MREMAP_FIXED.

Refs: #1112
Change-Id: Ice1421a8a77f962d087de8475aa2cd40c59be5f7
2018-07-26 01:49:01 +00:00
3636c8e7e4 setrlimit: Check arguments in the same order as in Linux
(1) Check if rlim's address is valid
(2) Check if soft-limit does not exceed hard-limit

Fujitsu: POSTK_DEBUG_TEMP_FIX_3
Refs: #1050
Change-Id: I5bf1008ce172f9dff64ec89b1f97614926abaf13
2018-07-26 01:48:05 +00:00
b920da5103 execve: Use interp in shebang as is
Fujitsu: POSTK_DEBUG_TEMP_FIX_9
Refs: #995
Change-Id: I09751d13c4fecd68087d47815029c0b65e51f18a
2018-07-26 01:46:22 +00:00
f1a40a409f perf_event: Include list.h by itself
Fujitsu: POSTK_DEBUG_TEMP_FIX_32
Refs: #1004
Change-Id: I8670477cf498ac98df971f2c0288f335a989f675
2018-07-26 00:45:57 +00:00
4ce4c9f264 init_process: Inherit parent cpu_set
Fujitsu: POSTK_DEBUG_TEMP_FIX_69
Refs: #1028
Change-Id: I1628bb5bf35fa670bb0019e1f3ae295277b1566e
2018-07-26 00:44:41 +00:00
e770a22fa5 scripts: add checkpatch.pl & git hooks
Change-Id: I29e5f7a99e8dd92511c0b1d099f3e1a2f37d7a72
2018-07-12 00:55:58 +00:00
9bb8076dc0 shmget: Make shmobj underwent IPC_RMID invisible to shmget
Refs: #926
Change-Id: I16120623b581da5d5d484fd05d5111788c8ad5e2
2018-07-10 02:13:00 +00:00
229b041320 test: Add testcase for #1122
Refs: #1122
Change-Id: Ieafee7469d1397461abf05552ffad0bfea1dd6cd
2018-07-10 02:12:23 +00:00
e1f204de4a test: Add testcase for #1112
Refs: #1112
Change-Id: I0041366d8dcf035a09fbb59a5dbd5c94cae0d65e
2018-07-10 02:12:04 +00:00
c6cc0bf07a test: Add testcase for #1111
Refs: #1111
Change-Id: Ifdf25a9ce98ef495200daf1c24d7ac2c81b3ef17
2018-07-10 02:11:45 +00:00
04e54ead5d test: Add testcase for #1031
Refs: #1031
Change-Id: I6a51596b84a97329ba7d5b765c8471246dcf85df
2018-07-10 02:11:13 +00:00
992705d465 pager_get_path: Append \0 to path
Change-Id: Iaabd89a649bb20b37b35cd345da0f468fd5dd0b5
2018-07-10 02:10:19 +00:00
ae09d979b6 Add testcases for #1141
Refs: #1141
Change-Id: I50d1ac6248e9dfc33c372b825c10cf0bd8b61d3e
2018-07-10 02:09:38 +00:00
14d819eff4 configure.ac: Update version number
Change-Id: Ia497306551aa103d80eb5a307ca7196940ea7e14
2018-07-06 18:28:26 +09:00
1cbe389879 do_fork: Propagate error code returned by mcexec
Refs: #731
Change-Id: I7eb52c1c76103d65d108b18b7beaf8041b51cd03
2018-07-03 09:19:54 +00:00
0758f6254e headers: declare void arguments for functions
Not giving any argument means that any argument is OK,
this is not what is meant here.

Change-Id: Ide651c1dec973d4b8709cf00646988f4c4f3acdd
2018-07-03 09:18:25 +00:00
db732a245c execve: Reinitialize vm_regions's map area on execve
Reinitialize vm->region.map_end in sys_execve()
in the same way as when creating a new process.

Change-Id: I7fc048a187e619ba4b5a578976e2a6774d13a6a7
2018-07-03 08:58:50 +00:00
08f2840f7d procfs: Show file names in /proc/<PID>/maps
Refs: #1065
Change-Id: I2f1603b02d12e60972c8f2e5f059d0025f4ceaea
2018-07-03 08:56:44 +00:00
521bdc6181 mremap: Fix type of size arguments (from ssize_t to size_t)
Refs: #1112
Change-Id: I3987d3a20a1e7c4b60f3880e91a670bc0bdc240f
2018-07-03 08:54:14 +00:00
e7b6a3472b sched_getaffinity: Check arguments in the same order as in Linux
(1) Check if size is large enough
(2) Check if size is positive

Fujitsu: POSTK_DEBUG_TEMP_FIX_5
Refs: #1121
Change-Id: I3e41720c89ef89294820f7f4fa8df1a69a7011b0
2018-07-03 08:53:30 +00:00
11756d96ef mmap, mremap: Check arguments in the same order as in Linux
Refs: #1137
Change-Id: I4fd2ac83b013a2741a3facce4dd7e0c37b14fd25
2018-07-03 08:41:30 +00:00
f185be06eb mcoverlay-create.sh, mcoverlay-destroy.sh: Return -EINVAL on failure
Change-Id: I0561df33e8068327bf2d921c8facac7b18ac8866
2018-07-03 05:19:55 +00:00
854bc85602 mcctrl: convert send_signal to mcctrl_ihk_send_wait
Change-Id: Ibd2fc834444d83341a96579f0c9c22080a53e8fa
2018-07-02 16:11:01 +09:00
ab8fe0bbbf mcctrl: convert perf ctrl ioctls to mcctrl_ihk_send_wait
While we are here, also optimize code a bit: perf_desc does not need
to be allocated for every cpu; and fix coding style.

Change-Id: Iad19fed08205d38594fd3f1b7ddf2b19a9cf0d9d
2018-07-02 16:11:01 +09:00
b87c06cbcb mcctrl_ikc_send_wait: give possibility to use pre-allocated desc
Change-Id: I1afbabe792648bbf2c5a9a38ebbfba8ea9060d06
2018-07-02 16:11:01 +09:00
b939ca9370 mcctrl: refactor prepare_image into new generic ikc send&wait
Many ikc messages expecting a reply use wait_event_interruptible
incorrectly, freeing memory that could still be used on the other side.

This commit implements a generic ikc send and wait helper that helps
with memory management and ownership properly:
 - if the message succeeds and a reply comes back normally, the memory
is freed by the caller as usual
 - if the wait fails (signal before the reply comes or timeout) then the
memory is set as owner by ikc and will be free when the reply comes back
later
 - if the reply never comes, the memory is freed at shutdown when
destroying ikc channels

Refs: #1076
Change-Id: I7f348d9029a6ad56ba9a50c836105ec39fa14943
2018-07-02 04:34:44 +00:00
ec202a1ca9 execve: fix execve with oversubscribing
Issue: #1072
Change-Id: I88446e075b60de3c94cad2a19a4731e58037ea63
2018-07-02 13:31:23 +09:00
d4471df94e execve: use thread variable instead of cpu_local_var(current)
This fixes crashes _without_ oversubscribing with a process doing
fork() execve() / wait() in a loop

Issue: #1132
Change-Id: I98531f4643ad6b6a8f750a1a3f05b9ff3ebfd50f
2018-07-02 04:28:23 +00:00
a6ac4acf40 rusage: Fix initialization of rusage->num_processors
Refs: #1064
Change-Id: I4c04127a766b9c71f726113b8b7d6416ff971bff
2018-06-28 11:24:47 +09:00
8ff754c466 test: delete garbage files 2018-06-21 13:50:40 +09:00
90dba00742 fix return value of sched_getaffinity (POSTK_DEBUG_TEMP_FIX_58) refs#1122
Change-Id: I3d7b9b74eec268dd49b703600ca56df1d2933bd9
2018-06-21 09:15:22 +09:00
86ae1380e4 configure.ac: Move man directory to share/man
Change-Id: Idaa5c0f61fbbe3bda4697bc59487f562e09ff2d6
2018-06-11 13:13:13 +09:00
9bb48186e6 add testcases for #732 #1065 #1102 2018-06-07 10:11:23 +09:00
139123dc12 move test programs 2018-06-07 10:08:48 +09:00
6602cf442c add test cases 2018-06-07 10:04:33 +09:00
f148863586 pager_req_map(): do not take mmap_sem if not needed 2018-06-07 07:17:41 +09:00
ec375da27a pager_req_create(): prefetch libiomp, libpthread and libc 2018-06-07 07:17:31 +09:00
c50e7c1029 prepare_process_ranges_args_envs(): fix saving cmdline 2018-06-07 07:17:21 +09:00
5f4dbb2c71 mprotect: Fix early exit condition on page table attribute 2018-06-06 01:39:44 +09:00
328609269b Clean up "Detect hang of McKernel in mcexec"
* Clean up error checks
2018-06-01 14:51:07 +09:00
056fdb2633 Fix "Detect hang of McKernel in mcexec"
1. Call exit() when detecting hang
2. Clean up error checks
2018-06-01 14:21:19 +09:00
09d0a59e22 Detect hang of McKernel in mcexec
mcexec spawns a thread which detects hang of McKernel by using
ihk_os_get_eventfd().

Change-Id: I6cf0ee0c1f0c2c31a8422224b2105f64a9b9ab93
2018-06-01 10:44:34 +09:00
511555c8cb fix: /proc/<PID>/maps outputs a unnecessary NULL character 2018-05-30 16:38:28 +09:00
81699345cc mprotect: do not set page table writable for cow pages
Change-Id: If8b0bb56e7dae59aa9dc3d745a4cc4e43bf4bf9a
2018-05-30 13:29:55 +09:00
130751ff66 fileobj: avoid memory leak in path recording 2018-05-14 17:46:52 +09:00
f3d18eb9de fileobj/devobj: record path name (originally by Takagi-san) 2018-05-14 17:46:52 +09:00
249bda4aef fileobj: use MCS locks for per-file page hash 2018-05-14 17:46:52 +09:00
aaa246f86f mcexec: change debug printf macros to be more tolerant to trivial format
Enabling DEBUG fails to compile. It'd be easy to fix the dprintf to dprint
but this is just as generic and we can now use dprintf everywhere
2018-05-11 09:23:46 +09:00
c52f7a5b49 syscall wait4: add _WALL (POSTK_DEBUG_ARCH_DEP_44)
Needed by strace -f
2018-05-11 09:22:54 +09:00
90a34f54c9 mcreboot.sh,mcstop+release.sh: Disable irqbalance_mck forcefully 2018-04-26 15:06:53 +09:00
bfb5080b71 pager_req_unmap: Put per-process data at exit 2018-04-10 11:35:03 +09:00
535 changed files with 47968 additions and 7468 deletions

1
.gitignore vendored
View File

@ -1,3 +1,4 @@
*~
*.o
*.elf
*.bin

3
.gitmodules vendored Normal file
View File

@ -0,0 +1,3 @@
[submodule "ihk"]
path = ihk
url = https://github.com/RIKEN-SysSoft/ihk.git

View File

@ -30,6 +30,7 @@
#include <debug-monitors.h>
#include <sysreg.h>
#include <cpufeature.h>
#include <debug.h>
#ifdef POSTK_DEBUG_ARCH_DEP_65
#include <hwcap.h>
#endif /* POSTK_DEBUG_ARCH_DEP_65 */
@ -39,16 +40,10 @@
#include "postk_print_sysreg.c"
#ifdef DEBUG_PRINT_CPU
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
struct cpuinfo_arm64 cpuinfo_data[NR_CPUS]; /* index is logical cpuid */
static unsigned int per_cpu_timer_val[NR_CPUS] = { 0 };
@ -1283,7 +1278,6 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
/*
* @ref.impl linux-linaro/arch/arm64/kernel/process.c::tls_thread_switch()
*/
@ -1309,14 +1303,13 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *last;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
struct mcs_rwlock_node_irqsave lock;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
/* Set up new TLS.. */
dkprintf("[%d] arch_switch_context: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
#ifdef ENABLE_PERF
/* Performance monitoring inherit */
if(next->proc->monitoring_event) {
if(next->proc->perf_status == PP_RESET)
@ -1326,10 +1319,10 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
perf_start(next->proc->monitoring_event);
}
}
#endif /*ENABLE_PERF*/
if (likely(prev)) {
tls_thread_switch(prev, next);
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
@ -1343,11 +1336,12 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
}
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
@ -1357,7 +1351,6 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
return last;
}
#endif /* POSTK_DEBUG_ARCH_DEP_22 */
/*@
@ requires \valid(thread);
@ -1439,8 +1432,7 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
void
clear_fp_regs(struct thread *thread)
void clear_fp_regs(void)
{
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
#ifdef CONFIG_ARM64_SVE
@ -1477,7 +1469,7 @@ restore_fp_regs(struct thread *thread)
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs(thread);
clear_fp_regs();
return;
}
thread_fpsimd_load(thread);

View File

@ -9,20 +9,16 @@
#include <prctl.h>
#include <cpufeature.h>
#include <kmalloc.h>
#include <debug.h>
#include <process.h>
//#define DEBUG_PRINT_FPSIMD
#ifdef DEBUG_PRINT_FPSIMD
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
#ifdef CONFIG_ARM64_SVE
/* Maximum supported vector length across all CPUs (initially poisoned) */
@ -73,9 +69,6 @@ static int get_nr_threads(struct process *proc)
return nr_threads;
}
extern void save_fp_regs(struct thread *thread);
extern void clear_fp_regs(struct thread *thread);
extern void restore_fp_regs(struct thread *thread);
/* @ref.impl arch/arm64/kernel/fpsimd.c::sve_set_vector_length */
int sve_set_vector_length(struct thread *thread,
unsigned long vl, unsigned long flags)
@ -129,7 +122,7 @@ int sve_set_vector_length(struct thread *thread,
/* for self at prctl syscall */
if (thread == cpu_local_var(current)) {
save_fp_regs(thread);
clear_fp_regs(thread);
clear_fp_regs();
thread_sve_to_fpsimd(thread, &fp_regs);
sve_free(thread);

View File

@ -7,6 +7,7 @@
#include <process.h>
#include <string.h>
#include <elfcore.h>
#include <debug.h>
#define align32(x) ((((x) + 3) / 4) * 4)
#define alignpage(x) ((((x) + (PAGE_SIZE) - 1) / (PAGE_SIZE)) * (PAGE_SIZE))
@ -14,11 +15,8 @@
//#define DEBUG_PRINT_GENCORE
#ifdef DEBUG_PRINT_GENCORE
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
/*

View File

@ -6,6 +6,8 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include "affinity.h"
#include <lwk/compiler.h>
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@ -152,6 +154,8 @@ typedef struct mcs_lock_node {
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_lock_node_t;
typedef mcs_lock_node_t mcs_lock_t;
static void mcs_lock_init(struct mcs_lock_node *node)
{
node->locked = 0;
@ -602,4 +606,16 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
static inline int irqflags_can_interrupt(unsigned long flags)
{
#ifdef CONFIG_HAS_NMI
#warning irqflags_can_interrupt needs testing/fixing on such a target
return flags > ICC_PMR_EL1_MASKED;
#else
// PSTATE.DAIF I bit clear means interrupt is possible
return !(flags & (1 << 7));
#endif
}
#endif /* !__HEADER_ARM64_COMMON_ARCH_LOCK_H */

View File

@ -35,38 +35,4 @@ void arm64_disable_pmu(void);
int armv8pmu_init(struct arm_pmu* cpu_pmu);
/* TODO[PMU]: 共通部に定義があっても良い。今後の動向を見てここの定義を削除する */
/*
* Generalized hardware cache events:
*
* { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
* { read, write, prefetch } x
* { accesses, misses }
*/
enum perf_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,
PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum perf_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,
PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum perf_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,
PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
#endif

View File

@ -9,6 +9,11 @@
#define _NSIG_BPW 64
#define _NSIG_WORDS (_NSIG / _NSIG_BPW)
static inline int valid_signal(unsigned long sig)
{
return sig <= _NSIG ? 1 : 0;
}
typedef unsigned long int __sigset_t;
#define __sigmask(sig) (((__sigset_t) 1) << ((sig) - 1))

View File

@ -114,14 +114,18 @@ SYSCALL_HANDLED(236, get_mempolicy)
SYSCALL_HANDLED(237, set_mempolicy)
SYSCALL_HANDLED(238, migrate_pages)
SYSCALL_HANDLED(239, move_pages)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(241, perf_event_open)
#endif // PERF_ENABLE
SYSCALL_HANDLED(260, wait4)
SYSCALL_HANDLED(270, process_vm_readv)
SYSCALL_HANDLED(271, process_vm_writev)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(601, pmc_init)
SYSCALL_HANDLED(602, pmc_start)
SYSCALL_HANDLED(603, pmc_stop)
SYSCALL_HANDLED(604, pmc_reset)
#endif // PERF_ENABLE
SYSCALL_HANDLED(700, get_cpu_id)
#ifdef PROFILE_ENABLE
SYSCALL_HANDLED(__NR_profile, profile)

View File

@ -7,15 +7,13 @@
#include <arch/cpu.h>
#include <memory.h>
#include <syscall.h>
#include <debug.h>
// #define DEBUG_GICV2
#ifdef DEBUG_GICV2
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
void *dist_base;

View File

@ -7,17 +7,15 @@
#include <cputype.h>
#include <process.h>
#include <syscall.h>
#include <debug.h>
//#define DEBUG_GICV3
#define USE_CAVIUM_THUNDER_X
#ifdef DEBUG_GICV3
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#ifdef USE_CAVIUM_THUNDER_X

View File

@ -14,9 +14,7 @@
#include <context.h>
#include <kmalloc.h>
#include <vdso.h>
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#include <debug.h>
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
@ -2924,17 +2922,12 @@ int read_process_vm(struct process_vm *vm, void *kdst, const void *usrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, to: %p, pa: %p,"
"cpsize: %d\n", __FUNCTION__, to, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(to, va, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -3007,17 +3000,12 @@ int write_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_WRITABLE|PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -3078,17 +3066,12 @@ int patch_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_WRITABLE|PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);

View File

@ -93,21 +93,50 @@ int ihk_mc_perfctr_init(int counter, uint64_t config, int mode)
return ret;
}
int ihk_mc_perfctr_start(int counter)
int ihk_mc_perfctr_start(unsigned long counter_mask)
{
int ret;
ret = cpu_pmu.enable_counter(counter);
return ret;
int ret = 0;
int counter;
unsigned long counter_bit;
for (counter = 0, counter_bit = 1;
counter_bit < counter_mask;
counter++, counter_bit <<= 1) {
if (!(counter_mask & counter_bit))
continue;
ret = cpu_pmu.enable_counter(counter_mask);
if (ret < 0)
break;
}
return ret < 0 ? ret : 0;
}
int ihk_mc_perfctr_stop(int counter)
int ihk_mc_perfctr_stop(unsigned long counter_mask)
{
cpu_pmu.disable_counter(counter);
int ret = 0;
int counter;
unsigned long counter_bit;
// ihk_mc_perfctr_startが呼ばれるときには、
// init系関数が呼ばれるのでdisableにする。
cpu_pmu.disable_intens(counter);
return 0;
for (counter = 0, counter_bit = 1;
counter_bit < counter_mask;
counter++, counter_bit <<= 1) {
if (!(counter_mask & counter_bit))
continue;
ret = cpu_pmu.disable_counter(counter);
if (ret < 0)
break;
// ihk_mc_perfctr_startが呼ばれるときには、
// init系関数が呼ばれるのでdisableにする。
ret = cpu_pmu.disable_intens(counter);
if (ret < 0)
break;
}
return ret < 0 ? ret : 0;
}
int ihk_mc_perfctr_reset(int counter)

View File

@ -4,16 +4,14 @@
#include <ihk/perfctr.h>
#include <errno.h>
#include <ihk/debug.h>
#include <debug.h>
#define BIT(nr) (1UL << (nr))
//#define DEBUG_PRINT_PMU
#ifdef DEBUG_PRINT_PMU
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif

View File

@ -21,15 +21,13 @@
#include <ihk/debug.h>
#include <compiler.h>
#include <lwk/compiler.h>
#include <debug.h>
//#define DEBUG_PRINT_PSCI
#ifdef DEBUG_PRINT_PSCI
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1

View File

@ -11,22 +11,17 @@
#include <hwcap.h>
#include <string.h>
#include <thread_info.h>
#include <debug.h>
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#define dkprintf kprintf
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
extern void save_debugreg(unsigned long *debugreg);
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern int interrupt_from_user(void *);
@ -959,11 +954,7 @@ void ptrace_report_signal(struct thread *thread, int sig)
}
thread->exit_status = sig;
/* Transition thread state */
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_TRACED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_TRACED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_TRACED;
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
@ -982,10 +973,6 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */

View File

@ -14,6 +14,8 @@
#include <prctl.h>
#include <limits.h>
#include <syscall.h>
#include <uio.h>
#include <debug.h>
extern void ptrace_report_signal(struct thread *thread, int sig);
extern void clear_single_step(struct thread *thread);
@ -27,18 +29,12 @@ static void __check_signal(unsigned long rc, void *regs, int num, int irq_disabl
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#define dkprintf kprintf
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
uintptr_t debug_constants[] = {
sizeof(struct cpu_local_var),
offsetof(struct cpu_local_var, current),
@ -59,7 +55,7 @@ static int cpuid_head = 1;
extern int num_processors;
int obtain_clone_cpuid(cpu_set_t *cpu_set) {
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last) {
int min_queue_len = -1;
int i, min_cpu = -1;
@ -1177,19 +1173,10 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
/* Reap and set new signal_flags */
proc->signal_flags = SIGNAL_STOP_STOPPED;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_STOPPED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_STOPPED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
dkprintf("do_signal(): pid: %d, tid: %d SIGSTOP, sleeping\n",
proc->pid, thread->tid);
/* Sleep */
@ -1206,19 +1193,10 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
thread->exit_status = SIGTRAP;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_TRACED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_TRACED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->status = PS_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&thread->proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
/* Sleep */
dkprintf("do_signal,SIGTRAP,sleeping\n");
@ -1594,7 +1572,7 @@ done:
return 0;
}
if (tthread->thread_offloaded) {
if (tthread->uti_state == UTI_STATE_RUNNING_IN_LINUX) {
interrupt_syscall(tthread, sig);
release_thread(tthread);
return 0;
@ -1729,7 +1707,7 @@ SYSCALL_DECLARE(mmap)
| MAP_NONBLOCK // 0x10000
;
const intptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const uintptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const size_t len0 = ihk_mc_syscall_arg1(ctx);
const int prot = ihk_mc_syscall_arg2(ctx);
const int flags0 = ihk_mc_syscall_arg3(ctx);
@ -1738,7 +1716,7 @@ SYSCALL_DECLARE(mmap)
struct thread *thread = cpu_local_var(current);
struct vm_regions *region = &thread->vm->region;
int error;
intptr_t addr = 0;
uintptr_t addr = 0;
size_t len;
int flags = flags0;
size_t pgsize;
@ -1801,8 +1779,9 @@ SYSCALL_DECLARE(mmap)
goto out;
}
if ((flags & MAP_FIXED) && ((addr < region->user_start)
|| (region->user_end <= addr))) {
if (addr < region->user_start
|| region->user_end <= addr
|| len > (region->user_end - region->user_start)) {
ekprintf("sys_mmap(%lx,%lx,%x,%x,%x,%lx):ENOMEM\n",
addr0, len0, prot, flags0, fd, off0);
error = -ENOMEM;

View File

@ -14,15 +14,13 @@
#include <ihk/debug.h>
#include <ikc/queue.h>
#include <vdso.h>
#include <debug.h>
//#define DEBUG_PRINT_VDSO
#ifdef DEBUG_PRINT_VDSO
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52

View File

@ -1,5 +1,7 @@
/* gettimeofday.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <affinity.h>
#include <arch-memory.h>
#include <time.h>
#include <syscall.h>
#include <registers.h>

View File

@ -9,29 +9,29 @@ PHDRS
SECTIONS
{
. = SIZEOF_HEADERS;
. = ALIGN(4096);
. = ALIGN(4096);
.text : {
*(.text)
*(.text)
} :text
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
. = ALIGN(8);
.bss : {
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}
}

View File

@ -31,6 +31,7 @@
#include <prctl.h>
#include <page.h>
#include <kmalloc.h>
#include <debug.h>
#define LAPIC_ID 0x020
#define LAPIC_TIMER 0x320
@ -69,11 +70,8 @@
//#define DEBUG_PRINT_CPU
#ifdef DEBUG_PRINT_CPU
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
static void *lapic_vp;
@ -96,6 +94,8 @@ int gettime_local_support = 0;
extern int ihk_mc_pt_print_pte(struct page_table *pt, void *virt);
extern int kprintf(const char *format, ...);
extern int interrupt_from_user(void *);
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
static struct idt_entry{
uint32_t desc[4];
@ -847,9 +847,6 @@ void setup_x86_ap(void (*next_func)(void))
}
void arch_show_interrupt_context(const void *reg);
void set_signal(int sig, void *regs, struct siginfo *info);
void check_signal(unsigned long, void *, int);
void check_sig_pending();
extern void tlb_flush_handler(int vector);
void __show_stack(uintptr_t *sp) {
@ -877,7 +874,7 @@ void interrupt_exit(struct x86_user_context *regs)
cpu_enable_interrupt();
check_sig_pending();
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
else {
check_sig_pending();
@ -1010,6 +1007,12 @@ void handle_interrupt(int vector, struct x86_user_context *regs)
set_cputime(interrupt_from_user(regs)? 0: 1);
--v->in_interrupt;
/* for migration by IPI */
if (v->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
}
}
void gpe_handler(struct x86_user_context *regs)
@ -1644,12 +1647,10 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *arch_switch_context(struct thread *prev, struct thread *next)
{
struct thread *last;
struct mcs_rwlock_node_irqsave lock;
dkprintf("[%d] schedule: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
@ -1668,7 +1669,7 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
#ifdef PROFILE_ENABLE
if (prev->profile && prev->profile_start_ts != 0) {
if (prev && prev->profile && prev->profile_start_ts != 0) {
prev->profile_elapsed_ts +=
(rdtsc() - prev->profile_start_ts);
prev->profile_start_ts = 0;
@ -1680,6 +1681,28 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
#endif
if (prev) {
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
case PS_DELAY_STOPPED:
prev->proc->status = PS_STOPPED;
break;
case PS_DELAY_TRACED:
prev->proc->status = PS_TRACED;
break;
default:
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
}
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
else {
@ -1687,7 +1710,6 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
return last;
}
#endif
/*@
@ requires \valid(thread);
@ -1762,14 +1784,6 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
#ifdef POSTK_DEBUG_TEMP_FIX_19
void
clear_fp_regs(struct thread *thread)
{
return;
}
#endif /* POSTK_DEBUG_TEMP_FIX_19 */
/*@
@ requires \valid(thread);
@ assigns thread->fp_regs;
@ -1777,8 +1791,11 @@ clear_fp_regs(struct thread *thread)
void
restore_fp_regs(struct thread *thread)
{
if (!thread->fp_regs)
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs();
return;
}
if (xsave_available) {
unsigned int low, high;
@ -1797,6 +1814,13 @@ restore_fp_regs(struct thread *thread)
//release_fp_regs(thread);
}
void clear_fp_regs(void)
{
struct cpu_local_var *v = get_this_cpu_local_var();
restore_fp_regs(&v->idle);
}
ihk_mc_user_context_t *lookup_user_context(struct thread *thread)
{
ihk_mc_user_context_t *uctx = thread->uctx;

View File

@ -6,6 +6,7 @@
#include <process.h>
#include <string.h>
#include <elfcore.h>
#include <debug.h>
#define align32(x) ((((x) + 3) / 4) * 4)
#define alignpage(x) ((((x) + (PAGE_SIZE) - 1) / (PAGE_SIZE)) * (PAGE_SIZE))
@ -13,13 +14,16 @@
//#define DEBUG_PRINT_GENCORE
#ifdef DEBUG_PRINT_GENCORE
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
/* Exclude reserved (mckernel's internal use), device file,
* hole created by mprotect
*/
#define GENCORE_RANGE_IS_INACCESSIBLE(range) \
((range->flag & (VR_RESERVED | VR_MEMTYPE_UC | VR_DONTDUMP)))
/*
* Generate a core file image, which consists of many chunks.
* Returns an allocated table, an etnry of which is a pair of the address
@ -309,12 +313,10 @@ int gencore(struct thread *thread, void *regs,
dkprintf("start:%lx end:%lx flag:%lx objoff:%lx\n",
range->start, range->end, range->flag, range->objoff);
/* We omit reserved areas because they are only for
mckernel's internal use. */
if (range->flag & VR_RESERVED)
continue;
if (range->flag & VR_DONTDUMP)
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
continue;
}
/* We need a chunk for each page for a demand paging area.
This can be optimized for spacial complexity but we would
lose simplicity instead. */
@ -403,8 +405,9 @@ int gencore(struct thread *thread, void *regs,
unsigned long flag = range->flag;
unsigned long size = range->end - range->start;
if (range->flag & VR_RESERVED)
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
continue;
}
ph[i].p_type = PT_LOAD;
ph[i].p_flags = ((flag & VR_PROT_READ) ? PF_R : 0)
@ -446,8 +449,9 @@ int gencore(struct thread *thread, void *regs,
unsigned long phys;
if (range->flag & VR_RESERVED)
if (GENCORE_RANGE_IS_INACCESSIBLE(range)) {
continue;
}
if (range->flag & VR_DEMAND_PAGING) {
/* Just an ad hoc kluge. */
unsigned long p, start, phys;

View File

@ -64,12 +64,13 @@ static inline int futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval,
return oldval;
}
static inline int futex_atomic_op_inuser(int encoded_op, int __user *uaddr)
static inline int futex_atomic_op_inuser(int encoded_op,
int __user *uaddr)
{
int op = (encoded_op >> 28) & 7;
int cmp = (encoded_op >> 24) & 15;
int oparg = (encoded_op << 8) >> 20;
int cmparg = (encoded_op << 20) >> 20;
int oparg = (encoded_op & 0x00fff000) >> 12;
int cmparg = encoded_op & 0xfff;
int oldval = 0, ret, tem;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))

View File

@ -6,6 +6,7 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include <lwk/compiler.h>
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@ -14,7 +15,17 @@
int __kprintf(const char *format, ...);
#endif
typedef int ihk_spinlock_t;
typedef unsigned short __ticket_t;
typedef unsigned int __ticketpair_t;
typedef struct ihk_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} ihk_spinlock_t;
extern void preempt_enable(void);
extern void preempt_disable(void);
@ -23,9 +34,61 @@ extern void preempt_disable(void);
static void ihk_mc_spinlock_init(ihk_spinlock_t *lock)
{
*lock = 0;
lock->head_tail = 0;
}
#define SPIN_LOCK_UNLOCKED { .head_tail = 0 }
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock_noirq(l) { int rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock_noirq(l); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock_noirq\n", ihk_mc_get_processor_id()); rc; \
}
#else
#define ihk_mc_spinlock_trylock_noirq __ihk_mc_spinlock_trylock_noirq
#endif
static int __ihk_mc_spinlock_trylock_noirq(ihk_spinlock_t *lock)
{
ihk_spinlock_t cur = { .head_tail = lock->head_tail };
ihk_spinlock_t next = { .tickets.head = cur.tickets.head, .tickets.tail = cur.tickets.tail + 2 };
int success;
if (cur.tickets.head != cur.tickets.tail) {
return 0;
}
preempt_disable();
/* Use the same increment amount as other functions! */
success = __sync_bool_compare_and_swap((__ticketpair_t*)lock, cur.head_tail, next.head_tail);
if (!success) {
preempt_enable();
}
return success;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock(l, result) ({ unsigned long rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock(l, result); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock\n", ihk_mc_get_processor_id()); rc;\
})
#else
#define ihk_mc_spinlock_trylock __ihk_mc_spinlock_trylock
#endif
static unsigned long __ihk_mc_spinlock_trylock(ihk_spinlock_t *lock, int *result)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
*result = __ihk_mc_spinlock_trylock_noirq(lock);
return flags;
}
#define SPIN_LOCK_UNLOCKED 0
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_lock_noirq(l) { \
@ -39,40 +102,24 @@ __kprintf("[%d] ret ihk_mc_spinlock_lock_noirq\n", ihk_mc_get_processor_id()); \
static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
{
int inc = 0x00010000;
int tmp;
#if 0
asm volatile("lock ; xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
"shrl $16, %0\n\t"
"1:\t"
"cmpl %0, %2\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movzwl %1, %2\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (*lock), "=r" (tmp) : : "memory", "cc");
#endif
register struct __raw_tickets inc = { .tail = 0x0002 };
preempt_disable();
asm volatile("lock; xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
"shrl $16, %0\n\t"
"1:\t"
"cmpl %0, %2\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movzwl %1, %2\n\t"
/* don't need lfence here, because loads are in-order */
"jmp 1b\n"
"2:"
: "+r" (inc), "+m" (*lock), "=&r" (tmp)
:
: "memory", "cc");
asm volatile ("lock xaddl %0, %1\n"
: "+r" (inc), "+m" (*(lock)) : : "memory", "cc");
if (inc.head == inc.tail)
goto out;
for (;;) {
if (*((volatile __ticket_t *)&lock->tickets.head) == inc.tail)
goto out;
cpu_pause();
}
out:
barrier(); /* make sure nothing creeps before the lock is taken */
}
#ifdef DEBUG_SPINLOCK
@ -106,8 +153,11 @@ __kprintf("[%d] ret ihk_mc_spinlock_unlock_noirq\n", ihk_mc_get_processor_id());
#endif
static void __ihk_mc_spinlock_unlock_noirq(ihk_spinlock_t *lock)
{
asm volatile ("lock incw %0" : "+m"(*lock) : : "memory", "cc");
__ticket_t inc = 0x0002;
asm volatile ("lock addw %1, %0\n"
: "+m" (lock->tickets.head) : "ri" (inc) : "memory", "cc");
preempt_enable();
}
@ -134,6 +184,8 @@ typedef struct mcs_lock_node {
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_lock_node_t;
typedef mcs_lock_node_t mcs_lock_t;
static void mcs_lock_init(struct mcs_lock_node *node)
{
node->locked = 0;
@ -600,4 +652,9 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
static inline int irqflags_can_interrupt(unsigned long flags)
{
return !!(flags & 0x200);
}
#endif

View File

@ -42,16 +42,34 @@
#define USER_END 0x0000800000000000UL
#define TASK_UNMAPPED_BASE 0x00002AAAAAA00000UL
/*
* Canonical negative addresses (i.e., the smallest kernel virtual address)
* on x86 64 bit mode (in its most restricted 48 bit format) starts from
* 0xffff800000000000, but Linux starts mapping physical memory at 0xffff880000000000.
* The 0x80000000000 long gap (8TBs, i.e., 16 PGD level entries in the page tables)
* is used for Xen hyervisor (see arch/x86/include/asm/page.h) and that is
* what we utilize for McKernel.
* This gives us the benefit of being able to use Linux kernel virtual
* addresses identically as in Linux.
*
* NOTE: update these also in eclair.c when modified!
*/
#define MAP_ST_START 0xffff800000000000UL
#define MAP_VMAP_START 0xfffff00000000000UL
#define MAP_FIXED_START 0xffffffff70000000UL
#define MAP_KERNEL_START 0xffffffff80000000UL
#define MAP_VMAP_START 0xffff850000000000UL
#define MAP_FIXED_START 0xffff860000000000UL
#define LINUX_PAGE_OFFSET 0xffff880000000000UL
/*
* MAP_KERNEL_START is 8MB below MODULES_END in Linux.
* Placing the LWK image in the virtual address space at the end of
* the Linux modules section enables us to map the LWK TEXT in Linux
* as well, so that Linux can also call into LWK text.
*/
#define MAP_KERNEL_START 0xFFFFFFFFFE800000UL
#define STACK_TOP(region) ((region)->user_end)
#define MAP_VMAP_SIZE 0x0000000100000000UL
#define KERNEL_PHYS_OFFSET MAP_ST_START
#define PTL4_SHIFT 39
#define PTL4_SIZE (1UL << PTL4_SHIFT)
#define PTL3_SHIFT 30

View File

@ -114,7 +114,7 @@ SYSCALL_HANDLED(160, setrlimit)
SYSCALL_HANDLED(164, settimeofday)
SYSCALL_HANDLED(186, gettid)
SYSCALL_HANDLED(200, tkill)
SYSCALL_DELEGATED(201, time)
SYSCALL_HANDLED(201, time)
SYSCALL_HANDLED(202, futex)
SYSCALL_HANDLED(203, sched_setaffinity)
SYSCALL_HANDLED(204, sched_getaffinity)
@ -161,6 +161,7 @@ SYSCALL_HANDLED(__NR_profile, profile)
SYSCALL_HANDLED(730, util_migrate_inter_kernel)
SYSCALL_HANDLED(731, util_indicate_clone)
SYSCALL_HANDLED(732, get_system)
SYSCALL_HANDLED(733, util_register_desc)
/* McKernel Specific */
SYSCALL_HANDLED(801, swapout)

View File

@ -107,9 +107,17 @@ void init_boot_processor_local(void)
@ ensures \result == %gs;
@ assigns \nothing;
*/
extern int num_processors;
int ihk_mc_get_processor_id(void)
{
int id;
void *gs;
gs = (void *)rdmsr(MSR_GS_BASE);
if (gs < (void *)locals ||
gs > ((void *)locals + LOCALS_SPAN * num_processors)) {
return -1;
}
asm volatile("movl %%gs:0, %0" : "=r"(id));

View File

@ -25,15 +25,13 @@
#include <cls.h>
#include <kmalloc.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG
#ifdef DEBUG
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
static char *last_page;
@ -41,6 +39,8 @@ extern char _head[], _end[];
extern unsigned long x86_kernel_phys_base;
int safe_kernel_map = 0;
/* Arch specific early allocation routine */
void *early_alloc_pages(int nr_pages)
{
@ -109,6 +109,7 @@ struct page_table {
};
static struct page_table *init_pt;
static int init_pt_loaded = 0;
static ihk_spinlock_t init_pt_lock;
static int use_1gb_page = 0;
@ -167,30 +168,6 @@ static unsigned long setup_l3(struct page_table *pt,
return virt_to_phys(pt);
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys, pt_phys;
int ident_index, virt_index;
map_start = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0);
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
kprintf("map_start = %lx, map_end = %lx\n", map_start, map_end);
ident_index = map_start >> PTL4_SHIFT;
virt_index = (MAP_ST_START >> PTL4_SHIFT) & (PT_ENTRIES - 1);
memset(pt, 0, sizeof(struct page_table));
for (phys = (map_start & ~(PTL4_SIZE - 1)); phys < map_end;
phys += PTL4_SIZE) {
pt_phys = setup_l3(ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL), phys,
map_start, map_end);
pt->entry[ident_index++] = pt_phys | PFL4_PDIR_ATTR;
pt->entry[virt_index++] = pt_phys | PFL4_PDIR_ATTR;
}
}
static struct page_table *__alloc_new_pt(ihk_mc_ap_flag ap_flag)
{
struct page_table *newpt = ihk_mc_alloc_pages(1, ap_flag);
@ -258,6 +235,11 @@ static unsigned long attr_to_l1attr(enum ihk_mc_pt_attribute attr)
}
}
#define PTLX_SHIFT(index) PTL ## index ## _SHIFT
#define GET_VIRT_INDEX(virt, index, dest) \
dest = ((virt) >> PTLX_SHIFT(index)) & (PT_ENTRIES - 1)
#define GET_VIRT_INDICES(virt, l4i, l3i, l2i, l1i) \
l4i = ((virt) >> PTL4_SHIFT) & (PT_ENTRIES - 1); \
l3i = ((virt) >> PTL3_SHIFT) & (PT_ENTRIES - 1); \
@ -1518,12 +1500,12 @@ static int clear_range_l1(void *args0, pte_t *ptep, uint64_t base,
if (page) {
dkprintf("%s: page=%p,is_in_memobj=%d,(old & PFL1_DIRTY)=%lx,memobj=%p,args->memobj->flags=%x\n", __FUNCTION__, page, page_is_in_memobj(page), (old & PFL1_DIRTY), args->memobj, args->memobj ? args->memobj->flags : -1);
}
if (page && page_is_in_memobj(page) && (old & PFL1_DIRTY) && (args->memobj) &&
!(args->memobj->flags & MF_ZEROFILL)) {
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL1_SIZE) &&
args->memobj && !(args->memobj->flags & MF_ZEROFILL)) {
memobj_flush_page(args->memobj, phys, PTL1_SIZE);
}
if (!(old & PFL1_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL1_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -1585,11 +1567,11 @@ static int clear_range_l2(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && (old & PFL2_DIRTY)) {
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL2_SIZE)) {
memobj_flush_page(args->memobj, phys, PTL2_SIZE);
}
if (!(old & PFL2_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL2_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -1666,13 +1648,13 @@ static int clear_range_l3(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && (old & PFL3_DIRTY)) {
if (page && page_is_in_memobj(page) && pte_is_dirty(&old, PTL3_SIZE)) {
memobj_flush_page(args->memobj, phys, PTL3_SIZE);
}
dkprintf("%s: phys=%ld, pte_get_phys(&old),PTL3_SIZE\n", __FUNCTION__, pte_get_phys(&old));
if (!(old & PFL3_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL3_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -2540,6 +2522,82 @@ static void init_fixed_area(struct page_table *pt)
return;
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
map_start = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0);
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
virt = (void *)MAP_ST_START + map_start;
kprintf("map_start = %lx, map_end = %lx, virt %lx\n",
map_start, map_end, virt);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n",
__func__, virt);
}
virt += LARGE_PAGE_SIZE;
}
}
static void init_linux_kernel_mapping(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
int nr_memory_chunks, chunk_id, numa_id;
/* In case of safe_kernel_map option (safe_kernel_map == 1),
* processing to prevent destruction of the memory area on Linux side
* is executed */
if (safe_kernel_map == 0) {
kprintf("Straight-map entire physical memory\n");
/* Map 2 TB for now */
map_start = 0;
map_end = 0x20000000000;
virt = (void *)LINUX_PAGE_OFFSET;
kprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET, LINUX_PAGE_OFFSET + map_end, 0, map_end);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n", __FUNCTION__, virt);
}
virt += LARGE_PAGE_SIZE;
}
} else {
kprintf("Straight-map physical memory areas allocated to McKernel\n");
nr_memory_chunks = ihk_mc_get_nr_memory_chunks();
if (nr_memory_chunks == 0) {
kprintf("%s: ERROR: No memory chunk available.\n", __FUNCTION__);
return;
}
for (chunk_id = 0; chunk_id < nr_memory_chunks; chunk_id++) {
if (ihk_mc_get_memory_chunk(chunk_id, &map_start, &map_end, &numa_id)) {
kprintf("%s: ERROR: Memory chunk id (%d) out of range.\n", __FUNCTION__, chunk_id);
continue;
}
dkprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET + map_start, LINUX_PAGE_OFFSET + map_end, map_start, map_end);
virt = (void *)(LINUX_PAGE_OFFSET + map_start);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE, virt += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: set_pt_large_page() failed for 0x%lx\n", __FUNCTION__, virt);
}
}
}
}
}
void init_text_area(struct page_table *pt)
{
unsigned long __end, phys, virt;
@ -2624,17 +2682,19 @@ void init_page_table(void)
init_pt = ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL);
ihk_mc_spinlock_init(&init_pt_lock);
memset(init_pt, 0, sizeof(PAGE_SIZE));
memset(init_pt, 0, sizeof(*init_pt));
/* Normal memory area */
init_normal_area(init_pt);
init_linux_kernel_mapping(init_pt);
init_fixed_area(init_pt);
init_low_area(init_pt);
init_text_area(init_pt);
init_vsyscall_area(init_pt);
load_page_table(init_pt);
kprintf("Page table is now at %p\n", init_pt);
init_pt_loaded = 1;
kprintf("Page table is now at 0x%lx\n", init_pt);
}
extern void __reserve_arch_pages(unsigned long, unsigned long,
@ -2662,17 +2722,33 @@ void ihk_mc_reserve_arch_pages(struct ihk_page_allocator_desc *pa_allocator,
unsigned long virt_to_phys(void *v)
{
unsigned long va = (unsigned long)v;
if (va >= MAP_KERNEL_START) {
dkprintf("%s: MAP_KERNEL_START <= 0x%lx <= LINUX_PAGE_OFFSET\n",
__FUNCTION__, va);
return va - MAP_KERNEL_START + x86_kernel_phys_base;
} else {
}
else if (va >= LINUX_PAGE_OFFSET) {
return va - LINUX_PAGE_OFFSET;
}
else if (va >= MAP_FIXED_START) {
return va - MAP_FIXED_START;
}
else {
dkprintf("%s: MAP_ST_START <= 0x%lx <= MAP_FIXED_START\n",
__FUNCTION__, va);
return va - MAP_ST_START;
}
}
void *phys_to_virt(unsigned long p)
{
return (void *)(p + MAP_ST_START);
/* Before loading our own PT use straight mapping */
if (!init_pt_loaded) {
return (void *)(p + MAP_ST_START);
}
return (void *)(p + LINUX_PAGE_OFFSET);
}
int copy_from_user(void *dst, const void *src, size_t siz)
@ -2840,17 +2916,12 @@ int read_process_vm(struct process_vm *vm, void *kdst, const void *usrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, to: %p, pa: %p,"
"cpsize: %d\n", __FUNCTION__, to, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(to, va, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -2924,17 +2995,12 @@ int write_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -2995,17 +3061,12 @@ int patch_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);

View File

@ -30,7 +30,7 @@ int ihk_mc_ikc_init_first_local(struct ihk_ikc_channel_desc *channel,
memset(channel, 0, sizeof(struct ihk_ikc_channel_desc));
mikc_queue_pages = ((2 * num_processors * MASTER_IKCQ_PKTSIZE)
mikc_queue_pages = ((4 * num_processors * MASTER_IKCQ_PKTSIZE)
+ (PAGE_SIZE - 1)) / PAGE_SIZE;
/* Place both sides in this side */

View File

@ -16,20 +16,16 @@
#include <registers.h>
#include <mc_perf_event.h>
#include <config.h>
#include <debug.h>
extern unsigned int *x86_march_perfmap;
extern int running_on_kvm(void);
#ifdef POSTK_DEBUG_TEMP_FIX_31
int ihk_mc_perfctr_fixed_init(int counter, int mode);
#endif/*POSTK_DEBUG_TEMP_FIX_31*/
//#define PERFCTR_DEBUG
#ifdef PERFCTR_DEBUG
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define X86_CR4_PCE 0x00000100
@ -43,11 +39,11 @@ int ihk_mc_perfctr_fixed_init(int counter, int mode);
} \
} while(0)
int perf_counters_discovered = 0;
int X86_IA32_NUM_PERF_COUNTERS = 0;
unsigned long X86_IA32_PERF_COUNTERS_MASK = 0;
int X86_IA32_NUM_FIXED_PERF_COUNTERS = 0;
unsigned long X86_IA32_FIXED_PERF_COUNTERS_MASK = 0;
int perf_counters_discovered;
int NUM_PERF_COUNTERS;
unsigned long PERF_COUNTERS_MASK;
int NUM_FIXED_PERF_COUNTERS;
unsigned long FIXED_PERF_COUNTERS_MASK;
void x86_init_perfctr(void)
{
@ -78,17 +74,17 @@ void x86_init_perfctr(void)
op = 0x0a;
asm volatile("cpuid" : "=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"a"(op));
X86_IA32_NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
X86_IA32_PERF_COUNTERS_MASK = (1 << X86_IA32_NUM_PERF_COUNTERS) - 1;
NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
PERF_COUNTERS_MASK = (1 << NUM_PERF_COUNTERS) - 1;
X86_IA32_NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
X86_IA32_FIXED_PERF_COUNTERS_MASK =
((1UL << X86_IA32_NUM_FIXED_PERF_COUNTERS) - 1) <<
X86_IA32_BASE_FIXED_PERF_COUNTERS;
NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
FIXED_PERF_COUNTERS_MASK =
((1UL << NUM_FIXED_PERF_COUNTERS) - 1) <<
BASE_FIXED_PERF_COUNTERS;
perf_counters_discovered = 1;
kprintf("X86_IA32_NUM_PERF_COUNTERS: %d, X86_IA32_NUM_FIXED_PERF_COUNTERS: %d\n",
X86_IA32_NUM_PERF_COUNTERS, X86_IA32_NUM_FIXED_PERF_COUNTERS);
kprintf("NUM_PERF_COUNTERS: %d, NUM_FIXED_PERF_COUNTERS: %d\n",
NUM_PERF_COUNTERS, NUM_FIXED_PERF_COUNTERS);
}
/* Clear Fixed Counter Control */
@ -97,20 +93,20 @@ void x86_init_perfctr(void)
wrmsr(MSR_PERF_FIXED_CTRL, value);
/* Clear Generic Counter Control */
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
wrmsr(MSR_IA32_PERFEVTSEL0 + i, 0);
}
/* Enable PMC Control */
value = rdmsr(MSR_PERF_GLOBAL_CTRL);
value |= X86_IA32_PERF_COUNTERS_MASK;
value |= X86_IA32_FIXED_PERF_COUNTERS_MASK;
value |= PERF_COUNTERS_MASK;
value |= FIXED_PERF_COUNTERS_MASK;
wrmsr(MSR_PERF_GLOBAL_CTRL, value);
}
static int set_perfctr_x86_direct(int counter, int mode, unsigned int value)
{
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
@ -149,13 +145,14 @@ static int set_pmc_x86_direct(int counter, long val)
val &= 0x000000ffffffffff; // 40bit Mask
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// set generic pmc
wrmsr(MSR_IA32_PMC0 + counter, val);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// set fixed pmc
wrmsr(MSR_IA32_FIXED_CTR0 + counter - X86_IA32_BASE_FIXED_PERF_COUNTERS, val);
wrmsr(MSR_IA32_FIXED_CTR0 +
counter - BASE_FIXED_PERF_COUNTERS, val);
}
else {
return -EINVAL;
@ -175,10 +172,10 @@ static int set_fixed_counter(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@ -208,14 +205,13 @@ int ihk_mc_perfctr_init_raw(int counter, uint64_t config, int mode)
int ihk_mc_perfctr_init_raw(int counter, unsigned int code, int mode)
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
{
#ifdef POSTK_DEBUG_TEMP_FIX_31
// PAPI_REF_CYC counted by fixed counter
if (counter >= X86_IA32_BASE_FIXED_PERF_COUNTERS) {
if (counter >= BASE_FIXED_PERF_COUNTERS &&
counter < BASE_FIXED_PERF_COUNTERS + NUM_FIXED_PERF_COUNTERS) {
return ihk_mc_perfctr_fixed_init(counter, mode);
}
#endif /*POSTK_DEBUG_TEMP_FIX_31*/
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
@ -248,7 +244,7 @@ int ihk_mc_perfctr_init(int counter, enum ihk_perfctr_type type, int mode)
}
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
if (type < 0 || type >= PERFCTR_MAX_TYPE) {
@ -300,18 +296,11 @@ int ihk_mc_perfctr_set_extra(struct mc_perf_event *event)
extern void x86_march_perfctr_start(unsigned long counter_mask);
#endif
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_start(int counter)
#else
int ihk_mc_perfctr_start(unsigned long counter_mask)
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
{
int ret = 0;
unsigned long value = 0;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@ -328,18 +317,11 @@ int ihk_mc_perfctr_start(unsigned long counter_mask)
goto fn_exit;
}
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_stop(int counter)
#else
int ihk_mc_perfctr_stop(unsigned long counter_mask)
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
{
int ret = 0;
unsigned long value;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@ -376,10 +358,10 @@ int ihk_mc_perfctr_fixed_init(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@ -420,7 +402,7 @@ int ihk_mc_perfctr_read_mask(unsigned long counter_mask, unsigned long *value)
{
int i, j;
for (i = 0, j = 0; i < X86_IA32_NUM_PERF_COUNTERS && counter_mask;
for (i = 0, j = 0; i < NUM_PERF_COUNTERS && counter_mask;
i++, counter_mask >>= 1) {
if (counter_mask & 1) {
value[j++] = rdpmc(i);
@ -440,13 +422,14 @@ unsigned long ihk_mc_perfctr_read(int counter)
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// read generic pmc
retval = rdpmc(counter);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// read fixed pmc
retval = rdpmc((1 << 30) + (counter - X86_IA32_BASE_FIXED_PERF_COUNTERS));
retval = rdpmc((1 << 30) +
(counter - BASE_FIXED_PERF_COUNTERS));
}
else {
retval = -EINVAL;
@ -468,12 +451,12 @@ unsigned long ihk_mc_perfctr_read_msr(int counter)
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// read generic pmc
idx = MSR_IA32_PMC0 + counter;
retval = (unsigned long) rdmsr(idx);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// read fixed pmc
idx = MSR_IA32_FIXED_CTR0 + counter;
retval = (unsigned long) rdmsr(idx);
@ -506,8 +489,8 @@ int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config, unsi
}
// find avail generic counter
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
if(!(pmc_status & (1 << i))) {
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
if (!(pmc_status & (1 << i))) {
ret = i;
break;
}

View File

@ -31,12 +31,11 @@
#include <page.h>
#include <limits.h>
#include <syscall.h>
#include <debug.h>
void terminate_mcexec(int, int);
extern long do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact);
long syscall(int num, ihk_mc_user_context_t *ctx);
void set_signal(int sig, void *regs0, siginfo_t *info);
void check_signal(unsigned long rc, void *regs0, int num);
extern unsigned long do_fork(int, unsigned long, unsigned long, unsigned long,
unsigned long, unsigned long, unsigned long);
extern int get_xsave_size();
@ -45,11 +44,8 @@ extern uint64_t get_xsave_mask();
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#define dkprintf kprintf
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
uintptr_t debug_constants[] = {
@ -92,33 +88,45 @@ static ptrdiff_t vdso_offset;
extern int num_processors;
int obtain_clone_cpuid(cpu_set_t *cpu_set) {
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last) {
int min_queue_len = -1;
int cpu, min_cpu = -1;
int cpu, min_cpu = -1, uti_cpu = -1;
unsigned long irqstate;
irqstate = ihk_mc_spinlock_lock(&runq_reservation_lock);
/* Find the first allowed core with the shortest run queue */
for (cpu = 0; cpu < num_processors; ++cpu) {
struct cpu_local_var *v;
unsigned long irqstate;
if (!CPU_ISSET(cpu, cpu_set)) continue;
v = get_cpu_local_var(cpu);
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
if (min_queue_len == -1 || v->runq_len < min_queue_len) {
min_queue_len = v->runq_len;
ihk_mc_spinlock_lock_noirq(&v->runq_lock);
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved);
if (min_queue_len == -1 || v->runq_len + v->runq_reserved < min_queue_len) {
min_queue_len = v->runq_len + v->runq_reserved;
min_cpu = cpu;
}
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* Record the last tie CPU */
if (min_cpu != cpu && v->runq_len + v->runq_reserved == min_queue_len) {
uti_cpu = cpu;
}
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d,min_cpu=%d,uti_cpu=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved, min_cpu, uti_cpu);
ihk_mc_spinlock_unlock_noirq(&v->runq_lock);
#if 0
if (min_queue_len == 0)
break;
#endif
}
min_cpu = use_last ? uti_cpu : min_cpu;
if (min_cpu != -1) {
if (get_cpu_local_var(min_cpu)->status != CPU_STATUS_RESERVED)
get_cpu_local_var(min_cpu)->status = CPU_STATUS_RESERVED;
__sync_fetch_and_add(&get_cpu_local_var(min_cpu)->runq_reserved, 1);
}
ihk_mc_spinlock_unlock(&runq_reservation_lock, irqstate);
return min_cpu;
}
@ -251,7 +259,7 @@ SYSCALL_DECLARE(rt_sigreturn)
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
if(ksigsp.fpregs && xsavesize){
@ -276,7 +284,6 @@ SYSCALL_DECLARE(rt_sigreturn)
}
extern struct cpu_local_var *clv;
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern void interrupt_syscall(struct thread *, int sig);
extern void terminate(int, int);
extern int num_processors;
@ -530,23 +537,32 @@ void ptrace_report_signal(struct thread *thread, int sig)
dkprintf("ptrace_report_signal, tid=%d, pid=%d\n", thread->tid, thread->proc->pid);
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
if(!(proc->ptrace & PT_TRACED)){
if (!(thread->ptrace & PT_TRACED)) {
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
return;
}
thread->exit_status = sig;
/* Transition thread state */
proc->status = PS_TRACED;
thread->exit_status = sig;
thread->status = PS_TRACED;
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
proc->signal_flags |= SIGNAL_STOP_STOPPED;
} else {
proc->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
parent_pid = proc->parent->pid;
thread->ptrace &= ~PT_TRACE_SYSCALL;
save_debugreg(thread->ptrace_debugreg);
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
thread->signal_flags |= SIGNAL_STOP_STOPPED;
}
else {
thread->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
if (thread == proc->main_thread) {
proc->status = PS_DELAY_TRACED;
parent_pid = proc->parent->pid;
}
else {
parent_pid = thread->report_proc->pid;
waitq_wakeup(&thread->report_proc->waitpid_q);
}
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
memset(&info, '\0', sizeof info);
@ -555,8 +571,6 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */
@ -569,9 +583,8 @@ ptrace_arch_prctl(int pid, long code, long addr)
{
long rc = -EIO;
struct thread *child;
struct mcs_rwlock_node_irqsave lock;
child = find_thread(pid, pid, &lock);
child = find_thread(pid, pid);
if (!child)
return -ESRCH;
if (child->proc->status & (PS_TRACED | PS_STOPPED)) {
@ -613,7 +626,7 @@ ptrace_arch_prctl(int pid, long code, long addr)
break;
}
}
thread_unlock(child, &lock);
thread_unlock(child);
return rc;
}
@ -635,11 +648,13 @@ arch_ptrace(long request, int pid, long addr, long data)
static int
isrestart(int num, unsigned long rc, int sig, int restart)
{
if(sig == SIGKILL || sig == SIGSTOP)
if (sig == SIGKILL || sig == SIGSTOP)
return 0;
if(num == 0 || rc != -EINTR)
if (num < 0 || rc != -EINTR)
return 0;
switch(num){
if (sig == SIGCHLD)
return 1;
switch (num) {
case __NR_pause:
case __NR_rt_sigsuspend:
case __NR_rt_sigtimedwait:
@ -660,14 +675,12 @@ isrestart(int num, unsigned long rc, int sig, int restart)
case __NR_io_getevents:
return 0;
}
if(sig == SIGCHLD)
return 1;
if(restart)
if (restart)
return 1;
return 0;
}
void
int
do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pending *pending, int num)
{
struct x86_user_context *regs = regs0;
@ -679,14 +692,15 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
int ptraceflag = 0;
struct mcs_rwlock_node_irqsave lock;
struct mcs_rwlock_node_irqsave mcs_rw_node;
int restart = 0;
for(w = pending->sigmask.__val[0], sig = 0; w; sig++, w >>= 1);
dkprintf("do_signal(): tid=%d, pid=%d, sig=%d\n", thread->tid, proc->pid, sig);
orgsig = sig;
if((proc->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
if ((thread->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
ptraceflag = 1;
sig = SIGSTOP;
}
@ -707,7 +721,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
if(k->sa.sa_handler == SIG_IGN){
kfree(pending);
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
return;
goto out;
}
else if(k->sa.sa_handler){
unsigned long *usp; /* user stack */
@ -757,9 +771,8 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
memcpy(&ksigsp.sigstack, &thread->sigstack, sizeof(stack_t));
ksigsp.sigrc = rc;
ksigsp.num = num;
ksigsp.restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
if(num != 0 && rc == -EINTR && sig == SIGCHLD)
ksigsp.restart = 1;
restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
ksigsp.restart = restart;
if(xsavesize){
uint64_t xsave_mask = get_xsave_mask();
unsigned int low = (unsigned int)xsave_mask;
@ -772,7 +785,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,no space available\n");
terminate(0, sig);
return;
goto out;
}
kfpregs = (void *)((((unsigned long)_kfpregs) + 63) & ~63);
memset(kfpregs, '\0', xsavesize);
@ -782,7 +795,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
return;
goto out;
}
ksigsp.fpregs = (void *)fpregs;
kfree(_kfpregs);
@ -794,7 +807,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
return;
goto out;
}
usp = (unsigned long *)sigsp;
@ -824,12 +837,13 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
}
else {
int coredumped = 0;
siginfo_t info;
int ptc = pending->ptracecont;
if(ptraceflag){
if(thread->ptrace_recvsig)
@ -856,25 +870,37 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = CLD_STOPPED;
info._sifields._sigchld.si_pid = thread->proc->pid;
info._sifields._sigchld.si_status = (sig << 8) | 0x7f;
do_kill(cpu_local_var(current), thread->proc->parent->pid, -1, SIGCHLD, &info, 0);
dkprintf("do_signal,SIGSTOP,changing state\n");
if (ptc == 2 &&
thread != thread->proc->main_thread) {
thread->signal_flags =
SIGNAL_STOP_STOPPED;
thread->status = PS_STOPPED;
thread->exit_status = SIGSTOP;
do_kill(thread,
thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(
&thread->report_proc->waitpid_q);
}
else {
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(
&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
/* Reap and set new signal_flags */
proc->main_thread->signal_flags =
SIGNAL_STOP_STOPPED;
/* Reap and set new signal_flags */
proc->signal_flags = SIGNAL_STOP_STOPPED;
proc->status = PS_DELAY_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(
&proc->update_lock, &lock);
proc->status = PS_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("do_signal(): pid: %d, tid: %d SIGSTOP, sleeping\n",
proc->pid, thread->tid);
do_kill(thread,
thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
/* Sleep */
schedule();
dkprintf("SIGSTOP(): woken up\n");
@ -882,19 +908,28 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
case SIGTRAP:
dkprintf("do_signal,SIGTRAP\n");
if(!(proc->ptrace & PT_TRACED)) {
if (!(thread->ptrace & PT_TRACED)) {
goto core;
}
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
thread->exit_status = SIGTRAP;
proc->status = PS_TRACED;
thread->status = PS_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&thread->proc->parent->waitpid_q);
if (thread == proc->main_thread) {
mcs_rwlock_writer_lock(&proc->update_lock,
&lock);
proc->group_exit_status = SIGTRAP;
proc->status = PS_DELAY_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock,
&lock);
do_kill(thread, thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
else {
do_kill(thread, thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(&thread->report_proc->waitpid_q);
}
/* Sleep */
dkprintf("do_signal,SIGTRAP,sleeping\n");
@ -909,7 +944,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info._sifields._sigchld.si_pid = proc->pid;
info._sifields._sigchld.si_status = 0x0000ffff;
do_kill(cpu_local_var(current), proc->parent->pid, -1, SIGCHLD, &info, 0);
proc->signal_flags = SIGNAL_STOP_CONTINUED;
proc->main_thread->signal_flags = SIGNAL_STOP_CONTINUED;
proc->status = PS_RUNNING;
dkprintf("do_signal,SIGCONT,do nothing\n");
break;
@ -938,6 +973,8 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
}
}
out:
return restart;
}
static struct sig_pending *
@ -957,10 +994,12 @@ getsigpending(struct thread *thread, int delflag){
lock = &thread->sigcommon->lock;
head = &thread->sigcommon->sigpending;
for(;;) {
if (delflag)
if (delflag) {
mcs_rwlock_writer_lock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_lock(lock, &mcs_rw_node);
}
list_for_each_entry_safe(pending, next, head, list){
for(x = pending->sigmask.__val[0], sig = 0; x; sig++, x >>= 1);
@ -973,19 +1012,23 @@ getsigpending(struct thread *thread, int delflag){
if(delflag)
list_del(&pending->list);
if (delflag)
if (delflag) {
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
return pending;
}
}
}
if (delflag)
if (delflag) {
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
if(lock == &thread->sigpendinglock)
return NULL;
@ -1000,6 +1043,11 @@ getsigpending(struct thread *thread, int delflag){
struct sig_pending *
hassigpending(struct thread *thread)
{
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
return NULL;
}
return getsigpending(thread, 0);
}
@ -1017,6 +1065,12 @@ void save_syscall_return_value(int num, unsigned long rc)
return;
}
/** \brief check arrived signals and processing
*
* @param rc return value of syscall
* @param regs0 context
* @param num syscall number (-1: Not called on exiting system call)
*/
void
check_signal(unsigned long rc, void *regs0, int num)
{
@ -1050,6 +1104,11 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
goto out;
}
for(;;){
pending = getsigpending(thread, 1);
if(!pending) {
@ -1057,7 +1116,9 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
do_signal(rc, regs, thread, pending, num);
if (do_signal(rc, regs, thread, pending, num)) {
num = -1;
}
}
out:
@ -1137,7 +1198,7 @@ check_sig_pending_thread(struct thread *thread)
}
void
check_sig_pending()
check_sig_pending(void)
{
struct thread *thread;
struct cpu_local_var *v;
@ -1158,7 +1219,7 @@ repeat:
continue;
}
if (thread->proc->exit_status & 0x0000000100000000L) {
if (thread->proc->group_exit_status & 0x0000000100000000L) {
continue;
}
@ -1367,7 +1428,8 @@ done:
return 0;
}
if (tthread->thread_offloaded) {
/* Forward signal to Linux by interrupt_syscall mechanism */
if (tthread->uti_state == UTI_STATE_RUNNING_IN_LINUX) {
if (!tthread->proc->nohost) {
interrupt_syscall(tthread, sig);
}
@ -1384,10 +1446,10 @@ done:
in check_signal */
rc = 0;
k = tthread->sigcommon->action + sig - 1;
if((sig != SIGKILL && (tproc->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))){
if ((sig != SIGKILL && (tthread->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))) {
struct sig_pending *pending = NULL;
if (sig < 33) { // SIGRTMIN - SIGRTMAX
list_for_each_entry(pending, head, list){
@ -1471,7 +1533,7 @@ set_signal(int sig, void *regs0, siginfo_t *info)
SYSCALL_DECLARE(mmap)
{
const int supported_flags = 0
const unsigned int supported_flags = 0
| MAP_SHARED // 01
| MAP_PRIVATE // 02
| MAP_FIXED // 10
@ -1479,7 +1541,7 @@ SYSCALL_DECLARE(mmap)
| MAP_LOCKED // 2000
| MAP_POPULATE // 8000
| MAP_HUGETLB // 00040000
| (0x3F << MAP_HUGE_SHIFT) // FC000000
| (0x3FU << MAP_HUGE_SHIFT) // FC000000
;
const int ignored_flags = 0
#ifdef USE_NOCACHE_MMAP
@ -1498,7 +1560,7 @@ SYSCALL_DECLARE(mmap)
| MAP_NONBLOCK // 00010000
;
const intptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const uintptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const size_t len0 = ihk_mc_syscall_arg1(ctx);
const int prot = ihk_mc_syscall_arg2(ctx);
const int flags0 = ihk_mc_syscall_arg3(ctx);
@ -1507,7 +1569,7 @@ SYSCALL_DECLARE(mmap)
struct thread *thread = cpu_local_var(current);
struct vm_regions *region = &thread->vm->region;
int error;
intptr_t addr = 0;
uintptr_t addr = 0;
size_t len;
int flags = flags0;
size_t pgsize;
@ -1570,8 +1632,9 @@ SYSCALL_DECLARE(mmap)
goto out;
}
if ((flags & MAP_FIXED) && ((addr < region->user_start)
|| (region->user_end <= addr))) {
if (addr < region->user_start
|| region->user_end <= addr
|| len > (region->user_end - region->user_start)) {
ekprintf("sys_mmap(%lx,%lx,%x,%x,%x,%lx):ENOMEM\n",
addr0, len0, prot, flags0, fd, off0);
error = -ENOMEM;
@ -1698,6 +1761,11 @@ SYSCALL_DECLARE(arch_prctl)
ihk_mc_syscall_arg1(ctx));
}
SYSCALL_DECLARE(time)
{
return time();
}
static int vdso_get_vdso_info(void)
{
int error;
@ -2080,7 +2148,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)local_iov,
(uintptr_t)(local_iov + liovcnt * sizeof(struct iovec)));
(uintptr_t)(local_iov + liovcnt));
if (!range) {
ret = -EFAULT;
@ -2089,7 +2157,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)remote_iov,
(uintptr_t)(remote_iov + riovcnt * sizeof(struct iovec)));
(uintptr_t)(remote_iov + riovcnt));
if (!range) {
ret = -EFAULT;
@ -2365,8 +2433,6 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
@ -2386,41 +2452,38 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
case 1:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 1:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
mpsr->nodes_ready = 1;
break;
default:
break;
}
}
else if (nr_cpus >= 4 && nr_cpus < 8) {
else if (nr_cpus >= 4 && nr_cpus < 7) {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
break;
case 1:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 2:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 3:
case 2:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 3:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
@ -2430,7 +2493,7 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
break;
}
}
else if (nr_cpus >= 8) {
else {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
@ -2442,28 +2505,23 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * (count / 2));
break;
case 2:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 3:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 4:
case 3:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
break;
case 5:
case 4:
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 6:
case 5:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
break;
case 7:
case 6:
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
break;
default:
break;
}
@ -2671,11 +2729,19 @@ out:
time_t time(void) {
struct syscall_request sreq IHK_DMA_ALIGN;
struct thread *thread = cpu_local_var(current);
time_t ret;
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id(), thread->proc->pid);
struct timespec ats;
time_t ret = 0;
if (gettime_local_support) {
calculate_time_from_tsc(&ats);
ret = ats.tv_sec;
}
else {
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id());
}
return ret;
}

View File

@ -31,51 +31,6 @@ struct tod_data_s tod_data
.version = IHK_ATOMIC64_INIT(0),
};
static inline void cpu_pause_for_vsyscall(void)
{
asm volatile ("pause" ::: "memory");
return;
} /* cpu_pause_for_vsyscall() */
static inline void calculate_time_from_tsc(struct timespec *ts)
{
long ver;
unsigned long current_tsc;
__time_t sec_delta;
long ns_delta;
for (;;) {
while ((ver = ihk_atomic64_read(&tod_data.version)) & 1) {
/* settimeofday() is in progress */
cpu_pause_for_vsyscall();
}
rmb();
*ts = tod_data.origin;
rmb();
if (ver == ihk_atomic64_read(&tod_data.version)) {
break;
}
/* settimeofday() has intervened */
cpu_pause_for_vsyscall();
}
current_tsc = rdtsc();
sec_delta = current_tsc / tod_data.clocks_per_sec;
ns_delta = NS_PER_SEC * (current_tsc % tod_data.clocks_per_sec)
/ tod_data.clocks_per_sec;
/* calc. of ns_delta overflows if clocks_per_sec exceeds 18.44 GHz */
ts->tv_sec += sec_delta;
ts->tv_nsec += ns_delta;
if (ts->tv_nsec >= NS_PER_SEC) {
ts->tv_nsec -= NS_PER_SEC;
++ts->tv_sec;
}
return;
} /* calculate_time_from_tsc() */
int vsyscall_gettimeofday(struct timeval *tv, void *tz)
{
int error;

View File

@ -43,7 +43,8 @@ error_exit() {
;;
esac
exit 1
# Retun -EINVAL
exit -22
}
fi
@ -144,3 +145,5 @@ for cpuid in `find /sys/bus/cpu/devices/* -maxdepth 0 -name "cpu[0123456789]*" -
rm -rf /tmp/mcos/mcos0_sys/bus/cpu/devices/$cpuid
fi
done
exit 0

View File

@ -8,6 +8,9 @@ if grep mcoverlay /proc/modules &>/dev/null; then
if [ -e /tmp/mcos ]; then rm -rf /tmp/mcos; fi
if ! rmmod mcoverlay 2>/dev/null; then
echo "error: removing mcoverlay" >&2
exit 1
# Return -EINVAL
exit -22
fi
fi
exit 0

View File

@ -12,6 +12,7 @@
# the same set of resources as it used previously.
# Note that the script does not output anything unless an error occurs.
ret=1
prefix="@prefix@"
BINDIR="${prefix}/bin"
SBINDIR="${prefix}/sbin"
@ -44,11 +45,12 @@ fi
turbo=""
ihk_irq=""
safe_kernel_map=""
umask_old=`umask`
idle_halt=""
allow_oversubscribe=""
while getopts :tk:c:m:o:f:r:q:i:d:e:hO OPT
while getopts stk:c:m:o:f:r:q:i:d:e:hO OPT
do
case ${OPT} in
f) facility=${OPTARG}
@ -61,6 +63,8 @@ do
;;
m) mem=${OPTARG}
;;
s) safe_kernel_map="safe_kernel_map"
;;
r) ikc_map=${OPTARG}
;;
q) ihk_irq=${OPTARG}
@ -77,8 +81,8 @@ do
;;
O) allow_oversubscribe="allow_oversubscribe"
;;
*) echo "invalid option -${OPT}" >&2
exit 1
\?) exit 1
;;
esac
done
@ -90,6 +94,16 @@ fi
if [ "${redirect_kmsg}" != "0" -o "${mon_interval}" != "-1" ]; then
${SBINDIR}/ihkmond -f ${facility} -k ${redirect_kmsg} -i ${mon_interval}
fi
disable_irqbalance_mck() {
if [ -f /etc/systemd/system/irqbalance_mck.service ]; then
systemctl disable irqbalance_mck.service >/dev/null 2>/dev/null
# Invalid .service file persists so remove it
rm -f /etc/systemd/system/irqbalance_mck.service
fi
}
#
# Revert any state that has been initialized before the error occured.
#
@ -103,9 +117,7 @@ error_exit() {
if ! systemctl stop irqbalance_mck.service 2>/dev/null; then
echo "warning: failed to stop irqbalance_mck" >&2
fi
if ! systemctl disable irqbalance_mck.service >/dev/null 2>/dev/null; then
echo "warning: failed to disable irqbalance_mck" >&2
fi
disable_irqbalance_mck
fi
fi
;&
@ -196,7 +208,8 @@ error_exit() {
;;
esac
exit 1
# Propagate exit status if any
exit $ret
}
ihk_ikc_irq_core=0
@ -222,7 +235,7 @@ if [ "${ENABLE_MCOVERLAYFS}" == "yes" ]; then
enable_mcoverlay="yes"
fi
else
if [ ${linux_version_code} -eq 199168 -a ${rhel_release} -ge 327 -a ${rhel_release} -le 693 ]; then
if [ ${linux_version_code} -eq 199168 -a ${rhel_release} -ge 327 -a ${rhel_release} -le 862 ]; then
enable_mcoverlay="yes"
fi
if [ ${linux_version_code} -ge 262144 -a ${linux_version_code} -lt 262400 ]; then
@ -247,7 +260,11 @@ fi
# Remove mcoverlay if loaded
if [ "$enable_mcoverlay" == "yes" ]; then
. ${SBINDIR}/mcoverlay-destroy.sh
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
error_exit "initial"
fi
fi
# Stop irqbalance
@ -432,7 +449,7 @@ if ! ${SBINDIR}/ihkosctl 0 load ${KERNDIR}/mckernel.img; then
fi
# Set kernel arguments
if ! ${SBINDIR}/ihkosctl 0 kargs "hidos $turbo $idle_halt dump_level=${DUMP_LEVEL} $extra_kopts $allow_oversubscribe"; then
if ! ${SBINDIR}/ihkosctl 0 kargs "hidos $turbo $safe_kernel_map $idle_halt dump_level=${DUMP_LEVEL} $extra_kopts $allow_oversubscribe"; then
echo "error: setting kernel arguments" >&2
error_exit "os_created"
fi
@ -450,7 +467,11 @@ fi
# Overlay /proc, /sys with McKernel specific contents
if [ "$enable_mcoverlay" == "yes" ]; then
. ${SBINDIR}/mcoverlay-create.sh
${SBINDIR}/mcoverlay-create.sh
ret=$?
if [ $ret -ne 0 ]; then
error_exit "os_created"
fi
fi
# Start irqbalance with CPUs and IRQ for McKernel banned
@ -458,7 +479,9 @@ if [ "${irqbalance_used}" == "yes" ]; then
banirq=`cat /proc/interrupts| perl -e 'while(<>) { if(/^\s*(\d+).*IHK\-SMP\s*$/) {print $1;}}'`
sed "s/%mask%/$smp_affinity_mask/g" $ETCDIR/irqbalance_mck.in | sed "s/%banirq%/$banirq/g" > /tmp/irqbalance_mck
systemctl disable irqbalance_mck.service >/dev/null 2>/dev/null
disable_irqbalance_mck
if ! systemctl link $ETCDIR/irqbalance_mck.service >/dev/null 2>/dev/null; then
echo "error: linking irqbalance_mck" >&2
error_exit "mcos_sys_mounted"

View File

@ -18,6 +18,15 @@ mem=""
cpus=""
irqbalance_used=""
disable_irqbalance_mck() {
if [ -f /etc/systemd/system/irqbalance_mck.service ]; then
systemctl disable irqbalance_mck.service >/dev/null 2>/dev/null
# Invalid .service file persists so remove it
rm -f /etc/systemd/system/irqbalance_mck.service
fi
}
# No SMP module? Exit.
if ! grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then exit 0; fi
@ -26,9 +35,7 @@ if [ "`systemctl status irqbalance_mck.service 2> /dev/null |grep -E 'Active: ac
if ! systemctl stop irqbalance_mck.service 2>/dev/null; then
echo "warning: failed to stop irqbalance_mck" >&2
fi
if ! systemctl disable irqbalance_mck.service >/dev/null 2>/dev/null; then
echo "warning: failed to disable irqbalance_mck" >&2
fi
disable_irqbalance_mck
fi
# Destroy all LWK instances
@ -93,7 +100,13 @@ if grep mcctrl /proc/modules &>/dev/null; then
fi
# Remove mcoverlay if loaded
. ${SBINDIR}/mcoverlay-destroy.sh
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
echo "error: mcoverlay-destroy.sh" >&2
exit $ret
fi
# Remove SMP module
if grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then

View File

@ -54,48 +54,6 @@
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H
/* Define to address of kernel symbol __vvar_page, or 0 if exported */
#undef MCCTRL_KSYM___vvar_page
/* Define to address of kernel symbol hpet_address, or 0 if exported */
#undef MCCTRL_KSYM_hpet_address
/* Define to address of kernel symbol hv_clock, or 0 if exported */
#undef MCCTRL_KSYM_hv_clock
/* Define to address of kernel symbol sys_mount, or 0 if exported */
#undef MCCTRL_KSYM_sys_mount
/* Define to address of kernel symbol sys_readlink, or 0 if exported */
#undef MCCTRL_KSYM_sys_readlink
/* Define to address of kernel symbol sys_umount, or 0 if exported */
#undef MCCTRL_KSYM_sys_umount
/* Define to address of kernel symbol sys_unshare, or 0 if exported */
#undef MCCTRL_KSYM_sys_unshare
/* Define to address of kernel symbol vdso_end, or 0 if exported */
#undef MCCTRL_KSYM_vdso_end
/* Define to address of kernel symbol vdso_image_64, or 0 if exported */
#undef MCCTRL_KSYM_vdso_image_64
/* Define to address of kernel symbol vdso_pages, or 0 if exported */
#undef MCCTRL_KSYM_vdso_pages
/* Define to address of kernel symbol vdso_spec, or 0 if exported */
#undef MCCTRL_KSYM_vdso_spec
/* Define to address of kernel symbol vdso_start, or 0 if exported */
#undef MCCTRL_KSYM_vdso_start
/* Define to address of kernel symbol walk_page_range, or 0 if exported */
#undef MCCTRL_KSYM_walk_page_range
/* Define to address of kernel symbol zap_page_range, or 0 if exported */
#undef MCCTRL_KSYM_zap_page_range
/* McKernel specific headers */
#undef MCKERNEL_INCDIR
@ -128,3 +86,6 @@
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
/* whether or not syscall_intercept library is linked */
#undef WITH_SYSCALL_INTERCEPT

658
configure vendored
View File

@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for mckernel 1.5.0.
# Generated by GNU Autoconf 2.69 for mckernel 1.6.0.
#
#
# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@ -577,8 +577,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='mckernel'
PACKAGE_TARNAME='mckernel'
PACKAGE_VERSION='1.5.0'
PACKAGE_STRING='mckernel 1.5.0'
PACKAGE_VERSION='1.6.0'
PACKAGE_STRING='mckernel 1.6.0'
PACKAGE_BUGREPORT=''
PACKAGE_URL=''
@ -628,9 +628,12 @@ IHK_RELEASE_DATE
DCFA_VERSION
MCKERNEL_VERSION
IHK_VERSION
WITH_SYSCALL_INTERCEPT
ENABLE_QLMPI
ENABLE_RUSAGE
ENABLE_MCOVERLAYFS
LDFLAGS_SYSCALL_INTERCEPT
CPPFLAGS_SYSCALL_INTERCEPT
MANDIR
KERNDIR
KMODDIR
@ -702,6 +705,9 @@ enable_option_checking
with_mpi
with_mpi_include
with_mpi_lib
with_syscall_intercept
with_syscall_intercept_include
with_syscall_intercept_lib
with_kernelsrc
with_target
with_system_map
@ -1262,7 +1268,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures mckernel 1.5.0 to adapt to many kinds of systems.
\`configure' configures mckernel 1.6.0 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@ -1323,7 +1329,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of mckernel 1.5.0:";;
short | recursive ) echo "Configuration of mckernel 1.6.0:";;
esac
cat <<\_ACEOF
@ -1346,6 +1352,15 @@ Optional Packages:
--with-mpi-include=PATH specify path where mpi include directory can be
found
--with-mpi-lib=PATH specify path where mpi lib directory can be found
--with-syscall_intercept=PATH
specify path where syscall_intercept include
directory and lib directory can be found
--with-syscall_intercept-include=PATH
specify path where syscall_intercept include
directory can be found
--with-syscall_intercept-lib=PATH
specify path where syscall_intercept lib directory
can be found
--with-kernelsrc=path Path to 'kernel src', default is
/lib/modules/uname_r/build
--with-target={attached-mic | builtin-mic | builtin-x86 | smp-x86}
@ -1431,7 +1446,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
mckernel configure 1.5.0
mckernel configure 1.6.0
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@ -1729,7 +1744,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by mckernel $as_me 1.5.0, which was
It was created by mckernel $as_me 1.6.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@ -2082,11 +2097,13 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
IHK_VERSION=1.5.0
MCKERNEL_VERSION=1.5.0
IHK_VERSION=1.6.0
MCKERNEL_VERSION=1.6.0
DCFA_VERSION=DCFA_VERSION_m4
IHK_RELEASE_DATE=2018-04-05
MCKERNEL_RELEASE_DATE=2018-04-05
IHK_RELEASE_DATE=2018-11-11
MCKERNEL_RELEASE_DATE=2018-11-11
DCFA_RELEASE_DATE=DCFA_RELEASE_DATE_m4
@ -3513,6 +3530,195 @@ fi
# Check whether --with-syscall_intercept was given.
if test "${with_syscall_intercept+set}" = set; then :
withval=$with_syscall_intercept; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept=PATH expects a valid PATH" >&2;}
with_syscall_intercept="" ;; #(
*) :
;;
esac
else
with_syscall_intercept=
fi
# Check whether --with-syscall_intercept-include was given.
if test "${with_syscall_intercept_include+set}" = set; then :
withval=$with_syscall_intercept_include; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept-include=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept-include=PATH expects a valid PATH" >&2;}
with_syscall_intercept_include="" ;; #(
*) :
;;
esac
fi
# Check whether --with-syscall_intercept-lib was given.
if test "${with_syscall_intercept_lib+set}" = set; then :
withval=$with_syscall_intercept_lib; case "$withval" in #(
yes|no|'') :
{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: --without-syscall_intercept-lib=PATH expects a valid PATH" >&5
$as_echo "$as_me: WARNING: --without-syscall_intercept-lib=PATH expects a valid PATH" >&2;}
with_syscall_intercept_lib="" ;; #(
*) :
;;
esac
fi
# The args have been sanitized into empty/non-empty values above.
# Now append -I/-L args to CPPFLAGS/LDFLAGS, with more specific options
# taking priority
if test -n "${with_syscall_intercept_include}"; then :
if echo "$CPPFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-I${with_syscall_intercept_include}\>" >/dev/null 2>&1; then :
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') contains '-I${with_syscall_intercept_include}', not appending" >&5
else
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') does not contain '-I${with_syscall_intercept_include}', appending" >&5
CPPFLAGS_SYSCALL_INTERCEPT="$CPPFLAGS_SYSCALL_INTERCEPT -I${with_syscall_intercept_include}"
fi
else
if test -n "${with_syscall_intercept}"; then :
if echo "$CPPFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-I${with_syscall_intercept}/include\>" >/dev/null 2>&1; then :
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') contains '-I${with_syscall_intercept}/include', not appending" >&5
else
echo "CPPFLAGS_SYSCALL_INTERCEPT(='$CPPFLAGS_SYSCALL_INTERCEPT') does not contain '-I${with_syscall_intercept}/include', appending" >&5
CPPFLAGS_SYSCALL_INTERCEPT="$CPPFLAGS_SYSCALL_INTERCEPT -I${with_syscall_intercept}/include"
fi
fi
fi
if test -n "${with_syscall_intercept_lib}"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept_lib} -Wl,-rpath,${with_syscall_intercept_lib}"
fi
else
if test -n "${with_syscall_intercept}"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept}/lib -Wl,-rpath,${with_syscall_intercept}/lib"
fi
if test -d "${with_syscall_intercept}/lib64"; then :
if echo "$LDFLAGS_SYSCALL_INTERCEPT" | $FGREP -e "\<-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64\>" >/dev/null 2>&1; then :
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') contains '-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64', not appending" >&5
else
echo "LDFLAGS_SYSCALL_INTERCEPT(='$LDFLAGS_SYSCALL_INTERCEPT') does not contain '-L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64', appending" >&5
LDFLAGS_SYSCALL_INTERCEPT="$LDFLAGS_SYSCALL_INTERCEPT -L${with_syscall_intercept}/lib64 -Wl,-rpath,${with_syscall_intercept}/lib64"
fi
fi
fi
fi
if test -n "${with_syscall_intercept}" || test -n "${with_syscall_intercept_include}" || test -n "${with_syscall_intercept_lib}"; then :
WITH_SYSCALL_INTERCEPT=yes
else
WITH_SYSCALL_INTERCEPT=no
fi
if test "x$WITH_SYSCALL_INTERCEPT" == "xno" ; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for syscall_no_intercept in -lsyscall_intercept" >&5
$as_echo_n "checking for syscall_no_intercept in -lsyscall_intercept... " >&6; }
if ${ac_cv_lib_syscall_intercept_syscall_no_intercept+:} false; then :
$as_echo_n "(cached) " >&6
else
ac_check_lib_save_LIBS=$LIBS
LIBS="-lsyscall_intercept -lcapstone -ldl $LIBS"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
/* Override any GCC internal prototype to avoid an error.
Use char because int might match the return type of a GCC
builtin and then its argument prototype would still apply. */
#ifdef __cplusplus
extern "C"
#endif
char syscall_no_intercept ();
int
main ()
{
return syscall_no_intercept ();
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"; then :
ac_cv_lib_syscall_intercept_syscall_no_intercept=yes
else
ac_cv_lib_syscall_intercept_syscall_no_intercept=no
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
LIBS=$ac_check_lib_save_LIBS
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_syscall_intercept_syscall_no_intercept" >&5
$as_echo "$ac_cv_lib_syscall_intercept_syscall_no_intercept" >&6; }
if test "x$ac_cv_lib_syscall_intercept_syscall_no_intercept" = xyes; then :
syscall_intercept_lib_found=yes
else
syscall_intercept_lib_found=no
fi
if test "x$syscall_intercept_lib_found" != "xyes"; then :
{ $as_echo "$as_me:${as_lineno-$LINENO}: libsyscall_intercept.so not found" >&5
$as_echo "$as_me: libsyscall_intercept.so not found" >&6;}
fi
ac_fn_c_check_header_mongrel "$LINENO" "libsyscall_intercept_hook_point.h" "ac_cv_header_libsyscall_intercept_hook_point_h" "$ac_includes_default"
if test "x$ac_cv_header_libsyscall_intercept_hook_point_h" = xyes; then :
syscall_intercept_header_found=yes
else
syscall_intercept_header_found=no
fi
if test "x$syscall_intercept_header_found" != "xyes"; then :
{ $as_echo "$as_me:${as_lineno-$LINENO}: libsyscall_intercept_hook_point.h not found" >&5
$as_echo "$as_me: libsyscall_intercept_hook_point.h not found" >&6;}
fi
if test "x$syscall_intercept_lib_found" == "xyes" && test "x$syscall_intercept_header_found" == "xyes"; then :
WITH_SYSCALL_INTERCEPT=yes
else
WITH_SYSCALL_INTERCEPT=no
fi
fi
# Check whether --with-kernelsrc was given.
if test "${with_kernelsrc+set}" = set; then :
withval=$with_kernelsrc; WITH_KERNELSRC=$withval
@ -4286,7 +4492,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/man"
MANDIR="$prefix/share/man"
fi
;;
builtin-mic)
@ -4303,7 +4509,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/attached/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/attached/man"
MANDIR="$prefix/share/man"
fi
;;
builtin-x86)
@ -4320,7 +4526,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/attached/man"
MANDIR="$prefix/share/man"
fi
;;
smp-x86)
@ -4352,7 +4558,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/smp-x86/man"
MANDIR="$prefix/share/man"
fi
;;
smp-arm64)
@ -4384,7 +4590,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/smp-arm64/man"
MANDIR="$prefix/share/man"
fi
;;
*)
@ -4396,399 +4602,6 @@ KDIR="$WITH_KERNELSRC"
UNAME_R="$WITH_UNAME_R"
TARGET="$WITH_TARGET"
MCCTRL_LINUX_SYMTAB=""
case "X$WITH_SYSTEM_MAP" in
Xyes | Xno | X)
MCCTRL_LINUX_SYMTAB=""
;;
*)
MCCTRL_LINUX_SYMTAB="$WITH_SYSTEM_MAP"
;;
esac
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for System.map" >&5
$as_echo_n "checking for System.map... " >&6; }
if test -r "$MCCTRL_LINUX_SYMTAB"; then
MCCTRL_LINUX_SYMTAB="$MCCTRL_LINUX_SYMTAB"
elif test -r "/boot/System.map-`uname -r`"; then
MCCTRL_LINUX_SYMTAB="/boot/System.map-`uname -r`"
elif test -r "$KDIR/System.map"; then
MCCTRL_LINUX_SYMTAB="$KDIR/System.map"
fi
if test "$MCCTRL_LINUX_SYMTAB" == ""; then
as_fn_error $? "could not find" "$LINENO" 5
fi
if test -z "`eval cat $MCCTRL_LINUX_SYMTAB`"; then
as_fn_error $? "could not read System.map file, no read permission?" "$LINENO" 5
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $MCCTRL_LINUX_SYMTAB" >&5
$as_echo "$MCCTRL_LINUX_SYMTAB" >&6; }
MCCTRL_LINUX_SYMTAB_CMD="cat $MCCTRL_LINUX_SYMTAB"
# MCCTRL_FIND_KSYM(SYMBOL)
# ------------------------------------------------------
# Search System.map for address of the given symbol and
# do one of three things in config.h:
# If not found, leave MCCTRL_KSYM_foo undefined
# If found to be exported, "#define MCCTRL_KSYM_foo 0"
# If found not to be exported, "#define MCCTRL_KSYM_foo 0x<value>"
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_mount" >&5
$as_echo_n "checking System.map for symbol sys_mount... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_mount\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_mount\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_mount $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_umount" >&5
$as_echo_n "checking System.map for symbol sys_umount... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_umount\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_umount\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_umount $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_unshare" >&5
$as_echo_n "checking System.map for symbol sys_unshare... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_unshare\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_unshare\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_unshare $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol zap_page_range" >&5
$as_echo_n "checking System.map for symbol zap_page_range... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " zap_page_range\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_zap_page_range\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_zap_page_range $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_image_64" >&5
$as_echo_n "checking System.map for symbol vdso_image_64... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_image_64\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_image_64\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_image_64 $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_start" >&5
$as_echo_n "checking System.map for symbol vdso_start... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_start\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_start\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_start $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_end" >&5
$as_echo_n "checking System.map for symbol vdso_end... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_end\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_end\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_end $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_pages" >&5
$as_echo_n "checking System.map for symbol vdso_pages... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_pages\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_pages\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_pages $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol __vvar_page" >&5
$as_echo_n "checking System.map for symbol __vvar_page... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __vvar_page\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab___vvar_page\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM___vvar_page $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol hpet_address" >&5
$as_echo_n "checking System.map for symbol hpet_address... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " hpet_address\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_hpet_address\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_hpet_address $mcctrl_addr
_ACEOF
fi
# POSTK_DEBUG_ARCH_DEP_50, add:find kernel symbol.
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol vdso_spec" >&5
$as_echo_n "checking System.map for symbol vdso_spec... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " vdso_spec\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_vdso_spec\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_vdso_spec $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol hv_clock" >&5
$as_echo_n "checking System.map for symbol hv_clock... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " hv_clock\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_hv_clock\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_hv_clock $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol sys_readlink" >&5
$as_echo_n "checking System.map for symbol sys_readlink... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " sys_readlink\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_sys_readlink\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_sys_readlink $mcctrl_addr
_ACEOF
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking System.map for symbol walk_page_range" >&5
$as_echo_n "checking System.map for symbol walk_page_range... " >&6; }
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " walk_page_range\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: not found" >&5
$as_echo "not found" >&6; }
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_walk_page_range\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $mcctrl_result" >&5
$as_echo "$mcctrl_result" >&6; }
cat >>confdefs.h <<_ACEOF
#define MCCTRL_KSYM_walk_page_range $mcctrl_addr
_ACEOF
fi
case $ENABLE_MEMDUMP in
yes|no|auto)
;;
@ -4986,6 +4799,17 @@ else
$as_echo "$as_me: perf is disabled" >&6;}
fi
if test "x$WITH_SYSCALL_INTERCEPT" = "xyes" ; then
$as_echo "#define WITH_SYSCALL_INTERCEPT 1" >>confdefs.h
{ $as_echo "$as_me:${as_lineno-$LINENO}: syscall_intercept library is linked" >&5
$as_echo "$as_me: syscall_intercept library is linked" >&6;}
else
{ $as_echo "$as_me:${as_lineno-$LINENO}: syscall_intercept library isn't linked" >&5
$as_echo "$as_me: syscall_intercept library isn't linked" >&6;}
fi
if test "x$MCKERNEL_INCDIR" != "x" ; then
cat >>confdefs.h <<_ACEOF
@ -5052,6 +4876,9 @@ fi
@ -5060,9 +4887,14 @@ ac_config_headers="$ac_config_headers config.h"
# POSTK_DEBUG_ARCH_DEP_37
# AC_CONFIG_FILES arch dependfiles separate
ac_config_files="$ac_config_files Makefile executer/user/Makefile executer/user/mcexec.1:executer/user/mcexec.1in executer/user/vmcore2mckdump executer/user/arch/$ARCH/Makefile executer/user/arch/x86_64/Makefile executer/kernel/mcctrl/Makefile executer/kernel/mcctrl/arch/$ARCH/Makefile executer/kernel/mcoverlayfs/Makefile executer/kernel/mcoverlayfs/linux-3.10.0-327.36.1.el7/Makefile executer/kernel/mcoverlayfs/linux-4.0.9/Makefile executer/kernel/mcoverlayfs/linux-4.6.7/Makefile executer/include/qlmpilib.h kernel/Makefile kernel/Makefile.build kernel/include/swapfmt.h arch/x86_64/tools/mcreboot-attached-mic.sh arch/x86_64/tools/mcshutdown-attached-mic.sh arch/x86_64/tools/mcreboot-builtin-x86.sh arch/x86_64/tools/mcreboot-smp-x86.sh arch/x86_64/tools/mcstop+release-smp-x86.sh arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh arch/x86_64/tools/mcoverlay-create-smp-x86.sh arch/x86_64/tools/eclair-dump-backtrace.exp arch/x86_64/tools/mcshutdown-builtin-x86.sh arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in arch/x86_64/tools/irqbalance_mck.service arch/x86_64/tools/irqbalance_mck.in tools/mcstat/Makefile"
ac_config_files="$ac_config_files Makefile executer/user/Makefile executer/user/mcexec.1:executer/user/mcexec.1in executer/user/vmcore2mckdump executer/user/arch/$ARCH/Makefile executer/user/arch/x86_64/Makefile executer/kernel/mcctrl/Makefile executer/kernel/mcctrl/arch/$ARCH/Makefile executer/kernel/mcoverlayfs/Makefile executer/kernel/mcoverlayfs/linux-3.10.0-327.36.1.el7/Makefile executer/kernel/mcoverlayfs/linux-4.0.9/Makefile executer/kernel/mcoverlayfs/linux-4.6.7/Makefile executer/include/qlmpilib.h kernel/Makefile kernel/Makefile.build kernel/include/swapfmt.h arch/x86_64/tools/mcreboot-attached-mic.sh arch/x86_64/tools/mcshutdown-attached-mic.sh arch/x86_64/tools/mcreboot-builtin-x86.sh arch/x86_64/tools/mcreboot-smp-x86.sh arch/x86_64/tools/mcstop+release-smp-x86.sh arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh arch/x86_64/tools/mcoverlay-create-smp-x86.sh arch/x86_64/tools/eclair-dump-backtrace.exp arch/x86_64/tools/mcshutdown-builtin-x86.sh arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in arch/x86_64/tools/irqbalance_mck.service arch/x86_64/tools/irqbalance_mck.in tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in tools/mcstat/Makefile"
if test -e "${ABS_SRCDIR}/test"; then
ac_config_files="$ac_config_files mck_test_config.sample:test/mck_test_config.sample.in"
fi
if test "$TARGET" = "smp-x86"; then
ac_config_files="$ac_config_files arch/x86_64/kernel/Makefile.arch"
@ -5585,7 +5417,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by mckernel $as_me 1.5.0, which was
This file was extended by mckernel $as_me 1.6.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@ -5647,7 +5479,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
mckernel config.status 1.5.0
mckernel config.status 1.6.0
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"
@ -5797,7 +5629,9 @@ do
"arch/x86_64/tools/mcreboot.1") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in" ;;
"arch/x86_64/tools/irqbalance_mck.service") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/irqbalance_mck.service" ;;
"arch/x86_64/tools/irqbalance_mck.in") CONFIG_FILES="$CONFIG_FILES arch/x86_64/tools/irqbalance_mck.in" ;;
"tools/mcstat/mcstat.1") CONFIG_FILES="$CONFIG_FILES tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in" ;;
"tools/mcstat/Makefile") CONFIG_FILES="$CONFIG_FILES tools/mcstat/Makefile" ;;
"mck_test_config.sample") CONFIG_FILES="$CONFIG_FILES mck_test_config.sample:test/mck_test_config.sample.in" ;;
"arch/x86_64/kernel/Makefile.arch") CONFIG_FILES="$CONFIG_FILES arch/x86_64/kernel/Makefile.arch" ;;
"kernel/config/config.smp-arm64") CONFIG_FILES="$CONFIG_FILES kernel/config/config.smp-arm64" ;;
"arch/arm64/kernel/vdso/Makefile") CONFIG_FILES="$CONFIG_FILES arch/arm64/kernel/vdso/Makefile" ;;

View File

@ -1,9 +1,9 @@
# configure.ac COPYRIGHT FUJITSU LIMITED 2015-2016
AC_PREREQ(2.63)
m4_define([IHK_VERSION_m4],[1.5.0])dnl
m4_define([MCKERNEL_VERSION_m4],[1.5.0])dnl
m4_define([IHK_RELEASE_DATE_m4],[2018-04-05])dnl
m4_define([MCKERNEL_RELEASE_DATE_m4],[2018-04-05])dnl
m4_define([IHK_VERSION_m4],[1.6.0])dnl
m4_define([MCKERNEL_VERSION_m4],[1.6.0])dnl
m4_define([IHK_RELEASE_DATE_m4],[2018-11-11])dnl
m4_define([MCKERNEL_RELEASE_DATE_m4],[2018-11-11])dnl
AC_INIT([mckernel], MCKERNEL_VERSION_m4)
@ -77,6 +77,58 @@ AC_DEFUN([PAC_SET_HEADER_LIB_PATH],[
])
])
AC_DEFUN([PAC_SET_HEADER_LIB_PATH_SYSCALL_INTERCEPT],[
AC_ARG_WITH([$1],
[AC_HELP_STRING([--with-$1=PATH],
[specify path where $1 include directory and lib directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1=PATH expects a valid PATH])
with_$1=""])],
[with_$1=$2])
AC_ARG_WITH([$1-include],
[AC_HELP_STRING([--with-$1-include=PATH],
[specify path where $1 include directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1-include=PATH expects a valid PATH])
with_$1_include=""])],
[])
AC_ARG_WITH([$1-lib],
[AC_HELP_STRING([--with-$1-lib=PATH],
[specify path where $1 lib directory can be found])],
[AS_CASE(["$withval"],
[yes|no|''],
[AC_MSG_WARN([--with[out]-$1-lib=PATH expects a valid PATH])
with_$1_lib=""])],
[])
# The args have been sanitized into empty/non-empty values above.
# Now append -I/-L args to CPPFLAGS/LDFLAGS, with more specific options
# taking priority
AS_IF([test -n "${with_$1_include}"],
[PAC_APPEND_FLAG([-I${with_$1_include}],[CPPFLAGS_SYSCALL_INTERCEPT])],
[AS_IF([test -n "${with_$1}"],
[PAC_APPEND_FLAG([-I${with_$1}/include],[CPPFLAGS_SYSCALL_INTERCEPT])])])
AS_IF([test -n "${with_$1_lib}"],
[PAC_APPEND_FLAG([-L${with_$1_lib} -Wl,-rpath,${with_$1_lib}],[LDFLAGS_SYSCALL_INTERCEPT])],
[AS_IF([test -n "${with_$1}"],
dnl is adding lib64 by default really the right thing to do? What if
dnl we are on a 32-bit host that happens to have both lib dirs available?
[PAC_APPEND_FLAG([-L${with_$1}/lib -Wl,-rpath,${with_$1}/lib],[LDFLAGS_SYSCALL_INTERCEPT])
AS_IF([test -d "${with_$1}/lib64"],
[PAC_APPEND_FLAG([-L${with_$1}/lib64 -Wl,-rpath,${with_$1}/lib64],[LDFLAGS_SYSCALL_INTERCEPT])])
])
])
AS_IF([test -n "${with_$1}" || test -n "${with_$1_include}" || test -n "${with_$1_lib}"],
[WITH_SYSCALL_INTERCEPT=yes],
[WITH_SYSCALL_INTERCEPT=no])
])
IHK_VERSION=IHK_VERSION_m4
MCKERNEL_VERSION=MCKERNEL_VERSION_m4
DCFA_VERSION=DCFA_VERSION_m4
@ -95,6 +147,23 @@ AS_IF([test "x$numa_lib_found" != "xyes"],
PAC_SET_HEADER_LIB_PATH([mpi])
PAC_SET_HEADER_LIB_PATH_SYSCALL_INTERCEPT([syscall_intercept])
if test "x$WITH_SYSCALL_INTERCEPT" == "xno" ; then
AC_CHECK_LIB([syscall_intercept],[syscall_no_intercept],[syscall_intercept_lib_found=yes],[syscall_intercept_lib_found=no],[-lcapstone -ldl])
AS_IF([test "x$syscall_intercept_lib_found" != "xyes"],
[AC_MSG_NOTICE([libsyscall_intercept.so not found])])
AC_CHECK_HEADER([libsyscall_intercept_hook_point.h],[syscall_intercept_header_found=yes],[syscall_intercept_header_found=no])
AS_IF([test "x$syscall_intercept_header_found" != "xyes"],
[AC_MSG_NOTICE([libsyscall_intercept_hook_point.h not found])])
AS_IF([test "x$syscall_intercept_lib_found" == "xyes" && test "x$syscall_intercept_header_found" == "xyes"],
[WITH_SYSCALL_INTERCEPT=yes],
[WITH_SYSCALL_INTERCEPT=no])
fi
AC_ARG_WITH([kernelsrc],
AC_HELP_STRING(
[--with-kernelsrc=path],[Path to 'kernel src', default is /lib/modules/uname_r/build]),
@ -229,7 +298,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/man"
MANDIR="$prefix/share/man"
fi
;;
builtin-mic)
@ -246,7 +315,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/attached/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/attached/man"
MANDIR="$prefix/share/man"
fi
;;
builtin-x86)
@ -263,7 +332,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/attached/man"
MANDIR="$prefix/share/man"
fi
;;
smp-x86)
@ -295,7 +364,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/smp-x86/man"
MANDIR="$prefix/share/man"
fi
;;
smp-arm64)
@ -327,7 +396,7 @@ case $WITH_TARGET in
KMODDIR="$prefix/kmod"
fi
if test "X$MANDIR" = X; then
MANDIR="$prefix/smp-arm64/man"
MANDIR="$prefix/share/man"
fi
;;
*)
@ -339,78 +408,6 @@ KDIR="$WITH_KERNELSRC"
UNAME_R="$WITH_UNAME_R"
TARGET="$WITH_TARGET"
MCCTRL_LINUX_SYMTAB=""
case "X$WITH_SYSTEM_MAP" in
Xyes | Xno | X)
MCCTRL_LINUX_SYMTAB=""
;;
*)
MCCTRL_LINUX_SYMTAB="$WITH_SYSTEM_MAP"
;;
esac
AC_MSG_CHECKING([[for System.map]])
if test -r "$MCCTRL_LINUX_SYMTAB"; then
MCCTRL_LINUX_SYMTAB="$MCCTRL_LINUX_SYMTAB"
elif test -r "/boot/System.map-`uname -r`"; then
MCCTRL_LINUX_SYMTAB="/boot/System.map-`uname -r`"
elif test -r "$KDIR/System.map"; then
MCCTRL_LINUX_SYMTAB="$KDIR/System.map"
fi
if test "$MCCTRL_LINUX_SYMTAB" == ""; then
AC_MSG_ERROR([could not find])
fi
if test -z "`eval cat $MCCTRL_LINUX_SYMTAB`"; then
AC_MSG_ERROR([could not read System.map file, no read permission?])
fi
AC_MSG_RESULT([$MCCTRL_LINUX_SYMTAB])
MCCTRL_LINUX_SYMTAB_CMD="cat $MCCTRL_LINUX_SYMTAB"
# MCCTRL_FIND_KSYM(SYMBOL)
# ------------------------------------------------------
# Search System.map for address of the given symbol and
# do one of three things in config.h:
# If not found, leave MCCTRL_KSYM_foo undefined
# If found to be exported, "#define MCCTRL_KSYM_foo 0"
# If found not to be exported, "#define MCCTRL_KSYM_foo 0x<value>"
AC_DEFUN([MCCTRL_FIND_KSYM],[
AC_MSG_CHECKING([[System.map for symbol $1]])
mcctrl_addr=`eval $MCCTRL_LINUX_SYMTAB_CMD | grep " $1\$" | cut -d\ -f1`
if test -z $mcctrl_addr; then
AC_MSG_RESULT([not found])
else
mcctrl_result=$mcctrl_addr
mcctrl_addr="0x$mcctrl_addr"
m4_ifval([$2],[],[
if `eval $MCCTRL_LINUX_SYMTAB_CMD | grep " __ksymtab_$1\$" >/dev/null`; then
mcctrl_result="exported"
mcctrl_addr="0"
fi
])
AC_MSG_RESULT([$mcctrl_result])
AC_DEFINE_UNQUOTED(MCCTRL_KSYM_[]$1,$mcctrl_addr,[Define to address of kernel symbol $1, or 0 if exported])
fi
])
MCCTRL_FIND_KSYM([sys_mount])
MCCTRL_FIND_KSYM([sys_umount])
MCCTRL_FIND_KSYM([sys_unshare])
MCCTRL_FIND_KSYM([zap_page_range])
MCCTRL_FIND_KSYM([vdso_image_64])
MCCTRL_FIND_KSYM([vdso_start])
MCCTRL_FIND_KSYM([vdso_end])
MCCTRL_FIND_KSYM([vdso_pages])
MCCTRL_FIND_KSYM([__vvar_page])
MCCTRL_FIND_KSYM([hpet_address])
# POSTK_DEBUG_ARCH_DEP_50, add:find kernel symbol.
MCCTRL_FIND_KSYM([vdso_spec])
MCCTRL_FIND_KSYM([hv_clock])
MCCTRL_FIND_KSYM([sys_readlink])
MCCTRL_FIND_KSYM([walk_page_range])
case $ENABLE_MEMDUMP in
yes|no|auto)
;;
@ -489,6 +486,13 @@ else
AC_MSG_NOTICE([perf is disabled])
fi
if test "x$WITH_SYSCALL_INTERCEPT" = "xyes" ; then
AC_DEFINE([WITH_SYSCALL_INTERCEPT],[1],[whether or not syscall_intercept library is linked])
AC_MSG_NOTICE([syscall_intercept library is linked])
else
AC_MSG_NOTICE([syscall_intercept library isn't linked])
fi
if test "x$MCKERNEL_INCDIR" != "x" ; then
AC_DEFINE_UNQUOTED(MCKERNEL_INCDIR,"$MCKERNEL_INCDIR",[McKernel specific headers])
fi
@ -526,9 +530,12 @@ AC_SUBST(KMODDIR)
AC_SUBST(KERNDIR)
AC_SUBST(MANDIR)
AC_SUBST(CFLAGS)
AC_SUBST(CPPFLAGS_SYSCALL_INTERCEPT)
AC_SUBST(LDFLAGS_SYSCALL_INTERCEPT)
AC_SUBST(ENABLE_MCOVERLAYFS)
AC_SUBST(ENABLE_RUSAGE)
AC_SUBST(ENABLE_QLMPI)
AC_SUBST(WITH_SYSCALL_INTERCEPT)
AC_SUBST(IHK_VERSION)
AC_SUBST(MCKERNEL_VERSION)
@ -570,9 +577,16 @@ AC_CONFIG_FILES([
arch/x86_64/tools/mcreboot.1:arch/x86_64/tools/mcreboot.1in
arch/x86_64/tools/irqbalance_mck.service
arch/x86_64/tools/irqbalance_mck.in
tools/mcstat/mcstat.1:tools/mcstat/mcstat.1in
tools/mcstat/Makefile
])
if test -e "${ABS_SRCDIR}/test"; then
AC_CONFIG_FILES([
mck_test_config.sample:test/mck_test_config.sample.in
])
fi
if test "$TARGET" = "smp-x86"; then
AC_CONFIG_FILES([
arch/x86_64/kernel/Makefile.arch

View File

@ -55,13 +55,14 @@
#define MCEXEC_UP_SYS_UMOUNT 0x30a02915
#define MCEXEC_UP_SYS_UNSHARE 0x30a02916
#define MCEXEC_UP_UTIL_THREAD1 0x30a02920
#define MCEXEC_UP_UTIL_THREAD2 0x30a02921
#define MCEXEC_UP_UTI_GET_CTX 0x30a02920
#define MCEXEC_UP_UTI_SAVE_FS 0x30a02921
#define MCEXEC_UP_SIG_THREAD 0x30a02922
#define MCEXEC_UP_SYSCALL_THREAD 0x30a02924
#define MCEXEC_UP_TERMINATE_THREAD 0x30a02925
#define MCEXEC_UP_GET_NUM_POOL_THREADS 0x30a02926
#define MCEXEC_UP_UTI_ATTR 0x30a02927
#define MCEXEC_UP_RELEASE_USER_SPACE 0x30a02928
#define MCEXEC_UP_DEBUG_LOG 0x40000000
@ -91,6 +92,7 @@ struct program_image_section {
struct get_cpu_set_arg {
int nr_processes;
int *process_rank;
void *cpu_set;
size_t cpu_set_size; // Size in bytes
int *target_core;
@ -111,10 +113,8 @@ typedef unsigned long __cpu_set_unit;
struct program_load_desc {
int num_sections;
int status;
int cpu;
int pid;
int err;
int stack_prot;
int pgid;
int cred[8];
@ -142,8 +142,10 @@ struct program_load_desc {
unsigned long heap_extension;
long stack_premap;
unsigned long mpol_bind_mask;
int uti_thread_rank; /* N-th clone() spawns a thread on Linux CPU */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int nr_processes;
char shell_path[SHELL_PATH_MAX_LEN];
int process_rank;
__cpu_set_unit cpu_set[PLD_CPU_SET_SIZE];
int profile;
struct program_image_section sections[0];
@ -244,6 +246,28 @@ struct sys_unshare_desc {
unsigned long unshare_flags;
};
struct release_user_space_desc {
unsigned long user_start;
unsigned long user_end;
};
struct terminate_thread_desc {
int pid;
int tid;
long code;
/* 32------32 31--16 15--------8 7----0
exit_group exit-status signal */
unsigned long tsk; /* struct task_struct * */
};
struct rpgtable_desc {
uintptr_t rpgtable;
uintptr_t start;
uintptr_t len;
};
enum perf_ctrl_type {
PERF_CTRL_SET,
PERF_CTRL_GET,
@ -253,7 +277,7 @@ enum perf_ctrl_type {
struct perf_ctrl_desc {
enum perf_ctrl_type ctrl_type;
int status;
int err;
union {
/* for SET, GET */
struct {
@ -293,6 +317,10 @@ struct perf_ctrl_desc {
#define UTI_FLAG_HIGH_PRIORITY (1ULL<<12)
#define UTI_FLAG_NON_COOPERATIVE (1ULL<<13)
#define UTI_FLAG_PREFER_LWK (1ULL << 14)
#define UTI_FLAG_PREFER_FWK (1ULL << 15)
#define UTI_FLAG_FABRIC_INTR_AFFINITY (1ULL << 16)
/* Linux default value is used */
#define UTI_MAX_NUMA_DOMAINS (1024)
@ -311,6 +339,30 @@ struct kuti_attr {
struct uti_attr_desc {
unsigned long phys_attr;
char *uti_cpu_set_str; /* UTI_CPU_SET environmental variable */
size_t uti_cpu_set_len;
};
struct uti_ctx {
union {
char ctx[4096]; /* TODO: Get the size from config.h */
struct {
int uti_refill_tid;
};
};
};
struct uti_get_ctx_desc {
unsigned long rp_rctx; /* Remote physical address of remote context */
void *rctx; /* Remote context */
void *lctx; /* Local context */
int uti_refill_tid;
unsigned long key; /* OUT: struct task_struct* of mcexec thread, used to search struct host_thread */
};
struct uti_save_fs_desc {
void *rctx; /* Remote context */
void *lctx; /* Local context */
};
#endif

31
executer/include/uti.h Normal file
View File

@ -0,0 +1,31 @@
#ifndef UTI_H_INCLUDED
#define UTI_H_INCLUDED
struct syscall_struct {
int number;
unsigned long args[6];
unsigned long ret;
unsigned long uti_clv; /* copy of a clv in McKernel */
};
#define UTI_SZ_SYSCALL_STACK 16
/* Variables accessed by mcexec.c and syscall_intercept.c */
struct uti_desc {
char lctx[4096]; /* TODO: Get the size from config.h */
char rctx[4096]; /* TODO: Get the size from config.h */
int mck_tid; /* TODO: Move this out for multiple migrated-to-Linux threads */
unsigned long key; /* struct task_struct* of mcexec thread, used to search struct host_thread */
int pid, tid; /* Used as the id of tracee when issuing MCEXEC_UP_TERMINATE_THREAD */
unsigned long uti_clv; /* copy of McKernel clv */
int fd; /* /dev/mcosX */
struct syscall_struct syscall_stack[UTI_SZ_SYSCALL_STACK]; /* stack of system call arguments and return values */
int syscall_stack_top; /* stack-pointer of syscall arguments list */
long syscalls[512], syscalls2[512]; /* Syscall profile counters */
int start_syscall_intercept; /* Used to sync between mcexec.c and syscall_intercept.c */
};
#endif

View File

@ -1,6 +1,7 @@
/* archdeps.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <linux/version.h>
#include <linux/mm_types.h>
#include <linux/kallsyms.h>
#include <asm/vdso.h>
#include "../../../config.h"
#include "../../mcctrl.h"
@ -17,29 +18,31 @@
#define D(fmt, ...) printk("%s(%d) " fmt, __func__, __LINE__, ##__VA_ARGS__)
#ifdef MCCTRL_KSYM_vdso_start
# if MCCTRL_KSYM_vdso_start
void *vdso_start = (void *)MCCTRL_KSYM_vdso_start;
# endif
#else
# error missing address of vdso_start.
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 0, 0)
void *vdso_start;
void *vdso_end;
static struct vm_special_mapping (*vdso_spec)[2];
#endif
#ifdef MCCTRL_KSYM_vdso_end
# if MCCTRL_KSYM_vdso_end
void *vdso_end = (void *)MCCTRL_KSYM_vdso_end;
# endif
#else
# error missing address of vdso_end.
int arch_symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 0, 0)
vdso_start = (void *) kallsyms_lookup_name("vdso_start");
if (WARN_ON(!vdso_start))
return -EFAULT;
vdso_end = (void *) kallsyms_lookup_name("vdso_end");
if (WARN_ON(!vdso_end))
return -EFAULT;
vdso_spec = (void *) kallsyms_lookup_name("vdso_spec");
if (WARN_ON(!vdso_spec))
return -EFAULT;
#endif
#ifdef MCCTRL_KSYM_vdso_spec
# if MCCTRL_KSYM_vdso_spec
static struct vm_special_mapping (*vdso_spec)[2] = (void*)MCCTRL_KSYM_vdso_spec;
# endif
#else
# error missing address of vdso_spec.
#endif
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_52
#define VDSO_MAXPAGES 1

View File

@ -1,5 +1,6 @@
/* archdeps.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <linux/version.h>
#include <linux/kallsyms.h>
#include "../../../config.h"
#include "../../mcctrl.h"
@ -13,57 +14,46 @@
#endif
#endif /* POSTK_DEBUG_ARCH_DEP_83 */
#ifdef MCCTRL_KSYM_vdso_image_64
#if MCCTRL_KSYM_vdso_image_64
struct vdso_image *vdso_image = (void *)MCCTRL_KSYM_vdso_image_64;
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 16, 0)
static struct vdso_image *vdso_image_64;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 23)
static void *vdso_start;
static void *vdso_end;
static struct page **vdso_pages;
#endif
static void *__vvar_page;
static long *hpet_address;
static void **hv_clock;
int arch_symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 16, 0)
vdso_image_64 = (void *) kallsyms_lookup_name("vdso_image_64");
if (WARN_ON(!vdso_image_64))
return -EFAULT;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 23)
vdso_start = (void *) kallsyms_lookup_name("vdso_start");
if (WARN_ON(!vdso_start))
return -EFAULT;
vdso_end = (void *) kallsyms_lookup_name("vdso_end");
if (WARN_ON(!vdso_end))
return -EFAULT;
vdso_pages = (void *) kallsyms_lookup_name("vdso_pages");
if (WARN_ON(!vdso_pages))
return -EFAULT;
#endif
#ifdef MCCTRL_KSYM_vdso_start
#if MCCTRL_KSYM_vdso_start
void *vdso_start = (void *)MCCTRL_KSYM_vdso_start;
#endif
#endif
__vvar_page = (void *) kallsyms_lookup_name("__vvar_page");
if (WARN_ON(!__vvar_page))
return -EFAULT;
#ifdef MCCTRL_KSYM_vdso_end
#if MCCTRL_KSYM_vdso_end
void *vdso_end = (void *)MCCTRL_KSYM_vdso_end;
#endif
#endif
hpet_address = (void *) kallsyms_lookup_name("hpet_address");
hv_clock = (void *) kallsyms_lookup_name("hv_clock");
return 0;
}
#ifdef MCCTRL_KSYM_vdso_pages
#if MCCTRL_KSYM_vdso_pages
struct page **vdso_pages = (void *)MCCTRL_KSYM_vdso_pages;
#endif
#endif
#ifdef MCCTRL_KSYM___vvar_page
#if MCCTRL_KSYM___vvar_page
void *__vvar_page = (void *)MCCTRL_KSYM___vvar_page;
#endif
#endif
long *hpet_addressp
#ifdef MCCTRL_KSYM_hpet_address
#if MCCTRL_KSYM_hpet_address
= (void *)MCCTRL_KSYM_hpet_address;
#else
= &hpet_address;
#endif
#else
= NULL;
#endif
void **hv_clockp
#ifdef MCCTRL_KSYM_hv_clock
#if MCCTRL_KSYM_hv_clock
= (void *)MCCTRL_KSYM_hv_clock;
#else
= &hv_clock;
#endif
#else
= NULL;
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52
#define VDSO_MAXPAGES 2
@ -138,7 +128,7 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
/* VDSO pages */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0)
size = vdso_image->size;
size = vdso_image_64->size;
vdso->vdso_npages = size >> PAGE_SHIFT;
if (vdso->vdso_npages > VDSO_MAXPAGES) {
@ -148,7 +138,7 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
for (i = 0; i < vdso->vdso_npages; ++i) {
vdso->vdso_physlist[i] = virt_to_phys(
vdso_image->data + (i * PAGE_SIZE));
vdso_image_64->data + (i * PAGE_SIZE));
}
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,23)
size = vdso_end - vdso_start;
@ -185,36 +175,36 @@ void get_vdso_info(ihk_os_t os, long vdso_rpa)
#endif
/* HPET page */
if (hpet_addressp && *hpet_addressp) {
if (hpet_address && *hpet_address) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)(-2 * PAGE_SIZE);
vdso->hpet_phys = *hpet_addressp;
vdso->hpet_phys = *hpet_address;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)(-1 * PAGE_SIZE);
vdso->hpet_phys = *hpet_addressp;
vdso->hpet_phys = *hpet_address;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0)
vdso->hpet_is_global = 0;
vdso->hpet_virt = (void *)((vdso->vdso_npages + 1) * PAGE_SIZE);
vdso->hpet_phys = *hpet_addressp;
vdso->hpet_phys = *hpet_address;
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,23)
vdso->hpet_is_global = 1;
vdso->hpet_virt = (void *)fix_to_virt(VSYSCALL_HPET);
vdso->hpet_phys = *hpet_addressp;
vdso->hpet_phys = *hpet_address;
#endif
}
/* struct pvlock_vcpu_time_info table */
if (hv_clockp && *hv_clockp) {
if (hv_clock && *hv_clock) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
vdso->pvti_is_global = 0;
vdso->pvti_virt = (void *)(-1 * PAGE_SIZE);
vdso->pvti_phys = virt_to_phys(*hv_clockp);
vdso->pvti_phys = virt_to_phys(*hv_clock);
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,0)
vdso->pvti_is_global = 1;
vdso->pvti_virt = (void *)fix_to_virt(PVCLOCK_FIXMAP_BEGIN);
vdso->pvti_phys = virt_to_phys(*hv_clockp);
vdso->pvti_phys = virt_to_phys(*hv_clock);
#endif
}
@ -289,6 +279,14 @@ get_fs_ctx(void *ctx)
return tctx->fs;
}
unsigned long
get_rsp_ctx(void *ctx)
{
struct trans_uctx *tctx = ctx;
return tctx->rsp;
}
#ifdef POSTK_DEBUG_ARCH_DEP_83 /* arch depend translate_rva_to_rpa() move */
int translate_rva_to_rpa(ihk_os_t os, unsigned long rpt, unsigned long rva,
unsigned long *rpap, unsigned long *pgsizep)

View File

@ -125,7 +125,6 @@ static int load_elf(struct linux_binprm *bprm
for(i = 0, st = 0; mode != 2;){
if(st == 0){
off = p & ~PAGE_MASK;
#ifdef POSTK_DEBUG_ARCH_DEP_41 /* HOST-Linux version switch add */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,10,0)
rc = get_user_pages_remote(current, bprm->mm,
bprm->p, 1, FOLL_FORCE, &page, NULL, NULL);
@ -141,17 +140,6 @@ static int load_elf(struct linux_binprm *bprm
bprm->p, 1, 0, 1,
&page, NULL);
#endif
#else /* POSTK_DEBUG_ARCH_DEP_41 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0)
rc = get_user_pages_remote(current, bprm->mm,
bprm->p, 1, 0, 1,
&page, NULL);
#else
rc = get_user_pages(current, bprm->mm,
bprm->p, 1, 0, 1,
&page, NULL);
#endif
#endif /* POSTK_DEBUG_ARCH_DEP_41 */
if(rc <= 0) {
kfree(pbuf);
return -EFAULT;

File diff suppressed because it is too large Load Diff

View File

@ -28,6 +28,7 @@
#include <linux/slab.h>
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/kallsyms.h>
#include "mcctrl.h"
#include <ihk/ihk_host_user.h>
@ -43,8 +44,6 @@ extern void mcctrl_syscall_init(void);
extern void procfs_init(int);
extern void procfs_exit(int);
extern void rus_page_hash_init(void);
extern void rus_page_hash_put_pages(void);
extern void uti_attr_finalize(void);
extern void binfmt_mcexec_init(void);
extern void binfmt_mcexec_exit(void);
@ -84,13 +83,14 @@ static struct ihk_os_user_call_handler mcctrl_uchs[] = {
{ .request = MCEXEC_UP_SYS_MOUNT, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYS_UMOUNT, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYS_UNSHARE, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTIL_THREAD1, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTIL_THREAD2, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_GET_CTX, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_SAVE_FS, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SIG_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_SYSCALL_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_TERMINATE_THREAD, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_GET_NUM_POOL_THREADS, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_UTI_ATTR, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_RELEASE_USER_SPACE, .func = mcctrl_ioctl },
{ .request = MCEXEC_UP_DEBUG_LOG, .func = mcctrl_ioctl },
{ .request = IHK_OS_AUX_PERF_NUM, .func = mcctrl_ioctl },
{ .request = IHK_OS_AUX_PERF_SET, .func = mcctrl_ioctl },
@ -178,6 +178,7 @@ int mcctrl_os_shutdown_notifier(int os_index)
mdelay(200);
}
pager_cleanup();
sysfsm_cleanup(os[os_index]);
free_topology_info(os[os_index]);
ihk_os_unregister_user_call_handlers(os[os_index], mcctrl_uc + os_index);
@ -185,9 +186,6 @@ int mcctrl_os_shutdown_notifier(int os_index)
destroy_ikc_channels(os[os_index]);
procfs_exit(os_index);
}
#ifdef POSTK_DEBUG_TEMP_FIX_35 /* in shutdown phase, rus_page_hash_put_pages() call added. */
rus_page_hash_put_pages();
#endif /* POSTK_DEBUG_TEMP_FIX_35 */
os[os_index] = NULL;
@ -214,6 +212,68 @@ static struct ihk_os_notifier mcctrl_os_notifier = {
.ops = &mcctrl_os_notifier_ops,
};
int (*mcctrl_sys_mount)(char *dev_name, char *dir_name, char *type,
unsigned long flags, void *data);
int (*mcctrl_sys_umount)(char *dir_name, int flags);
int (*mcctrl_sys_unshare)(unsigned long unshare_flags);
long (*mcctrl_sched_setaffinity)(pid_t pid, const struct cpumask *in_mask);
int (*mcctrl_sched_setscheduler_nocheck)(struct task_struct *p, int policy,
const struct sched_param *param);
ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz);
void (*mcctrl_zap_page_range)(struct vm_area_struct *vma,
unsigned long start,
unsigned long size,
struct zap_details *details);
struct inode_operations *mcctrl_hugetlbfs_inode_operations;
static int symbols_init(void)
{
mcctrl_sys_mount = (void *) kallsyms_lookup_name("sys_mount");
if (WARN_ON(!mcctrl_sys_mount))
return -EFAULT;
mcctrl_sys_umount = (void *) kallsyms_lookup_name("sys_umount");
if (WARN_ON(!mcctrl_sys_umount))
return -EFAULT;
mcctrl_sys_unshare = (void *) kallsyms_lookup_name("sys_unshare");
if (WARN_ON(!mcctrl_sys_unshare))
return -EFAULT;
mcctrl_sched_setaffinity =
(void *) kallsyms_lookup_name("sched_setaffinity");
if (WARN_ON(!mcctrl_sched_setaffinity))
return -EFAULT;
mcctrl_sched_setscheduler_nocheck =
(void *) kallsyms_lookup_name("sched_setscheduler_nocheck");
if (WARN_ON(!mcctrl_sched_setscheduler_nocheck))
return -EFAULT;
mcctrl_sys_readlink =
(void *) kallsyms_lookup_name("sys_readlink");
if (WARN_ON(!mcctrl_sys_readlink))
return -EFAULT;
mcctrl_zap_page_range =
(void *) kallsyms_lookup_name("zap_page_range");
if (WARN_ON(!mcctrl_zap_page_range))
return -EFAULT;
mcctrl_hugetlbfs_inode_operations =
(void *) kallsyms_lookup_name("hugetlbfs_inode_operations");
if (WARN_ON(!mcctrl_hugetlbfs_inode_operations))
return -EFAULT;
return arch_symbols_init();
}
static int __init mcctrl_init(void)
{
int ret = 0;
@ -227,10 +287,11 @@ static int __init mcctrl_init(void)
os[i] = NULL;
}
rus_page_hash_init();
binfmt_mcexec_init();
if ((ret = symbols_init()))
goto error;
if ((ret = ihk_host_register_os_notifier(&mcctrl_os_notifier)) != 0) {
printk("mcctrl: error: registering OS notifier\n");
goto error;
@ -241,7 +302,6 @@ static int __init mcctrl_init(void)
error:
binfmt_mcexec_exit();
rus_page_hash_put_pages();
return ret;
}
@ -253,7 +313,6 @@ static void __exit mcctrl_exit(void)
}
binfmt_mcexec_exit();
rus_page_hash_put_pages();
uti_attr_finalize();
printk("mcctrl: unregistered.\n");

View File

@ -49,15 +49,125 @@
//struct mcctrl_channel *channels;
void mcexec_prepare_ack(ihk_os_t os, unsigned long arg, int err);
static void mcctrl_ikc_init(ihk_os_t os, int cpu, unsigned long rphys, struct ihk_ikc_channel_desc *c);
int mcexec_syscall(struct mcctrl_usrdata *ud, struct ikc_scd_packet *packet);
void sig_done(unsigned long arg, int err);
void mcctrl_perf_ack(ihk_os_t os, struct ikc_scd_packet *packet);
void mcctrl_futex_wake(struct ikc_scd_packet *pisp);
void mcctrl_os_read_write_cpu_response(ihk_os_t os,
struct ikc_scd_packet *pisp);
void mcctrl_eventfd(ihk_os_t os, struct ikc_scd_packet *pisp);
/* Assumes usrdata->wakeup_descs_lock taken */
static void mcctrl_wakeup_desc_cleanup(ihk_os_t os,
struct mcctrl_wakeup_desc *desc)
{
int i;
list_del(&desc->chain);
for (i = 0; i < desc->free_addrs_count; i++) {
kfree(desc->free_addrs[i]);
}
}
static void mcctrl_wakeup_cb(ihk_os_t os, struct ikc_scd_packet *packet)
{
struct mcctrl_wakeup_desc *desc = packet->reply;
WRITE_ONCE(desc->err, packet->err);
/*
* Check if the other side is still waiting, and signal it we're done.
*
* Setting status needs to be done last because the other side could
* wake up opportunistically between this set and the wake_up call.
*
* If the other side is no longer waiting, free the memory that was
* left for us.
*/
if (cmpxchg(&desc->status, 0, 1)) {
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
unsigned long flags;
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
mcctrl_wakeup_desc_cleanup(os, desc);
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
return;
}
wake_up_interruptible(&desc->wq);
}
int mcctrl_ikc_send_wait(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp,
long int timeout, struct mcctrl_wakeup_desc *desc,
int *do_frees, int free_addrs_count, ...)
{
int ret, i;
int alloc_desc = (desc == NULL);
va_list ap;
if (free_addrs_count)
*do_frees = 1;
if (alloc_desc)
desc = kmalloc(sizeof(struct mcctrl_wakeup_desc) +
(free_addrs_count + 1) * sizeof(void *),
GFP_KERNEL);
if (!desc) {
pr_warn("%s: Could not allocate wakeup descriptor", __func__);
return -ENOMEM;
}
pisp->reply = desc;
va_start(ap, free_addrs_count);
for (i = 0; i < free_addrs_count; i++) {
desc->free_addrs[i] = va_arg(ap, void*);
}
va_end(ap);
if (alloc_desc)
desc->free_addrs[free_addrs_count++] = desc;
desc->free_addrs_count = free_addrs_count;
init_waitqueue_head(&desc->wq);
WRITE_ONCE(desc->err, 0);
WRITE_ONCE(desc->status, 0);
ret = mcctrl_ikc_send(os, cpu, pisp);
if (ret < 0) {
pr_warn("%s: mcctrl_ikc_send failed: %d\n", __func__, ret);
kfree(desc);
return ret;
}
if (timeout) {
ret = wait_event_interruptible_timeout(desc->wq,
desc->status, timeout);
} else {
ret = wait_event_interruptible(desc->wq, desc->status);
}
/*
* Check if wait aborted (signal..) or timed out, and notify
* the callback it will need to free things for us
*/
if (!cmpxchg(&desc->status, 0, 1)) {
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
unsigned long flags;
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
list_add(&desc->chain, &usrdata->wakeup_descs_list);
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
if (do_frees)
*do_frees = 0;
return ret < 0 ? ret : -ETIME;
}
ret = READ_ONCE(desc->err);
if (alloc_desc)
kfree(desc);
return ret;
}
/* XXX: this runs in atomic context! */
static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
void *__packet, void *__os)
@ -72,25 +182,16 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
break;
case SCD_MSG_PREPARE_PROCESS_ACKED:
mcexec_prepare_ack(__os, pisp->arg, 0);
break;
case SCD_MSG_PREPARE_PROCESS_NACKED:
mcexec_prepare_ack(__os, pisp->arg, pisp->err);
case SCD_MSG_PERF_ACK:
case SCD_MSG_SEND_SIGNAL_ACK:
case SCD_MSG_PROCFS_ANSWER:
mcctrl_wakeup_cb(__os, pisp);
break;
case SCD_MSG_SYSCALL_ONESIDE:
mcexec_syscall(usrdata, pisp);
break;
case SCD_MSG_PROCFS_ANSWER:
procfs_answer(usrdata, pisp->pid);
break;
case SCD_MSG_SEND_SIGNAL:
sig_done(pisp->arg, pisp->err);
break;
case SCD_MSG_SYSFS_REQ_CREATE:
case SCD_MSG_SYSFS_REQ_MKDIR:
case SCD_MSG_SYSFS_REQ_SYMLINK:
@ -106,17 +207,14 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_PROCFS_TID_CREATE:
case SCD_MSG_PROCFS_TID_DELETE:
procfsm_packet_handler(__os, pisp->msg, pisp->pid, pisp->arg);
procfsm_packet_handler(__os, pisp->msg, pisp->pid, pisp->arg,
pisp->resp_pa);
break;
case SCD_MSG_GET_VDSO_INFO:
get_vdso_info(__os, pisp->arg);
break;
case SCD_MSG_PERF_ACK:
mcctrl_perf_ack(__os, pisp);
break;
case SCD_MSG_CPU_RW_REG_RESP:
mcctrl_os_read_write_cpu_response(__os, pisp);
break;
@ -126,6 +224,10 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
mcctrl_eventfd(__os, pisp);
break;
case SCD_MSG_FUTEX_WAKE:
mcctrl_futex_wake(pisp);
break;
default:
printk(KERN_ERR "mcctrl:syscall_packet_handler:"
"unknown message (%d.%d.%d.%d.%d.%#lx)\n",
@ -157,10 +259,15 @@ static int dummy_packet_handler(struct ihk_ikc_channel_desc *c,
int mcctrl_ikc_send(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata;
if (cpu < 0 || os == NULL || usrdata == NULL ||
cpu >= usrdata->num_channels || !usrdata->channels[cpu].c) {
if (cpu < 0 || os == NULL) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (usrdata == NULL || cpu >= usrdata->num_channels ||
!usrdata->channels[cpu].c) {
return -EINVAL;
}
return ihk_ikc_send(usrdata->channels[cpu].c, pisp, 0);
@ -354,6 +461,8 @@ int prepare_ikc_channels(ihk_os_t os)
mutex_init(&usrdata->part_exec.lock);
INIT_LIST_HEAD(&usrdata->part_exec.pli_list);
usrdata->part_exec.nr_processes = -1;
INIT_LIST_HEAD(&usrdata->wakeup_descs_list);
spin_lock_init(&usrdata->wakeup_descs_lock);
return 0;
@ -375,7 +484,9 @@ void __destroy_ikc_channel(ihk_os_t os, struct mcctrl_channel *pmc)
void destroy_ikc_channels(ihk_os_t os)
{
int i;
unsigned long flags;
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_wakeup_desc *mwd_entry, *mwd_next;
if (!usrdata) {
printk("%s: WARNING: no mcctrl_usrdata found\n", __FUNCTION__);
@ -395,6 +506,12 @@ void destroy_ikc_channels(ihk_os_t os)
ihk_ikc_destroy_channel(usrdata->ikc2linux[i]);
}
}
spin_lock_irqsave(&usrdata->wakeup_descs_lock, flags);
list_for_each_entry_safe(mwd_entry, mwd_next,
&usrdata->wakeup_descs_list, chain) {
mcctrl_wakeup_desc_cleanup(os, mwd_entry);
}
spin_unlock_irqrestore(&usrdata->wakeup_descs_lock, flags);
kfree(usrdata->channels);
kfree(usrdata->ikc2linux);

View File

@ -48,7 +48,6 @@
#define SCD_MSG_PREPARE_PROCESS 0x1
#define SCD_MSG_PREPARE_PROCESS_ACKED 0x2
#define SCD_MSG_PREPARE_PROCESS_NACKED 0x7
#define SCD_MSG_SCHEDULE_PROCESS 0x3
#define SCD_MSG_WAKE_UP_SYSCALL_THREAD 0x14
@ -56,7 +55,8 @@
#define SCD_MSG_INIT_CHANNEL_ACKED 0x6
#define SCD_MSG_SYSCALL_ONESIDE 0x4
#define SCD_MSG_SEND_SIGNAL 0x8
#define SCD_MSG_SEND_SIGNAL 0x7
#define SCD_MSG_SEND_SIGNAL_ACK 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_GET_VDSO_INFO 0xa
@ -67,6 +67,7 @@
#define SCD_MSG_PROCFS_DELETE 0x11
#define SCD_MSG_PROCFS_REQUEST 0x12
#define SCD_MSG_PROCFS_ANSWER 0x13
#define SCD_MSG_PROCFS_RELEASE 0x15
#define SCD_MSG_DEBUG_LOG 0x20
@ -101,23 +102,18 @@
#define SCD_MSG_CPU_RW_REG 0x52
#define SCD_MSG_CPU_RW_REG_RESP 0x53
#define SCD_MSG_FUTEX_WAKE 0x60
#define DMA_PIN_SHIFT 21
#define DO_USER_MODE
#define __NR_coredump 999
#ifdef POSTK_DEBUG_TEMP_FIX_61 /* Core table size and lseek return value to loff_t */
struct coretable {
loff_t len;
unsigned long addr;
};
#else /* POSTK_DEBUG_TEMP_FIX_61 */
struct coretable {
int len;
unsigned long addr;
};
#endif /* POSTK_DEBUG_TEMP_FIX_61 */
enum mcctrl_os_cpu_operation {
MCCTRL_OS_CPU_READ_REGISTER,
@ -125,9 +121,16 @@ enum mcctrl_os_cpu_operation {
MCCTRL_OS_CPU_MAX_OP
};
/* Used to wake-up a Linux thread futex_wait()-ing */
struct uti_futex_resp {
int done;
wait_queue_head_t wq;
};
struct ikc_scd_packet {
int msg;
int err;
void *reply;
union {
/* for traditional SCD_MSG_* */
struct {
@ -146,7 +149,7 @@ struct ikc_scd_packet {
long sysfs_arg3;
};
/* SCD_MSG_SCHEDULE_THREAD */
/* SCD_MSG_WAKE_UP_SYSCALL_THREAD */
struct {
int ttid;
};
@ -162,10 +165,17 @@ struct ikc_scd_packet {
struct {
int eventfd_type;
};
/* SCD_MSG_FUTEX_WAKE */
struct {
void *resp;
int *spin_sleep; /* 1: waiting in linux_wait_event() 0: woken up by someone else */
} futex;
};
char padding[12];
char padding[8];
};
struct mcctrl_priv {
ihk_os_t os;
struct program_load_desc *desc;
@ -211,9 +221,12 @@ struct mcctrl_channel {
};
struct mcctrl_per_thread_data {
struct mcctrl_per_proc_data *ppd;
struct list_head hash;
struct task_struct *task;
void *data;
int tid; /* debug */
atomic_t refcount;
};
#define MCCTRL_PER_THREAD_DATA_HASH_SHIFT 8
@ -231,7 +244,6 @@ struct mcctrl_per_proc_data {
struct list_head wq_list_exact; /* These requests come from IKC IRQ handler targeting a particular thread */
ihk_spinlock_t wq_list_lock;
wait_queue_head_t wq_prepare;
wait_queue_head_t wq_procfs;
struct list_head per_thread_data_hash[MCCTRL_PER_THREAD_DATA_HASH_SIZE];
@ -314,6 +326,7 @@ struct mcctrl_part_exec {
struct mutex lock;
int nr_processes;
int nr_processes_left;
int process_rank;
cpumask_t cpus_used;
struct list_head pli_list;
};
@ -342,6 +355,8 @@ struct mcctrl_usrdata {
wait_queue_head_t wq_procfs;
struct list_head per_proc_data_hash[MCCTRL_PER_PROC_DATA_HASH_SIZE];
rwlock_t per_proc_data_hash_lock[MCCTRL_PER_PROC_DATA_HASH_SIZE];
struct list_head wakeup_descs_list;
spinlock_t wakeup_descs_lock;
void **keys;
struct sysfsm_data sysfsm_data;
@ -367,12 +382,60 @@ int mcctrl_ikc_send(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp);
int mcctrl_ikc_send_msg(ihk_os_t os, int cpu, int msg, int ref, unsigned long arg);
int mcctrl_ikc_is_valid_thread(ihk_os_t os, int cpu);
struct mcctrl_wakeup_desc {
int status;
int err;
wait_queue_head_t wq;
struct list_head chain;
int free_addrs_count;
void *free_addrs[];
};
/* ikc query-and-wait helper
*
* Arguments:
* - os, cpu and pisp as per mcctl_ikc_send()
* - timeout: time to wait for reply in ms
* - desc: if set, memory area to be used for desc.
* Care must be taken to leave room for variable-length array.
* - do_free: returns bool that specify if the caller should free
* its memory on error (e.g. if ikc_send failed in the first place,
* the reply has no chance of coming and memory should be free)
* Always true on success.
* - free_addrs_count & ...: addresses to kmalloc'd pointers that
* are referenced in the message and must be left intact if we
* abort to timeout/signal.
*/
int mcctrl_ikc_send_wait(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp,
long int timeout, struct mcctrl_wakeup_desc *desc,
int *do_frees, int free_addrs_count, ...);
ihk_os_t osnum_to_os(int n);
/* look up symbols, plus arch-specific ones */
extern int (*mcctrl_sys_mount)(char *dev_name, char *dir_name, char *type,
unsigned long flags, void *data);
extern int (*mcctrl_sys_umount)(char *dir_name, int flags);
extern int (*mcctrl_sys_unshare)(unsigned long unshare_flags);
extern long (*mcctrl_sched_setaffinity)(pid_t pid,
const struct cpumask *in_mask);
extern int (*mcctrl_sched_setscheduler_nocheck)(struct task_struct *p,
int policy,
const struct sched_param *param);
extern ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz);
extern void (*mcctrl_zap_page_range)(struct vm_area_struct *vma,
unsigned long start,
unsigned long size,
struct zap_details *details);
extern struct inode_operations *mcctrl_hugetlbfs_inode_operations;
/* syscall.c */
void pager_add_process(void);
void pager_remove_process(struct mcctrl_per_proc_data *ppd);
void pager_cleanup(void);
int __do_in_kernel_irq_syscall(ihk_os_t os, struct ikc_scd_packet *packet);
int __do_in_kernel_syscall(ihk_os_t os, struct ikc_scd_packet *packet);
int mcctrl_add_per_proc_data(struct mcctrl_usrdata *ud, int pid,
struct mcctrl_per_proc_data *ppd);
@ -381,20 +444,18 @@ struct mcctrl_per_proc_data *mcctrl_get_per_proc_data(
struct mcctrl_usrdata *ud, int pid);
void mcctrl_put_per_proc_data(struct mcctrl_per_proc_data *ppd);
int mcctrl_add_per_thread_data(struct mcctrl_per_proc_data* ppd,
struct task_struct *task, void *data);
int mcctrl_delete_per_thread_data(struct mcctrl_per_proc_data* ppd,
struct task_struct *task);
int mcctrl_add_per_thread_data(struct mcctrl_per_proc_data *ppd, void *data);
void mcctrl_put_per_thread_data_unsafe(struct mcctrl_per_thread_data *ptd);
void mcctrl_put_per_thread_data(struct mcctrl_per_thread_data* ptd);
#ifdef POSTK_DEBUG_ARCH_DEP_56 /* Strange how to use inline declaration fix. */
static inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(
struct mcctrl_per_proc_data *ppd, struct task_struct *task)
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_per_proc_data *ppd, struct task_struct *task)
{
struct mcctrl_per_thread_data *ptd_iter, *ptd = NULL;
int hash = (((uint64_t)task >> 4) & MCCTRL_PER_THREAD_DATA_HASH_MASK);
unsigned long flags;
/* Check if data for this thread exists and return it */
read_lock_irqsave(&ppd->per_thread_data_hash_lock[hash], flags);
/* Check if data for this thread exists */
write_lock_irqsave(&ppd->per_thread_data_hash_lock[hash], flags);
list_for_each_entry(ptd_iter, &ppd->per_thread_data_hash[hash], hash) {
if (ptd_iter->task == task) {
@ -403,16 +464,27 @@ static inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(
}
}
read_unlock_irqrestore(&ppd->per_thread_data_hash_lock[hash], flags);
return ptd ? ptd->data : NULL;
if (ptd) {
if (atomic_read(&ptd->refcount) <= 0) {
printk("%s: ERROR: use-after-free detected (%d)", __FUNCTION__, atomic_read(&ptd->refcount));
ptd = NULL;
goto out;
}
atomic_inc(&ptd->refcount);
}
out:
write_unlock_irqrestore(&ppd->per_thread_data_hash_lock[hash], flags);
return ptd;
}
#else /* POSTK_DEBUG_ARCH_DEP_56 */
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(
struct mcctrl_per_proc_data *ppd, struct task_struct *task);
inline struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_per_proc_data *ppd, struct task_struct *task);
#endif /* POSTK_DEBUG_ARCH_DEP_56 */
int mcctrl_clear_pte_range(uintptr_t start, uintptr_t len);
void __return_syscall(ihk_os_t os, struct ikc_scd_packet *packet,
long ret, int stid);
int clear_pte_range(uintptr_t start, uintptr_t len);
int mcctrl_os_alive(void);
@ -424,7 +496,6 @@ struct procfs_read {
int count; /* bytes to read (request) */
int eof; /* if eof is detected, 1 otherwise 0. (answer)*/
int ret; /* read bytes (answer) */
int status; /* non-zero if done (answer) */
int newcpu; /* migrated new cpu (answer) */
int readwrite; /* 0:read, 1:write */
char fname[PROCFS_NAME_MAX]; /* procfs filename (request) */
@ -437,7 +508,8 @@ struct procfs_file {
};
void procfs_answer(struct mcctrl_usrdata *ud, int pid);
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg);
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg,
unsigned long resp_pa);
void add_tid_entry(int osnum, int pid, int tid);
void add_pid_entry(int osnum, int pid);
void delete_tid_entry(int osnum, int pid, int tid);
@ -473,7 +545,9 @@ struct vdso {
int reserve_user_space(struct mcctrl_usrdata *usrdata, unsigned long *startp,
unsigned long *endp);
int release_user_space(uintptr_t start, uintptr_t len);
void get_vdso_info(ihk_os_t os, long vdso_pa);
int arch_symbols_init(void);
struct get_cpu_mapping_req {
int busy; /* INOUT: */

View File

@ -103,33 +103,6 @@ getpath(struct procfs_list_entry *e, char *buf, int bufsize)
}
}
/**
* \brief Process SCD_MSG_PROCFS_ANSWER message.
*
* \param ud mcctrl_usrdata pointer
* \param pid PID of the requesting process
*/
void procfs_answer(struct mcctrl_usrdata *ud, int pid)
{
struct mcctrl_per_proc_data *ppd = NULL;
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(ud, pid);
if (unlikely(!ppd)) {
kprintf("%s: ERROR: no per-process structure for PID %d\n",
__FUNCTION__, pid);
return;
}
}
wake_up_all(pid > 0 ? &ppd->wq_procfs : &ud->wq_procfs);
if (pid > 0) {
mcctrl_put_per_proc_data(ppd);
}
}
static struct procfs_list_entry *
find_procfs_entry(struct procfs_list_entry *parent, const char *name)
{
@ -321,6 +294,8 @@ get_base_entry(int osnum)
if(!e){
e = add_procfs_entry(NULL, name, S_IFDIR | 0555,
uid, gid, NULL);
if (!e)
return NULL;
e->osnum = osnum;
}
return e;
@ -456,6 +431,8 @@ proc_exe_link(int osnum, int pid, const char *path)
e = add_procfs_entry(parent, "exe", S_IFLNK | 0777, uid, gid,
path);
if (!e)
goto out;
e->data = kmalloc(strlen(path) + 1, GFP_KERNEL);
strcpy(e->data, path);
task = find_procfs_entry(parent, "task");
@ -464,6 +441,7 @@ proc_exe_link(int osnum, int pid, const char *path)
uid, gid, path);
}
}
out:
up(&procfs_file_list_lock);
}
@ -509,7 +487,6 @@ procfs_exit(int osnum)
* This function conforms to the 2) way of fs/proc/generic.c
* from linux-2.6.39.4.
*/
#ifdef POSTK_DEBUG_TEMP_FIX_43 /* Fixed an issue that failed pread / pwrite of size larger than 4MB */
static ssize_t __mckernel_procfs_read_write(
struct file *file,
char __user *buf, size_t nbytes,
@ -520,7 +497,7 @@ static ssize_t __mckernel_procfs_read_write(
int order = 0;
volatile struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int ret, osnum, pid, retw;
int ret, osnum, pid;
unsigned long pbuf;
size_t count = nbytes;
size_t copy_size = 0;
@ -615,11 +592,11 @@ static ssize_t __mckernel_procfs_read_write(
while (count > 0) {
int this_len = min_t(ssize_t, count, copy_size);
int do_free;
r->pbuf = pbuf;
r->eof = 0;
r->ret = -EIO; /* default */
r->status = 0;
r->offset = offset;
r->count = this_len;
r->readwrite = read_write;
@ -629,50 +606,26 @@ static ssize_t __mckernel_procfs_read_write(
isp.arg = virt_to_phys(r);
isp.pid = pid;
ret = mcctrl_ikc_send(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0, &isp);
ret = mcctrl_ikc_send_wait(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0,
&isp, HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
if (ret < 0) {
goto out; /* error */
}
/* Wait for a reply. */
ret = -EIO; /* default exit code */
dprintk("%s: waiting for reply\n", __FUNCTION__);
retry_wait:
/* Wait for the status field of the procfs_read structure,
* wait on per-process or OS specific data depending on
* who the request is for.
*/
if (pid > 0) {
retw = wait_event_interruptible_timeout(ppd->wq_procfs,
r->status != 0, HZ);
}
else {
retw = wait_event_interruptible_timeout(udp->wq_procfs,
r->status != 0, HZ);
}
/* Timeout? */
if (retw == 0 && r->status == 0) {
printk("%s: error: timeout (1 sec)\n", __FUNCTION__);
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
ret = -ERESTART;
}
if (!do_free)
r = NULL;
goto out;
}
/* Interrupted? */
else if (retw == -ERESTARTSYS) {
ret = -ERESTART;
goto out;
}
/* Were we woken up by a reply to another procfs request? */
else if (r->status == 0) {
/* TODO: r->status is not set atomically, we could be woken
* up with status == 0 and it could change to 1 while in this
* code, we could potentially miss the wake_up()...
*/
printk("%s: stale wake-up, retrying\n", __FUNCTION__);
goto retry_wait;
}
/* Wake up and check the result. */
dprintk("%s: woke up. ret: %d, eof: %d\n",
@ -717,193 +670,6 @@ out:
return ret;
}
#else /* POSTK_DEBUG_TEMP_FIX_43 */
static ssize_t __mckernel_procfs_read_write(
struct file *file,
char __user *buf, size_t nbytes,
loff_t *ppos, int read_write)
{
struct inode * inode = file->f_inode;
char *kern_buffer = NULL;
int order = 0;
volatile struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int ret, osnum, pid, retw;
unsigned long pbuf;
unsigned long count = nbytes;
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0)
struct proc_dir_entry *dp = PDE(inode);
struct procfs_list_entry *e = dp->data;
#else
struct procfs_list_entry *e = PDE_DATA(inode);
#endif
loff_t offset = *ppos;
char pathbuf[PROCFS_NAME_MAX];
char *path, *p;
ihk_os_t os = NULL;
struct mcctrl_usrdata *udp = NULL;
struct mcctrl_per_proc_data *ppd = NULL;
if (count <= 0 || offset < 0) {
return 0;
}
path = getpath(e, pathbuf, PROCFS_NAME_MAX);
dprintk("%s: invoked for %s, offset: %lu, count: %lu\n",
__FUNCTION__, path,
(unsigned long)offset, count);
/* Verify OS number */
ret = sscanf(path, "mcos%d/", &osnum);
if (ret != 1) {
printk("%s: error: couldn't determine OS number\n", __FUNCTION__);
return -EINVAL;
}
if (osnum != e->osnum) {
printk("%s: error: OS numbers don't match\n", __FUNCTION__);
return -EINVAL;
}
/* Is this request for a specific process? */
p = strchr(path, '/') + 1;
ret = sscanf(p, "%d/", &pid);
if (ret != 1) {
pid = -1;
}
os = osnum_to_os(osnum);
if (!os) {
printk("%s: error: no IHK OS data found for OS %d\n",
__FUNCTION__, osnum);
return -EINVAL;
}
udp = ihk_host_os_get_usrdata(os);
if (!udp) {
printk("%s: error: no MCCTRL data found for OS %d\n",
__FUNCTION__, osnum);
return -EINVAL;
}
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(udp, pid);
if (unlikely(!ppd)) {
printk("%s: error: no per-process structure for PID %d",
__FUNCTION__, pid);
return -EINVAL;
}
}
while ((1 << order) < count) ++order;
if (order > 12) {
order -= 12;
}
else {
order = 1;
}
/* NOTE: we need physically contigous memory to pass through IKC */
kern_buffer = (char *)__get_free_pages(GFP_KERNEL, order);
if (!kern_buffer) {
printk("%s: ERROR: allocating kernel buffer\n", __FUNCTION__);
ret = -ENOMEM;
goto out;
}
pbuf = virt_to_phys(kern_buffer);
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
ret = -ENOMEM;
goto out;
}
r->pbuf = pbuf;
r->eof = 0;
r->ret = -EIO; /* default */
r->status = 0;
r->offset = offset;
r->count = count;
r->readwrite = read_write;
strncpy((char *)r->fname, path, PROCFS_NAME_MAX);
isp.msg = SCD_MSG_PROCFS_REQUEST;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = pid;
ret = mcctrl_ikc_send(osnum_to_os(e->osnum),
(pid > 0) ? ppd->ikc_target_cpu : 0, &isp);
if (ret < 0) {
goto out; /* error */
}
/* Wait for a reply. */
ret = -EIO; /* default exit code */
dprintk("%s: waiting for reply\n", __FUNCTION__);
retry_wait:
/* Wait for the status field of the procfs_read structure,
* wait on per-process or OS specific data depending on
* who the request is for.
*/
if (pid > 0) {
retw = wait_event_interruptible_timeout(ppd->wq_procfs,
r->status != 0, 5 * HZ);
}
else {
retw = wait_event_interruptible_timeout(udp->wq_procfs,
r->status != 0, 5 * HZ);
}
/* Timeout? */
if (retw == 0 && r->status == 0) {
printk("%s: error: timeout (1 sec)\n", __FUNCTION__);
goto out;
}
/* Interrupted? */
else if (retw == -ERESTARTSYS) {
ret = -ERESTART;
goto out;
}
/* Were we woken up by a reply to another procfs request? */
else if (r->status == 0) {
/* TODO: r->status is not set atomically, we could be woken
* up with status == 0 and it could change to 1 while in this
* code, we could potentially miss the wake_up()...
*/
printk("%s: stale wake-up, retrying\n", __FUNCTION__);
goto retry_wait;
}
/* Wake up and check the result. */
dprintk("%s: woke up. ret: %d, eof: %d\n",
__FUNCTION__, r->ret, r->eof);
if (r->ret > 0) {
if (read_write == 0) {
if (copy_to_user(buf, kern_buffer, r->ret)) {
printk("%s: ERROR: copy_to_user failed.\n", __FUNCTION__);
ret = -EFAULT;
goto out;
}
}
*ppos += r->ret;
}
ret = r->ret;
out:
if (ppd)
mcctrl_put_per_proc_data(ppd);
if (kern_buffer)
free_pages((uintptr_t)kern_buffer, order);
if (r)
kfree((void *)r);
return ret;
}
#endif /* POSTK_DEBUG_TEMP_FIX_43 */
static ssize_t mckernel_procfs_read(struct file *file,
char __user *buf, size_t nbytes, loff_t *ppos)
@ -939,33 +705,48 @@ struct procfs_work {
int msg;
int pid;
unsigned long arg;
unsigned long resp_pa;
struct work_struct work;
};
static void procfsm_work_main(struct work_struct *work0)
{
struct procfs_work *work = container_of(work0, struct procfs_work, work);
unsigned long phys;
int *done;
switch (work->msg) {
case SCD_MSG_PROCFS_TID_CREATE:
add_tid_entry(ihk_host_os_get_index(work->os), work->pid, work->arg);
break;
case SCD_MSG_PROCFS_TID_CREATE:
add_tid_entry(ihk_host_os_get_index(work->os),
work->pid, work->arg);
phys = ihk_device_map_memory(ihk_os_to_dev(work->os),
work->resp_pa, sizeof(int));
done = ihk_device_map_virtual(ihk_os_to_dev(work->os),
phys, sizeof(int), NULL, 0);
*done = 1;
ihk_device_unmap_virtual(ihk_os_to_dev(work->os),
done, sizeof(int));
ihk_device_unmap_memory(ihk_os_to_dev(work->os),
phys, sizeof(int));
break;
case SCD_MSG_PROCFS_TID_DELETE:
delete_tid_entry(ihk_host_os_get_index(work->os), work->pid, work->arg);
break;
case SCD_MSG_PROCFS_TID_DELETE:
delete_tid_entry(ihk_host_os_get_index(work->os),
work->pid, work->arg);
break;
default:
printk("%s: unknown work: msg: %d, pid: %d, arg: %lu)\n",
__FUNCTION__, work->msg, work->pid, work->arg);
break;
default:
pr_warn("%s: unknown work: msg: %d, pid: %d, arg: %lu)\n",
__func__, work->msg, work->pid, work->arg);
break;
}
kfree(work);
return;
}
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg)
int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg,
unsigned long resp_pa)
{
struct procfs_work *work = NULL;
@ -979,6 +760,7 @@ int procfsm_packet_handler(void *os, int msg, int pid, unsigned long arg)
work->msg = msg;
work->pid = pid;
work->arg = arg;
work->resp_pa = resp_pa;
INIT_WORK(&work->work, &procfsm_work_main);
schedule_work(&work->work);
@ -997,6 +779,303 @@ static const struct file_operations mckernel_forward = {
.write = mckernel_procfs_write,
};
#define PA_NULL (-1L)
struct mckernel_procfs_buffer_info {
unsigned long top_pa;
unsigned long cur_pa;
ihk_os_t os;
int pid;
char path[0];
};
struct mckernel_procfs_buffer {
unsigned long next_pa;
unsigned long pos;
unsigned long size;
char buf[0];
};
static int mckernel_procfs_buff_open(struct inode *inode, struct file *file)
{
struct mckernel_procfs_buffer_info *info;
int pid;
int ret;
char *path;
char *path_buf;
char *p;
ihk_os_t os;
#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
struct proc_dir_entry *dp = PDE(inode);
struct procfs_list_entry *e = dp->data;
#else
struct procfs_list_entry *e = PDE_DATA(inode);
#endif
os = osnum_to_os(e->osnum);
if (!os) {
return -EINVAL;
}
path_buf = kmalloc(PROCFS_NAME_MAX, GFP_KERNEL);
if (!path_buf) {
return -ENOMEM;
}
path = getpath(e, path_buf, PROCFS_NAME_MAX);
p = strchr(path, '/') + 1;
ret = sscanf(p, "%d/", &pid);
if (ret != 1) {
pid = -1;
}
info = kmalloc(sizeof(struct mckernel_procfs_buffer_info) +
strlen(path) + 1, GFP_KERNEL);
if (!info) {
kfree(path_buf);
return -ENOMEM;
}
info->top_pa = PA_NULL;
info->cur_pa = PA_NULL;
info->os = os;
info->pid = pid;
strcpy(info->path, path);
file->private_data = info;
kfree(path_buf);
return 0;
}
static int mckernel_procfs_buff_release(struct inode *inode, struct file *file)
{
struct mckernel_procfs_buffer_info *info = file->private_data;
int rc = 0;
if (!info) {
return -EIO;
}
file->private_data = NULL;
if (info->top_pa != PA_NULL) {
int ret;
struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
int do_free;
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
rc = -ENOMEM;
goto out;
}
memset(r, '\0', sizeof(struct procfs_read));
r->pbuf = info->top_pa;
r->ret = -EIO; /* default */
r->fname[0] = '\0';
isp.msg = SCD_MSG_PROCFS_RELEASE;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = 0;
rc = -EIO;
ret = mcctrl_ikc_send_wait(info->os, 0,
&isp, 5 * HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
if (ret < 0) {
rc = ret;
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
rc = -ERESTART;
}
if (!do_free)
r = NULL;
goto out;
}
if (r->ret < 0) {
rc = r->ret;
goto out;
}
rc = 0;
out:
if (r)
kfree((void *)r);
}
kfree(info);
return rc;
}
static ssize_t mckernel_procfs_buff_read(struct file *file, char __user *ubuf,
size_t nbytes, loff_t *ppos)
{
struct mckernel_procfs_buffer_info *info = file->private_data;
unsigned long phys;
struct mckernel_procfs_buffer *buf;
int pos = *ppos;
ssize_t l = 0;
int done = 0;
ihk_os_t os;
if (nbytes <= 0 || *ppos < 0) {
return 0;
}
if (!info) {
return -EIO;
}
os = info->os;
if (info->top_pa == PA_NULL) {
int ret;
int pid = info->pid;
struct procfs_read *r = NULL;
struct ikc_scd_packet isp;
struct mcctrl_usrdata *udp = NULL;
struct mcctrl_per_proc_data *ppd = NULL;
int do_free;
udp = ihk_host_os_get_usrdata(os);
if (!udp) {
pr_err("%s: no MCCTRL data found for OS\n",
__func__);
return -EINVAL;
}
if (pid > 0) {
ppd = mcctrl_get_per_proc_data(udp, pid);
if (unlikely(!ppd)) {
pr_err("%s: no per-process structure for PID %d",
__func__, pid);
return -EINVAL;
}
}
r = kmalloc(sizeof(struct procfs_read), GFP_KERNEL);
if (r == NULL) {
l = -ENOMEM;
done = 1;
goto out;
}
memset(r, '\0', sizeof(struct procfs_read));
r->pbuf = PA_NULL;
r->ret = -EIO; /* default */
strncpy((char *)r->fname, info->path, PROCFS_NAME_MAX);
isp.msg = SCD_MSG_PROCFS_REQUEST;
isp.ref = 0;
isp.arg = virt_to_phys(r);
isp.pid = pid;
l = -EIO;
done = 1;
ret = mcctrl_ikc_send_wait(os,
(pid > 0) ? ppd->ikc_target_cpu : 0,
&isp, 5 * HZ, NULL, &do_free, 1, r);
if (!do_free && ret >= 0) {
ret = -EIO;
}
if (ret < 0) {
l = ret;
if (ret == -ETIME) {
pr_info("%s: error: timeout (1 sec)\n",
__func__);
}
else if (ret == -ERESTARTSYS) {
l = -ERESTART;
}
if (!do_free)
r = NULL;
goto out;
}
if (r->ret < 0) {
l = r->ret;
goto out;
}
done = 0;
l = 0;
info->top_pa = info->cur_pa = r->pbuf;
out:
if (ppd)
mcctrl_put_per_proc_data(ppd);
if (r)
kfree((void *)r);
}
if (info->cur_pa == PA_NULL) {
info->cur_pa = info->top_pa;
}
while (!done && info->cur_pa != PA_NULL) {
long bpos;
long bsize;
phys = ihk_device_map_memory(ihk_os_to_dev(os), info->cur_pa,
PAGE_SIZE);
#ifdef CONFIG_MIC
buf = ioremap_wc(phys, PAGE_SIZE);
#else
buf = ihk_device_map_virtual(ihk_os_to_dev(os), phys,
PAGE_SIZE, NULL, 0);
#endif
if (pos < buf->pos) {
info->cur_pa = info->top_pa;
goto rep;
}
if (pos >= buf->pos + buf->size) {
info->cur_pa = buf->next_pa;
goto rep;
}
bpos = pos - buf->pos;
bsize = (buf->pos + buf->size) - pos;
if (bsize > (nbytes - l)) {
bsize = nbytes - l;
}
if (copy_to_user(ubuf, buf->buf + bpos, bsize)) {
done = 1;
pos = *ppos;
l = -EFAULT;
}
else {
ubuf += bsize;
pos += bsize;
l += bsize;
if (l == nbytes) {
done = 1;
}
}
rep:
#ifdef CONFIG_MIC
iounmap(buf);
#else
ihk_device_unmap_virtual(ihk_os_to_dev(os), buf, PAGE_SIZE);
#endif
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, PAGE_SIZE);
};
*ppos = pos;
return l;
}
static const struct file_operations mckernel_buff_io = {
.llseek = mckernel_procfs_lseek,
.read = mckernel_procfs_buff_read,
.write = NULL,
.open = mckernel_procfs_buff_open,
.release = mckernel_procfs_buff_release,
};
static const struct procfs_entry tid_entry_stuff[] = {
// PROC_REG("auxv", S_IRUSR, NULL),
// PROC_REG("clear_refs", S_IWUSR, NULL),
@ -1006,10 +1085,10 @@ static const struct procfs_entry tid_entry_stuff[] = {
// PROC_LNK("exe", mckernel_readlink),
// PROC_REG("limits", S_IRUSR|S_IWUSR, NULL),
// PROC_REG("maps", S_IRUGO, NULL),
PROC_REG("mem", S_IRUSR|S_IWUSR, NULL),
PROC_REG("mem", 0600, NULL),
// PROC_REG("pagemap", S_IRUGO, NULL),
// PROC_REG("smaps", S_IRUGO, NULL),
PROC_REG("stat", S_IRUGO, NULL),
PROC_REG("stat", 0444, &mckernel_buff_io),
// PROC_REG("statm", S_IRUGO, NULL),
// PROC_REG("status", S_IRUGO, NULL),
// PROC_REG("syscall", S_IRUGO, NULL),
@ -1018,26 +1097,26 @@ static const struct procfs_entry tid_entry_stuff[] = {
};
static const struct procfs_entry pid_entry_stuff[] = {
PROC_REG("auxv", S_IRUSR, NULL),
PROC_REG("auxv", 0400, &mckernel_buff_io),
/* Support the case where McKernel process retrieves its job-id under the Fujitsu TCS suite. */
// PROC_REG("cgroup", S_IXUSR, NULL),
// PROC_REG("clear_refs", S_IWUSR, NULL),
PROC_REG("cmdline", S_IRUGO, NULL),
// PROC_REG("comm", S_IRUGO|S_IWUSR, NULL),
PROC_REG("cmdline", 0444, &mckernel_buff_io),
PROC_REG("comm", 0644, &mckernel_buff_io),
// PROC_REG("coredump_filter", S_IRUGO|S_IWUSR, NULL),
PROC_REG("cpuset", S_IXUSR, NULL),
// PROC_REG("cpuset", S_IRUGO, NULL),
// PROC_REG("environ", S_IRUSR, NULL),
// PROC_LNK("exe", mckernel_readlink),
// PROC_REG("limits", S_IRUSR|S_IWUSR, NULL),
PROC_REG("maps", S_IRUGO, NULL),
PROC_REG("mem", S_IRUSR|S_IWUSR, NULL),
PROC_REG("pagemap", S_IRUGO, NULL),
PROC_REG("smaps", S_IRUGO, NULL),
// PROC_REG("stat", S_IRUGO, NULL),
PROC_REG("maps", 0444, &mckernel_buff_io),
PROC_REG("mem", 0400, NULL),
PROC_REG("pagemap", 0444, NULL),
// PROC_REG("smaps", S_IRUGO, NULL),
// PROC_REG("stat", 0444, &mckernel_buff_io),
// PROC_REG("statm", S_IRUGO, NULL),
PROC_REG("status", S_IRUGO, NULL),
PROC_REG("status", 0444, &mckernel_buff_io),
// PROC_REG("syscall", S_IRUGO, NULL),
PROC_DIR("task", S_IRUGO|S_IXUGO),
PROC_DIR("task", 0555),
// PROC_REG("wchan", S_IRUGO, NULL),
PROC_TERM
};
@ -1045,14 +1124,14 @@ static const struct procfs_entry pid_entry_stuff[] = {
static const struct procfs_entry base_entry_stuff[] = {
// PROC_REG("cmdline", S_IRUGO, NULL),
#ifdef POSTK_DEBUG_ARCH_DEP_42 /* /proc/cpuinfo support added. */
PROC_REG("cpuinfo", S_IRUGO, NULL),
PROC_REG("cpuinfo", 0444, &mckernel_buff_io),
#else /* POSTK_DEBUG_ARCH_DEP_42 */
// PROC_REG("cpuinfo", S_IRUGO, NULL),
#endif /* POSTK_DEBUG_ARCH_DEP_42 */
// PROC_REG("meminfo", S_IRUGO, NULL),
// PROC_REG("pagetypeinfo",S_IRUGO, NULL),
// PROC_REG("softirq", S_IRUGO, NULL),
PROC_REG("stat", S_IRUGO, NULL),
PROC_REG("stat", 0444, &mckernel_buff_io),
// PROC_REG("uptime", S_IRUGO, NULL),
// PROC_REG("version", S_IRUGO, NULL),
// PROC_REG("vmallocinfo",S_IRUSR, NULL),

File diff suppressed because it is too large Load Diff

View File

@ -790,6 +790,7 @@ out:
return error;
} /* setup_node_files() */
#ifdef SETUP_PCI_FILES
static int read_file(void *buf, size_t size, char *fmt, va_list ap)
{
int error;
@ -798,7 +799,6 @@ static int read_file(void *buf, size_t size, char *fmt, va_list ap)
int n;
struct file *fp = NULL;
loff_t off;
mm_segment_t ofs;
ssize_t ss;
dprintk("read_file(%p,%ld,%s,%p)\n", buf, size, fmt, ap);
@ -824,13 +824,14 @@ static int read_file(void *buf, size_t size, char *fmt, va_list ap)
}
off = 0;
ofs = get_fs();
set_fs(KERNEL_DS);
ss = vfs_read(fp, buf, size, &off);
set_fs(ofs);
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
ss = kernel_read(fp, buf, size, &off);
#else
ss = kernel_read(fp, off, buf, size);
#endif
if (ss < 0) {
error = ss;
eprintk("mcctrl:read_file:vfs_read failed. %d\n", error);
eprintk("mcctrl:read_file:kernel_read failed. %d\n", error);
goto out;
}
if (ss >= size) {
@ -892,16 +893,6 @@ out:
return error;
} /* read_long() */
#ifdef MCCTRL_KSYM_sys_readlink
static ssize_t (*mcctrl_sys_readlink)(const char *path, char *buf,
size_t bufsiz)
#if MCCTRL_KSYM_sys_readlink
= (void *)MCCTRL_KSYM_sys_readlink;
#else
= &sys_readlink;
#endif
#endif
static int read_link(char *buf, size_t bufsize, char *fmt, ...)
{
int error;
@ -951,30 +942,14 @@ out:
return error;
} /* read_link() */
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
static int setup_one_pci(struct mcctrl_usrdata *udp, const char *name)
{
#else /* POSTK_DEBUG_TEMP_FIX_22 */
static int setup_one_pci(void *arg0, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type)
{
struct mcctrl_usrdata *udp = arg0;
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
int error;
char *buf = NULL;
long node;
struct sysfsm_bitmap_param param;
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
dprintk("setup_one_pci(%p,%s)\n", udp, name);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_one_pci(%p,%s,%d,%#lx,%#lx,%d)\n",
arg0, name, namlen, (long)offset, (long)ino, d_type);
if (namlen != 12) {
error = 0;
goto out;
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
buf = (void *)__get_free_pages(GFP_KERNEL, 0);
if (!buf) {
@ -1026,26 +1001,39 @@ static int setup_one_pci(void *arg0, const char *name, int namlen,
error = 0;
out:
free_pages((long)buf, 0);
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
dprintk("setup_one_pci(%p,%s): %d\n", udp, name, error);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_one_pci(%p,%s,%d,%#lx,%#lx,%d): %d\n",
arg0, name, namlen, (long)offset, (long)ino, d_type,
error);
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
return error;
} /* setup_one_pci() */
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
LIST_HEAD(pci_file_name_list);
struct pci_file_name {
char *name;
struct list_head chain;
};
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
struct mcctrl_filler_args {
struct dir_context ctx;
void *buf;
};
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0)
static int pci_file_name_gen(struct dir_context *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
#else
static int pci_file_name_gen(void *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned int d_type)
#endif
{
struct mcctrl_filler_args *args
= container_of(ctx, struct mcctrl_filler_args, ctx);
void *buf = args->buf;
#else
static int pci_file_name_gen(void *buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type)
{
#endif
struct pci_file_name *p;
int error = -1;
@ -1083,56 +1071,31 @@ out:
buf, name, namlen, (long)offset, (long)ino, d_type, error);
return error;
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
typedef int (*mcctrl_filldir_t)(void *buf, const char *name, int namlen,
loff_t offset, u64 ino, unsigned d_type);
struct mcctrl_filler_args {
struct dir_context ctx;
mcctrl_filldir_t filler;
void *buf;
};
static int mcctrl_filler(struct dir_context *ctx, const char *name,
int namlen, loff_t offset, u64 ino, unsigned d_type)
{
struct mcctrl_filler_args *args
= container_of(ctx, struct mcctrl_filler_args, ctx);
return (*args->filler)(args->buf, name, namlen, offset, ino, d_type);
} /* mcctrl_filler() */
static inline int mcctrl_vfs_readdir(struct file *file,
mcctrl_filldir_t filler, void *buf)
static inline int mcctrl_vfs_readdir(struct file *file, filldir_t filler,
void *buf)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
struct mcctrl_filler_args args = {
.ctx.actor = &mcctrl_filler,
.filler = (void *)filler,
.ctx.actor = filler,
.buf = buf,
};
return iterate_dir(file, &args.ctx);
} /* mcctrl_vfs_readdir() */
#else
static inline int mcctrl_vfs_readdir(struct file *file, filldir_t filler,
void *buf)
{
return vfs_readdir(file, filler, buf);
} /* mcctrl_vfs_readdir() */
#endif
} /* mcctrl_vfs_readdir() */
static int setup_pci_files(struct mcctrl_usrdata *udp)
{
int error;
int er;
struct file *fp = NULL;
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
int ret = 0;
struct pci_file_name *cur;
struct pci_file_name *next;
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
dprintk("setup_pci_files(%p)\n", udp);
fp = filp_open("/sys/bus/pci/devices", O_DIRECTORY, 0);
@ -1142,18 +1105,13 @@ static int setup_pci_files(struct mcctrl_usrdata *udp)
goto out;
}
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
error = mcctrl_vfs_readdir(fp, &pci_file_name_gen, udp);
#else /* POSTK_DEBUG_TEMP_FIX_22 */
error = mcctrl_vfs_readdir(fp, &setup_one_pci, udp);
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
if (error) {
eprintk("mcctrl:setup_pci_files:"
"mcctrl_vfs_readdir failed. %d\n", error);
goto out;
}
#ifdef POSTK_DEBUG_TEMP_FIX_22 /* iterate_dir() deadlock */
list_for_each_entry_safe(cur, next, &pci_file_name_list, chain) {
if (!ret) {
ret = setup_one_pci(udp, cur->name);
@ -1162,7 +1120,6 @@ static int setup_pci_files(struct mcctrl_usrdata *udp)
kfree(cur->name);
kfree(cur);
}
#endif /* POSTK_DEBUG_TEMP_FIX_22 */
error = 0;
out:
@ -1176,6 +1133,7 @@ out:
dprintk("setup_pci_files(%p): %d\n", udp, error);
return error;
} /* setup_pci_files() */
#endif // SETUP_PCI_FILES
void setup_sysfs_files(ihk_os_t os)
{
@ -1215,7 +1173,9 @@ void setup_sysfs_files(ihk_os_t os)
setup_cpus_sysfs_files(udp);
setup_node_files(udp);
setup_cpus_sysfs_files_node_link(udp);
//setup_pci_files(udp);
#ifdef SETUP_PCI_FILES
setup_pci_files(udp);
#endif
/* Indicate sysfs files setup completion for boot script */
error = sysfsm_mkdirf(os, NULL, "/sys/setup_complete");

View File

@ -21,7 +21,7 @@ endif
endif
ifeq ($(BUILD_MODULE_TMP),rhel)
ifeq ($(BUILD_MODULE),none)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -eq 199168 -a ${RHEL_RELEASE} -ge 327 -a ${RHEL_RELEASE} -le 693 ]; then echo "linux-3.10.0-327.36.1.el7"; else echo "none"; fi)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -eq 199168 -a ${RHEL_RELEASE} -ge 327 -a ${RHEL_RELEASE} -le 862 ]; then echo "linux-3.10.0-327.36.1.el7"; else echo "none"; fi)
endif
ifeq ($(BUILD_MODULE),none)
BUILD_MODULE=$(shell if [ ${LINUX_VERSION_CODE} -ge 262144 -a ${LINUX_VERSION_CODE} -lt 262400 ]; then echo "linux-4.0.9"; else echo "none"; fi)

View File

@ -15,6 +15,7 @@
#include <linux/rbtree.h>
#include <linux/security.h>
#include <linux/cred.h>
#include <linux/version.h>
#include "overlayfs.h"
struct ovl_cache_entry {
@ -34,10 +35,18 @@ struct ovl_dir_cache {
struct list_head entries;
};
/* vfs_readdir vs. iterate_dir compat */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0) || \
(defined(RHEL_RELEASE_CODE) && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5))
#define USE_ITERATE_DIR 1
#endif
#ifndef USE_ITERATE_DIR
struct dir_context {
const filldir_t actor;
//loff_t pos;
};
#endif
struct ovl_readdir_data {
struct dir_context ctx;
@ -256,7 +265,11 @@ static inline int ovl_dir_read(struct path *realpath,
do {
rdd->count = 0;
rdd->err = 0;
#ifdef USE_ITERATE_DIR
err = iterate_dir(realfile, &rdd->ctx);
#else
err = vfs_readdir(realfile, rdd->ctx.actor, rdd);
#endif
if (err >= 0)
err = rdd->err;
} while (!err && rdd->count);
@ -365,6 +378,22 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry)
return cache;
}
#ifdef USE_ITERATE_DIR
struct iterate_wrapper {
struct dir_context ctx;
filldir_t actor;
void *buf;
};
static int ovl_wrap_readdir(void *ctx, const char *name, int namelen,
loff_t offset, u64 ino, unsigned int d_type)
{
struct iterate_wrapper *w = ctx;
return w->actor(w->buf, name, namelen, offset, ino, d_type);
}
#endif
static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
{
struct ovl_dir_file *od = file->private_data;
@ -376,7 +405,16 @@ static int ovl_readdir(struct file *file, void *buf, filldir_t filler)
ovl_dir_reset(file);
if (od->is_real) {
#ifdef USE_ITERATE_DIR
struct iterate_wrapper w = {
.ctx.actor = ovl_wrap_readdir,
.actor = filler,
.buf = buf,
};
res = iterate_dir(od->realfile, &w.ctx);
#else
res = vfs_readdir(od->realfile, filler, buf);
#endif
file->f_pos = od->realfile->f_pos;
return res;

View File

@ -11,22 +11,29 @@ MCKERNEL_INCDIR=@MCKERNEL_INCDIR@
MCKERNEL_LIBDIR=@MCKERNEL_LIBDIR@
KDIR ?= @KDIR@
ARCH=@ARCH@
CFLAGS=-Wall -O -I. -I$(VPATH)/arch/${ARCH} -I${IHKDIR}
CFLAGS=-Wall -O -I. -I$(VPATH)/arch/${ARCH} -I${IHKDIR} -I@abs_builddir@/../../../ihk/linux/include
LDFLAGS=@LDFLAGS@
CPPFLAGS_SYSCALL_INTERCEPT=@CPPFLAGS_SYSCALL_INTERCEPT@
LDFLAGS_SYSCALL_INTERCEPT=@LDFLAGS_SYSCALL_INTERCEPT@
RPATH=$(shell echo $(LDFLAGS)|awk '{for(i=1;i<=NF;i++){if($$i~/^-L/){w=$$i;sub(/^-L/,"-Wl,-rpath,",w);print w}}}')
VPATH=@abs_srcdir@
TARGET=mcexec libsched_yield ldump2mcdump.so
@uncomment_if_ENABLE_MEMDUMP@TARGET+=eclair
LIBS=@LIBS@
IHKDIR ?= $(VPATH)/../../../ihk/linux/include/
MCEXEC_LIBS=-lmcexec -lrt -lnuma -pthread
MCEXEC_LIBS=-lmcexec -lrt -lnuma -pthread -L@abs_builddir@/../../../ihk/linux/user -lihk -Wl,-rpath,$(MCKERNEL_LIBDIR)
ENABLE_QLMPI=@ENABLE_QLMPI@
WITH_SYSCALL_INTERCEPT=@WITH_SYSCALL_INTERCEPT@
ifeq ($(ENABLE_QLMPI),yes)
MCEXEC_LIBS += -lmpi
TARGET+= libqlmpi.so ql_server ql_mpiexec_start ql_mpiexec_finalize ql_talker libqlfort.so
endif
ifeq ($(WITH_SYSCALL_INTERCEPT),yes)
TARGET += syscall_intercept.so
endif
ifeq ($(ARCH), arm64)
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_ARCH_DEP_, $(i)))
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_TEMP_FIX_, $(i)))
@ -40,10 +47,10 @@ mcexec: mcexec.c libmcexec.a
# POSTK_DEBUG_ARCH_DEP_34, eclair arch depend separate.
ifeq ($(ARCH), arm64)
eclair: eclair.c arch/$(ARCH)/arch-eclair.c
$(CC) -I.. -I. -I./arch/$(ARCH)/include -I$(VPATH)/.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS)
$(CC) -I.. -I. -I./arch/$(ARCH)/include -I$(VPATH)/.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS) -ldl -lz
else
eclair: eclair.c
$(CC) $(CFLAGS) -I${IHKDIR} -o $@ $^ $(LIBS)
eclair: eclair.c arch/$(ARCH)/arch-eclair.c
$(CC) -I.. -I$(VPATH) -I$(VPATH)/arch/$(ARCH)/include $(CFLAGS) -o $@ $^ $(LIBS)
endif
ldump2mcdump.so: ldump2mcdump.c
@ -52,6 +59,12 @@ ldump2mcdump.so: ldump2mcdump.c
libsched_yield: libsched_yield.c
$(CC) -shared -fPIC -Wl,-soname,sched_yield.so.1 -o libsched_yield.so.1.0.0 $^ -lc -ldl
syscall_intercept.so: syscall_intercept.c libsyscall_intercept_arch.a
$(CC) $(CPPFLAGS_SYSCALL_INTERCEPT) -g -O2 $(LDFLAGS_SYSCALL_INTERCEPT) -lsyscall_intercept -fpic -shared -L. -lsyscall_intercept_arch $^ -o $@
libsyscall_intercept_arch.a::
+(cd arch/${ARCH}; $(MAKE))
libmcexec.a::
+(cd arch/${ARCH}; $(MAKE))
@ -89,6 +102,7 @@ install::
mkdir -p -m 755 $(MCKERNEL_LIBDIR)
install -m 755 ldump2mcdump.so $(MCKERNEL_LIBDIR)
install -m 755 libsched_yield.so.1.0.0 $(MCKERNEL_LIBDIR)
mkdir -p -m 755 $(MANDIR)/man1
install -m 644 mcexec.1 $(MANDIR)/man1/mcexec.1
ifeq ($(ENABLE_QLMPI),yes)
install -m 644 ../include/qlmpilib.h $(MCKERNEL_INCDIR)
@ -98,6 +112,9 @@ ifeq ($(ENABLE_QLMPI),yes)
install -m 755 ql_mpiexec_start $(BINDIR)
install -m 755 ql_mpiexec_finalize $(BINDIR)
install -m 755 ql_talker $(SBINDIR)
endif
ifeq ($(WITH_SYSCALL_INTERCEPT),yes)
install -m 755 syscall_intercept.so $(MCKERNEL_LIBDIR)
endif
@uncomment_if_ENABLE_MEMDUMP@install -m 755 eclair $(BINDIR)
@uncomment_if_ENABLE_MEMDUMP@install -m 755 vmcore2mckdump $(BINDIR)

View File

@ -4,7 +4,7 @@ BINDIR=@BINDIR@
KDIR ?= @KDIR@
CFLAGS=-Wall -O -I.
VPATH=@abs_srcdir@
TARGET=../../libmcexec.a
TARGET=../../libmcexec.a ../../libsyscall_intercept_arch.a
LIBS=@LIBS@
all: $(TARGET)
@ -18,6 +18,12 @@ archdep.o: archdep.S
arch_syscall.o: arch_syscall.c
$(CC) -c -I${KDIR} $(CFLAGS) $(EXTRA_CFLAGS) -fPIE -pie -pthread $<
../../libsyscall_intercept_arch.a: archdep_c.o
$(AR) cr ../../libsyscall_intercept_arch.a archdep_c.o
archdep_c.o: archdep_c.c
$(CC) -c -I${KDIR} $(CFLAGS) $(EXTRA_CFLAGS) -fPIE -pie -pthread $<
clean:
$(RM) $(TARGET) *.o

View File

@ -42,7 +42,7 @@ int print_kregs(char *rbp, size_t rbp_size, const struct arch_kregs *kregs)
}
for (i = 0; i < sizeof(regs_1)/sizeof(regs_1[0]); i++) { /* rsi, rdi, rbp, rsp */
ret = print_bin(rbp, rbp_size, (void *)regs_1[i], sizeof(regs_1[0]));
ret = print_bin(rbp, rbp_size, regs_1 + i, sizeof(regs_1[0]));
if (ret < 0) {
return ret;
}
@ -62,7 +62,7 @@ int print_kregs(char *rbp, size_t rbp_size, const struct arch_kregs *kregs)
}
for (i = 0; i < sizeof(regs_2)/sizeof(regs_2[0]); i++) { /* r12-r15 */
ret = print_bin(rbp, rbp_size, (void *)regs_2[i], sizeof(regs_2[0]));
ret = print_bin(rbp, rbp_size, regs_2 + i, sizeof(regs_2[0]));
if (ret < 0) {
return ret;
}

View File

@ -67,6 +67,12 @@ get_syscall_arg6(syscall_args *args)
return args->r9;
}
static inline unsigned long
get_syscall_rip(syscall_args *args)
{
return args->rip;
}
static inline void
set_syscall_number(syscall_args *args, unsigned long value)
{

View File

@ -48,7 +48,7 @@ archdep_syscall(struct syscall_wait_desc *w, long *ret)
if (*ret >= PATH_MAX) {
*ret = -ENAMETOOLONG;
}
if (ret < 0) {
if (*ret < 0) {
return 0;
}
__dprintf("open: %s\n", pathbuf);

View File

@ -1,15 +1,22 @@
/*
arg: rdi, rsi, rdx, rcx, r8, r9
ret: rax
Calling convention:
arg: rdi, rsi, rdx, rcx, r8, r9
ret: rax
rax syscall number
syscall: (rax:num) rdi rsi rdx r10 r8 r9 (rcx:ret addr)
fd, cmd, param
rdi: fd
rsi: cmd
rdx: param
rcx: save area
r8: new thread context
rdi: fd
rsi: cmd
rdx: param
rcx: save area
r8: new thread context
Syscam call convention:
syscall number: rax
arg: rdi, rsi, rdx, r10, r8, r9
return addr: rcx
rdi: fd
rsi: cmd
rdx: param
*/
.global switch_ctx
@ -91,6 +98,7 @@ switch_ctx:
1:
mov $0xffffffffffffffff,%eax
retq
2:
pushq %rax
movq $158,%rax /* arch_prctl */
@ -146,4 +154,3 @@ compare_and_swap_int:
lock
cmpxchgl %edx,0(%rdi)
retq

View File

@ -0,0 +1,52 @@
/*
function call convention
rdi, rsi, rdx, rcx, r8, r9: IN arguments
rax: OUT return value
syscall convention:
rax: IN syscall number
rdi, rsi, rdx, r10, r8, r9: IN arguments
rax: OUT return value
rcx, r11: CLOBBER
*/
long uti_syscall6(long syscall_number, long arg0, long arg1, long arg2, long arg3, long arg4, long arg5)
{
long ret;
asm volatile ("movq %[arg3],%%r10; movq %[arg4],%%r8; movq %[arg5],%%r9; syscall"
: "=a" (ret)
: "a" (syscall_number),
"D" (arg0), "S" (arg1), "d" (arg2),
[arg3] "g" (arg3), [arg4] "g" (arg4), [arg5] "g" (arg5)
: "rcx", "r11", "r10", "r8", "r9", "memory");
return ret;
}
long uti_syscall3(long syscall_number, long arg0, long arg1, long arg2)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number), "D" (arg0), "S" (arg1), "d" (arg2)
: "rcx", "r11", "memory");
return ret;
}
long uti_syscall1(long syscall_number, long arg0)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number), "D" (arg0)
: "rcx", "r11", "memory");
return ret;
}
long uti_syscall0(long syscall_number)
{
long ret;
asm volatile ("syscall"
: "=a" (ret)
: "a" (syscall_number)
: "rcx", "r11", "memory");
return ret;
}

View File

@ -2,8 +2,18 @@
#ifndef HEADER_USER_X86_ECLAIR_H
#define HEADER_USER_X86_ECLAIR_H
#define MAP_KERNEL 0xFFFFFFFF80000000
#define MAP_ST 0xFFFF800000000000
#ifndef POSTK_DEBUG_ARCH_DEP_34
#define MAP_ST_START 0xffff800000000000UL
#define MAP_VMAP_START 0xffff850000000000UL
#define MAP_FIXED_START 0xffff860000000000UL
#define LINUX_PAGE_OFFSET 0xffff880000000000UL
#define MAP_KERNEL_START 0xFFFFFFFFFE800000UL
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
/* TODO: these should be updated when McKernel changes */
#define MCKERNEL_ELF_START "0xFFFFFFFFFE801000"
#define MCKERNEL_ELF_LEN "0x0000000000100000"
#define ARCH_CLV_SPAN "x86_cpu_local_variables_span"

View File

@ -1,4 +1,6 @@
extern int switch_ctx(int fd, unsigned long cmd, void **param, void *lctx, void *rctx);
#include "../include/uprotocol.h"
extern int switch_ctx(int fd, unsigned long cmd, struct uti_save_fs_desc *desc, void *lctx, void *rctx);
extern unsigned long compare_and_swap(unsigned long *addr, unsigned long old, unsigned long new);
extern unsigned int compare_and_swap_int(unsigned int *addr, unsigned int old, unsigned int new);
extern int archdep_syscall(struct syscall_wait_desc *w, long *ret);

View File

@ -0,0 +1,5 @@
extern long uti_syscall6(long syscall_number, long arg0, long arg1, long arg2, long arg3, long arg4, long arg5);
extern long uti_syscall3(long syscall_number, long arg0, long arg1, long arg2);
extern long uti_syscall1(long syscall_number, long arg0);
extern long uti_syscall0(long syscall_number);

View File

@ -8,9 +8,7 @@
* Copyright (C) 2015 RIKEN AICS
*/
#ifdef POSTK_DEBUG_ARCH_DEP_33
#include "../config.h"
#endif /* POSTK_DEBUG_ARCH_DEP_33 */
#include <bfd.h>
#include <fcntl.h>
#include <inttypes.h>
@ -22,10 +20,8 @@
#include <arpa/inet.h>
#include <sys/ioctl.h>
#include <ihk/ihk_host_user.h>
#ifdef POSTK_DEBUG_ARCH_DEP_34
#include <eclair.h>
#include <arch-eclair.h>
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
#define CPU_TID_BASE 1000000
@ -85,11 +81,7 @@ static struct thread_info *curr_thread = NULL;
static uintptr_t ihk_mc_switch_context = -1;
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
#ifdef POSTK_DEBUG_ARCH_DEP_34
uintptr_t lookup_symbol(char *name) {
#else /* POSTK_DEBUG_ARCH_DEP_34 */
static uintptr_t lookup_symbol(char *name) {
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
int i;
for (i = 0; i < nsyms; ++i) {
@ -101,22 +93,22 @@ static uintptr_t lookup_symbol(char *name) {
return NOSYMBOL;
} /* lookup_symbol() */
#define NOPHYS ((uintptr_t)-1)
static uintptr_t virt_to_phys(uintptr_t va) {
#ifndef POSTK_DEBUG_ARCH_DEP_34
#define MAP_KERNEL 0xFFFFFFFF80000000
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
if (va >= MAP_KERNEL) {
return (va - MAP_KERNEL + kernel_base);
if (va >= MAP_KERNEL_START) {
return va - MAP_KERNEL_START + kernel_base;
}
#ifndef POSTK_DEBUG_ARCH_DEP_34
#define MAP_ST 0xFFFF800000000000
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
if (va >= MAP_ST) {
return (va - MAP_ST);
else if (va >= LINUX_PAGE_OFFSET) {
return va - LINUX_PAGE_OFFSET;
}
if (0) printf("virt_to_phys(%lx): -1\n", va);
#define NOPHYS ((uintptr_t)-1)
else if (va >= MAP_FIXED_START) {
return va - MAP_FIXED_START;
}
else if (va >= MAP_ST_START) {
return va - MAP_ST_START;
}
return NOPHYS;
} /* virt_to_phys() */
@ -673,11 +665,7 @@ static int setup_dump(char *fname) {
return 0;
} /* setup_dump() */
#ifdef POSTK_DEBUG_ARCH_DEP_38
static ssize_t print_hex(char *buf, size_t buf_size, char *str) {
#else /* POSTK_DEBUG_ARCH_DEP_38 */
static ssize_t print_hex(char *buf, char *str) {
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
char *p;
char *q;
@ -702,11 +690,7 @@ static ssize_t print_hex(char *buf, char *str) {
return (q - buf);
} /* print_hex() */
#if defined(POSTK_DEBUG_ARCH_DEP_34) && defined(POSTK_DEBUG_ARCH_DEP_38)
ssize_t print_bin(char *buf, size_t buf_size, void *data, size_t size) {
#else /* POSTK_DEBUG_ARCH_DEP_34 && POSTK_DEBUG_ARCH_DEP_38*/
static ssize_t print_bin(char *buf, void *data, size_t size) {
#endif /* POSTK_DEBUG_ARCH_DEP_34 && POSTK_DEBUG_ARCH_DEP_38*/
uint8_t *p;
char *q;
int i;
@ -733,13 +717,8 @@ static ssize_t print_bin(char *buf, void *data, size_t size) {
return (q - buf);
} /* print_bin() */
#ifdef POSTK_DEBUG_ARCH_DEP_38
static void command(const char *cmd, char *res, size_t res_size) {
const char *p;
#else /* POSTK_DEBUG_ARCH_DEP_38 */
static void command(char *cmd, char *res) {
char *p;
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
char *rbp;
p = cmd;
@ -801,11 +780,7 @@ static void command(char *cmd, char *res) {
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
rbp += sprintf(rbp, "l");
if (0)
#ifdef POSTK_DEBUG_ARCH_DEP_38
rbp += print_hex(rbp, res_size, str);
#else /* POSTK_DEBUG_ARCH_DEP_38 */
rbp += print_hex(rbp, str);
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
rbp += sprintf(rbp, "%s", str);
}
else if (!strcmp(p, "D")) {
@ -814,20 +789,9 @@ static void command(char *cmd, char *res) {
}
else if (!strcmp(p, "g")) {
if (curr_thread->cpu < 0) {
#ifndef POSTK_DEBUG_ARCH_DEP_34
struct x86_kregs {
uintptr_t rsp, rbp, rbx, rsi;
uintptr_t rdi, r12, r13, r14;
uintptr_t r15, rflags, rsp0;
};
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
int error;
#ifdef POSTK_DEBUG_ARCH_DEP_34
struct arch_kregs kregs;
#else /* POSTK_DEBUG_ARCH_DEP_34 */
struct x86_kregs kregs;
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
error = read_mem(curr_thread->process+K(CTX_OFFSET),
&kregs, sizeof(kregs));
@ -836,36 +800,7 @@ static void command(char *cmd, char *res) {
break;
}
#ifdef POSTK_DEBUG_ARCH_DEP_34
print_kregs(rbp, res_size, &kregs);
#else /* POSTK_DEBUG_ARCH_DEP_34 */
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* rax */
rbp += print_bin(rbp, &kregs.rbx, sizeof(uint64_t));
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* rcx */
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* rdx */
rbp += print_bin(rbp, &kregs.rsi, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.rdi, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.rbp, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.rsp, sizeof(uint64_t));
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* r8 */
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* r9 */
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* r10 */
rbp += sprintf(rbp, "xxxxxxxxxxxxxxxx"); /* r11 */
rbp += print_bin(rbp, &kregs.r12, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.r13, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.r14, sizeof(uint64_t));
rbp += print_bin(rbp, &kregs.r15, sizeof(uint64_t));
rbp += print_bin(rbp, &ihk_mc_switch_context,
sizeof(uint64_t)); /* rip */
rbp += print_bin(rbp, &kregs.rflags, sizeof(uint32_t));
rbp += sprintf(rbp, "xxxxxxxx"); /* cs */
rbp += sprintf(rbp, "xxxxxxxx"); /* ss */
rbp += sprintf(rbp, "xxxxxxxx"); /* ds */
rbp += sprintf(rbp, "xxxxxxxx"); /* es */
rbp += sprintf(rbp, "xxxxxxxx"); /* fs */
rbp += sprintf(rbp, "xxxxxxxx"); /* gs */
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
}
else {
int error;
@ -943,11 +878,7 @@ static void command(char *cmd, char *res) {
#endif /* POSTK_DEBUG_ARCH_DEP_34 */
rbp += sprintf(rbp, "l");
if (0)
#ifdef POSTK_DEBUG_ARCH_DEP_38
rbp += print_hex(rbp, res_size, str);
#else /* POSTK_DEBUG_ARCH_DEP_38 */
rbp += print_hex(rbp, str);
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
rbp += sprintf(rbp, "%s", str);
}
else if (!strncmp(p, "T", 1)) {
@ -1039,11 +970,7 @@ static void command(char *cmd, char *res) {
else {
q += sprintf(q, "status=%#x", ti->status);
}
#ifdef POSTK_DEBUG_ARCH_DEP_38
rbp += print_hex(rbp, res_size, buf);
#else /* POSTK_DEBUG_ARCH_DEP_38 */
rbp += print_hex(rbp, buf);
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
}
} while (0);
@ -1272,11 +1199,7 @@ int main(int argc, char *argv[]) {
}
mode = 0;
fputc('+', ofp);
#ifdef POSTK_DEBUG_ARCH_DEP_38
command(lbuf, rbuf, sizeof(rbuf));
#else /* POSTK_DEBUG_ARCH_DEP_38 */
command(lbuf, rbuf);
#endif /* POSTK_DEBUG_ARCH_DEP_38 */
sum = 0;
for (p = rbuf; *p != '\0'; ++p) {
sum += *p;

View File

@ -3,11 +3,7 @@
#ifndef HEADER_USER_COMMON_ECLAIR_H
#define HEADER_USER_COMMON_ECLAIR_H
#ifdef POSTK_DEBUG_ARCH_DEP_76 /* header path fix */
#include "../config.h"
#else /* POSTK_DEBUG_ARCH_DEP_76 */
#include <config.h>
#endif /* POSTK_DEBUG_ARCH_DEP_76 */
#include <stdio.h>
#include <inttypes.h>
#include <arch-eclair.h>

View File

@ -11,7 +11,9 @@
typedef int (*int_void_fn)(void);
#if 0
static int_void_fn orig_sched_yield = 0;
#endif
int sched_yield(void)
{

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,139 @@
#include <libsyscall_intercept_hook_point.h>
#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <syscall.h>
#include <sys/time.h>
#include <sys/resource.h>
#include "../include/uprotocol.h"
#include "../include/uti.h"
#include "./archdep_uti.h"
static struct uti_desc uti_desc;
#define DEBUG_UTI
static int
hook(long syscall_number,
long arg0, long arg1,
long arg2, long arg3,
long arg4, long arg5,
long *result)
{
//return 1; /* debug */
int tid = uti_syscall0(__NR_gettid);
struct terminate_thread_desc term_desc;
unsigned long code;
int stack_top;
if (!uti_desc.start_syscall_intercept) {
return 1; /* System call isn't taken over */
}
if (tid != uti_desc.mck_tid) {
if (uti_desc.syscalls2 && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls2[syscall_number]++;
}
return 1;
}
#ifdef DEBUG_UTI
if (uti_desc.syscalls && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls[syscall_number]++;
}
#endif
switch (syscall_number) {
case __NR_gettid:
*result = uti_desc.mck_tid;
return 0;
case __NR_futex:
case __NR_brk:
case __NR_mmap:
case __NR_munmap:
case __NR_mprotect:
case __NR_mremap:
/* Overflow check */
if (uti_desc.syscall_stack_top == -1) {
*result = -ENOMEM;
return 0;
}
/* Sanity check */
if (uti_desc.syscall_stack_top < 0 || uti_desc.syscall_stack_top >= UTI_SZ_SYSCALL_STACK) {
*result = -EINVAL;
return 0;
}
/* Store the return value in the stack to prevent it from getting corrupted
when an interrupt happens just after ioctl() and before copying the return
value to *result */
stack_top = __sync_fetch_and_sub(&uti_desc.syscall_stack_top, 1);
uti_desc.syscall_stack[stack_top].number = syscall_number;
uti_desc.syscall_stack[stack_top].args[0] = arg0;
uti_desc.syscall_stack[stack_top].args[1] = arg1;
uti_desc.syscall_stack[stack_top].args[2] = arg2;
uti_desc.syscall_stack[stack_top].args[3] = arg3;
uti_desc.syscall_stack[stack_top].args[4] = arg4;
uti_desc.syscall_stack[stack_top].args[5] = arg5;
uti_desc.syscall_stack[stack_top].uti_clv = uti_desc.uti_clv;
uti_desc.syscall_stack[stack_top].ret = -EINVAL;
uti_syscall3(__NR_ioctl, uti_desc.fd, MCEXEC_UP_SYSCALL_THREAD, (long)(uti_desc.syscall_stack + stack_top));
*result = uti_desc.syscall_stack[stack_top].ret;
/* push syscall_struct list */
__sync_fetch_and_add(&uti_desc.syscall_stack_top, 1);
return 0; /* System call is taken over */
case __NR_exit_group:
code = 0x100000000;
goto make_remote_thread_exit;
case __NR_exit:
code = 0;
make_remote_thread_exit:
/* Make migrated-to-Linux thread on the McKernel side call do_exit() or terminate() */
term_desc.pid = uti_desc.pid;
term_desc.tid = uti_desc.tid; /* tid of mcexec */
term_desc.code = code | ((arg0 & 255) << 8);
term_desc.tsk = uti_desc.key;
uti_syscall3(__NR_ioctl, uti_desc.fd, MCEXEC_UP_TERMINATE_THREAD, (long)&term_desc);
return 1;
case __NR_clone:
case __NR_fork:
case __NR_vfork:
case __NR_execve:
*result = -ENOSYS;
return 0;
#if 0 /* debug */
case __NR_set_robust_list:
*result = -ENOSYS;
return 0;
#endif
case 888:
*result = (long)&uti_desc;
return 0;
default:
return 1;
}
return 0;
}
static __attribute__((constructor)) void
init(void)
{
/* Set up the callback function */
intercept_hook_point = hook;
/* Initialize uti_desc */
uti_desc.syscall_stack_top = UTI_SZ_SYSCALL_STACK - 1;
/* Pass address of uti_desc to McKernel */
uti_syscall1(733, (unsigned long)&uti_desc);
}
static __attribute__((destructor)) void
dtor(void)
{
}

1
ihk Submodule

Submodule ihk added at d9c74adf3f

View File

@ -6,7 +6,7 @@ IHKDIR=$(IHKBASE)/$(TARGETDIR)
OBJS = init.o mem.o debug.o mikc.o listeners.o ap.o syscall.o cls.o host.o
OBJS += process.o copy.o waitq.o futex.o timer.o plist.o fileobj.o shmobj.o
OBJS += zeroobj.o procfs.o devobj.o sysfs.o xpmem.o profile.o freeze.o
OBJS += rbtree.o
OBJS += rbtree.o hugefileobj.o
OBJS += pager.o
# POSTK_DEBUG_ARCH_DEP_18 coredump arch separation.
DEPSRCS=$(wildcard $(SRC)/*.c)
@ -19,7 +19,7 @@ endif
CFLAGS += -I$(SRC)/include -I@abs_builddir@/../ -I@abs_builddir@/include -D__KERNEL__ -g -fno-omit-frame-pointer -fno-inline -fno-inline-small-functions
ifneq ($(ARCH), arm64)
CFLAGS += -mcmodel=large -mno-red-zone
CFLAGS += -mcmodel=large -mno-red-zone -mno-sse
endif
LDFLAGS += -e arch_start
IHKOBJ = ihk/ihk.o

View File

@ -29,15 +29,13 @@
#include <time.h>
#include <syscall.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_AP
#ifdef DEBUG_PRINT_AP
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
int num_processors = 1;
@ -209,8 +207,10 @@ store_fake_cpu_info(struct sysfs_ops *ops0, void *instance, void *buf,
static struct fake_cpu_info_ops show_fci_online = {
.member = ONLINE,
.ops.show = &show_fake_cpu_info,
.ops.store = &store_fake_cpu_info,
.ops = {
.show = &show_fake_cpu_info,
.store = &store_fake_cpu_info,
},
};
void

View File

@ -1,24 +1,28 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@ -37,14 +41,14 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}

View File

@ -1,24 +1,28 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@ -37,14 +41,14 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}

View File

@ -1,24 +1,28 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@ -37,10 +41,10 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
}

View File

@ -16,6 +16,10 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@ -16,6 +16,10 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@ -16,6 +16,10 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@ -16,6 +16,10 @@ SECTIONS
.data : {
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)

View File

@ -1,24 +1,28 @@
PHDRS
{
text PT_LOAD FLAGS(5);
data PT_LOAD FLAGS(7);
data PT_LOAD FLAGS(7);
}
SECTIONS
{
. = 0xffffffff80001000;
_head = .;
. = 0xFFFFFFFFFE801000;
_head = .;
.text : {
*(.text);
} : text
.text : {
*(.text);
} : text
. = ALIGN(4096);
. = ALIGN(4096);
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
. = ALIGN(8);
__start___verbose = .;
*(__verbose);
__stop___verbose = .;
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
.vsyscall : ALIGN(0x1000) {
@ -37,9 +41,9 @@ SECTIONS
. = ALIGN(4096);
} : data = 0xf4
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
.bss : {
*(.bss .bss.*)
}
. = ALIGN(4096);
_end = .;
}

View File

@ -18,6 +18,9 @@
#include <ihk/lock.h>
#include <ihk/monitor.h>
#include <errno.h>
#include <sysfs.h>
#include <debug.h>
#include <limits.h>
struct ihk_kmsg_buf *kmsg_buf;
@ -84,7 +87,8 @@ void kputs(char *buf)
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
kprintf_unlock(flags_outer);
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (irqflags_can_interrupt(flags_outer) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@ -123,8 +127,8 @@ int __kprintf(const char *format, ...)
}
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (irqflags_can_interrupt(flags_inner) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@ -165,7 +169,8 @@ int kprintf(const char *format, ...)
debug_spin_unlock_irqrestore(&kmsg_buf->lock, flags_inner);
kprintf_unlock(flags_outer);
if (DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
if (irqflags_can_interrupt(flags_outer) &&
DEBUG_KMSG_USED > IHK_KMSG_HIGH_WATER_MARK) {
eventfd(IHK_OS_EVENTFD_TYPE_KMSG);
ihk_mc_delay_us(IHK_KMSG_NOTIFY_DELAY);
}
@ -178,3 +183,147 @@ void kmsg_init()
{
ihk_mc_spinlock_init(&kmsg_lock);
}
extern struct ddebug __start___verbose[];
extern struct ddebug __stop___verbose[];
static ssize_t dynamic_debug_sysfs_show(struct sysfs_ops *ops,
void *instance, void *buf, size_t size)
{
struct ddebug *dbg;
ssize_t n = 0;
n = snprintf(buf, size, "# filename:lineno function flags format\n");
for (dbg = __start___verbose; dbg < __stop___verbose; dbg++) {
n += snprintf(buf + n, size - n, "%s:%d %s =%s\n",
dbg->file, dbg->line, dbg->func,
dbg->flags ? "p" : "_");
if (n >= size)
break;
}
return n;
}
static ssize_t dynamic_debug_sysfs_store(struct sysfs_ops *ops,
void *instance, void *buf, size_t size)
{
char *cur = buf;
char *file = NULL, *func = NULL;
long int line_start = 0, line_end = INT_MAX;
int set_flag = -1;
struct ddebug *dbg;
// assume line was new-line terminated and squash last newline
cur[size-1] = '\0';
/* basic line parsing, combinaisons of:
* file <file>
* func <func>
* line <line|line-line|line-|-line>
* and must end with [+-=][p_] (set/clear print flag)
*/
again:
while (cur && cur < ((char *)buf) + size && *cur) {
dkprintf("looking at %.*s, size left %d\n",
size - (cur - (char *)buf), cur,
(char *)buf - cur + size);
if (strncmp(cur, "func ", 5) == 0) {
cur += 5;
func = cur;
} else if (strncmp(cur, "file ", 5) == 0) {
cur += 5;
file = cur;
} else if (strncmp(cur, "line ", 5) == 0) {
cur += 5;
if (*cur != '-') {
line_start = strtol(cur, &cur, 0);
}
if (*cur != '-') {
line_end = line_start;
} else {
cur++;
if (*cur == ' ' || *cur == '\0') {
line_end = INT_MAX;
} else {
line_end = strtol(cur, &cur, 0);
}
}
} else if (strchr("+-=", *cur)) {
switch ((*cur) + 256 * (*(cur+1))) {
case '+' + 256*'p':
case '=' + 256*'p':
set_flag = DDEBUG_PRINT;
break;
case '-' + 256*'p':
case '=' + 256*'_':
set_flag = DDEBUG_NONE;
break;
default:
kprintf("invalid flag: %.*s\n",
size - (cur - (char *)buf), cur);
return -EINVAL;
}
/* XXX check 3rd char is end of input or \n or ; */
cur += 3;
break;
} else {
kprintf("dynamic debug control: unrecognized keyword: %.*s\n",
size - (cur - (char *)buf), cur);
return -EINVAL;
}
cur = strpbrk(cur, " \n");
if (cur) {
*cur = '\0';
cur++;
}
}
dkprintf("func %s, file %s, lines %d-%d, flag %x\n",
func, file, line_start, line_end, set_flag);
if (set_flag < 0) {
kprintf("dynamic debug control: no flag set?\n");
return -EINVAL;
}
if (!func && !file) {
kprintf("at least file or func should be set\n");
return -EINVAL;
}
for (dbg = __start___verbose; dbg < __stop___verbose; dbg++) {
/* TODO: handle wildcards */
if ((!func || strcmp(func, dbg->func) == 0) &&
(!file || strcmp(file, dbg->file) == 0) &&
dbg->line >= line_start &&
dbg->line <= line_end) {
dbg->flags = set_flag;
}
}
if (cur && cur < ((char *)buf) + size && *cur)
goto again;
return size;
}
static struct sysfs_ops dynamic_debug_sysfs_ops = {
.show = &dynamic_debug_sysfs_show,
.store = &dynamic_debug_sysfs_store,
};
void dynamic_debug_sysfs_setup(void)
{
int error;
error = sysfs_createf(&dynamic_debug_sysfs_ops, NULL, 0644,
"/sys/kernel/debug/dynamic_debug/control");
if (error) {
kprintf("%s: ERROR: creating dynamic_debug/control sysfs file",
__func__);
}
}

View File

@ -36,15 +36,13 @@
#include <syscall.h>
#include <process.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_DEVOBJ
#ifdef DEBUG_PRINT_DEVOBJ
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
@ -54,16 +52,15 @@ struct devobj {
uintptr_t handle;
off_t pfn_pgoff;
uintptr_t * pfn_table;
ihk_spinlock_t pfn_table_lock;
size_t npages;
};
static memobj_release_func_t devobj_release;
static memobj_ref_func_t devobj_ref;
static memobj_free_func_t devobj_free;
static memobj_get_page_func_t devobj_get_page;
static struct memobj_ops devobj_ops = {
.release = &devobj_release,
.ref = &devobj_ref,
.free = &devobj_free,
.get_page = &devobj_get_page,
};
@ -88,12 +85,9 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
int error;
struct devobj *obj = NULL;
const size_t npages = (len + PAGE_SIZE - 1) / PAGE_SIZE;
#ifdef POSTK_DEBUG_TEMP_FIX_36
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
const size_t pfn_npages = (npages + uintptr_per_page - 1) / uintptr_per_page;
#else
const size_t pfn_npages = (npages / (PAGE_SIZE / sizeof(uintptr_t))) + 1;
#endif /*POSTK_DEBUG_TEMP_FIX_36*/
const size_t pfn_npages =
(npages + uintptr_per_page - 1) / uintptr_per_page;
dkprintf("%s: fd: %d, len: %lu, off: %lu \n", __FUNCTION__, fd, len, off);
@ -122,6 +116,8 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
ihk_mc_syscall_arg4(&ctx) = virt_to_phys(&result);
ihk_mc_syscall_arg5(&ctx) = prot | populate_flags;
memset(&result, 0, sizeof(result));
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s: error: fd: %d, len: %lu, off: %lu map failed.\n",
@ -135,11 +131,23 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
obj->memobj.ops = &devobj_ops;
obj->memobj.flags = MF_HAS_PAGER | MF_DEV_FILE;
obj->memobj.size = len;
ihk_atomic_set(&obj->memobj.refcnt, 1);
obj->handle = result.handle;
obj->ref = 1;
obj->pfn_pgoff = off / PAGE_SIZE;
dkprintf("%s: path=%s\n", __FUNCTION__, result.path);
if (result.path[0]) {
obj->memobj.path = kmalloc(PATH_MAX, IHK_MC_AP_NOWAIT);
if (!obj->memobj.path) {
error = -ENOMEM;
kprintf("%s: ERROR: Out of memory\n", __FUNCTION__);
goto out;
}
strncpy(obj->memobj.path, result.path, PATH_MAX);
}
obj->pfn_pgoff = off >> PAGE_SHIFT;
obj->npages = npages;
ihk_mc_spinlock_init(&obj->memobj.lock);
ihk_mc_spinlock_init(&obj->pfn_table_lock);
error = 0;
*objp = to_memobj(obj);
@ -158,76 +166,50 @@ out:
return error;
}
static void devobj_ref(struct memobj *memobj)
static void devobj_free(struct memobj *memobj)
{
struct devobj *obj = to_devobj(memobj);
dkprintf("devobj_ref(%p %lx):\n", obj, obj->handle);
memobj_lock(&obj->memobj);
++obj->ref;
memobj_unlock(&obj->memobj);
return;
}
static void devobj_release(struct memobj *memobj)
{
struct devobj *obj = to_devobj(memobj);
struct devobj *free_obj = NULL;
uintptr_t handle;
#ifndef POSTK_DEBUG_TEMP_FIX_36
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
const size_t pfn_npages =
(obj->npages / (PAGE_SIZE / sizeof(uintptr_t))) + 1;
#endif /*!POSTK_DEBUG_TEMP_FIX_36*/
(obj->npages + uintptr_per_page - 1) / uintptr_per_page;
int error;
ihk_mc_user_context_t ctx;
dkprintf("devobj_release(%p %lx)\n", obj, obj->handle);
dkprintf("%s(%p %lx)\n", __func__, obj, obj->handle);
memobj_lock(&obj->memobj);
--obj->ref;
if (obj->ref <= 0) {
free_obj = obj;
}
handle = obj->handle;
memobj_unlock(&obj->memobj);
if (free_obj) {
if (!(free_obj->memobj.flags & MF_HOST_RELEASED)) {
int error;
ihk_mc_user_context_t ctx;
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_UNMAP;
ihk_mc_syscall_arg1(&ctx) = handle;
ihk_mc_syscall_arg2(&ctx) = 1;
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_UNMAP;
ihk_mc_syscall_arg1(&ctx) = handle;
ihk_mc_syscall_arg2(&ctx) = 1;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("devobj_release(%p %lx):"
"release failed. %d\n",
free_obj, handle, error);
/* through */
}
}
if (obj->pfn_table) {
// Don't call memory_stat_rss_sub() because devobj related pages don't reside in main memory
#ifdef POSTK_DEBUG_TEMP_FIX_36
const size_t uintptr_per_page = (PAGE_SIZE / sizeof(uintptr_t));
const size_t pfn_npages = (obj->npages + uintptr_per_page - 1) / uintptr_per_page;
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
#else
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
#endif /*POSTK_DEBUG_TEMP_FIX_36*/
}
kfree(free_obj);
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s(%p %lx): release failed. %d\n",
__func__, obj, handle, error);
/* through */
}
dkprintf("devobj_release(%p %lx):free %p\n",
obj, handle, free_obj);
if (obj->pfn_table) {
// Don't call memory_stat_rss_sub() because devobj related
// pages don't reside in main memory
ihk_mc_free_pages(obj->pfn_table, pfn_npages);
}
if (to_memobj(obj)->path) {
kfree(to_memobj(obj)->path);
}
kfree(obj);
dkprintf("%s(%p %lx):free\n", __func__, obj, handle);
return;
}
static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintptr_t *physp, unsigned long *flag, uintptr_t virt_addr)
{
const off_t pgoff = off / PAGE_SIZE;
const off_t pgoff = off >> PAGE_SHIFT;
struct devobj *obj = to_devobj(memobj);
int error;
uintptr_t pfn;
@ -245,17 +227,14 @@ static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintpt
ix = pgoff - obj->pfn_pgoff;
dkprintf("ix: %ld\n", ix);
memobj_lock(&obj->memobj);
pfn = obj->pfn_table[ix];
#ifdef PROFILE_ENABLE
profile_event_add(PROFILE_page_fault_dev_file, PAGE_SIZE);
#endif // PROFILE_ENABLE
pfn = obj->pfn_table[ix];
if (!(pfn & PFN_VALID)) {
memobj_unlock(&obj->memobj);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_PFN;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = pgoff << PAGE_SHIFT;
ihk_mc_syscall_arg2(&ctx) = off & ~(PAGE_SIZE - 1);
ihk_mc_syscall_arg3(&ctx) = virt_to_phys(&pfn);
error = syscall_generic_forwarding(__NR_mmap, &ctx);
@ -286,11 +265,9 @@ static int devobj_get_page(struct memobj *memobj, off_t off, int p2align, uintpt
dkprintf("devobj_get_page(%p %lx,%lx,%d):PFN_PRESENT after %#lx\n", memobj, obj->handle, off, p2align, pfn);
}
memobj_lock(&obj->memobj);
obj->pfn_table[ix] = pfn;
// Don't call memory_stat_rss_add() because devobj related pages don't reside in main memory
}
memobj_unlock(&obj->memobj);
if (!(pfn & PFN_PRESENT)) {
kprintf("devobj_get_page(%p %lx,%lx,%d):not present. %lx\n", memobj, obj->handle, off, p2align, pfn);

View File

@ -27,18 +27,16 @@
#include <string.h>
#include <syscall.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_FILEOBJ
#ifdef DEBUG_PRINT_FILEOBJ
#define dkprintf(...) do { if (1) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
mcs_rwlock_lock_t fileobj_list_lock;
mcs_lock_t fileobj_list_lock;
static LIST_HEAD(fileobj_list);
#define FILEOBJ_PAGE_HASH_SHIFT 9
@ -47,24 +45,21 @@ static LIST_HEAD(fileobj_list);
struct fileobj {
struct memobj memobj; /* must be first */
long sref;
long cref;
uint64_t sref;
uintptr_t handle;
struct list_head list;
struct list_head page_hash[FILEOBJ_PAGE_HASH_SIZE];
mcs_rwlock_lock_t page_hash_locks[FILEOBJ_PAGE_HASH_SIZE];
mcs_lock_t page_hash_locks[FILEOBJ_PAGE_HASH_SIZE];
};
static memobj_release_func_t fileobj_release;
static memobj_ref_func_t fileobj_ref;
static memobj_free_func_t fileobj_free;
static memobj_get_page_func_t fileobj_get_page;
static memobj_flush_page_func_t fileobj_flush_page;
static memobj_invalidate_page_func_t fileobj_invalidate_page;
static memobj_lookup_page_func_t fileobj_lookup_page;
static struct memobj_ops fileobj_ops = {
.release = &fileobj_release,
.ref = &fileobj_ref,
.free = &fileobj_free,
.get_page = &fileobj_get_page,
.copy_page = NULL,
.flush_page = &fileobj_flush_page,
@ -89,7 +84,7 @@ static void fileobj_page_hash_init(struct fileobj *obj)
{
int i;
for (i = 0; i < FILEOBJ_PAGE_HASH_SIZE; ++i) {
mcs_rwlock_init(&obj->page_hash_locks[i]);
mcs_lock_init(&obj->page_hash_locks[i]);
INIT_LIST_HEAD(&obj->page_hash[i]);
}
return;
@ -170,22 +165,22 @@ static void obj_list_remove(struct fileobj *obj)
/* return NULL or locked fileobj */
static struct fileobj *obj_list_lookup(uintptr_t handle)
{
struct fileobj *obj;
struct fileobj *p;
obj = NULL;
list_for_each_entry(p, &fileobj_list, list) {
if (p->handle == handle) {
memobj_lock(&p->memobj);
if (p->cref > 0) {
obj = p;
break;
/* for the interval between last put and fileobj_free
* taking list_lock
*/
if (memobj_ref(&p->memobj) <= 1) {
ihk_atomic_dec(&p->memobj.refcnt);
continue;
}
memobj_unlock(&p->memobj);
return p;
}
}
return obj;
return NULL;
}
/***********************************************************************
@ -198,15 +193,9 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
int error;
struct fileobj *newobj = NULL;
struct fileobj *obj;
struct mcs_rwlock_node node;
struct mcs_lock_node node;
dkprintf("fileobj_create(%d)\n", fd);
newobj = kmalloc(sizeof(*newobj), IHK_MC_AP_NOWAIT);
if (!newobj) {
error = -ENOMEM;
kprintf("fileobj_create(%d):kmalloc failed. %d\n", fd, error);
goto out;
}
dkprintf("%s(%d)\n", __func__, fd);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_CREATE;
ihk_mc_syscall_arg1(&ctx) = fd;
@ -214,21 +203,43 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
memset(&result, 0, sizeof(result));
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
dkprintf("fileobj_create(%d):create failed. %d\n", fd, error);
/* -ESRCH doesn't mean an error but requesting a fall
* back to treat the file as a device file
*/
if (error != -ESRCH) {
kprintf("%s(%d):create failed. %d\n",
__func__, fd, error);
}
goto out;
}
if (result.flags & MF_HUGETLBFS) {
return hugefileobj_pre_create(&result, objp, maxprotp);
}
mcs_lock_lock(&fileobj_list_lock, &node);
obj = obj_list_lookup(result.handle);
if (obj)
goto found;
mcs_lock_unlock(&fileobj_list_lock, &node);
// not found: alloc new object and lookup again
newobj = kmalloc(sizeof(*newobj), IHK_MC_AP_NOWAIT);
if (!newobj) {
error = -ENOMEM;
kprintf("%s(%d):kmalloc failed. %d\n", __func__, fd, error);
goto out;
}
memset(newobj, 0, sizeof(*newobj));
newobj->memobj.ops = &fileobj_ops;
newobj->memobj.flags = MF_HAS_PAGER | MF_REG_FILE;
newobj->handle = result.handle;
newobj->sref = 1;
newobj->cref = 1;
fileobj_page_hash_init(newobj);
ihk_mc_spinlock_init(&newobj->memobj.lock);
mcs_rwlock_writer_lock_noirq(&fileobj_list_lock, &node);
fileobj_page_hash_init(newobj);
mcs_lock_lock_noirq(&fileobj_list_lock, &node);
obj = obj_list_lookup(result.handle);
if (!obj) {
obj_list_insert(newobj);
@ -236,10 +247,25 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, uintptr_t virt_a
to_memobj(obj)->size = result.size;
to_memobj(obj)->flags |= result.flags;
to_memobj(obj)->status = MEMOBJ_READY;
ihk_atomic_set(&to_memobj(obj)->refcnt, 1);
obj->sref = 1;
if (to_memobj(obj)->flags & MF_PREFETCH) {
to_memobj(obj)->status = MEMOBJ_TO_BE_PREFETCHED;
}
if (result.path[0]) {
newobj->memobj.path = kmalloc(PATH_MAX, IHK_MC_AP_NOWAIT);
if (!newobj->memobj.path) {
error = -ENOMEM;
kprintf("%s: error: allocating path\n", __FUNCTION__);
mcs_lock_unlock_noirq(&fileobj_list_lock, &node);
goto out;
}
strncpy(newobj->memobj.path, result.path, PATH_MAX);
}
dkprintf("%s: %s\n", __FUNCTION__, obj->memobj.path);
/* XXX: KNL specific optimization for OFP runs */
if ((to_memobj(obj)->flags & MF_PREMAP) &&
(to_memobj(obj)->flags & MF_ZEROFILL)) {
@ -291,24 +317,21 @@ error_cleanup:
}
newobj = NULL;
dkprintf("%s: new obj 0x%lx cref: %d, %s\n",
dkprintf("%s: new obj 0x%lx %s\n",
__FUNCTION__,
obj,
obj->cref,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
}
else {
++obj->sref;
++obj->cref;
memobj_unlock(&obj->memobj); /* locked by obj_list_lookup() */
dkprintf("%s: existing obj 0x%lx cref: %d, %s\n",
found:
obj->sref++;
dkprintf("%s: existing obj 0x%lx, %s\n",
__FUNCTION__,
obj,
obj->cref,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
}
mcs_rwlock_writer_unlock_noirq(&fileobj_list_lock, &node);
mcs_lock_unlock_noirq(&fileobj_list_lock, &node);
error = 0;
*objp = to_memobj(obj);
@ -318,147 +341,111 @@ out:
if (newobj) {
kfree(newobj);
}
dkprintf("fileobj_create(%d):%d %p %x\n", fd, error, *objp, *maxprotp);
dkprintf("%s(%d):%d %p %x\n", __func__, fd, error, *objp, *maxprotp);
return error;
}
static void fileobj_ref(struct memobj *memobj)
static void fileobj_free(struct memobj *memobj)
{
struct fileobj *obj = to_fileobj(memobj);
struct mcs_lock_node node;
int error;
ihk_mc_user_context_t ctx;
dkprintf("fileobj_ref(%p %lx):\n", obj, obj->handle);
memobj_lock(&obj->memobj);
++obj->cref;
memobj_unlock(&obj->memobj);
return;
}
static void fileobj_release(struct memobj *memobj)
{
struct fileobj *obj = to_fileobj(memobj);
long free_sref = 0;
uintptr_t free_handle;
struct fileobj *free_obj = NULL;
struct mcs_rwlock_node node;
dkprintf("%s: free obj 0x%lx, %s\n", __func__,
obj, to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
dkprintf("fileobj_release(%p %lx)\n", obj, obj->handle);
mcs_lock_lock_noirq(&fileobj_list_lock, &node);
obj_list_remove(obj);
mcs_lock_unlock_noirq(&fileobj_list_lock, &node);
memobj_lock(&obj->memobj);
--obj->cref;
free_sref = obj->sref - 1; /* surplus sref */
if (obj->cref <= 0) {
free_sref = obj->sref;
free_obj = obj;
}
obj->sref -= free_sref;
free_handle = obj->handle;
memobj_unlock(&obj->memobj);
if (obj->memobj.flags & MF_HOST_RELEASED) {
free_sref = 0; // don't call syscall_generic_forwarding
}
/* zap page_list */
for (;;) {
struct page *page;
void *page_va;
uintptr_t phys;
if (free_obj) {
dkprintf("%s: release obj 0x%lx cref: %d, free_obj: 0x%lx, %s\n",
__FUNCTION__,
obj,
obj->cref,
free_obj,
to_memobj(obj)->flags & MF_ZEROFILL ? "zerofill" : "");
mcs_rwlock_writer_lock_noirq(&fileobj_list_lock, &node);
/* zap page_list */
for (;;) {
struct page *page;
void *page_va;
uintptr_t phys;
page = fileobj_page_hash_first(obj);
if (!page) {
break;
}
__fileobj_page_hash_remove(page);
phys = page_to_phys(page);
page_va = phys_to_virt(phys);
/* Count must be one because set to one on the first get_page() invoking fileobj_do_pageio and
incremented by the second get_page() reaping the pageio and decremented by clear_range().
page = fileobj_page_hash_first(obj);
if (!page) {
break;
}
__fileobj_page_hash_remove(page);
phys = page_to_phys(page);
page_va = phys_to_virt(phys);
/* Count must be one because set to one on the first
* get_page() invoking fileobj_do_pageio and incremented by
* the second get_page() reaping the pageio and decremented
* by clear_range().
*/
if (ihk_atomic_read(&page->count) != 1) {
kprintf("%s: WARNING: page count is %d for phys 0x%lx is invalid, flags: 0x%lx\n",
__func__, ihk_atomic_read(&page->count),
page->phys, to_memobj(obj)->flags);
}
else if (page_unmap(page)) {
ihk_mc_free_pages_user(page_va, 1);
/* Track change in page->count for !MF_PREMAP pages.
* It is decremented here or in clear_range()
*/
if (ihk_atomic_read(&page->count) != 1) {
kprintf("%s: WARNING: page count is %d for phys 0x%lx is invalid, flags: 0x%lx\n",
__FUNCTION__,
ihk_atomic_read(&page->count),
page->phys,
to_memobj(free_obj)->flags);
}
else if (page_unmap(page)) {
ihk_mc_free_pages_user(page_va, 1);
/* Track change in page->count for !MF_PREMAP pages. It is decremented here or in clear_range() */
dkprintf("%lx-,%s: calling memory_stat_rss_sub(),phys=%lx,size=%ld,pgsize=%ld\n", phys, __FUNCTION__, phys, PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE, PAGE_SIZE);
}
#if 0
count = ihk_atomic_sub_return(1, &page->count);
if (!((page->mode == PM_WILL_PAGEIO)
|| (page->mode == PM_DONE_PAGEIO)
|| (page->mode == PM_PAGEIO_EOF)
|| (page->mode == PM_PAGEIO_ERROR)
|| ((page->mode == PM_MAPPED)
&& (count <= 0)))) {
kprintf("fileobj_release(%p %lx): "
"mode %x, count %d, off %lx\n",
obj, obj->handle, page->mode,
count, page->offset);
panic("fileobj_release");
}
page->mode = PM_NONE;
#endif
}
/* Pre-mapped? */
if (to_memobj(free_obj)->flags & MF_PREMAP) {
int i;
for (i = 0; i < to_memobj(free_obj)->nr_pages; ++i) {
if (to_memobj(free_obj)->pages[i]) {
dkprintf("%s: pages[i]=%p\n", __FUNCTION__, i, to_memobj(free_obj)->pages[i]);
// Track change in fileobj->pages[] for MF_PREMAP pages
// Note that page_unmap() isn't called for MF_PREMAP in
// free_process_memory_range() --> ihk_mc_pt_free_range()
dkprintf("%lx-,%s: memory_stat_rss_sub,phys=%lx,size=%ld,pgsize=%ld\n",
virt_to_phys(to_memobj(free_obj)->pages[i]), __FUNCTION__, virt_to_phys(to_memobj(free_obj)->pages[i]), PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE, PAGE_SIZE);
ihk_mc_free_pages_user(to_memobj(free_obj)->pages[i], 1);
}
}
kfree(to_memobj(free_obj)->pages);
}
obj_list_remove(free_obj);
mcs_rwlock_writer_unlock_noirq(&fileobj_list_lock, &node);
kfree(free_obj);
}
if (free_sref) {
int error;
ihk_mc_user_context_t ctx;
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_RELEASE;
ihk_mc_syscall_arg1(&ctx) = free_handle;
ihk_mc_syscall_arg2(&ctx) = free_sref;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("fileobj_release(%p %lx):"
"release %ld failed. %d\n",
obj, free_handle, free_sref, error);
/* through */
dkprintf("%lx-,%s: calling memory_stat_rss_sub(),phys=%lx,size=%ld,pgsize=%ld\n",
phys, __func__, phys, PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE,
PAGE_SIZE);
}
}
dkprintf("fileobj_release(%p %lx):free %ld %p\n",
obj, free_handle, free_sref, free_obj);
/* Pre-mapped? */
if (to_memobj(obj)->flags & MF_PREMAP) {
int i;
for (i = 0; i < to_memobj(obj)->nr_pages; ++i) {
if (to_memobj(obj)->pages[i]) {
dkprintf("%s: pages[i]=%p\n", __func__, i,
to_memobj(obj)->pages[i]);
// Track change in fileobj->pages[] for MF_PREMAP pages
// Note that page_unmap() isn't called for MF_PREMAP in
// free_process_memory_range() --> ihk_mc_pt_free_range()
dkprintf("%lx-,%s: memory_stat_rss_sub,phys=%lx,size=%ld,pgsize=%ld\n",
virt_to_phys(to_memobj(obj)->pages[i]),
__func__,
virt_to_phys(to_memobj(obj)->pages[i]),
PAGE_SIZE, PAGE_SIZE);
rusage_memory_stat_mapped_file_sub(PAGE_SIZE,
PAGE_SIZE);
ihk_mc_free_pages_user(to_memobj(obj)->pages[i],
1);
}
}
kfree(to_memobj(obj)->pages);
}
if (to_memobj(obj)->path) {
dkprintf("%s: %s\n", __func__, to_memobj(obj)->path);
kfree(to_memobj(obj)->path);
}
/* linux side
* sref is necessary because handle is used as key, so there could
* be a new mckernel pager with the same handle being created as
* this one is being destroyed
*/
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_RELEASE;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = obj->sref;
error = syscall_generic_forwarding(__NR_mmap, &ctx);
if (error) {
kprintf("%s(%p %lx): free failed. %d\n", __func__,
obj, obj->handle, error);
/* through */
}
dkprintf("%s(%p %lx):free\n", __func__, obj, obj->handle);
kfree(obj);
return;
}
struct pageio_args {
@ -481,10 +468,10 @@ static void fileobj_do_pageio(void *args0)
struct page *page;
ihk_mc_user_context_t ctx;
ssize_t ss;
struct mcs_rwlock_node mcs_node;
struct mcs_lock_node mcs_node;
int hash = (off >> PAGE_SHIFT) & FILEOBJ_PAGE_HASH_MASK;
mcs_rwlock_writer_lock_noirq(&obj->page_hash_locks[hash],
mcs_lock_lock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
page = __fileobj_page_hash_lookup(obj, hash, off);
if (!page) {
@ -492,10 +479,10 @@ static void fileobj_do_pageio(void *args0)
}
while (page->mode == PM_PAGEIO) {
mcs_rwlock_writer_unlock_noirq(&obj->page_hash_locks[hash],
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
cpu_pause();
mcs_rwlock_writer_lock_noirq(&obj->page_hash_locks[hash],
mcs_lock_lock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
}
@ -509,7 +496,7 @@ static void fileobj_do_pageio(void *args0)
}
else {
page->mode = PM_PAGEIO;
mcs_rwlock_writer_unlock_noirq(&obj->page_hash_locks[hash],
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_READ;
@ -522,7 +509,7 @@ static void fileobj_do_pageio(void *args0)
__FUNCTION__, obj->handle);
ss = syscall_generic_forwarding(__NR_mmap, &ctx);
mcs_rwlock_writer_lock_noirq(&obj->page_hash_locks[hash],
mcs_lock_lock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
if (page->mode != PM_PAGEIO) {
kprintf("fileobj_do_pageio(%p,%lx,%lx):"
@ -549,9 +536,9 @@ static void fileobj_do_pageio(void *args0)
page->mode = PM_DONE_PAGEIO;
}
out:
mcs_rwlock_writer_unlock_noirq(&obj->page_hash_locks[hash],
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
fileobj_release(&obj->memobj); /* got fileobj_get_page() */
memobj_unref(&obj->memobj); /* got fileobj_get_page() */
kfree(args0);
dkprintf("fileobj_do_pageio(%p,%lx,%lx):\n", obj, off, pgsize);
return;
@ -568,7 +555,7 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
uintptr_t phys = -1;
struct page *page;
struct pageio_args *args = NULL;
struct mcs_rwlock_node mcs_node;
struct mcs_lock_node mcs_node;
int hash = (off >> PAGE_SHIFT) & FILEOBJ_PAGE_HASH_MASK;
dkprintf("fileobj_get_page(%p,%lx,%x,%x,%p)\n", obj, off, p2align, virt_addr, physp);
@ -619,7 +606,7 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
goto out_nolock;
}
mcs_rwlock_writer_lock_noirq(&obj->page_hash_locks[hash],
mcs_lock_lock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
page = __fileobj_page_hash_lookup(obj, hash, off);
if (!page || (page->mode == PM_WILL_PAGEIO)
@ -637,7 +624,9 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
npages = 1 << p2align;
virt = ihk_mc_alloc_pages_user(npages, (IHK_MC_AP_NOWAIT |
(to_memobj(obj)->flags & MF_ZEROFILL) ? IHK_MC_AP_USER : 0), virt_addr);
((to_memobj(obj)->flags & MF_ZEROFILL) ?
IHK_MC_AP_USER : 0)),
virt_addr);
if (!virt) {
error = -ENOMEM;
kprintf("fileobj_get_page(%p,%lx,%x,%x,%p):"
@ -662,9 +651,7 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
page->mode = PM_WILL_PAGEIO;
}
memobj_lock(&obj->memobj);
++obj->cref; /* for fileobj_do_pageio() */
memobj_unlock(&obj->memobj);
memobj_ref(&obj->memobj);
args->fileobj = obj;
args->objoff = off;
@ -698,7 +685,7 @@ static int fileobj_get_page(struct memobj *memobj, off_t off,
*physp = page_to_phys(page);
virt = NULL;
out:
mcs_rwlock_writer_unlock_noirq(&obj->page_hash_locks[hash],
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
out_nolock:
if (virt) {
@ -725,10 +712,6 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
return 0;
}
if (memobj->flags & MF_HOST_RELEASED) {
return 0;
}
page = phys_to_page(phys);
if (!page) {
kprintf("%s: warning: tried to flush non-existing page for phys addr: 0x%lx\n",
@ -736,8 +719,6 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
return 0;
}
memobj_unlock(&obj->memobj);
ihk_mc_syscall_arg0(&ctx) = PAGER_REQ_WRITE;
ihk_mc_syscall_arg1(&ctx) = obj->handle;
ihk_mc_syscall_arg2(&ctx) = page->offset;
@ -752,7 +733,6 @@ static int fileobj_flush_page(struct memobj *memobj, uintptr_t phys,
/* through */
}
memobj_lock(&obj->memobj);
return 0;
}
@ -775,7 +755,7 @@ static int fileobj_lookup_page(struct memobj *memobj, off_t off,
struct fileobj *obj = to_fileobj(memobj);
int error = -1;
struct page *page;
struct mcs_rwlock_node mcs_node;
struct mcs_lock_node mcs_node;
int hash = (off >> PAGE_SHIFT) & FILEOBJ_PAGE_HASH_MASK;
dkprintf("fileobj_lookup_page(%p,%lx,%x,%p)\n", obj, off, p2align, physp);
@ -784,7 +764,7 @@ static int fileobj_lookup_page(struct memobj *memobj, off_t off,
return -ENOMEM;
}
mcs_rwlock_reader_lock_noirq(&obj->page_hash_locks[hash],
mcs_lock_lock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
page = __fileobj_page_hash_lookup(obj, hash, off);
@ -796,7 +776,7 @@ static int fileobj_lookup_page(struct memobj *memobj, off_t off,
error = 0;
out:
mcs_rwlock_reader_unlock_noirq(&obj->page_hash_locks[hash],
mcs_lock_unlock_noirq(&obj->page_hash_locks[hash],
&mcs_node);
dkprintf("fileobj_lookup_page(%p,%lx,%x,%p): %d \n",

View File

@ -70,15 +70,22 @@
#include <cls.h>
#include <kmsg.h>
#include <timer.h>
#include <debug.h>
#include <syscall.h>
//#define DEBUG_PRINT_FUTEX
#ifdef DEBUG_PRINT_FUTEX
#define dkprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define uti_dkprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define uti_dkprintf(...) do { } while (0)
#endif
#define uti_kprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
unsigned long ihk_mc_get_ns_per_tsc(void);
int futex_cmpxchg_enabled;
/**
@ -108,6 +115,9 @@ struct futex_q {
union futex_key key;
union futex_key *requeue_pi_key;
uint32_t bitset;
/* Used to wake-up a thread running on a Linux CPU */
void *uti_futex_resp;
};
/*
@ -180,11 +190,12 @@ static void drop_futex_key_refs(union futex_key *key)
* lock_page() might sleep, the caller should not hold a spinlock.
*/
static int
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key)
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key, struct cpu_local_var *clv_override)
{
unsigned long address = (unsigned long)uaddr;
unsigned long phys;
struct process_vm *mm = cpu_local_var(current)->vm;
struct thread *thread = cpu_local_var_with_override(current, clv_override);
struct process_vm *mm = thread->vm;
/*
* The futex address must be "naturally" aligned.
@ -250,7 +261,7 @@ static int cmpxchg_futex_value_locked(uint32_t __user *uaddr, uint32_t uval, uin
* The hash bucket lock must be held when this is called.
* Afterwards, the futex_q must not be accessed.
*/
static void wake_futex(struct futex_q *q)
static void wake_futex(struct futex_q *q, struct cpu_local_var *clv_override)
{
struct thread *p = q->task;
@ -272,8 +283,31 @@ static void wake_futex(struct futex_q *q)
barrier();
q->lock_ptr = NULL;
dkprintf("wake_futex(): waking up tid %d\n", p->tid);
sched_wakeup_thread(p, PS_NORMAL);
if (q->uti_futex_resp) {
int rc;
uti_dkprintf("wake_futex(): waking up migrated-to-Linux thread (tid %d),uti_futex_resp=%p\n", p->tid, q->uti_futex_resp);
/* TODO: Add the case when a Linux thread waking up another Linux thread */
if (clv_override) {
uti_dkprintf("%s: ERROR: A Linux thread is waking up migrated-to-Linux thread\n", __FUNCTION__);
}
if (p->spin_sleep == 0) {
uti_dkprintf("%s: INFO: woken up by someone else\n", __FUNCTION__);
}
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel = cpu_local_var_with_override(ikc2linux, clv_override);
pckt.msg = SCD_MSG_FUTEX_WAKE;
pckt.futex.resp = q->uti_futex_resp;
pckt.futex.spin_sleep = &p->spin_sleep;
rc = ihk_ikc_send(resp_channel, &pckt, 0);
if (rc) {
uti_dkprintf("%s: ERROR: ihk_ikc_send returned %d, resp_channel=%p\n", __FUNCTION__, rc, resp_channel);
}
} else {
uti_dkprintf("wake_futex(): waking up McKernel thread (tid %d)\n", p->tid);
sched_wakeup_thread(p, PS_NORMAL);
}
}
/*
@ -303,7 +337,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
/*
* Wake up waiters matching bitset queued on this futex (uaddr).
*/
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset)
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset, struct cpu_local_var *clv_override)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
@ -314,7 +348,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!bitset)
return -EINVAL;
ret = get_futex_key(uaddr, fshared, &key);
ret = get_futex_key(uaddr, fshared, &key, clv_override);
if ((ret != 0))
goto out;
@ -330,7 +364,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!(this->bitset & bitset))
continue;
wake_futex(this);
wake_futex(this, clv_override);
if (++ret >= nr_wake)
break;
}
@ -348,7 +382,8 @@ out:
*/
static int
futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_wake2, int op)
int nr_wake, int nr_wake2, int op,
struct cpu_local_var *clv_override)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2;
@ -357,10 +392,10 @@ futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int ret, op_ret;
retry:
ret = get_futex_key(uaddr1, fshared, &key1);
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2);
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
if ((ret != 0))
goto out_put_key1;
@ -394,7 +429,7 @@ retry_private:
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key1)) {
wake_futex(this);
wake_futex(this, clv_override);
if (++ret >= nr_wake)
break;
}
@ -406,7 +441,7 @@ retry_private:
op_ret = 0;
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key2)) {
wake_futex(this);
wake_futex(this, clv_override);
if (++op_ret >= nr_wake2)
break;
}
@ -469,7 +504,7 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
*/
static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_requeue, uint32_t *cmpval,
int requeue_pi)
int requeue_pi, struct cpu_local_var *clv_override)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
int drop_count = 0, task_count = 0, ret;
@ -477,10 +512,10 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
struct plist_head *head1;
struct futex_q *this, *next;
ret = get_futex_key(uaddr1, fshared, &key1);
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2);
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
if ((ret != 0))
goto out_put_key1;
@ -515,7 +550,7 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
*/
/* RIKEN: no requeue_pi at this moment */
if (++task_count <= nr_wake) {
wake_futex(this);
wake_futex(this, clv_override);
continue;
}
@ -574,7 +609,7 @@ queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
* state is implicit in the state of woken task (see futex_wait_requeue_pi() for
* an example).
*/
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb, struct cpu_local_var *clv_override)
{
int prio;
@ -595,7 +630,7 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
q->list.plist.spinlock = &hb->lock;
#endif
plist_add(&q->list, &hb->chain);
q->task = cpu_local_var(current);
q->task = cpu_local_var_with_override(current, clv_override);
ihk_mc_spinlock_unlock_noirq(&hb->lock);
}
@ -658,46 +693,64 @@ retry:
/* RIKEN: this function has been rewritten so that it returns the remaining
* time in case we are waken.
*/
static uint64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
uint64_t timeout)
static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
uint64_t timeout, struct cpu_local_var *clv_override)
{
uint64_t time_remain = 0;
int64_t time_remain = 0;
unsigned long irqstate;
struct thread *thread = cpu_local_var(current);
struct thread *thread = cpu_local_var_with_override(current, clv_override);
/*
* The task state is guaranteed to be set before another task can
* wake it.
* queue_me() calls spin_unlock() upon completion, serializing
* access to the hash list and forcing a memory barrier.
*/
xchg4(&(cpu_local_var(current)->status), PS_INTERRUPTIBLE);
xchg4(&(thread->status), PS_INTERRUPTIBLE);
/* Indicate spin sleep */
if (!idle_halt) {
/* Indicate spin sleep. Note that schedule_timeout() with
* idle_halt should use spin sleep because sleep with timeout
* is not implemented.
*/
if (!idle_halt || timeout) {
irqstate = ihk_mc_spinlock_lock(&thread->spin_sleep_lock);
thread->spin_sleep = 1;
ihk_mc_spinlock_unlock(&thread->spin_sleep_lock, irqstate);
}
queue_me(q, hb);
queue_me(q, hb, clv_override);
if (!plist_node_empty(&q->list)) {
if (clv_override) {
uti_dkprintf("%s: tid: %d is trying to sleep\n", __FUNCTION__, thread->tid);
/* Note that the unit of timeout is nsec */
time_remain = (*linux_wait_event)(q->uti_futex_resp, timeout);
/* Note that time_remain == 0 indicates contidion evaluated to false after the timeout elapsed */
if (time_remain < 0) {
if (time_remain == -ERESTARTSYS) { /* Interrupted by signal */
uti_dkprintf("%s: DEBUG: wait_event returned -ERESTARTSYS\n", __FUNCTION__);
} else {
uti_kprintf("%s: ERROR: wait_event returned %d\n", __FUNCTION__, time_remain);
}
}
uti_dkprintf("%s: tid: %d woken up\n", __FUNCTION__, thread->tid);
} else {
if (timeout) {
dkprintf("futex_wait_queue_me(): tid: %d schedule_timeout()\n", cpu_local_var(current)->tid);
dkprintf("futex_wait_queue_me(): tid: %d schedule_timeout()\n", thread->tid);
time_remain = schedule_timeout(timeout);
}
else {
dkprintf("futex_wait_queue_me(): tid: %d schedule()\n", cpu_local_var(current)->tid);
dkprintf("futex_wait_queue_me(): tid: %d schedule()\n", thread->tid);
spin_sleep_or_schedule();
time_remain = 0;
}
dkprintf("futex_wait_queue_me(): tid: %d woken up\n", cpu_local_var(current)->tid);
dkprintf("futex_wait_queue_me(): tid: %d woken up\n", thread->tid);
}
}
/* This does not need to be serialized */
cpu_local_var(current)->status = PS_RUNNING;
thread->status = PS_RUNNING;
thread->spin_sleep = 0;
return time_remain;
@ -721,7 +774,8 @@ static uint64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
struct futex_q *q, struct futex_hash_bucket **hb)
struct futex_q *q, struct futex_hash_bucket **hb,
struct cpu_local_var *clv_override)
{
uint32_t uval;
int ret;
@ -744,7 +798,7 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
* rare, but normal.
*/
q->key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q->key);
ret = get_futex_key(uaddr, fshared, &q->key, clv_override);
if (ret != 0)
return ret;
@ -768,49 +822,59 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
}
static int futex_wait(uint32_t __user *uaddr, int fshared,
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt)
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt,
struct cpu_local_var *clv_override)
{
struct futex_hash_bucket *hb;
struct futex_q q;
uint64_t time_remain;
int64_t time_remain;
int ret;
if (!bitset)
return -EINVAL;
#ifdef PROFILE_ENABLE
if (cpu_local_var(current)->profile &&
cpu_local_var(current)->profile_start_ts) {
cpu_local_var(current)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var(current)->profile_start_ts);
cpu_local_var(current)->profile_start_ts = 0;
if (cpu_local_var_with_override(current, clv_override)->profile &&
cpu_local_var_with_override(current, clv_override)->profile_start_ts) {
cpu_local_var_with_override(current, clv_override)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var_with_override(current, clv_override)->profile_start_ts);
cpu_local_var_with_override(current, clv_override)->profile_start_ts = 0;
}
#endif
q.bitset = bitset;
q.requeue_pi_key = NULL;
q.uti_futex_resp = cpu_local_var_with_override(uti_futex_resp, clv_override);
retry:
/* Prepare to wait on uaddr. */
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
if (ret)
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb, clv_override);
if (ret) {
uti_dkprintf("%s: tid=%d futex_wait_setup returns zero, no need to sleep\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
goto out;
}
/* queue_me and wait for wakeup, timeout, or a signal. */
time_remain = futex_wait_queue_me(hb, &q, timeout);
time_remain = futex_wait_queue_me(hb, &q, timeout, clv_override);
/* If we were woken (and unqueued), we succeeded, whatever. */
ret = 0;
if (!unqueue_me(&q))
if (!unqueue_me(&q)) {
uti_dkprintf("%s: tid=%d unqueued\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
goto out_put_key;
}
ret = -ETIMEDOUT;
/* RIKEN: timer expired case (indicated by !time_remain) */
if (timeout && !time_remain)
if (timeout && !time_remain) {
uti_dkprintf("%s: tid=%d timer expired\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
goto out_put_key;
}
if (hassigpending(cpu_local_var(current))) {
/* RIKEN: futex_wait_queue_me() returns -ERESTARTSYS when waiting on Linux CPU and woken up by signal */
if (hassigpending(cpu_local_var_with_override(current, clv_override)) || time_remain == -ERESTARTSYS) {
ret = -EINTR;
uti_dkprintf("%s: tid=%d woken up by signal\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
goto out_put_key;
}
@ -822,19 +886,22 @@ out_put_key:
put_futex_key(fshared, &q.key);
out:
#ifdef PROFILE_ENABLE
if (cpu_local_var(current)->profile) {
cpu_local_var(current)->profile_start_ts = rdtsc();
if (cpu_local_var_with_override(current, clv_override)->profile) {
cpu_local_var_with_override(current, clv_override)->profile_start_ts = rdtsc();
}
#endif
return ret;
}
int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared)
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared,
struct cpu_local_var *clv_override)
{
int clockrt, ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK;
uti_dkprintf("%s: uaddr=%p, op=%x, val=%x, timeout=%ld, uaddr2=%p, val2=%x, val3=%x, fshared=%d, clv=%p\n", __FUNCTION__, uaddr, op, val, timeout, uaddr2, val2, val3, fshared, clv_override);
clockrt = op & FUTEX_CLOCK_REALTIME;
if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
return -ENOSYS;
@ -843,21 +910,21 @@ int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt, clv_override);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAKE_BITSET:
ret = futex_wake(uaddr, fshared, val, val3);
ret = futex_wake(uaddr, fshared, val, val3, clv_override);
break;
case FUTEX_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0, clv_override);
break;
case FUTEX_CMP_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 0);
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 0, clv_override);
break;
case FUTEX_WAKE_OP:
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3, clv_override);
break;
/* RIKEN: these calls are not supported for now.
case FUTEX_LOCK_PI:

View File

@ -34,13 +34,13 @@
#include <sysfs.h>
#include <ihk/perfctr.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG_PRINT_HOST
#ifdef DEBUG_PRINT_HOST
#define dkprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
/* Linux channel table, indexec by Linux CPU id */
@ -78,7 +78,6 @@ int prepare_process_ranges_args_envs(struct thread *thread,
unsigned long args_envs_p, args_envs_rp;
unsigned long s, e, up;
char **argv;
char **a;
int i, n, argc, envc, args_envs_npages;
char **env;
int range_npages;
@ -306,7 +305,7 @@ int prepare_process_ranges_args_envs(struct thread *thread,
/* Only unmap remote address if it wasn't specified as an argument */
if (!args) {
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages, 0);
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages);
ihk_mc_unmap_memory(NULL, args_envs_rp, p->args_len);
}
flush_tlb();
@ -341,7 +340,7 @@ int prepare_process_ranges_args_envs(struct thread *thread,
/* Only map remote address if it wasn't specified as an argument */
if (!envs) {
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages, 0);
ihk_mc_unmap_virtual(args_envs_r, args_envs_npages);
ihk_mc_unmap_memory(NULL, args_envs_rp, p->envs_len);
}
flush_tlb();
@ -357,31 +356,32 @@ int prepare_process_ranges_args_envs(struct thread *thread,
proc->saved_cmdline_len = 0;
}
proc->saved_cmdline = kmalloc(p->args_len, IHK_MC_AP_NOWAIT);
proc->saved_cmdline_len = p->args_len - ((argc + 2) * sizeof(char **));
proc->saved_cmdline = kmalloc(proc->saved_cmdline_len,
IHK_MC_AP_NOWAIT);
if (!proc->saved_cmdline) {
goto err;
}
proc->saved_cmdline_len = p->args_len - ((argc + 1) * sizeof(char **));
memcpy(proc->saved_cmdline,
(char *)args_envs + ((argc + 1) * sizeof(char **)),
(char *)args_envs + ((argc + 2) * sizeof(char **)),
proc->saved_cmdline_len);
dkprintf("%s: saved_cmdline: %s\n",
__FUNCTION__,
proc->saved_cmdline);
for (a = argv; *a; a++) {
*a = (char *)addr + (unsigned long)*a; // Process' address space!
for (i = 0; i < argc; i++) {
// Process' address space!
argv[i] = (char *)addr + (unsigned long)argv[i];
}
envc = *((long *)(args_envs + p->args_len));
dkprintf("envc: %d\n", envc);
env = (char **)(args_envs + p->args_len + sizeof(long));
while (*env) {
char **_env = env;
//dkprintf("%s\n", args_envs + p->args_len + (unsigned long)*env);
*env = (char *)addr + p->args_len + (unsigned long)*env;
env = ++_env;
for (i = 0; i < envc; i++) {
env[i] = addr + p->args_len + env[i];
}
env = (char **)(args_envs + p->args_len + sizeof(long));
dkprintf("env OK\n");
@ -446,7 +446,7 @@ static int process_msg_prepare_process(unsigned long rphys)
if((pn = kmalloc(sizeof(struct program_load_desc)
+ sizeof(struct program_image_section) * n,
IHK_MC_AP_NOWAIT)) == NULL){
ihk_mc_unmap_virtual(p, npages, 0);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_memory(NULL, phys, sz);
return -ENOMEM;
}
@ -457,7 +457,7 @@ static int process_msg_prepare_process(unsigned long rphys)
(unsigned long *)&p->cpu_set,
sizeof(p->cpu_set))) == NULL) {
kfree(pn);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_memory(NULL, phys, sz);
return -ENOMEM;
}
@ -479,6 +479,7 @@ static int process_msg_prepare_process(unsigned long rphys)
proc->mpol_flags = pn->mpol_flags;
proc->mpol_threshold = pn->mpol_threshold;
proc->nr_processes = pn->nr_processes;
proc->process_rank = pn->process_rank;
proc->heap_extension = pn->heap_extension;
/* Update NUMA binding policy if requested */
@ -501,6 +502,9 @@ static int process_msg_prepare_process(unsigned long rphys)
vm->numa_mem_policy = MPOL_BIND;
}
proc->uti_thread_rank = pn->uti_thread_rank;
proc->uti_use_last_cpu = pn->uti_use_last_cpu;
#ifdef PROFILE_ENABLE
proc->profile = pn->profile;
thread->profile = pn->profile;
@ -539,14 +543,14 @@ static int process_msg_prepare_process(unsigned long rphys)
kfree(pn);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_memory(NULL, phys, sz);
flush_tlb();
return 0;
err:
kfree(pn);
ihk_mc_unmap_virtual(p, npages, 1);
ihk_mc_unmap_virtual(p, npages);
ihk_mc_unmap_memory(NULL, phys, sz);
destroy_thread(thread);
return -ENOMEM;
@ -559,7 +563,6 @@ static void syscall_channel_send(struct ihk_ikc_channel_desc *c,
}
extern unsigned long do_kill(struct thread *, int, int, int, struct siginfo *, int ptracecont);
extern void process_procfs_request(struct ikc_scd_packet *rpacket);
extern void terminate_host(int pid);
extern void debug_log(long);
@ -570,7 +573,6 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel = cpu_local_var(ikc2linux);
int rc;
struct mcs_rwlock_node_irqsave lock;
struct thread *thread;
struct process *proc;
struct mcctrl_signal {
@ -594,14 +596,9 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_PREPARE_PROCESS:
if((rc = process_msg_prepare_process(packet->arg)) == 0){
pckt.msg = SCD_MSG_PREPARE_PROCESS_ACKED;
pckt.err = 0;
}
else{
pckt.msg = SCD_MSG_PREPARE_PROCESS_NACKED;
pckt.err = rc;
}
pckt.err = process_msg_prepare_process(packet->arg);
pckt.msg = SCD_MSG_PREPARE_PROCESS_ACKED;
pckt.reply = packet->reply;
pckt.ref = packet->ref;
pckt.arg = packet->arg;
syscall_channel_send(resp_channel, &pckt);
@ -612,7 +609,7 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_SCHEDULE_PROCESS:
thread = (struct thread *)packet->arg;
cpuid = obtain_clone_cpuid(&thread->cpu_set);
cpuid = obtain_clone_cpuid(&thread->cpu_set, 0);
if (cpuid == -1) {
kprintf("No CPU available\n");
ret = -1;
@ -636,14 +633,14 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
* the waiting thread
*/
case SCD_MSG_WAKE_UP_SYSCALL_THREAD:
thread = find_thread(0, packet->ttid, &lock);
thread = find_thread(0, packet->ttid);
if (!thread) {
kprintf("%s: WARNING: no thread for SCD reply? TID: %d\n",
__FUNCTION__, packet->ttid);
ret = -EINVAL;
break;
}
thread_unlock(thread, &lock);
thread_unlock(thread);
dkprintf("%s: SCD_MSG_WAKE_UP_SYSCALL_THREAD: waking up tid %d\n",
__FUNCTION__, packet->ttid);
@ -655,12 +652,13 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
pp = ihk_mc_map_memory(NULL, packet->arg, sizeof(struct mcctrl_signal));
sp = (struct mcctrl_signal *)ihk_mc_map_virtual(pp, 1, PTATTR_WRITABLE | PTATTR_ACTIVE);
memcpy(&info, sp, sizeof(struct mcctrl_signal));
ihk_mc_unmap_virtual(sp, 1, 0);
ihk_mc_unmap_virtual(sp, 1);
ihk_mc_unmap_memory(NULL, pp, sizeof(struct mcctrl_signal));
pckt.msg = SCD_MSG_SEND_SIGNAL;
pckt.msg = SCD_MSG_SEND_SIGNAL_ACK;
pckt.err = 0;
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
syscall_channel_send(resp_channel, &pckt);
rc = do_kill(NULL, info.pid, info.tid, info.sig, &info.info, 0);
@ -669,7 +667,14 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
break;
case SCD_MSG_PROCFS_REQUEST:
process_procfs_request(packet);
case SCD_MSG_PROCFS_RELEASE:
pckt.msg = SCD_MSG_PROCFS_ANSWER;
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.err = process_procfs_request(packet);
pckt.reply = packet->reply;
pckt.pid = packet->pid;
syscall_channel_send(resp_channel, &pckt);
ret = 0;
break;
@ -706,17 +711,26 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
if (!pcd->exclude_user) {
mode |= PERFCTR_USER_MODE;
}
ihk_mc_perfctr_init_raw(pcd->target_cntr, pcd->config, mode);
ihk_mc_perfctr_stop(1 << pcd->target_cntr);
ihk_mc_perfctr_reset(pcd->target_cntr);
ret = ihk_mc_perfctr_init_raw(pcd->target_cntr, pcd->config, mode);
if (ret != 0) {
break;
}
ret = ihk_mc_perfctr_stop(1 << pcd->target_cntr);
if (ret != 0) {
break;
}
ret = ihk_mc_perfctr_reset(pcd->target_cntr);
break;
case PERF_CTRL_ENABLE:
ihk_mc_perfctr_start(pcd->target_cntr_mask);
ret = ihk_mc_perfctr_start(pcd->target_cntr_mask);
break;
case PERF_CTRL_DISABLE:
ihk_mc_perfctr_stop(pcd->target_cntr_mask);
ret = ihk_mc_perfctr_stop(pcd->target_cntr_mask);
break;
case PERF_CTRL_GET:
@ -727,15 +741,15 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
kprintf("%s: SCD_MSG_PERF_CTRL unexpected ctrl_type\n", __FUNCTION__);
}
ihk_mc_unmap_virtual(pcd, 1, 0);
ihk_mc_unmap_virtual(pcd, 1);
ihk_mc_unmap_memory(NULL, pp, sizeof(struct perf_ctrl_desc));
pckt.msg = SCD_MSG_PERF_ACK;
pckt.err = 0;
pckt.err = ret;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
ihk_ikc_send(resp_channel, &pckt, 0);
ret = 0;
break;
case SCD_MSG_CPU_RW_REG:

303
kernel/hugefileobj.c Normal file
View File

@ -0,0 +1,303 @@
#include <memobj.h>
#include <ihk/mm.h>
#include <kmsg.h>
#include <kmalloc.h>
#include <string.h>
#include <debug.h>
#if DEBUG_HUGEFILEOBJ
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
struct hugefilechunk {
struct list_head list;
off_t pgoff;
int npages;
void *mem;
};
struct hugefileobj {
struct memobj memobj;
size_t pgsize;
uintptr_t handle;
unsigned int pgshift;
struct list_head chunk_list;
ihk_spinlock_t chunk_lock;
struct list_head obj_list;
};
static ihk_spinlock_t hugefileobj_list_lock;
static LIST_HEAD(hugefileobj_list);
static struct hugefileobj *to_hugefileobj(struct memobj *memobj)
{
return (struct hugefileobj *)memobj;
}
static struct memobj *to_memobj(struct hugefileobj *obj)
{
return &obj->memobj;
}
static struct hugefileobj *hugefileobj_lookup(uintptr_t handle)
{
struct hugefileobj *p;
list_for_each_entry(p, &hugefileobj_list, obj_list) {
if (p->handle == handle) {
/* for the interval between last put and fileobj_free
* taking list_lock
*/
if (memobj_ref(&p->memobj) <= 1) {
ihk_atomic_dec(&p->memobj.refcnt);
continue;
}
return p;
}
}
return NULL;
}
static int hugefileobj_get_page(struct memobj *memobj, off_t off,
int p2align, uintptr_t *physp,
unsigned long *pflag, uintptr_t virt_addr)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk;
off_t pgoff;
if (p2align != obj->pgshift - PTL1_SHIFT) {
kprintf("%s: p2align %d but expected %d\n",
__func__, p2align, obj->pgshift - PTL1_SHIFT);
return -ENOMEM;
}
pgoff = off >> obj->pgshift;
ihk_mc_spinlock_lock_noirq(&obj->chunk_lock);
list_for_each_entry(chunk, &obj->chunk_list, list) {
if (pgoff >= chunk->pgoff + chunk->npages)
continue;
if (pgoff >= chunk->pgoff)
break;
kprintf("%s: no segment found for pgoff %lx (obj %p)\n",
__func__, pgoff, obj);
chunk = NULL;
break;
}
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
if (!chunk)
return -EIO;
*physp = virt_to_phys(chunk->mem + (off - chunk->pgoff * PAGE_SIZE));
return 0;
}
static void hugefileobj_free(struct memobj *memobj)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk, *next;
dkprintf("Destroying hugefileobj %p\n", memobj);
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
list_del(&obj->obj_list);
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
kfree(memobj->path);
/* don't bother with chunk_lock, memobj refcounting makes this safe */
list_for_each_entry_safe(chunk, next, &obj->chunk_list, list) {
ihk_mc_free_pages_user(chunk->mem, chunk->npages);
kfree(chunk);
}
kfree(memobj);
}
struct memobj_ops hugefileobj_ops = {
.free = hugefileobj_free,
.get_page = hugefileobj_get_page,
};
void hugefileobj_cleanup(void)
{
struct hugefileobj *obj;
int refcnt;
while (true) {
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
if (list_empty(&hugefileobj_list)) {
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
break;
}
obj = list_first_entry(&hugefileobj_list, struct hugefileobj,
obj_list);
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
if ((refcnt = memobj_unref(to_memobj(obj))) != 0) {
kprintf("%s: obj %p had refcnt %ld > 1, destroying anyway\n",
__func__, obj, refcnt + 1);
hugefileobj_free(to_memobj(obj));
}
}
}
int hugefileobj_pre_create(struct pager_create_result *result,
struct memobj **objp, int *maxprotp)
{
struct hugefileobj *obj;
ihk_mc_spinlock_lock_noirq(&hugefileobj_list_lock);
obj = hugefileobj_lookup(result->handle);
if (obj)
goto out_unlock;
obj = kmalloc(sizeof(*obj), IHK_MC_AP_NOWAIT);
if (!obj)
return -ENOMEM;
obj->handle = result->handle;
obj->pgsize = result->size;
obj->pgshift = 0;
INIT_LIST_HEAD(&obj->chunk_list);
ihk_mc_spinlock_init(&obj->chunk_lock);
obj->memobj.flags = result->flags;
obj->memobj.status = MEMOBJ_TO_BE_PREFETCHED;
obj->memobj.ops = &hugefileobj_ops;
/* keep mapping around when process is gone */
ihk_atomic_set(&obj->memobj.refcnt, 2);
if (result->path[0]) {
obj->memobj.path = kmalloc(PATH_MAX, IHK_MC_AP_NOWAIT);
if (!obj->memobj.path) {
kfree(obj);
return -ENOMEM;
}
strncpy(obj->memobj.path, result->path, PATH_MAX);
}
list_add(&obj->obj_list, &hugefileobj_list);
out_unlock:
ihk_mc_spinlock_unlock_noirq(&hugefileobj_list_lock);
*maxprotp = result->maxprot;
*objp = to_memobj(obj);
return 0;
}
int hugefileobj_create(struct memobj *memobj, size_t len, off_t off,
int *pgshiftp, uintptr_t virt_addr)
{
struct hugefileobj *obj = to_hugefileobj(memobj);
struct hugefilechunk *chunk = NULL, *old_chunk = NULL;
int p2align;
unsigned int pgshift;
int npages, npages_left;
void *v;
off_t pgoff, next_pgoff;
int error;
error = arch_get_smaller_page_size(NULL, obj->pgsize + 1, NULL,
&p2align);
if (error)
return error;
pgshift = p2align + PTL1_SHIFT;
if (1 << pgshift != obj->pgsize) {
dkprintf("invalid hugefileobj pagesize: %d\n",
obj->pgsize);
return -EINVAL;
}
if (len & ((1 << pgshift) - 1)) {
dkprintf("invalid hugetlbfs mmap size %d (pagesize %d)\n",
len, 1 << pgshift);
obj->pgshift = 0;
return -EINVAL;
}
if (off & ((1 << pgshift) - 1)) {
dkprintf("invalid hugetlbfs mmap offset %d (pagesize %d)\n",
off, 1 << pgshift);
obj->pgshift = 0;
return -EINVAL;
}
ihk_mc_spinlock_lock_noirq(&obj->chunk_lock);
if (obj->pgshift && obj->pgshift != pgshift) {
kprintf("pgshift changed between two calls on same inode?! had %d now %d\n",
obj->pgshift, pgshift);
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
return -EINVAL;
}
obj->pgshift = pgshift;
/* Prealloc upfront, we need to fail here if not enough memory. */
if (!list_empty(&obj->chunk_list))
old_chunk = list_first_entry(&obj->chunk_list,
struct hugefilechunk, list);
pgoff = off >> PAGE_SHIFT;
npages_left = len >> PAGE_SHIFT;
npages = npages_left;
while (npages_left) {
while (old_chunk &&
pgoff >= old_chunk->pgoff + old_chunk->npages) {
if (list_is_last(&old_chunk->list, &obj->chunk_list)) {
old_chunk = NULL;
break;
}
old_chunk = list_entry(old_chunk->list.next,
struct hugefilechunk, list);
}
if (old_chunk) {
next_pgoff = old_chunk->pgoff + old_chunk->npages;
if (pgoff >= old_chunk->pgoff && pgoff < next_pgoff) {
npages_left -= next_pgoff - pgoff;
pgoff = next_pgoff;
continue;
}
}
if (!chunk) {
chunk = kmalloc(sizeof(*chunk), IHK_MC_AP_NOWAIT);
}
if (!chunk) {
kprintf("could not allocate hugefileobj chunk\n");
return -ENOMEM;
}
if (npages > npages_left)
npages = npages_left;
v = ihk_mc_alloc_aligned_pages_user(npages, p2align,
IHK_MC_AP_NOWAIT | IHK_MC_AP_USER, virt_addr);
if (!v) {
if (npages == 1) {
dkprintf("could not allocate more pages wth pgshift %d\n",
pgshift);
kfree(chunk);
/* caller will cleanup the rest */
return -ENOMEM;
}
/* exponential backoff, try less aggressive? */
npages /= 2;
continue;
}
memset(v, 0, npages * PAGE_SIZE);
chunk->npages = npages;
chunk->mem = v;
chunk->pgoff = pgoff;
/* ordered list: insert before next (bigger) element */
if (old_chunk)
list_add(&chunk->list, old_chunk->list.prev);
else
list_add(&chunk->list, obj->chunk_list.prev);
pgoff += npages;
npages_left -= npages;
}
obj->memobj.size = len;
ihk_mc_spinlock_unlock_noirq(&obj->chunk_lock);
*pgshiftp = pgshift;
return 0;
}

View File

@ -21,7 +21,7 @@
struct kmalloc_header {
unsigned int front_magic;
unsigned int cpu_id;
int cpu_id;
struct list_head list;
int size; /* The size of this chunk without the header */
unsigned int end_magic;
@ -74,6 +74,7 @@ struct cpu_local_var {
struct thread *current;
struct list_head runq;
size_t runq_len;
size_t runq_reserved; /* Number of threads which are about to be added to runq */
struct ihk_ikc_channel_desc *ikc2linux;
@ -99,6 +100,9 @@ struct cpu_local_var {
struct list_head smp_func_req_list;
struct process_vm *on_fork_vm;
/* UTI */
void *uti_futex_resp;
} __attribute__((aligned(64)));
@ -110,4 +114,6 @@ static struct cpu_local_var *get_this_cpu_local_var(void)
#define cpu_local_var(name) get_this_cpu_local_var()->name
#define cpu_local_var_with_override(name, clv_override) (clv_override ? clv_override->name : get_this_cpu_local_var()->name)
#endif

54
kernel/include/debug.h Normal file
View File

@ -0,0 +1,54 @@
#ifndef DEBUG_H
#define DEBUG_H
#include "lwk/compiler.h"
void panic(const char *);
/* when someone has a lot of time, add attribute __printf(1, 2) to kprintf */
int kprintf(const char *format, ...);
struct ddebug {
const char *file;
const char *func;
const char *fmt;
unsigned int line:24;
unsigned int flags:8;
} __aligned(8);
#define DDEBUG_NONE 0x0
#define DDEBUG_PRINT 0x1
#define DDEBUG_DEFAULT DDEBUG_NONE
#define DDEBUG_SYMBOL() \
static struct ddebug __aligned(8) \
__attribute__((section("__verbose"))) ddebug = { \
.file = __FILE__, \
.func = __func__, \
.line = __LINE__, \
.flags = DDEBUG_DEFAULT, \
}
#define DDEBUG_TEST ddebug.flags
#define dkprintf(fmt, args...) \
do { \
DDEBUG_SYMBOL(); \
if (DDEBUG_TEST) \
kprintf(fmt, ##args); \
} while (0)
#define ekprintf(fmt, args...) kprintf(fmt, ##args)
#define BUG_ON(condition) do { \
if (condition) { \
kprintf("PANIC: %s: %s(line:%d)\n", \
__FILE__, __func__, __LINE__); \
panic(""); \
} \
} while (0)
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#endif

View File

@ -63,7 +63,7 @@
#define FUTEX_OP_ANDN 3 /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR 4 /* *(int *)UADDR2 ^= OPARG; */
#define FUTEX_OP_OPARG_SHIFT 8 /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_OPARG_SHIFT 8U /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_CMP_EQ 0 /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE 1 /* if (oldval != CMPARG) wake */
@ -150,6 +150,7 @@ union futex_key {
extern int futex_init(void);
struct cpu_local_var;
extern int
futex(
uint32_t __user * uaddr,
@ -159,7 +160,8 @@ futex(
uint32_t __user * uaddr2,
uint32_t val2,
uint32_t val3,
int fshared
int fshared,
struct cpu_local_var *clv_override
);

View File

@ -33,6 +33,7 @@ extern void cpu_sysfs_setup(void);
extern void numa_sysfs_setup(void);
extern void rusage_sysfs_setup(void);
extern void status_sysfs_setup(void);
extern void dynamic_debug_sysfs_setup(void);
extern char *find_command_line(char *name);

View File

@ -13,11 +13,9 @@
#ifndef __HEADER_KMALLOC_H
#define __HEADER_KMALLOC_H
#include <ihk/mm.h>
#include <cls.h>
void panic(const char *);
int kprintf(const char *format, ...);
#include "ihk/mm.h"
#include "cls.h"
#include "debug.h"
#define kmalloc(size, flag) ({\
void *r = _kmalloc(size, flag, __FILE__, __LINE__);\
@ -34,7 +32,7 @@ void *__kmalloc(int size, ihk_mc_ap_flag flag);
void __kfree(void *ptr);
int _memcheck(void *ptr, char *msg, char *file, int line, int free);
int memcheckall();
int memcheckall(void);
int freecheck(int runcount);
void kmalloc_consolidate_free_list(void);

View File

@ -12,11 +12,8 @@
/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
/* XXX: barrier is also defined in lib/include/ihk/cpu.h,
* it would be cleaner to restore this here at some point, but we have
* quite a few C files not including either this or kernel's compiler.h
* #define barrier() __asm__ __volatile__("": : :"memory")
*/
#define barrier() __asm__ __volatile__("": : :"memory")
/*
* This version is i.e. to prevent dead stores elimination on @ptr
* where gcc and llvm may behave differently when otherwise using

View File

@ -3,6 +3,8 @@
#ifndef __ASSEMBLY__
#include <types.h>
#ifdef __CHECKER__
# define __user __attribute__((noderef, address_space(1)))
# define __kernel __attribute__((address_space(0)))
@ -175,11 +177,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
# define unlikely(x) __builtin_expect(!!(x), 0)
#endif
/* Optimization barrier */
#ifndef barrier
# define barrier() __memory_barrier()
#endif
#ifndef barrier_data
# define barrier_data(ptr) barrier()
#endif
@ -490,4 +487,62 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
(_________p1); \
})
extern void *memcpy(void *dest, const void *src, size_t n);
static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(unsigned char *)res = *(volatile unsigned char *)p; break;
case 2: *(unsigned short *)res = *(volatile unsigned short *)p; break;
case 4: *(unsigned int *)res = *(volatile unsigned int *)p; break;
case 8: *(unsigned long long *)res = *(volatile unsigned long long *)p; break;
default:
barrier();
memcpy((void *)res, (const void *)p, size);
barrier();
}
}
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(volatile unsigned char *)p = *(unsigned char *)res; break;
case 2: *(volatile unsigned short *)p = *(unsigned short *)res; break;
case 4: *(volatile unsigned int *)p = *(unsigned int *)res; break;
case 8: *(volatile unsigned long long *)p = *(unsigned long long *)res; break;
default:
barrier();
memcpy((void *)p, (const void *)res, size);
barrier();
}
}
/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
* READ_ONCE, WRITE_ONCE and ACCESS_ONCE (see below), but only when the
* compiler is aware of some particular ordering. One way to make the
* compiler aware of ordering is to put the two invocations of READ_ONCE,
* WRITE_ONCE or ACCESS_ONCE() in different C statements.
*
* In contrast to ACCESS_ONCE these two macros will also work on aggregate
* data types like structs or unions. If the size of the accessed data
* type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
* READ_ONCE() and WRITE_ONCE() will fall back to memcpy and print a
* compile-time warning.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
* and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
*/
#define READ_ONCE(x) \
({ union { typeof(x) __val; char __c[1]; } __u; __read_once_size(&(x), __u.__c, sizeof(x)); __u.__val; })
#define WRITE_ONCE(x, val) \
({ typeof(x) __val = (val); __write_once_size(&(x), &__val, sizeof(__val)); __val; })
#endif /* __LWK_COMPILER_H */

View File

@ -25,7 +25,7 @@
#define FUTEX_OP_ANDN 3 /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR 4 /* *(int *)UADDR2 ^= OPARG; */
#define FUTEX_OP_OPARG_SHIFT 8 /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_OPARG_SHIFT 8U /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_CMP_EQ 0 /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE 1 /* if (oldval != CMPARG) wake */

View File

@ -19,6 +19,7 @@
#include <ihk/lock.h>
#include <errno.h>
#include <list.h>
#include <pager.h>
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
#else /* POSTK_DEBUG_ARCH_DEP_18 */
@ -44,8 +45,7 @@ enum {
MF_XPMEM = 0x10000, /* To identify XPMEM attachment pages for rusage accounting */
MF_ZEROOBJ = 0x20000, /* To identify pages of anonymous, on-demand paging ranges for rusage accounting */
MF_SHM = 0x40000,
MF_HOST_RELEASED = 0x80000000,
MF_END
MF_HUGETLBFS = 0x100000,
};
#define MEMOBJ_READY 0
@ -56,15 +56,15 @@ struct memobj {
uint32_t flags;
uint32_t status;
size_t size;
ihk_spinlock_t lock;
ihk_atomic_t refcnt;
/* For pre-mapped memobjects */
void **pages;
int nr_pages;
char *path;
};
typedef void memobj_release_func_t(struct memobj *obj);
typedef void memobj_ref_func_t(struct memobj *obj);
typedef void memobj_free_func_t(struct memobj *obj);
typedef int memobj_get_page_func_t(struct memobj *obj, off_t off, int p2align, uintptr_t *physp, unsigned long *flag, uintptr_t virt_addr);
typedef uintptr_t memobj_copy_page_func_t(struct memobj *obj, uintptr_t orgphys, int p2align);
typedef int memobj_flush_page_func_t(struct memobj *obj, uintptr_t phys, size_t pgsize);
@ -72,27 +72,28 @@ typedef int memobj_invalidate_page_func_t(struct memobj *obj, uintptr_t phys, si
typedef int memobj_lookup_page_func_t(struct memobj *obj, off_t off, int p2align, uintptr_t *physp, unsigned long *flag);
struct memobj_ops {
memobj_release_func_t * release;
memobj_ref_func_t * ref;
memobj_get_page_func_t * get_page;
memobj_copy_page_func_t * copy_page;
memobj_flush_page_func_t * flush_page;
memobj_invalidate_page_func_t * invalidate_page;
memobj_lookup_page_func_t * lookup_page;
memobj_free_func_t *free;
memobj_get_page_func_t *get_page;
memobj_copy_page_func_t *copy_page;
memobj_flush_page_func_t *flush_page;
memobj_invalidate_page_func_t *invalidate_page;
memobj_lookup_page_func_t *lookup_page;
};
static inline void memobj_release(struct memobj *obj)
static inline int memobj_ref(struct memobj *obj)
{
if (obj->ops->release) {
(*obj->ops->release)(obj);
}
return ihk_atomic_inc_return(&obj->refcnt);
}
static inline void memobj_ref(struct memobj *obj)
static inline int memobj_unref(struct memobj *obj)
{
if (obj->ops->ref) {
(*obj->ops->ref)(obj);
int cnt;
if ((cnt = ihk_atomic_dec_return(&obj->refcnt)) == 0) {
(*obj->ops->free)(obj);
}
return cnt;
}
static inline int memobj_get_page(struct memobj *obj, off_t off,
@ -139,16 +140,6 @@ static inline int memobj_lookup_page(struct memobj *obj, off_t off,
return -ENXIO;
}
static inline void memobj_lock(struct memobj *obj)
{
ihk_mc_spinlock_lock_noirq(&obj->lock);
}
static inline void memobj_unlock(struct memobj *obj)
{
ihk_mc_spinlock_unlock_noirq(&obj->lock);
}
static inline int memobj_has_pager(struct memobj *obj)
{
return !!(obj->flags & MF_HAS_PAGER);
@ -165,5 +156,10 @@ int shmobj_create(struct shmid_ds *ds, struct memobj **objp);
int zeroobj_create(struct memobj **objp);
int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxprotp,
int prot, int populate_flags);
int hugefileobj_pre_create(struct pager_create_result *result,
struct memobj **objp, int *maxprotp);
int hugefileobj_create(struct memobj *obj, size_t len, off_t off,
int *pgshiftp, uintptr_t virt_addr);
void hugefileobj_cleanup(void);
#endif /* HEADER_MEMOBJ_H */

View File

@ -13,6 +13,7 @@
#define HEADER_PAGER_H
#include <ihk/types.h>
#include <limits.h>
enum pager_op {
PAGER_REQ_CREATE = 0x0001,
@ -32,6 +33,7 @@ struct pager_create_result {
int maxprot;
uint32_t flags;
size_t size;
char path[PATH_MAX];
};
/*
@ -46,6 +48,7 @@ struct pager_map_result {
uintptr_t handle;
int maxprot;
int8_t padding[4];
char path[PATH_MAX];
};
/* for pager_req_pfn() */

View File

@ -70,10 +70,8 @@
#define PS_TRACED 0x40 /* Set to "not running" by a ptrace related event */
#define PS_STOPPING 0x80
#define PS_TRACING 0x100
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
#define PS_DELAY_STOPPED 0x200
#define PS_DELAY_TRACED 0x400
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
#define PS_NORMAL (PS_INTERRUPTIBLE | PS_UNINTERRUPTIBLE)
@ -142,9 +140,7 @@
#define WCONTINUED 0x00000008
#define WNOWAIT 0x01000000 /* Don't reap, just poll status. */
#ifdef POSTK_DEBUG_ARCH_DEP_44 /* wait() add support __WALL */
#define __WALL 0x40000000 /* Wait on all children, regardless of type */
#endif /* POSTK_DEBUG_ARCH_DEP_44 */
#define __WCLONE 0x80000000
/* idtype */
@ -246,6 +242,11 @@ enum mpol_rebind_step {
#define SPAWN_TO_REMOTE 1
#define SPAWNING_TO_REMOTE 1001
#define UTI_STATE_DEAD 0
#define UTI_STATE_PROLOGUE 1
#define UTI_STATE_RUNNING_IN_LINUX 2
#define UTI_STATE_EPILOGUE 3
#include <waitq.h>
#include <futex.h>
@ -279,6 +280,7 @@ extern struct list_head resource_set_list;
extern mcs_rwlock_lock_t resource_set_lock;
extern int idle_halt;
extern int allow_oversubscribe;
extern ihk_spinlock_t runq_reservation_lock; /* mutex for cpuid reservation (clv->runq_reserved) */
struct process_hash {
struct list_head list[HASH_SIZE];
@ -462,6 +464,14 @@ struct process {
// threads and children
struct list_head threads_list;
struct list_head report_threads_list;
/*
* main_thread is used to refer to thread information using process ID.
* 1) signal related state in signal_flags
* 2) status of trace
*/
struct thread *main_thread;
mcs_rwlock_lock_t threads_lock; // lock for threads_list
/* TID set of proxy process */
struct mcexec_tid *tids;
@ -490,7 +500,6 @@ struct process {
// V +---- |
// PS_STOPPED -----+
// (PS_TRACED)
unsigned long exit_status; // only for zombie
/* Store exit_status for a group of threads when stopped by SIGSTOP.
exit_status can't be used because values of exit_status of threads
@ -522,22 +531,6 @@ struct process {
long saved_cmdline_len;
cpu_set_t cpu_set;
/* Store ptrace flags.
* The lower 8 bits are PTRACE_O_xxx of the PTRACE_SETOPTIONS request.
* Other bits are for inner use of the McKernel.
*/
int ptrace;
/* Store ptrace event message.
* PTRACE_O_xxx will store event message here.
* PTRACE_GETEVENTMSG will get from here.
*/
unsigned long ptrace_eventmsg;
/* Store event related to signal. For example,
it represents that the proceess has been resumed by SIGCONT. */
int signal_flags;
/* Store signal sent to parent when the process terminates. */
int termsig;
@ -559,6 +552,9 @@ struct process {
size_t mpol_threshold;
unsigned long heap_extension;
unsigned long mpol_bind_mask;
int uti_thread_rank; /* Spawn on Linux CPU when clone_count reaches this */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int clone_count;
// perf_event
int perf_status;
@ -574,6 +570,7 @@ struct process {
unsigned long profile_elapsed_ts;
#endif // PROFILE_ENABLE
int nr_processes; /* For partitioned execution */
int process_rank; /* Rank in partition */
};
/*
@ -604,7 +601,7 @@ struct thread {
// thread info
int cpu_id;
int tid;
int status; // PS_RUNNING -> PS_EXITED
int status; // PS_RUNNING -> PS_EXITED (-> ZOMBIE / ptrace)
// | ^ ^
// | | |
// V | |
@ -614,6 +611,14 @@ struct thread {
// PS_UNINTERRUPTIBLE
int exit_status;
/*
* Store event related to signal. For example,
* it represents that the proceess has been resumed by SIGCONT.
*/
int signal_flags;
int termsig;
// process vm
struct process_vm *vm;
@ -633,6 +638,22 @@ struct thread {
ihk_spinlock_t spin_sleep_lock;
int spin_sleep;
// for ptrace
struct process *report_proc;
struct list_head report_siblings_list; // lock process
/* Store ptrace flags.
* The lower 8 bits are PTRACE_O_xxx of the PTRACE_SETOPTIONS request.
* Other bits are for inner use of the McKernel.
*/
int ptrace;
/* Store ptrace event message.
* PTRACE_O_xxx will store event message here.
* PTRACE_GETEVENTMSG will get from here.
*/
unsigned long ptrace_eventmsg;
ihk_atomic_t refcount;
int *clear_child_tid;
@ -689,10 +710,11 @@ struct thread {
/* Syscall offload wait queue head */
struct waitq scd_wq;
int thread_offloaded;
int uti_state;
int mod_clone;
struct uti_attr *mod_clone_arg;
int parent_cpuid;
int uti_refill_tid;
// for performance counter
unsigned long pmc_alloc_map;
@ -718,6 +740,8 @@ struct process_vm {
// 2. addition of process page table (allocate_pages, update_process_page_table)
// note that physical memory allocator (ihk_mc_alloc_pages, ihk_pagealloc_alloc)
// is protected by its own lock (see ihk/manycore/generic/page_alloc.c)
unsigned long is_memory_range_lock_taken;
/* #986: Fix deadlock between do_page_fault_process_vm() and set_host_vma() */
ihk_atomic_t refcount;
int exiting;
@ -821,14 +845,32 @@ void cpu_clear_and_set(int c_cpu, int s_cpu,
void release_cpuid(int cpuid);
struct thread *find_thread(int pid, int tid, struct mcs_rwlock_node_irqsave *lock);
void thread_unlock(struct thread *thread, struct mcs_rwlock_node_irqsave *lock);
struct thread *find_thread(int pid, int tid);
void thread_unlock(struct thread *thread);
struct process *find_process(int pid, struct mcs_rwlock_node_irqsave *lock);
void process_unlock(struct process *proc, struct mcs_rwlock_node_irqsave *lock);
void chain_process(struct process *);
void chain_thread(struct thread *);
void proc_init();
void set_timer();
void proc_init(void);
void set_timer(int runq_locked);
struct sig_pending *hassigpending(struct thread *thread);
extern int do_signal(unsigned long rc, void *regs0, struct thread *thread,
struct sig_pending *pending, int num);
extern void check_signal(unsigned long rc, void *regs0, int num);
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig,
struct siginfo *info, int ptracecont);
extern void set_signal(int sig, void *regs, struct siginfo *info);
extern void check_sig_pending(void);
void clear_single_step(struct thread *thread);
void release_fp_regs(struct thread *proc);
void save_fp_regs(struct thread *proc);
void copy_fp_regs(struct thread *from, struct thread *to);
void restore_fp_regs(struct thread *proc);
void clear_fp_regs(void);
#define VERIFY_READ 0
#define VERIFY_WRITE 1
int access_ok(struct process_vm *vm, int type, uintptr_t addr, size_t len);
#endif

View File

@ -10,6 +10,7 @@
#include <rusage.h>
#include <ihk/ihk_monitor.h>
#include <arch_rusage.h>
#include <debug.h>
#ifdef ENABLE_RUSAGE
@ -55,7 +56,7 @@ rusage_rss_add(unsigned long size)
}
vm->currss += size;
if (vm->currss > vm->proc->maxrss) {
if (vm->proc && vm->currss > vm->proc->maxrss) {
vm->proc->maxrss = vm->currss;
}
}
@ -118,8 +119,9 @@ static inline int rusage_memory_stat_add(struct vm_range *range, uintptr_t phys,
struct page *page = phys_to_page(phys);
/* Is It file map and cow page? */
if ((range->memobj->flags & (MF_DEV_FILE | MF_REG_FILE)) &&
!page) {
if ((range->memobj->flags & (MF_DEV_FILE | MF_REG_FILE |
MF_HUGETLBFS)) &&
!page) {
//kprintf("%s: cow,phys=%lx\n", __FUNCTION__, phys);
memory_stat_rss_add(size, pgsize);
return 1;

View File

@ -57,6 +57,7 @@ struct shmobj {
struct shmlock_user * user;
struct shmid_ds ds;
struct list_head page_list;
ihk_spinlock_t page_list_lock;
struct list_head chain; /* shmobj_list */
};
@ -104,7 +105,6 @@ static inline void shmlock_users_unlock(void)
void shmobj_list_lock(void);
void shmobj_list_unlock(void);
int shmobj_create_indexed(struct shmid_ds *ds, struct shmobj **objp);
void shmobj_destroy(struct shmobj *obj);
void shmlock_user_free(struct shmlock_user *user);
int shmlock_user_get(uid_t ruid, struct shmlock_user **userp);

View File

@ -30,7 +30,6 @@
#define SCD_MSG_PREPARE_PROCESS 0x1
#define SCD_MSG_PREPARE_PROCESS_ACKED 0x2
#define SCD_MSG_PREPARE_PROCESS_NACKED 0x7
#define SCD_MSG_SCHEDULE_PROCESS 0x3
#define SCD_MSG_WAKE_UP_SYSCALL_THREAD 0x14
@ -38,8 +37,9 @@
#define SCD_MSG_INIT_CHANNEL_ACKED 0x6
#define SCD_MSG_SYSCALL_ONESIDE 0x4
#define SCD_MSG_SEND_SIGNAL 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_SEND_SIGNAL 0x7
#define SCD_MSG_SEND_SIGNAL_ACK 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_GET_VDSO_INFO 0xa
#define SCD_MSG_GET_CPU_MAPPING 0xc
@ -49,6 +49,7 @@
#define SCD_MSG_PROCFS_DELETE 0x11
#define SCD_MSG_PROCFS_REQUEST 0x12
#define SCD_MSG_PROCFS_ANSWER 0x13
#define SCD_MSG_PROCFS_RELEASE 0x15
#define SCD_MSG_DEBUG_LOG 0x20
@ -82,6 +83,8 @@
#define SCD_MSG_CPU_RW_REG 0x52
#define SCD_MSG_CPU_RW_REG_RESP 0x53
#define SCD_MSG_FUTEX_WAKE 0x60
/* Cloning flags. */
# define CSIGNAL 0x000000ff /* Signal mask to be sent at exit. */
# define CLONE_VM 0x00000100 /* Set if VM shared between processes. */
@ -168,10 +171,8 @@ typedef unsigned long __cpu_set_unit;
struct program_load_desc {
int num_sections;
int status;
int cpu;
int pid;
int err;
int stack_prot;
int pgid;
int cred[8];
@ -199,8 +200,10 @@ struct program_load_desc {
unsigned long heap_extension;
long stack_premap;
unsigned long mpol_bind_mask;
int uti_thread_rank; /* N-th clone() spawns a thread on Linux CPU */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int nr_processes;
char shell_path[SHELL_PATH_MAX_LEN];
int process_rank;
__cpu_set_unit cpu_set[PLD_CPU_SET_SIZE];
int profile;
struct program_image_section sections[0];
@ -241,6 +244,7 @@ enum mcctrl_os_cpu_operation {
struct ikc_scd_packet {
int msg;
int err;
void *reply;
union {
/* for traditional SCD_MSG_* */
struct {
@ -259,7 +263,7 @@ struct ikc_scd_packet {
long sysfs_arg3;
};
/* SCD_MSG_SCHEDULE_THREAD */
/* SCD_MSG_WAKE_UP_SYSCALL_THREAD */
struct {
int ttid;
};
@ -275,8 +279,14 @@ struct ikc_scd_packet {
struct {
int eventfd_type;
};
/* SCD_MSG_FUTEX_WAKE */
struct {
void *resp;
int *spin_sleep; /* 1: waiting in linux_wait_event() 0: woken up by someone else */
} futex;
};
char padding[12];
char padding[8];
};
#define IHK_SCD_REQ_THREAD_SPINNING 0
@ -337,10 +347,10 @@ struct syscall_post {
SYSCALL_ARG_##a2(2); SYSCALL_ARG_##a3(3); \
SYSCALL_ARG_##a4(4); SYSCALL_ARG_##a5(5);
#define SYSCALL_FOOTER return do_syscall(&request, ihk_mc_get_processor_id(), 0)
#define SYSCALL_FOOTER return do_syscall(&request, ihk_mc_get_processor_id())
extern long do_syscall(struct syscall_request *req, int cpu, int pid);
extern int obtain_clone_cpuid();
extern long do_syscall(struct syscall_request *req, int cpu);
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last);
extern long syscall_generic_forwarding(int n, ihk_mc_user_context_t *ctx);
#define DECLARATOR(number,name) __NR_##name = number,
@ -354,17 +364,10 @@ enum {
#undef SYSCALL_DELEGATED
#define __NR_coredump 999 /* pseudo syscall for coredump */
#ifdef POSTK_DEBUG_TEMP_FIX_61 /* Core table size and lseek return value to loff_t */
struct coretable { /* table entry for a core chunk */
off_t len; /* length of the chunk */
unsigned long addr; /* physical addr of the chunk */
};
#else /* POSTK_DEBUG_TEMP_FIX_61 */
struct coretable { /* table entry for a core chunk */
int len; /* length of the chunk */
unsigned long addr; /* physical addr of the chunk */
};
#endif /* POSTK_DEBUG_TEMP_FIX_61 */
#ifdef POSTK_DEBUG_TEMP_FIX_1
void create_proc_procfs_files(int pid, int tid, int cpuid);
@ -384,7 +387,6 @@ struct procfs_read {
int count; /* bytes to read (request) */
int eof; /* if eof is detected, 1 otherwise 0. (answer)*/
int ret; /* read bytes (answer) */
int status; /* non-zero if done (answer) */
int newcpu; /* migrated new cpu (answer) */
int readwrite; /* 0:read, 1:write */
char fname[PROCFS_NAME_MAX]; /* procfs filename (request) */
@ -396,6 +398,8 @@ struct procfs_file {
char fname[PROCFS_NAME_MAX]; /* procfs filename (request) */
};
int process_procfs_request(struct ikc_scd_packet *rpacket);
#define RUSAGE_SELF 0
#define RUSAGE_CHILDREN -1
#define RUSAGE_THREAD 1
@ -458,10 +462,10 @@ static inline unsigned long timespec_to_jiffy(const struct timespec *ats)
return ats->tv_sec * 100 + ats->tv_nsec / 10000000;
}
void reset_cputime();
void reset_cputime(void);
void set_cputime(int mode);
int do_munmap(void *addr, size_t len);
intptr_t do_mmap(intptr_t addr0, size_t len0, int prot, int flags, int fd,
int do_munmap(void *addr, size_t len, int holding_memory_range_lock);
intptr_t do_mmap(uintptr_t addr0, size_t len0, int prot, int flags, int fd,
off_t off0);
void clear_host_pte(uintptr_t addr, size_t len);
typedef int32_t key_t;
@ -472,7 +476,16 @@ int arch_setup_vdso(void);
int arch_cpu_read_write_register(struct ihk_os_cpu_register *desc,
enum mcctrl_os_cpu_operation op);
struct vm_range_numa_policy *vm_range_policy_search(struct process_vm *vm, uintptr_t addr);
void calculate_time_from_tsc(struct timespec *ts);
time_t time(void);
long do_futex(int n, unsigned long arg0, unsigned long arg1,
unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5,
unsigned long _uti_clv,
void *uti_futex_resp,
void *_linux_wait_event,
void *_linux_printk,
void *_linux_clock_gettime);
#ifndef POSTK_DEBUG_ARCH_DEP_52
#define VDSO_MAXPAGES 2
@ -520,7 +533,7 @@ enum perf_ctrl_type {
struct perf_ctrl_desc {
enum perf_ctrl_type ctrl_type;
int status;
int err;
union {
/* for SET, GET */
struct {
@ -571,6 +584,15 @@ typedef struct uti_attr {
uint64_t flags; /* Representing location and behavior hints by bitmap */
} uti_attr_t;
struct uti_ctx {
union {
char ctx[4096];
struct {
int uti_refill_tid;
};
};
};
struct move_pages_smp_req {
unsigned long count;
const void **user_virt_addr;
@ -591,4 +613,9 @@ struct move_pages_smp_req {
#define PROCESS_VM_READ 0
#define PROCESS_VM_WRITE 1
/* uti: function pointers pointing to Linux codes */
extern long (*linux_wait_event)(void *_resp, unsigned long nsec_timeout);
extern int (*linux_printk)(const char *fmt, ...);
extern int (*linux_clock_gettime)(clockid_t clk_id, struct timespec *tp);
#endif

View File

@ -25,6 +25,8 @@
#define CLOCK_PROCESS_CPUTIME_ID 2
#define CLOCK_THREAD_CPUTIME_ID 3
typedef int clockid_t;
typedef long int __time_t;
/* POSIX.1b structure for a time value. This is like a `struct timeval' but

Some files were not shown because too many files have changed in this diff Show More