Compare commits

...

405 Commits

Author SHA1 Message Date
21cf953a03 x86: disable zero mapping and add a boot pt for ap trampoline
the application processor trampoline needs the trampoline physical
address to be mapped for the few instructions between loading the
page table and jumping to normal memory area; setup a new pt for them.

Also make it use its stack where it needs to be directly.

With that, x86 can finally remove the 0 page from its init mapping

Change-Id: Iab3f33a2ed22570eeb47b5ab6e068c9a17c25413
2019-02-14 07:59:03 +00:00
c59d8db1b3 CMake: define RHEL_RELEASE_VERSION in config.h for non-rhel kernels
Change-Id: Iaa48e763be71e9cbc8dff6335810d3191bb3c177
2019-02-14 16:44:09 +09:00
abc0a7bdac mcs_rwlock: remove aligned(64) attribute if ENABLE_UBSAN
The attribute would impose 64-bytes alignment that we do not
respect later because the whole structures (e.g. process/thread)
are allocated at 32bytes boundaries with kmalloc

These are however justified for performance reason as we do not want
them on same page cache line, so just accept slower performance for
UBSAN only

Change-Id: Ia28968257675b7ae97b0391471986e6bf6485b7b
2019-02-14 16:44:09 +09:00
2f456b8752 cmake: Add ENABLE_UBSAN for -fsanitize=undefined
Change-Id: I73db5f904a7d86052aae62e67b01281763c83561
2019-02-14 16:44:09 +09:00
2a63c962fc build system switch to cmake
Remove old build system at the same time

Change-Id: Ifdffe1fcd4cfece05f036d8de6e7cb74aca65f62
2019-02-14 16:44:09 +09:00
4bdd9cf512 ubsan: remove most sprintf calls
sprintf is implemented as snprintf(..., INT_MAX, ...) which will overflow
the argument pointer for the end, then fix the end to be -1.
This technically works but we know the actual buffer size in all these
call sites, might as well do this properly

Change-Id: I807d09f46a0221f539063fda515e1c504e658d40
2019-02-14 16:44:09 +09:00
bc2a444828 ubsan: fix undefined shifts
A signed integer cannot be shifted in a way that will flip the
sign bit; make such arguments unsigned to be safe

Change-Id: Iafc060f98f899ae3ffb876ba22fdd6183fbb6e57
2019-02-14 16:44:09 +09:00
d9b2924249 Update patch for "Add test programs for large page"
Change-Id: I6ee96b677c65c5bf4b2312059abd689225c0581d
2019-02-14 16:26:20 +09:00
501531f3b3 shmobj: Don't page_unmap() when count isn't one in shmobj_destroy()
Change-Id: If9d567d61e1dc4db808a2aeee290034acf7be4b5
2019-02-14 16:26:19 +09:00
366e95856c Null-check ihk_os_t and mcctrl_usrdata pointers
Change-Id: I941c58d4ab6a0c1ce6bd53c24b552218a1716750
Refs: #1216
2019-02-14 16:26:19 +09:00
bdf5175d4c invalidate_one_page: Support shmobj and contiguous PTE
Change-Id: I15b74ee4afd8e2dc52c933925aae4a1e0d8bcc72
2019-02-14 16:26:18 +09:00
b174fb8099 move_pages: Check flags argument
Change-Id: Ia74aa463a060ecd43aa56ee08d622421f227dbfe
Fujitsu: POSTK_TEMP_FIX_78
2019-02-14 16:26:16 +09:00
e828398c8b do_mmap: don't pre-populate the whole file when asked for smaller segment
The linker maps parts of libs with different access flags,
so we cannot prepopulate the whole file.

[dominique.martinet@cea.fr: moved min and friends in compiler.h]
Change-Id: Ifbeddc0908699099cfae5ce9cc2adc578221db31
2019-02-14 16:26:15 +09:00
641d9f1b39 clear_range_l1, clear_range_middle: Fix handling contiguous PTE
Change-Id: I2609c94d7f9342fe25aa9a5cfc208375274d46fa
2019-02-14 16:26:14 +09:00
c1270cdf6d fileobj, shmobj: free pages in object destructor (as opposed to page_unmap())
Change-Id: I3ea50fc13ae5c090ba32aad4461f9741a4c35665
2019-02-14 16:26:00 +09:00
022e04b62b shmobj: Clean up code around memory_stat_rss_sub call
Change-Id: I6f678568c3c27799cd2a81f5574b96fd218e942f
2019-02-14 16:26:00 +09:00
9cfc373538 Refactor "do write back only MAP_SHARED pages"
* free_process_memory_range() always passes memobj to
  ihk_mc_pt_free_range()
* clear_range_*() don't flush page in fileobj with MF_PRIVATE flag

Fujitsu: POSTK_DEBUG_TEMP_FIX_87
Change-Id: I8d46d029b3fc51ca6f0e59d748a2fe93e324a374
2019-02-14 16:25:58 +09:00
fb24dcea2e unhandled_page_fault: Refactor architecture dependent parts
Fujitsu: REQ-12
Refs: #1012
Change-Id: I3c61f9cd3f514bdcd4a7f26e7c15043529269cf5
2019-02-14 16:25:57 +09:00
207d653b41 mcctrl: use vmf_insert_pfn for kernel >= 4.18
vmf_insert_pfn got added as a wrapper around vm_insert_pfn in 4.17
1c8f422059ae5da ("mm: change return type to vm_fault_t") and totally
replaced the later in 4.20 ae2b01f37044c ("mm: remove vm_insert_pfn()")

Compare with 4.18 here specifically to avoid troubles when rhel
backports this change later, and avoid adding a rhel version check down
the road.

Change-Id: Ibf108e2fb6f1199f89cde6a7973f4eb55447260b
2019-02-14 16:25:49 +09:00
0a49b6eca5 Add test programs for #1190
Change-Id: Icb63e898d5882e1fab18e6af7859af50448a1d60
2019-02-14 16:25:44 +09:00
950ea678dd Reject "setfsuid: Specify mcexec tid when asking mcexec for fsuid"
This fix is rejected because it only makes the setfsuid test in ostest
pass and doesn't fix the other issues including the one in which file
I/O could be done with the old fsuid because an mcexec thread with an
arbitrary tid could handle the system-call offload request.

Explanation of the rejected fix:

  setfsuid() proceeds as follows:

  1. McKernel asks mcexec for __NR_setfsuid (set)
  2. mcexec calls setfsuid, reports the id to McKernel
  3. McKernel asks mcexec for __NR_setfsuid (get)
  4. mcexec calls mcexec_getcred(), reports the id to Mckernel
  5. McKernel sets proc->fsuid to the obtained value

  tid of mcexec on the 2nd and 4th step could be different. So this
  fix lets mcexec report its tid on the 2nd step and McKernel specify
  it in the 3rd step.

Change-Id: Id5cfeed18c64430d576a56e961bbca1ecb2e39ad
Fujitsu: POSTK_DEBUG_TEMP_FIX_45
2019-02-14 04:42:32 +00:00
cd42d186b7 uti: Report error of offloading ioctl if any
Change-Id: If4218b9fb89f34728c4aaf81bccab2dfbb0d4a87
2019-02-14 04:15:44 +00:00
66bc44f88a Readme.md: move figures to R-CCS server
Change-Id: I6a861c15402c8e925e3692b912a8df3f6f0ffce9
2019-02-13 18:26:18 +09:00
34a995d290 perfctr_stop: add flags to no 'disable_intens'
The original fujitsu code added a whole new ihk_mc_perfctr_stop_first
function, duplicating a lot of code - add a flag to existing function
instead.

Change-Id: Ic9ce0236d68f967ff72cf88e5d9f1bda5c98aa1b
Fujitsu: POSTK_DEBUG_ARCH_DEP_107
2019-02-12 05:18:22 +00:00
d0d99adfb3 Readme.md for github
Change-Id: Ib5aa5cde10acb5f5956212f8c451baedc940d123
2019-02-12 02:37:09 +00:00
d78883c692 fix to missing exclusive processing between terminate() and
finalize_process().

The process of making a child process zombie and the process of setting
the parent of the child process to process ID 1 are excluded.

Refs: #1257
Change-Id: Ic95d4d8ee92d6a4a63847e5eda20ec1ba92566ac
2019-02-08 10:25:20 +09:00
ff0395581c Register PPD and release_handler at the same time.
Fix that process will remain even if signal is received between PPD
registration and release_handler registration.

Refs: #1201
Fujitsu: POSTK_DEBUG_TEMP_FIX_64
Change-Id: I571781963578df8cedb327f19298f595cfb137a3
2019-02-08 10:20:58 +09:00
f5023c9730 page fault handler: protect thread accesses
current cpu's thread can be NULL during init, we don't want null derefs
in the page fault handler

Change-Id: I0a2c22b39cae2c258d211317cffc2408e19f3bbf
2019-02-07 02:41:50 +00:00
fe08ac4a67 arm: turn off cpu on panic
Since interrupts are disabled on panic, linux cannot reset a
panic'd core when NMI are disabled (for e.g. mcreboot/mcstop)

Just always offline it, so linux can get it back

Change-Id: If8107172375f2924e02bd4c36e24645ec38a8999
2019-02-07 02:37:31 +00:00
60dcd0e798 move rusage into kernel ELF image (avoid dynamic alloc before NUMA init)
Change-Id: I7fe86244c8707694b379e567b31de65ee2c56887
2019-02-07 10:43:47 +09:00
4d215de641 Separate mmap area from program loading (relocation) area
We need to separate the two because the heap of a PIE is created in
the area to which it is mapped.

Related commits:

b1309a5d: PIE is mapped at map_end instead of at
          user_start
c4219655: Interpreter is mapped to map_start to make a
          system call that dereferences a NULL pointer fail

[dominique.martinet@cea.fr: Also add ULONG_MAX and friend macroes,
 used for data_min]
[ken.sato.ty@hitachi-solutions.com: fix execve]
Change-Id: I8ecaf22b7965090ab67bebece57c68283ba23664
2019-02-07 09:58:03 +09:00
97e0219f50 Make Linux handler run when mmap to procfs.
Change-Id: I98a3d098c5c676f33c83fa4354c623988ee591f2
Refs: #1222
2019-02-06 11:54:50 +00:00
f9d8d98af1 sysfs: add missing symlinks for cpu/node
Add the following patterns of symlinks:
 - /sys/bus/cpu/drivers/processor/cpu*
 - /sys/bus/node/devices/node*

And slightly change how /sys/devices/system/cpu/cpu*/node* are created
to avoid duplicate lookups

Change-Id: Id94a4d157da06d75f6bd450d5bd9a9e7709a1414
2019-02-06 09:55:54 +00:00
3738b70ad3 git hooks: fix submodule check sloppy match
Submodule check used to match any file containing submodule name (e.g.
lib/include/ihk/foo would match ihk and incorrectly be identified as
a submodule change) -- properly check for full name with anchors instead

Change-Id: Ib4330aec97e9da713cd3ab9e791962f2e0c8d396
2019-02-06 08:34:27 +00:00
9bf225d193 mckernel overlay: replace mcoverlayfs with a soft userspace overlay
mcoverlayfs has a high maintenance burden and does not work on rhel8's 4.18
kernel (while it works on vanilla 4.18...); instead of debugging this further
time is better spent making it independent from overlayfs.

Change-Id: I7454ae95b0fbb3373c256aa2fd83cdfec466c009
2019-02-06 08:27:25 +00:00
6fc9ec1c92 gencore: finish reintegration into arch-independent code
Change-Id: Ic2fc935aeec17c54931817bf43f67ef6da78adc8
Fujitsu: POSTK_DEBUG_ARCH_DEP_18
2019-02-06 17:23:54 +09:00
112ade484a page_table: Fix return value of lookup_pte when ptl4 is blank
Change-Id: I5926fedda182941a4b7a2fe480bffb12d4069713
2019-02-06 07:30:44 +00:00
be708674d3 Reject "do_migrate: Send IPI"
Change-Id: If77a51c9bc6a3caef502dd35a276b0dba22b4d24
Fujitsu: POSTK_TEMP_FIX_57
2019-02-06 04:11:16 +00:00
557f33a705 eliminate futex_cmpxchg_enabled check (not used and dereffed a NULL pointer)
Change-Id: I97b0e79acfd51b57eeaa6556eba880d231330f01
2019-02-06 02:47:31 +00:00
7dd0cbd9a6 ARM: eliminate zero page mapping (i.e, init_low_area())
Change-Id: I89bcce7fb286a4c5983a768534a0d3cea093040c
2019-02-04 04:22:24 +00:00
6ed2e5ffc1 Fix ThunderX2 write-combined PTE flag insanity
Change-Id: I59999a680b556acf3e22ac516f4758e3aee7f355
2019-02-01 21:03:19 +09:00
649059f2d2 contiguous PTE: Fix requested page-shift check
Change-Id: Iafc505457f7e10c94142070113870cd8b8c6922d
2019-02-01 21:01:27 +09:00
312c1168f3 test: XPMEM: Fix Makefile
Change-Id: If7b5887e9dc4d7f94bf18dc5ae95a549baa5fb58
2019-02-01 15:15:47 +09:00
d29419d336 test: Add test programs for #1242
Change-Id: Ib3b5d5b661e0cd027711a815d9da2e308cedeffc
Refs: #1242
2019-02-01 15:15:46 +09:00
9f7425c152 Add test programs for lage page
Tests arm64 specific, contiguous bit based large pages as well.

Change-Id: I09edad8cfde6c23a259f1f32cfc97974d9cb63c3
2019-02-01 15:15:44 +09:00
100754f556 test: add uti tests
Change-Id: Ib59f1c4dab7cec7e67ba35ec1988f6f968a2deaa
2019-02-01 15:15:14 +09:00
6d38c34993 Merge branch 'postk_topic-contiguous_pte' into development
* Merge cd7ab307fae9bc8aa49d23b32becf37368a1603e
* Merge commit is changed to one commit for gerrit

Change-Id: I75f0f4cf6b8b3286284638ac2c7816c5257551e4
2019-02-01 15:15:12 +09:00
7f1c17fc4c tests: add 'postk_master' branch tests
Change-Id: Ie0d4cfd0921aed89d2db6083c9eb068b1cfc1984
2019-02-01 15:15:00 +09:00
25ef4e9261 Merge branch 'postk_master' into development
* Merge 53e436ae7db1ed457692dbe16ccb15511aa6bc64
* Only arm64 stuff are left

Change-Id: I6b79de1f659fa61e75f44811b639d41f9a37d6cc
2019-02-01 15:14:58 +09:00
d4d78e9c61 Following arm64-support to development branch
This includes the following fixes:
* fix build of arch/arm64/kernel/vdso

Change-Id: I73b05034d29f7f8731ac17f9736edbba4fb2c639
2019-02-01 15:14:45 +09:00
e52d748744 new_mcos_handler_info: Propagate kmalloc failure
Change-Id: If484cf32cd0bf096ffd712561dd1f73046c60cd8
Fujitsu: POSTK_TEMP_FIX_64
2019-02-01 15:11:36 +09:00
39b21e7ba9 monitor_init: Use ihk_mc_cpu_info()
Its call site is moved before numa_init() as well because
monitor_init() defines ihk_os_monitor that was used in
rusage_total_memory_add() called from numa_init().
I didn't revert this modification because I don't want to touch the
working code.

Change-Id: I602467284581ce45989dd071cfe59d3fc4827e29
Fujitsu: POSTK_DEBUG_TEMP_FIX_73
2019-02-01 15:11:33 +09:00
8db2d3beec sysfs: use nr_cpu_ids for cpumasks (fixes libnuma parsing error on ARM)
Change-Id: I466ffbaf38fe5fd2b1ca0439fa7ea4a813e226ca
2019-02-01 15:08:49 +09:00
f5320fc2b4 overlayfs: make mcoverlayfs compile for 4.14.0-115 (el7 arm64)
Use the 4.18 module as a base

Change-Id: I6c9ef66399800828e1932573da5a97573545c5da
2019-02-01 15:08:47 +09:00
0fbdcc44b9 mcoverlayfs 4.18: re-define ovl_readlink
Apparently /proc needs it; it's normally implemented using get_link if
readlink isn't implemented but proc's get_link crashes the kernel in
this case (because nameidata is only defined for open* paths)

Change-Id: I1864d6c948db879d33ea29b1b281bf84ff8eeec6
2019-02-01 15:08:45 +09:00
351fdead3b kmalloc: Fix address order in free list
The order is expected by the merger.

Change-Id: I54338caaaa1a203ab5dd39a574a25aac324142a5
Fujitsu: POSTK_TEMP_FIX_46
2019-02-01 06:07:26 +00:00
859e976348 kernel/syscall.c: cleanup? pass virt_to_phys directly to do_futex
Change-Id: I196ebe5d5cdc577fce442bcd2247d07e85d2b9ff
2019-02-01 13:19:02 +09:00
49353e252b Added check of nohost to terminate_host().
Change-Id: I796a0d98b68783dad6ce04b3a80ca01db8f8eee2
Fujitsu: POSTK_DEBUG_TEMP_FIX_103
2019-02-01 13:19:00 +09:00
452d93f14d mcctrl_clear_pte_range: fix zap_page for kernel >= 4.18
zap_vma_ptes no longer returns an error code as of Linux's
27d036e33237e4 ("mm: Remove return value of zap_vma_ptes()"),
where they decided nobody is interested in it....

Just copy the check out of the function.

Change-Id: I2eda0f91ec55a34bba96f45cc3d887bc80132a82
Originally-by: Kagawa Kodai <fj1731iw@aa.jp.fujitsu.com>
2019-02-01 13:18:58 +09:00
9e5472bb94 Fix for PAGE_SIZE / PAGE_MASK magic number.
Change-Id: Icc00594d84a33495af774096ae13f830e29be39f
Fujitsu: POSTK_DEBUG_ARCH_DEP_116
2019-02-01 13:18:56 +09:00
516ab87ab9 Copyrights: fujitsu 2018 bump
Separate copyright bumps in a different commit.
A lot of files only had the copyright change at this point; these
were probably changes I added separatly in other patches but just
split these in a different commit instead to simplify git stats

Change-Id: I93cf3fc1c0fa04ee743a79c3fe9768933e6bd0d2
2019-02-01 13:18:52 +09:00
a9884453e2 vmcore2mckdump: make arm-compatible, 'fix' timeout
Change-Id: Icdb42ff47d9dff5c6a818cb8c9ae94d183b19569
Fujitsu: POSTK_DEBUG_ARCH_DEP_93
Fujitsu: POSTK_DEBUG_ARCH_DEP_102
2019-02-01 13:18:12 +09:00
0f01312040 configure.ac: remove duplicate executer/user/arch/x86_64/Makefile
Change-Id: I6b4b8e636f0194e390871600d6502d3cc94f042b
2019-02-01 13:18:10 +09:00
fb9832af6d perf counters: add arch-specific perf counters
arch perf counters are placed at start, so offset all
other counters (because placing arch perf counters at the end
wouldn't have been intrusive enough?)

Change-Id: Ifab1047872384927d9cfa0a0212327ee73545c29
Fujitsu: POSTK_DEBUG_ARCH_DEP_86
2019-02-01 13:18:09 +09:00
0e895478a1 mcctrl rus_mmap: make vma->vm_flags arch-dependent
[Dominique: renamed arch_vm_flags to arch_rus_vm_flags]
Change-Id: I5ec89b3ff80af6bf0ede342eb5816df8c78de348
Fujitsu: POSTK_DEBUG_ARCH_DEP_100
2019-02-01 13:18:07 +09:00
19659aa908 mcctrl: move translate_rva_to_rpa to archdep
Change-Id: I0efa51468a7ff4d776d8340a612e6f44eac2ed53
Fujitsu: POSTK_DEBUG_ARCH_DEP_83
2019-02-01 13:18:06 +09:00
e5de0b81ca ldump2mcdump: move PAGE_SHIFT to arch-dependent includes
Change-Id: I42e49db87e375f2dc094926e21dfc00e50484855
Fujitsu: POSTK_DEBUG_ARCH_DEP_94
2019-02-01 13:18:04 +09:00
f299fff266 stack: add hwcap auxval
Fix the AUXV_LEN to account for hwcap and remove the ifdefs

Change-Id: I303fc2c5fa4c8cea7ec9823f8580b8a66de2f58f
Fujitsu: POSTK_DEBUG_ARCH_DEP_65
2019-02-01 13:17:58 +09:00
206df33658 perfctr: remove ihk_mc_perfctr_fixed_init from api
ihk_mc_perfctr_fixed_init is only used on x86

Change-Id: I6f25d4237d45b4455ccdaae03b850dd9e8edcc57
Fujitsu: POSTK_DEBUG_TEMP_FIX_31
2019-02-01 13:17:52 +09:00
ad8a3ae962 vsnprintf: reject POSTK_DEBUG_TEMP_FIX_28 return value fix
Change-Id: I23beeca094e1b0ee84211f3ed4c33ef7e2aa62c2
2019-02-01 13:16:45 +09:00
3c1fd54a92 kernel/mem: remove unused page_table struct
Change-Id: I3593bc08206d07d7c07421240f08ac3539ddc81d
Fujitsu: POSTK_DEBUG_ARCH_DEP_89
2019-02-01 13:16:42 +09:00
ca34154a43 mcexec: lookup page_size with sysconf
page size is not defiend in sys/user.h on aarch64

Change-Id: Idbdaef2519792eeb1e1a2794be0a34d67e87907e
Fujitsu: POSTK_DEBUG_ARCH_DEP_35
2019-02-01 13:16:40 +09:00
a10f4b861c do_pageout: fix direct kernel-user access
Change-Id: Ie02faca93fdb0d52d72e1f2aa1384a214c84ebff
Fujitsu: POSTK_DEBUG_ARCH_DEP_46
2019-02-01 13:16:32 +09:00
36d473c5b5 pager linux_open/unlink: always use openat/unlinkat
some archs do not have the simple open/unlink variants, while the *at
is always available -- this is simpler than making these arch-dependent
functions

Change-Id: Ic16ae5683e6e375210b1744538d291585e67a2fa
Fujitsu: POSTK_DEBUG_ARCH_DEP_78
2019-02-01 13:16:30 +09:00
342a2e1287 x86 syscalls: add a bunch of XXat() delegated syscalls
at least funlinkat is needed because these macros define __NR_x for mckernel
side and we will use funlinkat in a later commit

Change-Id: I6b6a2eee11e2fa1e42f97eab4b67e1128cd83ddf
2019-02-01 13:16:29 +09:00
238f563e88 perf: add arch-dependent counter_mask_check function
A later version would probably want to check some mask for arm64...

Change-Id: I67e13a852c3ed406fbf8ae1688539b9e069c0e81
Fujitsu: POSTK_DEBUG_ARCH_DEP_87
2019-02-01 13:16:28 +09:00
03cadbcba2 perf: add arch-dependent get_num_counters function
Change-Id: I2230af87e0c764d97115e833dccb1842946c1b94
Fujitsu: POSTK_DEBUG_ARCH_DEP_109
2019-02-01 13:16:28 +09:00
2b254f02f8 init_process_stack: change premapped stack size based on arch
Avoid consuming a large 512MB page on 64K base page arch

Change-Id: Ice491d43fd998b375ddc24f4eff7faf5d36d9f42
Fujitsu: POSTK_DEBUG_ARCH_DEP_104
2019-02-01 13:16:27 +09:00
960a6f5f90 prepare process: add magic header in program_load_desc
Check we mapped the correct region with a magic header in the struct

Original commit: d246b93a3bced92d0ac2a4a337118091b010658a

Fujitsu: POSTK_DEBUG_TEMP_FIX_76
Change-Id: If848be64af5d76844ba65b48493021637c8114f4
2019-02-01 13:16:25 +09:00
0cc3120a01 freeze(): add cpu_pause() to the frozen state loop
I guess cpu_halt is not enough on arm?... I don't get it.

Change-Id: Ic67113ae474e5b3af91734d763f1498a19f6a948
Fujitsu: POSTK_DEBUG_ARCH_DEP_82
2019-02-01 13:16:23 +09:00
9f31abf402 monitor_init: fix undetected hang on highest numbered core
Original commit: 7d38ead4f ("Fix for bug#99 Change setting value for
monitor->num_processors.")

Change-Id: I437c957fa319c014316a6064cc660e337668bb88
2019-01-29 09:32:25 +09:00
dfd23c3ebe prctl: Add support for PR_SET_THP_DISABLE and PR_GET_THP_DISABLE
Change-Id: I04c5568a9eb78bcac632b734f34bba49cf602c4d
Refs: #1181
2019-01-22 05:40:56 +00:00
eb184419ea shmget: Use transparent huge pages when page size isn't specified
Refs: #1241
Change-Id: Ia111bfeb67d224ad1ab77e5193eac7b7d14a6577
2019-01-22 05:40:56 +00:00
13e29c0da5 mcoverlayfs: fix disabled build
Change-Id: Ia40853432547084329fc034e3942e51954e1ddf5
2019-01-22 02:15:43 +00:00
8aaf0f8551 test: Add test programs for #1166
refs: #1166
Change-Id: I9b6dd8628e8a3dcb2281e31f4b8d116e9c7852d8
2019-01-08 15:15:34 +09:00
ef9fda23a9 mcexec: Set default heap extension amount to sysconf(_SC_PAGESIZE)
Change-Id: I3ac660d33918c1fa28093ab59f3a7ead65d337d7
2018-12-12 00:38:10 +00:00
cd5cb469eb Fix "Test "Error handling improvement" on arm64"
Change-Id: Ie3c835dfe65a9754628ca221f3f563b67b0eb1a0
Refs: #727
Refs: #873
Refs: #1011
Refs: #1232
Refs: #1233
2018-12-10 19:58:15 +09:00
7a8f5043c5 mcstat: Fix test description
Change-Id: I942b351146cabd259eb164b73375a547d0fd0c30
2018-12-10 09:27:28 +00:00
cf6514def9 test: Add descriptions to "user_space" test
Change-Id: Ic14ddbfbf6bfc12d40d3284ec08e040597356963
2018-12-10 12:59:20 +09:00
96b6d773a9 ARMv8.2-LPA support
Change-Id: I12a6eac55af2e7f6a643e4e04ed59a85769f4063
2018-12-07 17:41:50 +09:00
4ba4bbd711 ContiguousPTE[12/12] modify sys_shmget/sys_mmap
Change-Id: Icfbe9fbfa6216735ec20c55da95e5b62a25fdfea
2018-12-07 08:27:51 +00:00
410bf13367 ContiguousPTE[11/12] modify ihk_mc_pt_virt_to_pagemap
Change-Id: Iff0c77cdd08a76b55c2635c6b0163ef2caade71d
2018-12-07 08:24:22 +00:00
7c231928ab ContiguousPTE[10/12] modify split_largepage
Change-Id: I0a8385af9709b11d7917eb34e8612413fefe6931
2018-12-07 08:22:56 +00:00
50de3820ad ContiguousPTE[9/12] modify ihk_mc_pt_clear|free_range
Change-Id: I75d821b81d351f4fdfd504c791543db174634261
2018-12-07 08:21:44 +00:00
c4e5bf6d6b ContiguousPTE[8/12] modify page_fault_process_memory_range
Change-Id: I79ecd08cf83aeacd3e20a7720bad66ef19573402
2018-12-07 08:17:08 +00:00
c319fe08a4 ContiguousPTE[7/12] modify ihk_mc_pt_set_range
Change-Id: Ib38530ce64a01f21107e0a6a73de7c54f214eb5a
2018-12-07 08:12:44 +00:00
24d3da32ed ContiguousPTE[6/12] modify arch_get_smaller_page_size
Change-Id: I4fe8c36cf9561b3ee895f29b112f0ac6f2418f5e
2018-12-07 08:00:32 +00:00
c4fbbb6027 ContiguousPTE[5/12] modify lookup_pte
Change-Id: Ie5aa625e5a13596ff8294699d10114aeba9d991d
2018-12-07 07:59:12 +00:00
0449437c15 ContiguousPTE[4/12] modify invalidate_process_memory_range
Change-Id: Ib59f4c5d78580a1c4344ac632d3d8f68355d7058
2018-12-07 07:56:28 +00:00
639d0e496b ContiguousPTE[3/12] modify move_pte_range
Change-Id: I20878c97bea768d1f09ab0580d744a58c070be2c
2018-12-07 07:54:28 +00:00
b6de164e9a ContiguousPTE[2/12] modify copy_user_pte
Change-Id: Ie696245a8c09e87c48426bc3e74a6f049a085471
2018-12-07 07:52:17 +00:00
d1b36aab62 ContiguousPTE[1/12] add page table access functions
Change-Id: I3291c170e66592c871f316d78d71248d26748501
2018-12-07 07:51:01 +00:00
8a2f4be443 Test "user_space" on arm64
Test: Architecture dependent separation of user space access code.
Add arm64 result files.

Change-Id: I651992c0c8bcd1da8313a35eda03612405b55b89
2018-12-07 07:46:09 +00:00
8a684587fa Fix "Test "Error handling improvement" on arm64"
* Fix test to make mcexec fail to fork()

Change-Id: I9a696787b5d4ce44541a4651622e5be60f9ef355
2018-12-07 07:40:14 +00:00
05c315857c Test "Add mcstat tool" on arm64
Change-Id: I4bf1260e999c16fe7b9c339af3833ea007277889
2018-12-07 06:24:18 +00:00
1422838dd1 sysfs-meminfo: Add page size consideration other than 4KiB.
Change-Id: I88e3aa6b9537dfff21c72b4a247fda24873216cb
2018-12-06 18:45:56 +09:00
c9fc110fc6 do_kill(): fix pids table when nr of threads is larger than num_processors
Change-Id: I0f0120c67a9b0df1cdf7d3fed34dd9c656fd317a
Refs: #1235
2018-12-05 08:17:05 +00:00
ed3c138e1f test: Fix user_space, process_vm_writev01 expected value file.
Fix to check only TPASS. (Delete pagesize)
  test: pvw_003, pvw_012

Change-Id: I4f9c3c42b855d419f3db457fbb5e7865da85eee8
2018-12-05 15:51:52 +09:00
60c97d0e60 Test "mbind support" on arm64
Add arm64 result files.

Change-Id: I32e8d4e1346076683e7d55e8e928d168e439eaca
2018-12-05 11:27:03 +09:00
95e90c727e Test "Error handling improvement" on arm64
The following test set:
  execve: fix memory leak
  add: NULL check for master_channel at IKC interrupt_handler.
  Fix the check routine for elf sections (Fujitsu: POSTK_TEMP_FIX_77)

Change-Id: I16c2a341c48f6df10a4839be08b93ea16bda8fbe
Refs: #727
Refs: #873
Refs: #1011
2018-12-05 02:01:29 +00:00
ec844bb6e3 Test "fix: Bug for getrusage" on arm64
The following test set:
  fix: Bug for getrusage return incorrect ru_maxrss
  fix: Bug for getrusage(RUSAGE_CHILDREN) return parent info (POSTK_DEBUG_TEMP_FIX_72)
  fix: Bug for getrusage often return incorrect ru_stime

Change-Id: I6734b1e34565d5d2715f9901a04ba5b6f0278032
Refs: #1032
Refs: #1033
Refs: #1034
2018-12-05 01:58:44 +00:00
a11d4d7a9d Test "mcexec_destroy_per_process_data: System calls delegation can not be terminated in error when the last process that closed /dev/mcos0 is a child process." on arm64
Change-Id: I6bc3023c1fa6089bc2ca6365b59bbab384b3e1d7
Refs: #882
2018-12-05 01:43:31 +00:00
0ee446923a Test "make sure to context-switch to idle thread when therad's status is PS_EXITED" on arm64
Change-Id: I757d529e49655e9010022f10414e4d6c9eb4c059
Refs: #1029
2018-12-05 01:21:48 +00:00
01b2a1d213 Tests: dust off x86_64 mem_dest_prev
Change-Id: I445ea0e8ae2cd631c775718a86a64fd2ecb90f35
Refs: #1228
2018-12-04 10:05:39 +00:00
52cd57fed2 memory/x86_64: fix linux safe_kernel_map
init_linux_kernel_mapping is called in setup_x86_phase1 way
before arguments are setup, but we can access kernel boot args
directly and use that, so ugly fix for now.

Change-Id: I285ecc31c6646d6d18566d411b09ae3190e8101e
Refs: #1228
2018-12-04 10:05:03 +00:00
bbc39480d2 Fix test programs for "execve: fix memory leak"
* Fix README

Change-Id: I90fe1fbb26569bbab5a34638b5f357d7000eda5d
Refs: #727
2018-12-04 10:02:42 +00:00
8521b98730 execve: Call preempt_enable() before error-exit
Fix "execve: fix execve with oversubscribing".

Change-Id: I4de3f5d44b1703db392f3da75196faa1e12d5845
Refs: #727
Refs: #1072
Refs: #1232
2018-12-04 09:43:19 +00:00
da02f76a25 mcexec: Fix error handling of init_worker_threads
Refs: #1233
Change-Id: Icce49c996d69b3cf64a71e7bd470421f329c881f
2018-12-04 09:40:24 +00:00
dbe5e99cf9 Fix test of "make sure to context-switch to idle thread when therad's status is PS_EXITED"
Change-Id: I62ea813656805b6250b0465853e8fa2918b0c86b
Refs: #1029
Refs: #1227
2018-12-04 08:17:54 +00:00
6b293409e5 mbind: Fix test programs
Refs: #1226
Change-Id: I12bf807812d93b7eca8f452e70e70e7c4e32f6a3
2018-12-04 08:17:13 +00:00
b94247c478 Test "signal: When the process receives a termination signal, it first terminates mcexec." on arm64
Change-Id: I1be32b991a45f0892146d93a9e6d6be9199faf59
Refs: #870
2018-12-04 05:07:32 +00:00
556a64ac5e Test "signal: When the process receives a termination signal, it first terminates mcexec." on arm64
Change-Id: I5c8ab90ffd5c5da30162d606f4d86dca9d387b5a
Refs: #863
2018-12-04 05:06:07 +00:00
3f11c1aee5 Test "Wait for LWK to run at shutdown." on arm64
Change-Id: I96785dda7a1a7eb36ceeb31401d71b4e40efb185
Refs: #898
Refs: #928
2018-12-03 20:06:37 +09:00
de70eac619 mcstat: Fix error propagation
Change-Id: Ib4a053d5b9ba5eb0d32c46be7c7fcd0be10cb97b
2018-11-30 14:29:14 +09:00
2ba3ec8a4c mcstat: Fix memory related stats
Refs: #1237
Change-Id: I0574cd71fe3b07aeda3ef981bd82d04ce5862f4f
2018-11-30 05:18:48 +00:00
394a1ef3c5 mcstat: Fix array of status strings
More error checks are added at the same time.

Refs: #1223
Change-Id: I406066a6ba0853584d6e1820dde74721ce2682dd
2018-11-30 14:05:21 +09:00
1954aec0ea perf_event_open: Propagate return value
Refs: #1236
Change-Id: I61a4683a533fb199a73a99bc7b2e6f2638212000
2018-11-30 04:10:54 +00:00
2b1b82b242 qlmpi: Refactor test programs
Change-Id: I3dd74eda1b77aea529f9cc044177b6c29185b6df
2018-11-29 10:33:11 +00:00
502463ed9e test: Fix user_space, testing use of copy_from_user / copy_to_user
Change-Id: I2caef1ba6597f693dc4f773ef8fedbd837c45ce6
2018-11-29 19:32:04 +09:00
715f67f32f mcreboot.sh: Fix error handling of BUILDID mismatch
Change-Id: I29d78c4739679e0b3229cc6fa28816f1ceee332c
2018-11-29 19:19:09 +09:00
82a57d5f55 test: Add MCK_DIR to mck_test_config.sample.in
Change-Id: I9ed1b0433fc6b8eeb1cb024be2d33263e3283ab7
2018-11-29 12:50:29 +09:00
56abe988f3 test: Fix user_space, testing use of copy_from_user / copy_to_user
Change-Id: I2caef1ba6597f693dc4f773ef8fedbd837c45ce6
2018-11-29 11:32:42 +09:00
68c581f721 test: Fix 898 and 928
1. Catch up with the interface change in
   ihk_os_destroy_pseudofs() and ihk_os_create_pseudofs()
2. Expect ihk_os_shutdown() to return zero when the OS had been shut
   down

Refs: #898
Refs: #928
Change-Id: Ic430550ebfd5cd21164eefaed155fe769adf8395
2018-11-28 02:19:37 +00:00
6ca5aaa1fc configure: Fix BUILDID (again)
The previous commit made BUILDID use git for submodule, but for complex
git setup (e.g. worktree) and older git version or dead .git 'link' it
would blindly rely on the existence of the .git file even if git does not
actually find anything.
This would lead to possibly empty BUILDID which would fail building.

Just always run the git command, and echo the version string if it failed

Change-Id: Ied268d2150a30dc1146498e15fa8394afc8a8d0d
2018-11-27 17:15:27 +09:00
b2a58ce3e3 Test "Confirm build ID of mcexec, ihk, mckernel" on arm64
Change-Id: Ia5fa6d6d062e8d845c7fedca1b6cc50fbeab1860
2018-11-27 08:12:28 +00:00
cfcf0137eb Test "Exclude areas not assigned to Mckernel from direct map of all phys." on arm64
Change-Id: Ida0d1f13f4a14c2ee219325aaa4b2cac1476c991
2018-11-27 05:29:15 +00:00
00395d68d4 Test "mcexec additional options (h, m, n, O, stack-premap)" on arm64
Change-Id: I85d5deb0433cc1208e4b6837dcc6d6dc2a7b7b52
2018-11-27 05:12:43 +00:00
dc1f96fee3 Add set_cputime() kernel to kernel case and mode enum.
Change-Id: Id4584389f39f255335d3bf7b5606f054f108ad51
Fujitsu: POSTK_DEBUG_TEMP_FIX_84
2018-11-27 05:03:39 +00:00
c585a37440 move mcoverlayfs kernel version check from mcexec.c to configure
While we are here:
 - fix uname -r (single quote?!)
 - add compat for rhel8 (el kernel and version is 4.18)
 - also remove linux version check in mcreboot.sh, trust configure check

Change-Id: I14726d4374b0dfd941640096044ea1d5d88bfcb8
2018-11-26 12:09:00 +00:00
98aa633856 add attribute converted flag
Change-Id: I215e42fa87752d16b8c9744b02d063098cba0af7
2018-11-22 06:04:34 +00:00
ddde519263 Test "rus_vm_fault: If page fault occurs in a thread that has not processed system call offloading, incorrectly return to normal." on arm64
Change-Id: I3dc98d8994228ad27cfdf9ca96a0a76e544bc947
Refs: #923
2018-11-22 05:27:56 +00:00
f240671fc8 Test "ptrace: support for attaching child_process to parent" on arm64
Change-Id: I752542b6bfbf023d22e91f909518660afbff813c
Refs: #885
2018-11-22 04:54:29 +00:00
cf113d392a Test "/proc/PID/maps support add" on arm64.
Change-Id: I0585ae6257b5c0269760dd7f23ba75b83dd7ac2c
2018-11-22 04:53:04 +00:00
9e57db5427 Test "sigaction: support for SA_RESETHAND on x86_64" on arm64
Change-Id: I6154134d53d1ee0344e4bc344f302ffaf810c618
Refs: #1031
2018-11-22 04:51:36 +00:00
739472bd86 Test "xpmem: support for fork()" on arm64
Change-Id: I12c628312157f35e239d3c5e67fa38adf156406b
Refs #925
2018-11-22 04:50:58 +00:00
136b749349 configure.ac: Fix BUILDID
Change-Id: Id9717422c3d5d2de51570d4672864dbd271ad0fc
2018-11-21 17:02:45 +09:00
ae9a1f39df ihk_ikc_recv: Record channel to packet for release
ihk_ikc_release_packet takes the channel and puts the packet into its
free-list.  This fix makes it easy and safe to identify the proper
channel.

Change-Id: I5584b1e8a3ed675c2f9d68f0b5ed331b909197f6
Fujitsu: POSTK_DEBUG_TEMP_FIX_89
2018-11-21 17:01:58 +09:00
10dc87dd3f mcreboot: check on SELinux
Change-Id: I2c3706c04c7977ec22407358232d7c3a21abdc14
2018-11-21 07:52:10 +00:00
724e0eb7d0 mbind(): Fix memory_range_lock deadlock.
Fixed the problem of "return error/goto out" while
locking the memory_range_lock in mbind().

Change-Id: I980a7a440f652b60379acae3cb3575211a749774
Fujitsu: POSTK_DEBUG_TEMP_FIX_100
2018-11-21 16:49:48 +09:00
04e0456232 set_mempolicy(): Add mode check.
Fix a problem that does not result in an error even
if MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES are
simultaneously specified in set_mempolicy() mode.

Change-Id: I06e695baf869daee8bc64179748cac27b64e914b
Fujitsu: POSTK_DEBUG_TEMP_FIX_99
2018-11-21 16:49:40 +09:00
6626204c99 set_cputime(): interrupt enable/disable fix.
Check interrupt enabled state in set_cputime() instead of enabling
them unconditionally on exit.

Change-Id: I99212855f33f5535f67f045665bf5e025c55b690
Fujitsu: POSTK_DEBUG_TEMP_FIX_98
2018-11-21 16:49:30 +09:00
190039f5d9 arch_cpu_read_write_register: error return fix.
Fixed an issue where errors generated in arch_cpu_read_write_register()
are not transmitted to the caller.

Change-Id: I05d7d872eab834918220cf18f628aee37208a156
Fujitsu: POSTK_DEBUG_TEMP_FIX_94
2018-11-21 16:49:21 +09:00
583cb94667 mcctrl: remove in-kernel calls to syscalls
Since 4.17.0, kernel cannot call syscalls directly because the calling
convention can be different on x86_64, as explained in this email:
https://lore.kernel.org/lkml/20180325162527.GA17492@light.dominikbrodowski.net

Use the ksys_* alternatives instead when possible, or for readlink use
do_readlinkat (and use readlinkat all the time to simplify ifdefs)

It might be possible to change some of these without ifdefs, but for
example ksys_unshare only got introduced in 4.17 so we need to keep some
syscall calling...

Change-Id: Ic47e184b29ef8b21731b2eae6193b0af2548b872
2018-11-21 16:42:26 +09:00
db4d19e419 Add crash utility extension
Change-Id: Ia3dadecdd4605c3ee74d1b5242f67486c675faa7
2018-11-21 07:40:00 +00:00
04c11f35e9 xpmem: Add xpmem_openat
In arm64, glibc-open of /dev/xpmem is hooked in sys_openat. This
commit adds xpmem_openat which is called by sys_openat.
This commit silently applies copy_from_user fix to sys_open as well.

Change-Id: I3b4f7bf0e152c359250bb2b56910db9192390cb1
Fujitsu: POSTK_DEBUG_ARCH_DEP_46, POSTK_DEBUG_ARCH_DEP_62
2018-11-21 07:39:56 +00:00
e12d5ed341 Expose McKernel version in /proc/mckernel
Change-Id: Ica0fbb0ff70a4ff2559e92738926279a3ae78a21
2018-11-21 07:39:54 +00:00
1253f4d18c mcexec shebang: delete spaces *before* path as well
Apparently, a shebang '#! /bin/sh' should work.
Will add some ostests for these...

Change-Id: Iab8ba8e3cc7e434c98742f71fe7db3c425f08278
2018-11-21 07:39:51 +00:00
527adedaa3 madvise: Add MADV_HUGEPAGE support
Since McKernel allocates hugepages by default, we could consider that
madvise call with MADV_HUGEPAGE is supported.

Change-Id: Ibdaa6f77416d029a1d17210773ef79539ba04b1c
2018-11-21 07:39:26 +00:00
525b90d028 flatten_string/process env: realign env and clear trailing bits
envs are stuck after args which are now possibly unaligned, and used
from a non-aligned pointer in prepare_process_ranges_args_envs (env)

The memory immediately after args/envs is copied anyway with memcpy_long,
so make sure the bits are initialized and realign env correctly

Fixes: 70e52faf36 ("flatten_strings: do not return unused trailing bits")
Change-Id: Ic747e947d151c0eea65dec36bc9c888cf6e0c394
2018-11-21 07:39:16 +00:00
38e68f358a Add kernel argument to turn on/off time sharing
Add "-T 0" to mcreboot.sh if you want to turn off time sharing.  When
it's turned off, McKernel doesn't activate interval timer when the
length of per-CPU run-queue is larger than one.

Change-Id: I2cedc1b30a9cd9a0f4608a32ecec0a0d58c6225e
2018-11-21 07:37:01 +00:00
7a3f4d7501 mcctrl rhel8 compat: remove unneeded RHEL_RELEASE_CODE check
it was meant for 3.10 kernels, so the regular < 4.0.0 check
will work for el7 and older kernels as well

Change-Id: I807f030f6303c9c3d17b0d80de55c256a3479486
2018-11-21 07:36:50 +00:00
1a5b10277f mcexec: load_elf: disable execvp for within-mckernel execs
the libc takes care of trying execve as many times as needed for
execvp, it's not a kernel call.

Also, sneak a double-free fix (desc was not reset properly in case
load_elf_desc_shebang failed)

Fixes: b1681f4a3affff ("mcexec/execve: fix shebangs handling")
Change-Id: If8e3d7ae53acdeffc0331ae8621e0832fcfa406f
2018-11-21 16:17:58 +09:00
a59c55c188 mcexec load_elf_desc: print error after returning
Running "mcexec dfsafds" did not print any message in normal use.
Rather than looking for which message shows in debug and turn in into
eprintf, add a single coherent message (more shell-like) at the end and
turn other messages off.

There is a small loss of information but this is equivalent to what
shells give (a single errno value with no details), and it is now easy
to add --debug to mcexec to see more information if required

Change-Id: Id2c3a47880b7d1d7467883351e6e7af561f91bbf
2018-11-21 16:17:58 +09:00
1d6a078afa mcexec: add --debug-mcexec
We already have debug statements compiled in, add a toggle for it
Also fix case indent for 's'

Change-Id: I1104ee57d571b82ec5e061f22cd44033a5c7fc39
2018-11-21 07:16:54 +00:00
fb98664f49 clone_thread: Add arch_clone_thread()
Fujitsu: POSTK_DEBUG_ARCH_DEP_23
Refs: #969
Change-Id: Ic15765b8c9e956c95fc50b333b01464d87450d3c
2018-11-21 07:10:01 +00:00
9db8d115d9 overlayfs: rhel8 compat for the 4.18 version
rhel8 is a 4.18 kernel but they've already backported some later fixes.
Instead of relying on the kernel version, the changes removed some defines so
we can check for the define presence to make the code more robust to kernel
version wilderness instead

Change-Id: I6cf5548a7b73a7394405daf850f715a1e20ab0b4
2018-11-21 16:06:31 +09:00
e26e693e58 mcoverlayfs: update and compile new overlayfs for 4.18 kernels
This newer version is much simpler than the old ones:
 - the options are noop, this lets the code simplify all the allocating
of a new option struct and passing it around
 - ovl_reset_ovl_entry was added and called all the time, but the
mechanism that made this required is gone in this kernel version

On the other hand, one new thing in this version:
 - newer kernel check the stacking depth of filesystems now, and we are
reaching the default limit of two with our setup. Bump it to three here.

Also, while we are here, make make fail if requested directory does not
exist, instead of infinitely recurse into make modules in the mcoverlayfs
directory...

Change-Id: I45050d693a0aa6fd3027deaf417c29876ef6a1ea
2018-11-21 16:06:31 +09:00
fc2775c932 mcoverlayfs: add new base from 4.18.14
This just lays out new files so the next commit is easier to review;
nothing changes here

Change-Id: I66669877d2d10632f5436c0eeb32248cd4c8b996
2018-11-21 16:06:31 +09:00
6581f9b4b2 mcctrl syscall: compat for newer zap_vma_ptes
newer version of this function no longer return an error on the basis
that "no-one checks what it returns anyway"........

See linux 4.18's 27d036e33237e ("mm: Remove return value of zap_vma_ptes()")

Change-Id: I8fb9f060e3e145cc2db21738585c9ee7f1445f74
2018-11-21 16:06:31 +09:00
3a90521489 mcexec: fix strncat bounding
strncat must not look at the appendee's length, but at how much
is left where we're appending.
This API is stupid anyway, where is strlcat when we need it...

Change-Id: Icdf418083146420a06f8ba5ffdf882982610d39b
2018-11-21 16:06:31 +09:00
03802052ed mcctrl: add handling for one more level of page tables
newer linux got a 5 level page table now, try to handle that.

Some of the macros will be no-op (e.g. loop only on one iteration) on
architecture/kernels with only 4 levels but the code needs to be there
to compile

Change-Id: Ifc6304cbb066dce7d4e30962687ae05d7e034730
2018-11-21 07:03:24 +00:00
c21485d427 mcctrl: include linux/cred.h
The headers defines __task_cred and other macroes we use, and always
existed; we must have gotten it indirectly on older kernels, it doesn't
hurt to always include

Change-Id: Iacfff0365e7a21e6247eea42606bbbf1dfccc077
2018-11-21 06:38:08 +00:00
18d50e48dc mcctrl: lookup for alternate syscall names
on newer x64 kernels (config option?), syscalls can be renamed to allow
both x64 and ia32 versions to coexist. Lookup either names

Change-Id: I2f55cc804d3eee948ee1ed6d18c69c75bd2f652c
2018-11-21 06:38:08 +00:00
a2be475ae4 mcctrl control: replace cpu_isset by cpumask_test_cpu for new kernels
Change-Id: I60635118e5ce7281de97e024c626ac40d1a4aa36
Fujitsu: POSTK_DEBUG_ARCH_DEP_54
2018-11-21 06:38:08 +00:00
38f683d1d0 mcctrl control: task start_time changed to u64 nsec
Change-Id: I1128c20cf836d20b6e84d7ec58cf8dfb075297da
Fujitsu: POSTK_DEBUG_ARCH_DEP_74
2018-11-21 06:38:08 +00:00
59828db5c9 mcctrl archdeps: rename vdso_image_64 to _vdso_image_64
The symbol appears in some header in some linux version,
it's still not exported so we need our own lookup anyway; just rename it.

Change-Id: Ia4bce85988641c96fa3f5a0ae1d42c25c713b6c2
2018-11-21 06:38:08 +00:00
1a3c73468f shmobj: Fix rusage counting for large page
Fujitsu: POSTK_DEBUG_TEMP_FIX_88
Change-Id: I852fe804bddf6da5b93a2ac72b0461ee63c98d46
2018-11-21 04:51:57 +00:00
85c936a6cb mcexec: fix terminating zero after readlink()
Change-Id: Icb5432f157ceb2182d93e2d327cfa63ad02a8c0e
2018-11-08 17:01:22 +09:00
6f9fef2b13 procfs: Make /proc/<PID>/mem unwritable
refs: #1177
Change-Id: Ibb319221155547febf9126e05a9e322bd9f140cc
2018-10-26 08:58:31 +00:00
cc1d39e55d mcctrl_perf_enable: Fix type of integer constant
Change-Id: Ib98eca85a9962520dafdd08b8fc223a6a83bafd0
2018-10-24 14:56:26 +09:00
fd8bed670e ihk_os_setperfevent: Return number of registered events
In addition to that, mcctrl_perf_set is modified so that it updates
usrdata->perf_event_num with number of registered events.

Change-Id: I3f343176f55b06d3baab0b0fe34e240f39706cf6
Fujitsu: POSTK_DEBUG_TEMP_FIX_80
2018-10-24 06:16:41 +00:00
24a3b236a0 Update .gitmodules to point IHK at github
Change-Id: I712f4cf2fb012d2b268f0881a156268024df57b9
2018-10-24 11:20:13 +09:00
27e55b8cf1 mcreboot.sh: Fix error reporting for missing argment
Change-Id: I3af99d7a117d4401c2e0a143fa74513094a53302
2018-10-18 12:06:58 +09:00
70e52faf36 flatten_strings: do not return unused trailing bits
Trailing bits were displayed in proc->saved_cmdline, displaying
uninitialized data to the user in /proc/<pid>/cmdline

Change-Id: I74831c8c68dd2f2197b35e9b49aaaae29c4c1dd5
2018-10-15 08:35:50 +00:00
8db36c3828 mcexec: do not resolve links in lookup_exec_path
This would incorrectly make "mcexec sh -c './script.sh'" run with
/bin/bash instead of /bin/sh (which is important, because bash behaviour
changes depending on how it is invoked)

Change-Id: I80610cf442c6c3ecacfa23e8ed15652bc8d4e3f7
2018-10-15 08:35:41 +00:00
06dd71a7e0 Revert "procfs: add '/proc/pid/stat' to mckernel side and fix its comm"
This reverts commit b70d470e20.

That commit had been landed too fast after a mistake during migration
from old to new gerrit that didn't keep -1 vote ; it needs some fix

Change-Id: Ifc8a23e42449dfe471049270b4706e9b137e096e
2018-10-12 10:54:14 +09:00
01fe83dcb3 do_mmap: change addr to uintptr_t
Change-Id: I7df45e125387083aef7e62b046c20b7422f60f22
2018-10-11 09:24:23 +00:00
c86d168165 procfs: handle 'comm' on mckernel side
Change-Id: Ie68514ba3e5161b931b88eeee9e8a2267ee69354
2018-10-11 09:19:42 +00:00
a032dc3d1b procfs: use length from snprintf instead of recomputing
Change-Id: I75ba4cf5c2e94798d183728c11bb34032cdddf5a
2018-10-11 09:17:58 +00:00
201fa7fb55 fork: copy saved_cmdline from parent process
This fixes empty children names for forked children.

Change-Id: I9512f0981d2a241c106ee3e8500f2084ef61a660
2018-10-11 09:14:14 +00:00
dd676f7149 saved_cmdline: only allocated necessary space
Change-Id: Ibb3fe66b46485a28c15e45dca9213f42f5afaa1c
2018-10-11 09:13:15 +00:00
a751e96b1a Add mck_num_processors symbol pointing to num_processors
the 'num_processors' symbol is also used by linux, so trying to load all
symbols from linux and mckernel at the same time renders either symbol
inaccessible (the first to be seen is kept by default).

This provides an alternate name for the mckernel symbol, thus letting us
access both more easily if required.

Change-Id: I8074d4f9f9ac45717df9a8df16be710ff762e161
2018-10-11 09:12:04 +00:00
c3bfa3f6a9 move BUG_ON, panic and kprintf define to debug.h; add BUILD_BUG_ON
these functions are more logical to keep together there as they depend
on each other.

Also add a comment about the __printf attribute, if we have a quiet
period it would be useful to enable and clear the thousands of
warnings...

Change-Id: I47d3891c9cd87da28b2883c29384959f5abd1459
2018-10-11 09:03:53 +00:00
1e1fa4f70d trivial warnings fixes (unused variable/function)
Change-Id: I71cedd2c09eeb5d2c2fd2e988dfdde0877627abc
2018-10-11 09:03:53 +00:00
39f9d7fdff Handle hugetlbfs file mapping
Hugetlbfs file mappings are handled differently than regular files:
 - pager_req_create will tell us the file is in a hugetlbfs
 - allocate memory upfront, we need to fail if not enough memory
 - the memory needs to be given again if another process maps the same
   file

This implementation still has some hacks, in particular, the memory
needs to be freed when all mappings are done and the file has been
deleted/closed by all processes.
We cannot know when the file is closed/unlinked easily, so clean up
memory when all processes have exited.

To test, install libhugetlbfs and link a program with the additional
LDFLAGS += -B /usr/share/libhugetlbfs -Wl,--hugetlbfs-align

Then run with HUGETLB_ELFMAP=RW set, you can check this works with
HUGETLB_DEBUG=1 HUGETLB_VERBOSE=2

Change-Id: I327920ff06efd82e91b319b27319f41912169af1
2018-10-11 08:54:13 +00:00
3e3ccf377c compiler.h: add READ_ONCE/WRITE_ONCE macro
These macros are needed to make sure the compiler does not optimize away
atomic constructs such as "while (!READ_ONCE(foo))" loops that do not
modify foo within the loop

Also move the barrier() define where it belongs while we are here, it is
needed for READ_ONCE/WRITE_ONCE and including ihk/cpu.h here causes
include loops

Change-Id: Ia533a849ed674719ccbc0495be47d22a3c47b8f8
2018-10-11 08:54:13 +00:00
13e71ac9dc pager: minor cleanups
- remove unused MF_END (that only makes sense for enums without holes,
  this one is a set of bits masks)
- remove useless goto in pager_req_create()
- init maxprot to 0 from the start, it's not used in the error cases
  (except for debug print)

Change-Id: Ic56c0754824b99f8a7e45fa8e99b8fe3e7c7e592
2018-10-11 08:54:13 +00:00
b1681f4a3a mcexec/execve: fix shebangs handling
There were mainly two problems with shebangs:
 - Suffix arguments handling e.g. '#!/bin/sh -x'
 - Recursive handling e.g. script1 fetchs '#!/path/to/script2'
and script2 itself has a shebang
 - (did I say two?) running shebang would replace argv[optind] instead
of appending e.g. script with '#!/bin/sh' and running './script -c'
would run '/bin/sh -c' instead of '/bin/sh ./script -c'

There also are two places where this needs parsing:
 - starting a fresh program from mcexec
 - starting a new program from execve in mcexec

The first was easy to fix as we already had argv around, but the later
required a new way to transfer the 'new argv elements from the script'
to mckernel to append before its argv -- it used to be 'desc->shell_path'
but that was no longer used at some point and just one keyword is not
enough to handle this properly.

This commit does:
 - Refactors the lookup_path + load_elf_desc that was only done at most
twice in its own function that loops indefinitely and use that in both
situations described above
 - Transmits the argv addition in the transfer to mckernel after the
desc; mckernel allocates 4 pages (hardcoded) for the descs and we will
hopefully have room for the script arguments on top of that... (there is
no guard!!!)
 - Change flatten_strings to allow prepending a flattened string instead
of a single string.
Note that the flatten_string change also brought in a difference in the
format, to have the full length embedded within the string, the latest
slot that used to be zeroes now contains the position of the end of the
buffer (where the last+1 string would be if there had been one)
This required a trivial change in mckernel prepare args function that
used this property for no real reason.

Hopefully things work™, this probably warrants adding a couple of new
ostests...
 - create a couple of scripts with recursive invocation/arguments and
check their own argv.
 - execute "mcexec script args" and "mcexec sh -c 'script args'"

Change-Id: I2cf9cde5c07c9293f730de89c9731bd93dbfa789
Refs: #1115
2018-10-04 14:31:02 +09:00
1226e692d9 mcstat: Install mcstat.1
Change-Id: Id5af2f56ef9cc9c444bfc0500190f52ffc779936
2018-10-04 02:52:18 +00:00
73ea4b1ce9 ihk_os_getperfevent,setperfevent: Return -ETIME when IKC timeouts
Change the return value from -EINVAL to -ETIME.

Refs: #1167
Change-Id: I87fa57bb45d0036b7e4b25366aa7b7ce6fb2c764
2018-10-04 02:44:22 +00:00
09f663c246 mcctrl procfs: check entry was returned before using it
Change-Id: If66e95d217d1045e2e65bc5978bba020e3fa7c0d
Refs: #1116
2018-10-04 02:41:16 +00:00
9b77630c8b mcexec: readlink and use full path for reexec
This fixes comm on linux side, showing mcexec instead of 'exe'

Change-Id: I9345d7a23dccb36b3a1e17fd3e7491eaeca54e5b
2018-10-04 01:03:10 +00:00
b70d470e20 procfs: add '/proc/pid/stat' to mckernel side and fix its comm
This lets ps show the proper executable name instead of mcexec's comm
on linux side

Change-Id: I62732037451f129fc2e905357ebdc351bf7f6d2d
Refs: #1114
2018-10-04 01:01:19 +00:00
ecc850dfef procfs/do_fork: wait until procfs entries are registered
Do not return from fork() until mcctrl side has created mckernel's
procfs entries for the child PID.

This fixes programs doing fork() immediately followed by opening
/proc/<child pid>/something, and would get some error

Refs: #1189
Change-Id: Ie10ea56b65c55f59e96a1ab6ef83a1070e36048d
2018-10-04 01:00:52 +00:00
b11377f2e9 Increase IKC master channel size
Change-Id: I183878bb22b848e1230f8028947cf46485293471
2018-10-03 06:23:17 +00:00
ed1edb152b ptrace supports threads
Fujitsu: POSTK_DEBUG_TEMP_FIX_53, POSTK_DEBUG_ARCH_DEP_44
Refs: #771, #1179, #1143
Change-Id: Ie17ece6864f0eeb0c0e550f4e369abb77980a0d0
2018-10-01 03:57:16 +00:00
28c434a230 test: Fix test for 898 and 928
Change-Id: If939dda7ccdcf568abfa42ccab7ff6be2b983cc2
2018-09-28 02:55:55 +00:00
daa234d8b9 mcexec_create_per_process_data: use copy_from_user
Refs: #1205
Change-Id: Idced73a7f88aada5fc2462b490d56603f8fe2472
2018-09-27 15:42:01 +00:00
e803698618 test: Refactor test programs
Change-Id: I77fec2f5f30f6fda3bda6f85ce00f1c2e7f7a9b3
2018-09-25 12:45:20 +09:00
c862b29d65 sched_setaffinity: Check migration after decrementing in_interrupt
refs: #1180
Change-Id: I2b3fb03066812ecc802406297084977e757092fe
2018-09-25 01:52:54 +00:00
dd58d366c3 procfs: Fix pread/pwrite to procfs fail when specified size is bigger than 4MB
Fujitsu: POSTK_DEBUG_TEMP_FIX_43
Refs: #1018
Change-Id: I736ac69885695ef8eeababc3fcfe69a6258b4e16
2018-09-20 02:06:17 +00:00
ab284b0531 test: Add test programs for #1158
refs: #1158
Change-Id: I853dd84f5433a01da510813e9fb1276e5477f73f
2018-09-20 02:05:55 +00:00
42b9b31606 mcctrl: Propagate writecore()'s return value to caller
Fujitsu: POSTK_DEBUG_TEMP_FIX_62
Change-Id: I847dd520187cbf66fbad8140f79f62c6d5d9d5fc
2018-09-20 11:01:22 +09:00
29c5c68761 coredump: Change type of coretable.len to loff_t from int
Fujitsu: POSTK_DEBUG_TEMP_FIX_61
Change-Id: I6a27a8d477c3b3dcc12be772a15dfcff370bd2a8
2018-09-20 11:01:22 +09:00
38c08a6663 coredump: Add O_TRUNC to flags opening corefile
Fujitsu: POSTK_DEBUG_TEMP_FIX_59
Change-Id: I36c89fa894dfc0cdd170781e8ca4aab6149d4928
2018-09-20 11:01:20 +09:00
57258e7f59 coredump: Don't dump when MCK_RLIMIT_CORE is zero
Fujitsu: POSTK_DEBUG_ARCH_DEP_67
Change-Id: Ic85c793b052cde9d7fa4fe510c5daee303d370c4
2018-09-20 01:51:18 +00:00
8c33c92720 mcctrl: Switch Linux functions/structures according to the version
For get_user_pages_remote in binfmt_mcexec.c:
In 4.10 with 5b56d49fc31d ("mm: add locked parameter to
get_user_pages_remote()")
In 4.9 with 9beae1ea8930 ("mm: replace get_user_pages_remote()
write/force parameters with gup_flags")

For vmf in syscall.c, these two patches in 4.10:
82b0f8c39a38 ("mm: join struct fault_env and vm_fault")
1a29d85eb0f1 ("mm: use vmf->address instead of
vmf->virtual_address")

Fujitsu: POSTK_DEBUG_ARCH_DEP_41
Change-Id: I89a02d03169a2162ea186da1804bf48910446d11
2018-09-20 01:50:04 +00:00
a269d96978 coredump: Exclude special areas
Fujitsu: POSTK_DEBUG_TEMP_FIX_38
Refs: #1005
Change-Id: I8934d2aecf06a09469afe131347e42b48b6f67f6
2018-09-20 01:48:17 +00:00
2910818f06 execve: Fix calling ptrace_report_signal after preemption is disabled
Change-Id: I451d28d985ab330d855501597713e982b8febf4e
Refs: 1194
2018-09-20 01:31:31 +00:00
3df82d61ce test: Fix tests of "user_space"
user_space/swapout/swapout_copy_to_01.sh:
* Use ~/.mck_test_config
* Fix checking if McKernel version is written in swap-file

user_space/futex/futex_test.sh:
* Use ~/.mck_test_config

user_space/perf_event_open/perf_event_open_test.sh
* Use ~/.mck_test_config

Change-Id: Id93b207ed0e3e9ebf307073db81b40335bc5b140
2018-09-19 08:54:08 +00:00
159092c58e rusage: Refactor test programs
Change-Id: I846a6416acf903f7fa19db98d4d937c51c10b4af
2018-09-18 18:42:19 +09:00
60011718d2 add common test framework
Add new file with common functions for tests to use.

 - loads config file
 - checks for mcexec etc
 - checks for LTP and OSTEST if required
 - handle mcstop / mcreboot if required, and provide function for it

At the same time, make a few changes to mck_test_config:
 - move to ~/.mck_test_config
 - add boot params to the config, tests the require specific params can
   overwite it
 - make the config "set-if-variable-is-empty", so someone can overwrite
   any param by setting the environment value e.g. LTP=.... ./test.sh
   will use the value given

Change-Id: Ib04112043e3eb89615dc7afaa8842a98571fab93
2018-09-14 03:30:06 +00:00
7e342751a2 do_syscall: Delegate system calls to the mcexec with the same pid
This includes the following fix:
send_syscall, do_syscall: remove argument pid

Fujitsu: POSTK_TEMP_FIX_26
Refs: #1165
Change-Id: I702362c07a28f507a5e43dd751949aefa24bc8c0
2018-09-13 16:59:47 +09:00
c23bc8d401 syscall_time: Handle by McKernel
refs: #1036
Change-Id: Ifa81b613c7ee8d95ae7cdf3dd54643f60526fa73
2018-09-13 07:44:02 +00:00
5e760db417 syscall: the signal received during system call processing is not processed.
Refs: #1176
Fujitsu: POSTK_DEBUG_TEMP_FIX_56
Change-Id: I410160ccbcef3ef49a0e37611a608bc87c97e63b
2018-09-13 07:04:11 +00:00
e4da71010c check_signal: system call restart is done only once
Fujitsu: POSTK_TEMP_FIX_66
Refs: #1009
Change-Id: Ic0f04ac6b7f6c6bb01b55fb389bf9befd56b1dd9
2018-09-13 07:00:49 +00:00
c25fb2aa39 memobj: transform memobj lock to refcounting
We had a deadlock between:
 - free_process_memory_range (take lock) -> ihk_mc_pt_free_range ->
... -> remote_flush_tlb_array_cpumask -> "/* Wait for all cores */"
and
 - obj_list_lookup() under fileobj_list_lock that disabled irqs
and thus never ack'd the remote flush

The rework is quite big but removes the need for the big lock,
although devobj and shmobj needed a new smaller lock to be
introduced - the new locks are used much more locally and
should not cause problems.

On the bright side, refcounting being moved to memobj level means
we could remove refcounting implemented separately in all object
types and simplifies code a bit.

Change-Id: I6bc8438a98b1d8edddc91c4ac33c11b88e097ebb
2018-09-12 18:03:25 +09:00
b51886421e uti: Don't compile syscall_intercept related stuff when not specified with configure option
Change-Id: I9be8cb9b3fcae78d33a33b057c43caee23a81fc1
2018-09-05 16:29:20 +09:00
22c6c5c736 do_syscall: Call schedule() when runq_len > 1
This optimization make the offloading thread quickly yield to
another thread. Without this, it yileded only after the interval timer
set the rescheduling flag.

Change-Id: Ida3b17ed94782d5d1af0185a96b1f50d9db8d244
2018-09-04 19:53:03 +09:00
cd00fc3a78 set_timer: Start timer when runnable thread count is bigger than one
Change-Id: Ie32799fff2936ffc057f166db5681edccdbf5920
2018-09-04 19:53:03 +09:00
00a34a8ba3 uti: util_thread: Hoist uti_desc check
Change-Id: I8c4b75140df2fe149dfe20e0a8f0bf323b5f1763
2018-09-04 19:53:03 +09:00
8900c2cec5 uti: mcexec_uti_attr: Fix CPU binding decision
Change-Id: I4047858895503ae912e5575bb232dbbb2f915722
2018-09-04 19:53:03 +09:00
fca02ee248 uti: Add error checks to kmalloc of struct uti_attr 2018-09-04 19:53:03 +09:00
781a69617b uti: Replace data types represented as arrays with C structures
Defining C structures for the following objects:
(1) Remote and local context
(2) Stack of system call arguments / return values

Change-Id: Iafbb6c795bd765e3c78c54a255d8a1e4d4536288
2018-09-04 19:53:03 +09:00
04d4145b3e uti: Replace dead uti thread with new mcexec thread in proc->tids
Change-Id: Ic6e906dd1bfac1b07f1317732cbe0a5191831cd8
2018-09-04 19:53:03 +09:00
96aab7e215 uti: Cosmetic change in util_thread
Change-Id: I8aa75efa4dbfb798e40e75f76bacbd184dae23b8
2018-09-04 19:53:02 +09:00
98ee584ab6 uti: Change field name of release_user_space_desc
Change-Id: I18ada86ec3835198c1a947d8ceb36075d6ff2e94
2018-09-04 19:53:02 +09:00
6b031c5472 uti: Fix condition for pthread_join of mcexec threads
Change-Id: Iaeee91c197b84436f84ce4380768aa79e7f9419e
2018-09-04 19:53:02 +09:00
e42c414454 uti: Hook system calls by binary-patching glibc
(1) Add --enable-uti option. The binary-patch library is
    preloaded with this option.
(2) Binary-patching is done by syscall_intercept developed by Intel

This commit includes the following fixes:

(1) Fix do_exit() and terminate() handling
(2) Fix timing of killing mcexec threads when McKernel thread calls terminate()

Change-Id: Iad885e1e5540ed79f0808debd372463e3b8fecea
2018-09-04 19:53:02 +09:00
e613483bee uti: Add system call profile 2018-09-04 19:53:02 +09:00
c0271f4727 Add debug messages for per-process data 2018-09-04 19:53:02 +09:00
4969762f15 uti: Add usage of uti specific options to mcexec 2018-09-04 19:53:02 +09:00
09d3648e43 uti: Set PROT_EXEC to host VMA when PROT_READ is set
Set PROT_EXEC to host VMA because uti needs PROT_EXEC for text VMAs.

Meanings of prot bits of Host VMA has been changed as follows.
   RWX: No mapping or RW mapping
   RX: Read only mapping
2018-09-04 19:53:02 +09:00
4e905cd412 uti: do_syscall: Don't warn when proxy is gone
This is because this is a normal case since terminate() is changed so
that it first kills all mcexec threads and then kill McKernel threads.

Change-Id: I88380bf28b60645d361baded525d71105235c16f
2018-09-04 19:53:01 +09:00
8c11daf726 uti: Fix signal relay from mcexec to McKernel
Change-Id: I2ffd8049a0fb1637cfc6bab7fe24c6a85e5e53fc
2018-09-04 19:53:01 +09:00
5cb8a1f10f uti: Workaround not to share CPU with OpenMP threads
* Assign uti thread to the last idle CPU so that it's not shared with
  an OpenMP thread

Change-Id: Ia42cae056ce81fde9b6dab6286b39a52f3c9e172
2018-09-04 19:53:01 +09:00
dbba7dea18 uti: Allow only the first do_fork() call to create a uti thread 2018-09-04 19:53:01 +09:00
b6ab5911b7 uti: Identify uti thread by clone count
--uti-thread-count <count> is added to mcexec.

Change-Id: Id9ec464412a5bb71e4d9e87d05f79de22d35b067
2018-09-04 19:53:01 +09:00
b0d7f890d0 uti: Reverse-offload msync() 2018-09-04 19:53:01 +09:00
b9c0cdddab uti: Cosmetic change 2018-09-04 19:52:14 +09:00
7ee7dd5e2c uti: Allow tracer to call release_handler() for the main process
Change-Id: I934a6eefbcb87473e87c109d6b4d32c7ab486894
2018-09-04 19:52:14 +09:00
07db4a80a7 __do_in_kernel_syscall: Move ihk_ikc_release_packet from mcexec_wait_syscall
Change-Id: Ieeb5fda42dbddc9da27242f4b547c2143659f97a
2018-09-04 19:52:14 +09:00
f04e5c24ab uti: Don't call mcexec_terminate_thread() when McKernel asks mcexec to interrupt system call 2018-09-04 19:52:14 +09:00
b8bacdd2de Reference counting per-thread data
It is accompanied by the following fixes:
(1) Fix put ppd locations in mcexec_wait_syscall()
(2) Move put ptd to end of mcexec_terminate_thread_unsafe() and mcexec_ret_syscall()
(3) Add debug messages for ptd add/get/put
(4) Fix ptd-add/get/put matching in mcexec_wait_syscall()
    * Skip put when woken-up from wait_event_interruptible() by signal

Change-Id: Ib9be3f5e62a7a370197cb36c9fa7c4d79f44c314
2018-09-04 19:52:14 +09:00
a121ffc785 uti: Release packet of reply from McKernel in backward_offload() 2018-09-04 19:52:14 +09:00
88f9693390 uti: Return -ENOSYS without offloading for set_robust_list()
Change-Id: I43466e3850fd2ad68e5754d1d460438fa47f3ed4
2018-09-04 19:52:13 +09:00
124ec580a0 uti: Call do_exit when tracer isn't working and do_syscall returned -ERESTARTSYS 2018-09-04 19:52:13 +09:00
af7f61db49 uti: mcexec: Fix error check of pthread_detach
Change-Id: Idda8e060641bbd7b01c50163140a2c5f7466d193
2018-09-04 19:52:13 +09:00
ee299b5780 uti: Check size of syscall arguments for syscall_intercept
Change-Id: I747b90e1f521b08266cfc021ef4b23e2e3c7ba4c
2018-09-04 19:52:13 +09:00
c60a778c8d uti: Zero-clear struct mckernel_exec_file before initialization
Change-Id: I315008b7f5c9e66a93b80da87d1a6332d717c2aa
2018-09-04 19:52:13 +09:00
25a129ea6a uti: Disable jumping to McKernel futex code 2018-09-04 19:52:13 +09:00
8e9924c523 uti: Lock per_thread_data_hash_lock in mcctrl_put_per_proc_data() 2018-09-04 19:52:13 +09:00
c71291a429 mcctrl: Add mcexec_terminate_thread_unsafe()
Change-Id: I6ca54cdac2ab9449d40b22f7329f1a215e5aa33b
2018-09-04 19:52:13 +09:00
ba93b83d68 uti: Add __user to mcexec_terminate_thread argument
Change-Id: Ic96a91e6a892a1bd2f1d333580e28bced6a40dc0
2018-09-04 19:52:13 +09:00
c2f41ca9ad uti: Replace hand-made list of host_threads with Linux macro
Change-Id: Ib46cc9fcdd2854b7bbe21c2cc885beeb22d16dd2
2018-09-04 19:52:13 +09:00
062d7ecae3 uti: Use copy_from_user() in mcexec_terminate_thread() 2018-09-04 19:52:12 +09:00
58d038fcac uti: Fix wrong argument passed to ihk_ikc_release_packet() in mcexec_terminate_thread() 2018-09-04 19:52:12 +09:00
510310342c uti: Use fresh struct syscall_request instance when replying to syscall_backward() 2018-09-04 19:52:12 +09:00
a6198f267b uti: Offload set_robust_list to McKernel 2018-09-04 19:52:12 +09:00
5e78bd85ab uti: Fix tracer exit code for the case when create_tracer() isn't called 2018-09-04 19:52:12 +09:00
85c0c8a01f uti: Add debug messages for syscall
Change-Id: I2f96e71d5384f883f7dc568122c57d92bc1cd818
2018-09-04 19:52:12 +09:00
e29f579061 uti: Prevent user space vma from getting copied when forking 2018-09-04 19:52:12 +09:00
63703589e5 uti: Clear user space PTEs after first fork in create_tracer()
Change-Id: I60755f0cb5e84c3a5a5cd91515411a30f0995822
2018-09-04 19:52:12 +09:00
5c8c1986b5 uti: Add comment on ppd life cycle
Change-Id: Id16cf036b2d919444e8634b536fd701d996bcef2
2018-09-04 19:52:12 +09:00
e4370d235c uti: Make tracer not call mcexec_terminate_thread() when tracee is killed by signal
Change-Id: I5878c7d623ce182a7cb9578c9d5c430c1bee8e1e
2018-09-04 19:52:12 +09:00
31ac007cb5 uti: Increase CPU_HZ to 1000
Change-Id: I8619263845fd8ebabe6fc7de619a5b51ac04470a
2018-09-04 19:52:11 +09:00
56da7e2de9 uti: Allocate memory area directly to uti_desc->wp
Change-Id: Ia5a1dbf56b937d9d05cd7fa1c5eec4a5b4b7b196
2018-09-04 19:52:11 +09:00
35300e7b4f uti: Create tracer when forking
Change-Id: Ic66cf6289ac6f32a884ba1266e641ce61620a239
2018-09-04 19:52:11 +09:00
439dc0928b uti: Streamline syscall_backward() 2018-09-04 19:52:11 +09:00
4b3e58fd3d uti: Call terminate only when exit_group is called
Tracer tells McKernel side to call do_exit() in WIFSIGNALED case.

Change-Id: If85c6cbb4856036b406b11335f1384e57f26292d
2018-09-04 19:52:11 +09:00
b7cdbd6c42 uti: Enforce mcexec is destroyed and then McKernel process is destroyed 2018-09-04 19:52:11 +09:00
77f5cac2bf uti: Make tracer exit when not used
Change-Id: I3d3b2f92fa2b160ffce633c46d1b60e9079e7f1b
2018-09-04 19:52:11 +09:00
9102b176c4 uti: Make per_proc_data of tracee survive over the signal-kill of the tracee
Change-Id: I8ff1dddb526ef2fd948cfe1b8f3aa8403c2006d6
2018-09-04 19:52:11 +09:00
bb4317beaf uti: futex: Propagate -ERESTARTSYS returned by wait_event_interruptible()
Change-Id: Id36c4df0e0a8e1f64b12c635c0502f63552ba50b
2018-09-04 19:52:11 +09:00
d24b7585b7 uti: Make tracee pthread-detached
Change-Id: I672ee18739b956980901b63e55ee3ebc192b4e56
2018-09-04 19:52:11 +09:00
4438f994dc uti: Add/Modify test programs
Change-Id: I27a39d6b11af5243f93d07c31c2ef80f6727dd53
2018-09-04 19:52:11 +09:00
52afbbbc98 uti: Call into McKernel futex()
(1) Masquerade clv
(2) Fix timeout
(3) Let mcexec thread with the same tid as McKernel thread migrating
    to Linux handles the migration request
(4) Call create_tracer() before creating proxy related objects

Change-Id: I6b2689b70db49827f10aa7d5a4c581aa81319b55
2018-09-04 19:52:10 +09:00
460917c4a0 remote_page_fault,syscall_backward: Zero-clear waitq entry
Change-Id: I151a35004183e911aaba766a8749830e1768bfe6
2018-09-04 19:52:10 +09:00
7803468afe remote_page_fault,syscall_backward: Retry when interrupted by signal
Change-Id: Ic7d72ad9ca32bb3c8e3522e00fef1d98caf3c049
2018-09-04 19:52:10 +09:00
8f2c7d2265 Fix thread-safety issue in rus_vm_fault
Change-Id: I8640a8e0de8a0dfaee700b25e5f9e2941ac98fc8
2018-09-04 19:52:10 +09:00
c6c3a84a46 syscall: Add missing definition of thread to access thread->sigpending 2018-09-04 19:52:10 +09:00
5a7ca14fcc rus_vm_fault: Return VM_FAULT_SIGBUS when per-process data is not found 2018-09-04 19:52:10 +09:00
d7b882855a Correct comments in declaration of struct ikc_scd_packet 2018-09-04 19:52:10 +09:00
2337832e4c pager_req_release(): Correct debug messages 2018-09-04 19:52:10 +09:00
be635ceb19 terminate: Fix coutning of non-leader threads
Change-Id: I8399ad553bb8e09bef508ac976e8cd56cdae8013
2018-09-04 19:51:11 +09:00
0b0b7b03d7 Prevent one CPU from getting chosen by concurrent forks
One CPU could be chosen by concurrent forks because CPU selection and
runq addition are not done atomicly. So this fix makes the two steps
atomic.

Change-Id: Ib6b75ad655789385d13207e0a47fa4717dec854a
2018-09-04 19:51:11 +09:00
82914c6a2e remote_page_fault: Retry when interrupted
Change-Id: Ib71a87ad03420e1918dc97da43351cb93e7d0754
2018-09-04 19:51:11 +09:00
f127dfdf1e mcexec_create_per_process_data: Zero ppd on allocation
Change-Id: I06306f30ce30ad6ddc6e8b8cab46ee39be0e4940
2018-09-04 19:51:11 +09:00
567dcd3846 Fix deadlock involving mmap_sem and memory_range_lock
Change-Id: I187246271163e708af6542c057d0a8dfde5b211e
Fujitsu: TEMP_FIX_1
Refs: #986
2018-09-04 19:51:10 +09:00
b080e0f301 spinlock: Add trylock
Change-Id: If349d7c0065609615f5df229f70c59f92bf97adf
2018-09-04 19:51:10 +09:00
ff383d96ba spinlock: rewrite spinlock to use Linux ticket head/tail format
This is a cherry-pick of 2964302d094f035242d6257d8af5450f72f9b5a7.

Change-Id: Ie8b7e825b28415dd41cc232fbeceb4653251f9e3
2018-09-04 19:51:10 +09:00
0bcd3d5de3 unimap: update ihk to unimap
Change-Id: I5b23270f9253d26031ad90bb38721a6234bd98e1
2018-09-04 19:51:10 +09:00
9d6e0319f7 atobytes(): restore postfix before return 2018-09-04 19:51:10 +09:00
0e50eb44a9 process/vm/access_ok: fix edge checks.
Add check for start/end being larger than the range we're checking.
Fix corner case where the access_check() was done on last vm range, and
we would be looking beyond last element (null deref)
2018-09-04 19:51:10 +09:00
2db69d0f24 process/vm: implement access_ok() 2018-09-04 19:51:10 +09:00
a697f5e98d partitioned execution: pass process rank to LWK
Cherry-pick of d2d134d5e6a4b16a34d55d31b14614a2a91ecf47

Conflicts:
	kernel/include/process.h
2018-09-04 19:51:10 +09:00
4439b04d9f ihk_mc_get_linux_kernel_pgt(): add declaration
Cherry-pick of caff967a442907dd75f8cd878b9f2ea7608c77b2
2018-09-04 19:51:10 +09:00
38c3b2358a Exclude areas not assigned to Mckernel from direct map of all phys. memory
It's enabled by adding -s to mcreboot.sh.

Cherry-pick of the following commit:

commit b5c13ce51a5a4926c2cf11c817cd0d369ac4402d
Author: Katsuya Horigome <katsuya.horigome.rj@ps.hitachi-solutions.com>
Date:   Mon Nov 20 09:40:41 2017 +0900

    Include measures to prevent memory destruction on Linux side (This is rebase commit for merging to development+hfi)
2018-09-04 19:51:10 +09:00
221ce34da2 eclair: fix MAP_KERNEL_START and apply Fujitsu's proposals
(1) Cherry-pick of 644afd8b45fc253ad7b90849e99aae354bac5b17
(2) Pass length to functions with arguments of variable length
    * POSTK_DEBUG_ARCH_DEP_38
(3) Separate architecture dependent functions/structures
    * POSTK_DEBUG_ARCH_DEP_34
(4) Fix include path
    * POSTK_DEBUG_ARCH_DEP_76
(5) Include config.h
    * POSTK_DEBUG_ARCH_DEP_33
2018-09-04 19:51:09 +09:00
4246d41007 kmalloc_header: use signed integer for target CPU id
Cherry-pick of bdb2d4d8fa94f9c0268cdfdb21af1a2a5c2bcae5
2018-09-04 19:51:09 +09:00
65df9c8084 ihk_mc_get_processor_id(): return -1 for non-McKernel CPUs
Cherry-pick of c45641e97add9fde467844d9272f2626cf4317de
2018-09-04 19:51:09 +09:00
7836aa0136 Map LWK TEXT to the end of Linux modules section (0xFFFFFFFFFE800000) 2018-09-04 19:51:09 +09:00
1cf7fad15a virt_to_phys(): fix debug messages
Cherry-pick of 46eb3b73dac75b28ead62476f017ad0f29ec4b0a
2018-09-04 19:51:09 +09:00
0076e1f5e0 mem: make McKernel kernel heap virtual addresses Linux compatible
Cherry-pick of e5334c646d2dc6fb11d419918d8139a0de583fde
2018-09-04 19:51:09 +09:00
cae6b9f154 move McKernel out of Linux kernel virtual 2018-09-04 19:51:09 +09:00
5fcbfa2eb5 page_fault_process_memory_range: Remove ihk_mc_map_virtual for CoW of device map
Device map with MAP_PRIVATE is copied when forking using copy_user_pte.
So the map isn't copied by those statements.

Futjitsu: POSTK_TEMP_FIX_14
Refs: #1039
Change-Id: I1a697ed2e003055d66a8eebd3e8d5e9e49d094ad
2018-08-30 02:21:42 +00:00
9a20cfaefb mem: Check if phys-mem is within the range of McKernel memory
Fujitsu: POSTK_DEBUG_TEMP_FIX_52
Refs: #1164
Change-Id: Idb9a6eac1d2e1df4c663c3171925c774421177fd
2018-08-30 02:18:37 +00:00
f57b0c5d4f wait: Delay wake-up parent within switch context
Fujitsu: POSTK_DEBUG_TEMP_FIX_41
Refs: #1006
Change-Id: Ia98e896505ad0f6549766604ade84550eee8bd2d
2018-08-30 02:13:51 +00:00
0fdeb254b3 switch context: Move to arch-dependent (arch_switch_context())
Fujitsu: POSTK_DEBUG_ARCH_DEP_22
Change-Id: I6faf8d9daa1e639350c2cd83db9bb27b9d37ba01
2018-08-30 02:13:34 +00:00
895a8c4099 procfs: Support multiple reads of e.g. /proc/*/maps
Refs: #1021
Change-Id: If36e1a0f3f41f0215868daf578e96775d96a59a3
2018-08-30 01:48:06 +00:00
e531ee626e mcctrl pager: handle pagers more properly
the pagers are all destroyed when linux thinks there is no process left,
but there is no synchronisation with mcexec on that and some new process
might have spawned and started using these pagers in the meantime,
leading to weird crashes because an invalid pager was used.

The reason we're cleaning up pagers when no process is left is that
mcctrl does not handle pager_req_release is the linux-side process got
killed or died before the mckernel one for some reason, so:
 - move pager_req_release to a new __do_in_kernel_irq_syscall() helper
 - have free_all_process_memory_range not set MF_HOST_RELEASED on the
memobj
 - just in case, clean up everything like before on mcctrl shutdown
instead of when no process is left.

Change-Id: I53b8b9b81b1e5b807593850af17b5ea5e8471174
Refs: #1154
2018-08-24 09:18:20 +09:00
94d093f058 fileobj_create: Suppress message on getting -ESRCH
-ESRCH from mcctrl doesn't mean an error but the file is not a regular
file and mcctrl wants McKernel to treat it as a device file.

Change-Id: Ie121f0e6a8b1f0a29c2f2cf193a51f4f52337809
2018-08-23 04:01:20 +00:00
9b8424523a mcctrl: remove rus page cache
Change-Id: Ieed7a2a0077ffde3fec8a64d2051e56a53924a42
2018-08-23 02:10:44 +00:00
ebc702624b devobj: fix object size (POSTK_DEBUG_TEMP_FIX_36)
Fujitsu: POSTK_DEBUG_TEMP_FIX_36
Change-Id: I5f020708f97b7468f19496b44c98e164d856598d
2018-08-22 07:26:50 +00:00
ea125cb58c checkpatch: remove warning on LINUX_KERNEL_VERSION and split strings
Change-Id: Ia22f3106208c6ddf46a767e142b8842373e9d6b5
2018-08-22 07:14:48 +00:00
689a799bb9 mcctrl prepare_image: return reserve_user_space error
Change-Id: I00556cb58b12acca888f9512c144a3ce3f5332b1
2018-08-22 07:14:40 +00:00
802b1ac14b ihk_os_getperfevent,setperfevent: Timeout IKC sent by mcctrl
Report timeout when McKernel doesn't respond to prevent the caller
from waiting forever.

Refs: #1167
Change-Id: I8bd87e43aafffdd0952198224e44195af4368883
2018-08-22 06:43:27 +00:00
affe3e9010 do_fork: Increase tid table size when allowing oversubscription
The size of tid table needs to be more than #CPUs when CPU oversubscription
is needed.

Note that the max number of simultaneous threads are the min of the
following two:
(1) Number of mcexec worker threads
(2) NR_TID defined in kernel/syscall.c

Change-Id: I425189da415e1d3a763ad62567950d001850cf0d
2018-08-22 06:42:13 +00:00
0b2169964a futex_wait_queue_me: Spin-sleep when timeout and idle_halt is specified
schedule_timeout() with idle_halt should use spin sleep because sleep
with timeout is not implemented.

Change-Id: Ia0bebcc10ddfb872bffeece7f13fb35a4791db18
2018-08-22 06:36:43 +00:00
f18d1f5383 __sched_wakeup_thread: Notify interrupt_exit() of re-schedule
Change-Id: I438eb168f818eb5649857e22bdc7e68a145872f7
2018-08-22 06:33:23 +00:00
ea35954613 linux side: replace vfs_read by kernel_read
vfs_read has been unexported in bd8df82be66 ("fs: unexport vfs_read and vfs_write")
in kernel 4.14.
kernel_read has always™ existed and is actually more appropriate: we can
remove the set_fs calls that are done in kernel_read.

The downside is that the function prototype also changed in 4.14 with
bdd1d2d3d251 ("fs: fix kernel_read prototype")...
(same with kernel_write e13ec939e96b ("fs: fix kernel_write prototype"))

Change-Id: I6f76a6387ae02b4d33bd62952d995a90b1952fc9
2018-08-22 06:27:12 +00:00
61a942acdc arm64 vdso/gettimeofday: add new includes for cpu_set_t and pte_t
Change-Id: I4035b179a173a6b29c34c73670d68a38d4dc5dc4
2018-08-22 06:17:56 +00:00
c4b4b7222e arm64: ihk_mc_perfctr_start/stop: fix prototype that was changed in x86
The functions now take a bitmask in argument since commit d7416c6f79
("perf_event: Specify counter by bit_mask on start/stop")...
Thanksfully the change also induced a type modification so it was easy
to notice.

(On the other hand I'm building with --disable-perf so why the hell is
that file compiled?!)

Change-Id: Ie16367cc94e81068b70e1b80142a6394de896c4f
2018-08-22 06:14:15 +00:00
21af0351d1 arm64 syscall.c needs uio.h for struct iovec
Change-Id: I9d070d0e148636be1d9ecec8ec4dfb72f93c4ed6
2018-08-22 06:08:27 +00:00
1e1c91962e mcctrl: add missing sched_param include for newer linux
struct sched_param is defined differently since headers changed in
linux ae7e81c07 ("sched/headers...")

Change-Id: I22af79bf3d9df69d09903b2830d99426309cf911
2018-08-22 06:04:35 +00:00
b1aa94d417 arm64 arch-perfctr.h: remove duplicate enums
Some enums were redefined in lib/include/mc_perf_event.h in commit
1284060 ("support PERF_TYPE_{HARDWARE|HWCACHE} in perf_event_open")

Change-Id: I1a98699955ca7fd6135b2a7dde72ed4df77b1974
2018-08-22 06:04:08 +00:00
a6a9bac5b7 Protect more code by #ifdef PERF_ENABLE
Change-Id: I20a67c56c4d7817fdb87cc6a2aa47d68fe3eae8d
2018-08-22 06:03:12 +00:00
240a23a21b arch-lock: tentative implementation of irqflags_can_interrupt for arm64
Change-Id: I814e02e757039cab8c142c0b774ad470154454c1
2018-08-22 06:02:06 +00:00
d5108dba80 arm64 eclair build: add missing explicit libs
Change-Id: I5b6f8825430c2d495da50d868a3f54fc0b354d84
2018-08-22 05:56:20 +00:00
20368dd317 syscall: move sync_child_event up a bit
The function was between two perf functions when perf functions don't
use it...
It seemed simpler to move the function than to add an extra ifdef

Use that occasion to fix style warnings, no actual code changes were
made.

Change-Id: Ie8b5fa7968a3d5e54a690d079874db54f5e6c8c9
2018-08-22 05:55:26 +00:00
b93e14f695 arm64 signal.h: add valid_signal() function
This function was added for x86 by commit 140f813d77 ("fix:
differences in behavior of sigaction between Linux and Mckernel")

The x86 and arm files are actually pretty close and could use
factoring...

Change-Id: Ia8820fd2f824d898610b384a3e137c96aadbc911
2018-08-22 05:54:31 +00:00
3e3f3c5590 mcoverlayfs: vfs_readdir -> iterate_dir compat for el7.5
Also enable mcoverlay for new kernel version / actually build it

Change-Id: I80bc043c65cf99c3b41a54a5666ea7652e6c2bbd
2018-08-09 04:30:24 +00:00
e8f8660b73 mcctrl: lookup unexported symbols at runtime
Instead of parsing System.map, use kallsyms_lookup_name() to
get unexported symbols addresses at module loading time.

This lets mckernel work with kaslr enabled (it gets enabled by
default from el7.5 onwards)

Change-Id: Ie4349fc1145ebce44f37f1f40c16f9d75584074d
2018-08-08 06:00:20 +00:00
794684985f mcctrl syscall: remove unused walk page debug function
This saves looking up one symbol for a debug function that is not
used anywhere

Change-Id: I6a3a480ce8067b4f6f0faf9aa837119ea46888ad
2018-08-08 05:57:46 +00:00
625607e6db mcctrl sysfs_files: cleanup vfs_readdir -> iterate_dir compat
Cleanup the fix suggested by Fujitsu a bit

Change-Id: I95165b834e32a01f43eb3b4fcaca039e4d04fe86
2018-08-08 05:41:04 +00:00
05afa8b6dd mcctrl sysfs_files: vfs_readdir -> iterate_dir compat
vfs_readdir got removed in recent kernels

Change-Id: Iac9a9954afefa0f6dbcdc2c94786cf747e21e1fe
Fujitsu: POSTK_DEBUG_TEMP_FIX_22
2018-08-08 05:39:07 +00:00
6cf89076dc mcctrl handle_mm_fault compat: add el7.5 support
Change-Id: I8c7738b70ca914e857be119b7720cdc22e61ae0e
2018-08-08 05:36:35 +00:00
29a658716b configure: Create config file for test programs
Change-Id: I3ec90fed348ff535b24c8116416c6b89636c532c
2018-08-02 02:29:19 +00:00
a7c9988aeb schedule: Don't reschedule immediately when wake up on migrate
Refs: #1027
Change-Id: Ibe563c45c42611170273f1e437566c20fbef68d3
2018-08-02 02:28:25 +00:00
d4fa953975 test: Add testcase for #1001
Refs: #1001
Change-Id: I3edd750108bd3f887af1f0afe3f2651f1243062b
2018-08-02 02:24:41 +00:00
786649d2a3 perf_event: Move changing monitoring-status into perf_stop
Change-Id: I84a13c2a825de24bfdada533c7049e8770a07061
2018-08-02 02:23:38 +00:00
d7416c6f79 perf_event: Specify counter by bit_mask on start/stop
Fujitsu: POSTK_DEBUG_TEMP_FIX_30
Refs: #1002
Change-Id: Iea51e9aef78927a5033e3a226d5efc6298da056a
2018-08-02 11:22:28 +09:00
cb1522ca92 perf_event: Handle fixed-pmc in arch-dep part
Fujitsu: POSTK_DEBUG_TEMP_FIX_31
Refs: #1003
Change-Id: I66c7d18b9137894cf5764464482e2ebd5ecb9d52
2018-08-02 02:14:04 +00:00
14660a10c3 Fix to procfs read returns EIO
Refs: #1152
Change-Id: I48b330953fd7674ba1a3ac35744f9f50a5712730
2018-08-02 01:48:51 +00:00
1387c9687b Add test cases for #765
Refs: #765
Change-Id: I50d70a15d5d5ce31227cacbed4eccd49b218713b
2018-08-02 01:42:46 +00:00
ec99adde4a Add test cases for #998 and #999
Refs: #998 #999
Change-Id: I86f8857594b2446c833c1e59d53b484ef022a9ee
2018-08-02 01:42:11 +00:00
c716e87c53 execve: Clear sigaltstack and fp_regs
Fujitsu: POSTK_DEBUG_TEMP_FIX_19
Refs: #976
Change-Id: I16895eab13eecbb47b7e6da961fae82ee5e570ee
2018-08-01 15:11:05 +09:00
d898f18293 mcexec: Do not close fd returned to mckernel side
Fixes: 9a79920ef9 ("Static analysis fixes")
Change-Id: I2b51d6e288e7bb2b0f4bff579fa237d575dcb026
Reported-by: Tomoki Shirasawa <tomoki.shirasawa.kk@hitachi-solutions.com>
2018-07-30 23:27:17 +00:00
bc0759e2dc arm64 arch-lock: add missing include for cpu_set
Probably only needed for recent system, see ihk's 3271b5e6 ("fix
compilation with recent glibc (cpu_set define change)")

The root of the problem really is that we rely on system headers for
mckernel that ought to be independent...

Change-Id: Ieb9a017e5a7697ad767087370ced7b615efc917e
2018-07-27 02:33:03 +00:00
1aa429d4f5 init_normal_area: fix warnings
- unused variable pt_phys
 - undeclared function set_pt_large_page (move definition lower)

Change-Id: I4625b70efe8e914160b17064078c42b86a461d3e
2018-07-27 02:32:23 +00:00
1543119139 mcctrl rus_vm_fault: tpe changed with kernel >= 4.11
vma is part of vmf and isn't needed, so type changed (see linux 11bac80
("mm, fs: reduce fault, [...] to take only vmf"))

Change-Id: I4c023e23c7e7416ad2df2dcc0698a0032e574e4c
2018-07-27 02:31:39 +00:00
0a0a78ac2e mcctrl: replace GFP_TEMPORARY by GFP_KERNEL
See linux's commit 0ee931c4 ("mm: treewide: remove GFP_TEMPORARY
allocation flag") for a long explanation, but basically that flag
"is just cargo cult" and should be removed

Change-Id: I2147cd65b6b9ec509a72e11cc3abf1fe1561c10b
2018-07-27 02:31:00 +00:00
6999d0a3f9 bind_mount_recursive: Use lstat instead of d_type of readdir
Change-Id: I0eb8d6c7e1fa5df6dbc5962a639901546a159d04
2018-07-26 18:38:48 +09:00
f01a883971 devobj: fix out of bounds shift
Similarily, pgoff << PAGE_SHIFT would need pgoff to be unsigned to fit,
but off_t is signed.
The reason for this shift was to truncate the offset argument to be
aligned to page boundaries, do that instead

Change-Id: I36c3de34b1834fdb0503942a6f3212e94986effd
2018-07-26 05:20:19 +00:00
3185334c1c debug messages: implement dynamic debug
Heavily inspired off linux kernel's dynamic debug:
 * add a /sys/kernel/debug/dynamic_debug/control file
 (accessible from linux side in /sys/class/mcos/mcos0/sys/kernel/debug/dynamic_debug/control)
 * read from file to list debug statements (currently limited to 4k in size)
 * write to file with '[file foo ][func bar ][line [x][-[y]]] [+-]p' to change values

Side effects:
 * reindented all linker scripts, there is a new __verbose section
 * added string function strpbrk

Change-Id: I36d7707274dcc3ecaf200075a31a2f0f76021059
2018-07-26 14:16:31 +09:00
bc887aab44 x86 futex: fix out of bounds shift
8 << 28 needs unsigned to fit, other shifts were done to truncate
the input, use a mask instead

Change-Id: I81ba41595f4629f1df554e34392116440ff3b641
2018-07-26 05:10:36 +00:00
6f7c428a34 terminate: fix oversubscribe hang when waiting for other threads on same CPU to die
Change-Id: I8c4fbdd3aab9d0567ce5457a4a6405490608925d
2018-07-26 05:02:13 +00:00
68c702d024 process_procfs_request: Add Pid to /proc/<PID>/status
The standard UNIX tool to get processes information, need to have the
process id inside /proc/<PID>/status.

Using ps without PID in /proc/<PID>/status gives :

  PID TTY          TIME CMD
 2551 pts/0    00:00:00 bash
    0 pts/0    00:00:00 exe
    0 pts/0    00:00:00 exe

With this patch:
  PID TTY          TIME CMD
 2551 pts/0    00:00:00 bash
11966 pts/0    00:00:00 exe
12619 pts/0    00:00:00 exe

Change-Id: Ic9d255cbef4d49e49bdaedcfc8e3545d9c144325
2018-07-26 05:00:21 +00:00
97273adcc5 x86_64 move_pages_smp_handler: rework initialisation
- add missing break statement
- remove duplicate memset for mpsr->status

Change-Id: I1fd1a8b2bb7bbabb32db9e7d3fc84102d9b0ff82
2018-07-26 04:59:23 +00:00
ad2cb6375a kprintf: only call eventfd() if it is safe to interrupt
Missing ARM64 implementation, cannot test right now

Change-Id: Ia05e8b7952b19bcd8fdac1f920d9bfe341be8b97
2018-07-26 04:57:30 +00:00
6df4bd8f8c Fix a few more warnings
Some are important, e.g. the seemingly harmless braces around if with dprintf,
since that dprintf is defined as empty, will screw things up and grab the next
line

Change-Id: Ie5e1cf813178ad708ff42ae5e477fbc96034471c
2018-07-26 04:52:17 +00:00
0994c3300e search_free_space: remove POSTK_DEBUG_ARCH_DEP_27 side
search_free_space changed since this was implemented and the code is
no longer compatible
Looking at it again, the function is not used anywhere other than syscall.c
and the second function does not seem to fix anything specific so this
just removes the untested side.

Change-Id: If28d35ec4da083a40dc6936fcb21f05fb64e378a
Fujitsu: POSTK_DEBUG_ARCH_DEP_27
2018-07-26 04:43:05 +00:00
a5c3e48843 search_free_space(): manage region->map_end internally
Change-Id: If9176773868c44fa1eb801c0815c35cea9f4b54b
2018-07-26 04:43:05 +00:00
df2c993721 fileobj_create: only allocate new object if one wasn't found
Change-Id: I5e12439333bf0c9cc7dad6e3cf410bfee616f77e
2018-07-26 04:41:03 +00:00
dc8d6b740c pager_req_read: handle short read
Change-Id: Iff89046041e012a65c80a29b485ddbb636435dd0
2018-07-26 04:37:54 +00:00
c2e1b8d694 mcctrl_ikc_send_wait: fix interrupt with do_frees == NULL
do_frees is allowed to be NULL only if free_addrs_count is 0, but that
is increased to account for the wakeup_desc itself before this failure

Change-Id: Iab33712c76ae452df7044558a12745a89adb47ac
2018-07-26 04:34:03 +00:00
f6d8138e05 mcexec_wait_syscall: requeue potential request on interrupted wait
Change-Id: Id7a324f18ebb8c81f05bd8362e19d9314a445308
2018-07-26 04:31:34 +00:00
9d587dcbe8 fileobj_release: do not notify linux of surplus refs
Surplus refs on the linux side will not change anything, so spare
ourselves a message.
The final message will free all refs at once when the object is
destroyed.

Change-Id: Ie086b9dda663729962037c67e8233370509234a5
2018-07-26 04:08:43 +00:00
eb675818c7 x86 mmap: fix out of bounds shift
0x3F << MAP_HUGE_SHIFT is too big to fit in signed int,
make it unsigned

Change-Id: I0e476b80ff51a8e141c90da6f985ba18a3438752
2018-07-26 03:50:44 +00:00
3ce7763715 x86 mem init: do not map identity mapping
init_normal_area was mapping identity lookups (phys = virt) from 0,
leading to many undetected null pointer dereferences in init_pt (but
not in new process page tables leading to odd behaviour)

This also makes the code use the set_pt_large_page() function, cleaning
it up a bit

Change-Id: I22889031de26a7e48501b0eb4d453ca62e671835
2018-07-26 03:50:44 +00:00
fd429ecc5b rusage_private: fix null pointer dereference
Change-Id: Id1f066699a41c249203073c5937e34012f5fe6c3
2018-07-26 03:50:44 +00:00
ed7f5abc28 schedule: fix null pointer dereferences
Change-Id: I1d4b0a2fabb5810a89cca4c6a0a837db3a9813ee
2018-07-26 03:50:44 +00:00
79e5026f01 x86 mem init: fix clearing of init_pt
memset(init_pt...) had the wrong size.

Change-Id: Idb5d0d53b3c70ee4a16a101dd265d0854cfd3b72
2018-07-26 03:50:31 +00:00
a1b50051ed mcexec: always compile debug statements
This helps catching errors like accessing a field that no longer exists
in a debug print that wasn't compiled...

Change-Id: If6c862ea2b866f819195aae93c7fd68e610fe48e
2018-07-26 03:38:00 +00:00
9a79920ef9 Static analysis fixes
Change-Id: I7bc42545a1c497f704d7bfa6ea1b7e3893acc697
2018-07-26 03:36:50 +00:00
141fa5120e git hooks: use correct directory for submodule
Change-Id: I7a39021dc02212065612b21cafcb6c653e2280f0
2018-07-26 03:29:43 +00:00
699cb4f88c arm64/arch-lock: typedef mcs_lock_t
Was done in x86_64 for fileobj in commit 249bda4aef ("fileobj: use
MCS locks for per-file page hash")

Change-Id: I61957de336b6657687803e6288afed9360a42032
2018-07-26 03:28:40 +00:00
bc3e6ded65 disable sse for everyone
GCC optimizes big switches with sse so we could clobber users floating
point registers when they would do a syscall

Reproducer:
```
 #include <stdio.h>
 #include <stdlib.h>

 union num {
 	float f;
 	unsigned long long i;
 };

 #define WORKSIZE (1024 * 1024 * 32)

 int main(int argc, char **argv) {
 	char *work = malloc(WORKSIZE);
 	char *fromaddr;
 	char sink;
 	union num r;
 	unsigned long long int offset;

 	r.f = drand48();
 	printf("r: %llx\n", (long long)r.i);
 	offset = (long long int)(r.f * (double)WORKSIZE);
 	fromaddr = work + offset;
 	printf("%e %llx %llx\n", r.f, offset, fromaddr);
 	sink = *fromaddr;

 	return 0;
 }
```

Change-Id: I7bb0883ec8ef2f245ab98064e308025422afc115
2018-07-26 03:26:25 +00:00
eae5c40f60 init_process_stack: Support "ulimit -s unlimited"
Refs: #1109
Change-Id: I395f012fd747cb6a2f93be71e34c7f6f3666ed67
2018-07-26 02:40:27 +00:00
0c7384f980 Add test cases for #840
Refs: #840
Change-Id: Ie29867d29ba6a25cfac77b95b8effc2f057aae14
2018-07-26 02:39:24 +00:00
67ebcca74d Fix to VMAP virtual address leak
Fujitsu: POSTK_DEBUG_TEMP_FIX_51
Refs: #1024
Change-Id: I1692ee4f004cb4d1f725baf47a8ed31fce1bf42a
2018-07-26 02:17:55 +00:00
3d365b0d7a add ihk as submodule
Change-Id: I512255a96d0d95795bd0d803289fffe4394eb7ec
2018-07-26 01:50:48 +00:00
94e96927a6 mremap: Do nothing when no size change and !MREMAP_FIXED
Behave in the same way as Linux which returns old_address when
old_size == new_size && !MREMAP_FIXED.

Refs: #1112
Change-Id: Ice1421a8a77f962d087de8475aa2cd40c59be5f7
2018-07-26 01:49:01 +00:00
3636c8e7e4 setrlimit: Check arguments in the same order as in Linux
(1) Check if rlim's address is valid
(2) Check if soft-limit does not exceed hard-limit

Fujitsu: POSTK_DEBUG_TEMP_FIX_3
Refs: #1050
Change-Id: I5bf1008ce172f9dff64ec89b1f97614926abaf13
2018-07-26 01:48:05 +00:00
b920da5103 execve: Use interp in shebang as is
Fujitsu: POSTK_DEBUG_TEMP_FIX_9
Refs: #995
Change-Id: I09751d13c4fecd68087d47815029c0b65e51f18a
2018-07-26 01:46:22 +00:00
f1a40a409f perf_event: Include list.h by itself
Fujitsu: POSTK_DEBUG_TEMP_FIX_32
Refs: #1004
Change-Id: I8670477cf498ac98df971f2c0288f335a989f675
2018-07-26 00:45:57 +00:00
4ce4c9f264 init_process: Inherit parent cpu_set
Fujitsu: POSTK_DEBUG_TEMP_FIX_69
Refs: #1028
Change-Id: I1628bb5bf35fa670bb0019e1f3ae295277b1566e
2018-07-26 00:44:41 +00:00
e770a22fa5 scripts: add checkpatch.pl & git hooks
Change-Id: I29e5f7a99e8dd92511c0b1d099f3e1a2f37d7a72
2018-07-12 00:55:58 +00:00
9bb8076dc0 shmget: Make shmobj underwent IPC_RMID invisible to shmget
Refs: #926
Change-Id: I16120623b581da5d5d484fd05d5111788c8ad5e2
2018-07-10 02:13:00 +00:00
229b041320 test: Add testcase for #1122
Refs: #1122
Change-Id: Ieafee7469d1397461abf05552ffad0bfea1dd6cd
2018-07-10 02:12:23 +00:00
e1f204de4a test: Add testcase for #1112
Refs: #1112
Change-Id: I0041366d8dcf035a09fbb59a5dbd5c94cae0d65e
2018-07-10 02:12:04 +00:00
c6cc0bf07a test: Add testcase for #1111
Refs: #1111
Change-Id: Ifdf25a9ce98ef495200daf1c24d7ac2c81b3ef17
2018-07-10 02:11:45 +00:00
04e54ead5d test: Add testcase for #1031
Refs: #1031
Change-Id: I6a51596b84a97329ba7d5b765c8471246dcf85df
2018-07-10 02:11:13 +00:00
992705d465 pager_get_path: Append \0 to path
Change-Id: Iaabd89a649bb20b37b35cd345da0f468fd5dd0b5
2018-07-10 02:10:19 +00:00
ae09d979b6 Add testcases for #1141
Refs: #1141
Change-Id: I50d1ac6248e9dfc33c372b825c10cf0bd8b61d3e
2018-07-10 02:09:38 +00:00
1531 changed files with 100764 additions and 21543 deletions

31
.gitignore vendored
View File

@ -1,3 +1,4 @@
*~
*.o
*.elf
*.bin
@ -8,9 +9,27 @@
Module.symvers
*.order
.tmp_versions
elfboot/elfboot
elfboot/elfboot_test
linux/executer/mcexec
linux/mod_test*
linux/target
old_timestamp
CMakeFiles
CMakeCache.txt
Makefile
Kbuild
cmake_install.cmake
config.h
mcstop+release.sh
mcreboot.sh
mcreboot.1
mcoverlay-destroy.sh
mcoverlay-create.sh
kernel/mckernel.img
kernel/include/swapfmt.h
executer/user/vmcore2mckdump
executer/user/ql_talker
executer/user/mcexec.1
executer/user/mcexec
executer/user/libsched_yield.so.1.0.0
executer/user/libsched_yield.so
executer/user/libmcexec.a
executer/user/libldump2mcdump.so
executer/user/eclair
tools/mcstat/mcstat

3
.gitmodules vendored Normal file
View File

@ -0,0 +1,3 @@
[submodule "ihk"]
path = ihk
url = https://github.com/RIKEN-SysSoft/ihk.git

192
CMakeLists.txt Normal file
View File

@ -0,0 +1,192 @@
cmake_minimum_required(VERSION 2.6)
if (NOT CMAKE_BUILD_TYPE)
set (CMAKE_BUILD_TYPE "Debug" CACHE STRING "Build type: Debug Release..." FORCE)
endif (NOT CMAKE_BUILD_TYPE)
enable_language(C ASM)
project(mckernel C ASM)
set(MCKERNEL_VERSION "1.6.0")
set(CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake/modules)
# for rpmbuild
if(DEFINED SYSCONF_INSTALL_DIR)
set(CMAKE_INSTALL_SYSCONFDIR "${SYSCONF_INSTALL_DIR}")
endif()
include(GNUInstallDirs)
include(CMakeParseArguments)
include(Kbuild)
include(Ksym)
include(CheckCCompilerFlag)
set(CFLAGS_WARNINGS "-Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-unused-function")
CHECK_C_COMPILER_FLAG(-Wno-implicit-fallthrough IMPLICIT_FALLTHROUGH)
if(IMPLICIT_FALLTHROUGH)
set(CFLAGS_WARNINGS "${CFLAGS_WARNINGS} -Wno-implicit-fallthrough")
endif(IMPLICIT_FALLTHROUGH)
# C flags need to be set before enabling language?
set(CMAKE_C_FLAGS_DEBUG "-g ${CFLAGS_WARNINGS}" CACHE STRING "Debug compiler flags")
set(CMAKE_C_FLAGS_RELEASE "${CFLAGS_WARNINGS}" CACHE STRING "Release compiler flags")
# build options
option(ENABLE_WERROR "Enable -Werror" OFF)
if (ENABLE_WERROR)
add_compile_options("-Werror")
endif(ENABLE_WERROR)
if (CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64")
set(BUILD_TARGET "smp-x86" CACHE STRING "Build target: smp-x86 | smp-arm64")
elseif (CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64")
set(BUILD_TARGET "smp-arm64" CACHE STRING "Build target: smp-x86 | smp-arm64")
endif()
if (BUILD_TARGET STREQUAL "smp-x86")
set(ARCH "x86_64")
elseif (BUILD_TARGET STREQUAL "smp-arm64")
set(ARCH "arm64")
foreach(i RANGE 1 120)
add_definitions(-DPOSTK_DEBUG_ARCH_DEP_${i} -DPOSTK_DEBUG_TEMP_FIX_${i})
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DPOSTK_DEBUG_ARCH_DEP_${i} -DPOSTK_DEBUG_TEMP_FIX_${i}")
endforeach()
add_definitions(-DCONFIG_ARM64_64K_PAGES -DCONFIG_ARM64_VA_BITS=48)
endif()
set_property(CACHE BUILD_TARGET PROPERTY STRINGS smp-x86 smp-arm64)
set(ENABLE_MEMDUMP ON)
option(ENABLE_PERF "Enable perf support" ON)
option(ENABLE_RUSAGE "Enable rusage support" ON)
option(ENABLE_MCOVERLAYFS "Enable overlay filesystem" OFF)
option(ENABLE_QLMPI "Enable qlmpi programs" OFF)
option(ENABLE_UTI "Enable uti support" OFF)
option(ENABLE_UBSAN "Enable undefined behaviour sanitizer on mckernel size" OFF)
find_library(LIBRT rt)
find_library(LIBNUMA numa)
find_library(LIBBFD bfd)
find_library(LIBIBERTY iberty)
if (ENABLE_QLMPI)
find_package(MPI REQUIRED)
endif()
if (ENABLE_UTI)
find_library(LIBSYSCALL_INTERCEPT syscall_intercept)
endif()
string(REGEX REPLACE "^([0-9]+)\\.([0-9]+)\\.([0-9]+)(-([0-9]+)(.*))?" "\\1;\\2;\\3;\\5;\\6" LINUX_VERSION ${UNAME_R})
list(GET LINUX_VERSION 0 LINUX_VERSION_MAJOR)
list(GET LINUX_VERSION 1 LINUX_VERSION_MINOR)
list(GET LINUX_VERSION 2 LINUX_VERSION_PATCH)
list(GET LINUX_VERSION 3 LINUX_VERSION_RELEASE)
math(EXPR LINUX_VERSION_CODE "${LINUX_VERSION_MAJOR} * 65536 + ${LINUX_VERSION_MINOR} * 256 + ${LINUX_VERSION_PATCH}")
ksym(sys_mount PREFIX MCCTRL_)
ksym(sys_umount PREFIX MCCTRL_)
ksym(sys_unshare PREFIX MCCTRL_)
ksym(zap_page_range PREFIX MCCTRL_)
ksym(vdso_image_64 PREFIX MCCTRL_)
ksym(vdso_start PREFIX MCCTRL_)
ksym(vdso_end PREFIX MCCTRL_)
ksym(vdso_pages PREFIX MCCTRL_)
ksym(__vvar_page PREFIX MCCTRL_)
ksym(hpet_address PREFIX MCCTRL_)
# POSTK_DEBUG_ARCH_DEP_50, add:find kernel symbol.
ksym(vdso_spec PREFIX MCCTRL_)
ksym(hv_clock PREFIX MCCTRL_)
ksym(sys_readlink PREFIX MCCTRL_)
ksym(walk_page_range PREFIX MCCTRL_)
# compat with various install paths
set(MCKERNEL_LIBDIR ${CMAKE_INSTALL_FULL_LIBDIR})
set(BINDIR ${CMAKE_INSTALL_FULL_BINDIR})
set(SBINDIR ${CMAKE_INSTALL_FULL_SBINDIR})
set(ETCDIR ${CMAKE_INSTALL_FULL_SYSCONFDIR})
set(ROOTFSDIR "${CMAKE_INSTALL_PREFIX}/rootfs")
if (CMAKE_INSTALL_PREFIX STREQUAL "/usr")
set(KMODDIR "/lib/modules/${UNAME_R}/extra/mckernel")
set(MCKERNELDIR "${CMAKE_INSTALL_FULL_DATADIR}/mckernel/${BUILD_TARGET}")
else()
set(KMODDIR "${CMAKE_INSTALL_PREFIX}/kmod")
set(MCKERNELDIR "${CMAKE_INSTALL_PREFIX}/${BUILD_TARGET}/kernel")
endif()
set(prefix ${CMAKE_INSTALL_PREFIX})
# set rpath for everyone
set(CMAKE_INSTALL_RPATH ${CMAKE_INSTALL_FULL_LIBDIR})
# ihk: ultimately should support extrnal build, but add as subproject for now
if (EXISTS ${PROJECT_SOURCE_DIR}/ihk/CMakeLists.txt)
set(IHK_SOURCE_DIR "ihk" CACHE STRINGS "path to ihk source directory from mckernel sources")
elseif (EXISTS ${PROJECT_SOURCE_DIR}/../ihk/CMakeLists.txt)
set(IHK_SOURCE_DIR "../ihk" CACHE STRINGS "path to ihk source directory from mckernel sources")
else()
set(IHK_SOURCE_DIR "ihk" CACHE STRINGS "path to ihk source directory from mckernel sources")
endif()
if (EXISTS ${PROJECT_SOURCE_DIR}/${IHK_SOURCE_DIR}/CMakeLists.txt)
set(IHK_FULL_SOURCE_DIR ${PROJECT_SOURCE_DIR}/${IHK_SOURCE_DIR})
elseif (EXISTS /${IHK_SOURCE_DIR}/CMakeLists.txt)
set(IHK_FULL_SOURCE_DIR /${IHK_SOURCE_DIR})
else()
message(FATAL_ERROR "Could not find ihk dir, or it does not contain CMakeLists.txt, either clone ihk or run git submodule update --init")
endif()
add_subdirectory(${IHK_SOURCE_DIR} ihk)
configure_file(config.h.in config.h)
# actual build section - just subdirs
add_subdirectory(executer/kernel/mcctrl)
if (ENABLE_MCOVERLAYFS)
add_subdirectory(executer/kernel/mcoverlayfs)
endif()
add_subdirectory(executer/user)
add_subdirectory(kernel)
add_subdirectory(tools/mcstat)
configure_file(arch/x86_64/tools/mcreboot-smp-x86.sh.in mcreboot.sh @ONLY)
configure_file(arch/x86_64/tools/mcstop+release-smp-x86.sh.in mcstop+release.sh @ONLY)
configure_file(arch/x86_64/tools/mcreboot.1in mcreboot.1 @ONLY)
install(PROGRAMS
"${CMAKE_CURRENT_BINARY_DIR}/mcreboot.sh"
"${CMAKE_CURRENT_BINARY_DIR}/mcstop+release.sh"
DESTINATION "${CMAKE_INSTALL_SBINDIR}")
install(FILES
"arch/x86_64/tools/irqbalance_mck.service"
"arch/x86_64/tools/irqbalance_mck.in"
DESTINATION "${CMAKE_INSTALL_SYSCONFDIR}")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/mcreboot.1"
DESTINATION "${CMAKE_INSTALL_MANDIR}/man1")
configure_file(scripts/mckernel.spec.in scripts/mckernel.spec @ONLY)
set(CPACK_SOURCE_PACKAGE_FILE_NAME "${CMAKE_PROJECT_NAME}-${MCKERNEL_VERSION}")
set(CPACK_SOURCE_IGNORE_FILES "/.git$")
set(CPACK_SOURCE_INSTALLED_DIRECTORIES "${CMAKE_SOURCE_DIR};/;${IHK_FULL_SOURCE_DIR};/ihk;${CMAKE_BINARY_DIR}/scripts;/scripts")
set(CPACK_SOURCE_GENERATOR "TGZ")
include(CPack)
add_custom_target(dist COMMAND ${CMAKE_MAKE_PROGRAM} package_source)
# config report
message("-------------------------------")
message("Option summary")
message("-------------------------------")
message("Build type: ${CMAKE_BUILD_TYPE}")
message("Build target: ${BUILD_TARGET}")
message("IHK_SOURCE_DIR: ${IHK_SOURCE_DIR} (relative to mckernel source tree)")
message("UNAME_R: ${UNAME_R}")
message("KERNEL_DIR: ${KERNEL_DIR}")
message("SYSTEM_MAP: ${SYSTEM_MAP}")
message("VMLINUX: ${VMLINUX}")
message("KBUILD_C_FLAGS: ${KBUILD_C_FLAGS}")
message("ENABLE_MEMDUMP: ${ENABLE_MEMDUMP}")
message("ENABLE_PERF: ${ENABLE_PERF}")
message("ENABLE_RUSAGE: ${ENABLE_RUSAGE}")
message("ENABLE_MCOVERLAYFS: ${ENABLE_MCOVERLAYFS}")
message("ENABLE_QLMPI: ${ENABLE_QLMPI}")
message("ENABLE_UTI: ${ENABLE_UTI}")
message("ENABLE_WERROR: ${ENABLE_WERROR}")
message("ENABLE_UBSAN: ${ENABLE_UBSAN}")
message("-------------------------------")

339
LICENSE Normal file
View File

@ -0,0 +1,339 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

View File

@ -1,81 +0,0 @@
TARGET = @TARGET@
SBINDIR = @SBINDIR@
INCDIR = @INCDIR@
ETCDIR = @ETCDIR@
MANDIR = @MANDIR@
all: executer-mcctrl executer-mcoverlayfs executer-user mckernel mck-tools
executer-mcctrl:
+@(cd executer/kernel/mcctrl; $(MAKE) modules)
executer-mcoverlayfs:
+@(cd executer/kernel/mcoverlayfs; $(MAKE) modules)
executer-user:
+@(cd executer/user; $(MAKE))
mckernel:
+@case "$(TARGET)" in \
attached-mic | builtin-x86 | builtin-mic | smp-x86 | smp-arm64) \
(cd kernel; $(MAKE)) \
;; \
*) \
echo "unknown target $(TARGET)" >&2 \
exit 1 \
;; \
esac
mck-tools:
+@(cd tools/mcstat; $(MAKE))
install:
@(cd executer/kernel/mcctrl; $(MAKE) install)
@(cd executer/kernel/mcoverlayfs; $(MAKE) install)
@(cd executer/user; $(MAKE) install)
@case "$(TARGET)" in \
attached-mic | builtin-x86 | builtin-mic | smp-x86 | smp-arm64) \
(cd kernel; $(MAKE) install) \
;; \
*) \
echo "unknown target $(TARGET)" >&2 \
exit 1 \
;; \
esac
@case "$(TARGET)" in \
smp-x86 | smp-arm64) \
mkdir -p -m 755 $(SBINDIR); \
install -m 755 arch/x86_64/tools/mcreboot-smp-x86.sh $(SBINDIR)/mcreboot.sh; \
install -m 755 arch/x86_64/tools/mcstop+release-smp-x86.sh $(SBINDIR)/mcstop+release.sh; \
install -m 755 arch/x86_64/tools/mcoverlay-destroy-smp-x86.sh $(SBINDIR)/mcoverlay-destroy.sh; \
install -m 755 arch/x86_64/tools/mcoverlay-create-smp-x86.sh $(SBINDIR)/mcoverlay-create.sh; \
install -m 755 arch/x86_64/tools/eclair-dump-backtrace.exp $(SBINDIR)/eclair-dump-backtrace.exp;\
mkdir -p -m 755 $(ETCDIR); \
install -m 644 arch/x86_64/tools/irqbalance_mck.service $(ETCDIR)/irqbalance_mck.service; \
install -m 644 arch/x86_64/tools/irqbalance_mck.in $(ETCDIR)/irqbalance_mck.in; \
mkdir -p -m 755 $(INCDIR); \
install -m 644 kernel/include/swapfmt.h $(INCDIR); \
mkdir -p -m 755 $(MANDIR)/man1; \
install -m 644 arch/x86_64/tools/mcreboot.1 $(MANDIR)/man1/mcreboot.1; \
;; \
*) \
echo "unknown target $(TARGET)" >&2 \
exit 1 \
;; \
esac
@(cd tools/mcstat/; $(MAKE) install)
clean:
@(cd executer/kernel/mcctrl; $(MAKE) clean)
@(cd executer/kernel/mcoverlayfs; $(MAKE) clean)
@(cd executer/user; $(MAKE) clean)
@case "$(TARGET)" in \
attached-mic | builtin-x86 | builtin-mic | smp-x86 | smp-arm64) \
(cd kernel; $(MAKE) clean) \
;; \
*) \
echo "unknown target $(TARGET)" >&2 \
exit 1 \
;; \
esac
@(cd tools/mcstat; $(MAKE) clean)

186
README.md Normal file
View File

@ -0,0 +1,186 @@
![McKernel Logo](https://www.sys.r-ccs.riken.jp/members_files/bgerofi/mckernel-logo.png)
-------------------------
IHK/McKernel is a light-weight multi-kernel operating system designed for high-end supercomputing. It runs Linux and McKernel, a light-weight kernel (LWK), side-by-side inside compute nodes and aims at the following:
- Provide scalable and consistent execution of large-scale parallel scientific applications, but at the same time maintain the ability to rapidly adapt to new hardware features and emerging programming models
- Provide efficient memory and device management so that resource contention and data movement are minimized at the system level
- Eliminate OS noise by isolating OS services in Linux and provide jitter free execution on the LWK
- Support the full POSIX/Linux APIs by selectively offloading (slow-path) system calls to Linux
## Contents
- [Background] (#background)
- [Architectural Overview](#architectural-overview)
- [Installation](#installation)
- [The Team](#the-team)
## Background and Motivation
With the growing complexity of high-end supercomputers, the current system software stack faces significant challenges as we move forward to exascale and beyond. The necessity to deal with extreme degree of parallelism, heterogeneous architectures, multiple levels of memory hierarchy, power constraints, etc., advocates operating systems that can rapidly adapt to new hardware requirements, and that can support novel programming paradigms and runtime systems. On the other hand, a new class of more dynamic and complex applications are also on the horizon, with an increasing demand for application constructs such as in-situ analysis, workflows, elaborate monitoring and performance tools. This complexity relies not only on the rich features of POSIX, but also on the Linux APIs (such as the */proc*, */sys* filesystems, etc.) in particular.
##### Two Traditional HPC OS Approaches
Traditionally, light-weight operating systems specialized for HPC followed two approaches to tackle scalable execution of large-scale applications. In the full weight kernel (FWK) approach, a full Linux environment is taken as the basis, and features that inhibit attaining HPC scalability are removed, i.e., making it light-weight. The pure light-weight kernel (LWK) approach, on the other hand, starts from scratch and effort is undertaken to add sufficient functionality so that it provides a familiar API, typically something close to that of a general purpose OS, while at the same time it retains the desired scalability and reliability attributes. Neither of these approaches yields a fully Linux compatible environment.
##### The Multi-kernel Approach
A hybrid approach recognized recently by the system software community is to run Linux simultaneously with a lightweight kernel on compute nodes and multiple research projects are now pursuing this direction. The basic idea is that simulations run on an HPC tailored lightweight kernel, ensuring the necessary isolation for noiseless execution of parallel applications, but Linux is leveraged so that the full POSIX API is supported. Additionally, the small code base of the LWK can also facilitate rapid prototyping for new, exotic hardware features. Nevertheless, the questions of how to share node resources between the two types of kernels, where do device drivers execute, how exactly do the two kernels interact with each other and to what extent are they integrated, remain subjects of ongoing debate.
## Architectural Overview
At the heart of the stack is a low-level software infrastructure called Interface for Heterogeneous Kernels (IHK). IHK is a general framework that provides capabilities for partitioning resources in a many-core environment (e.g.,CPU cores and physical memory) and it enables management of lightweight kernels. IHK can allocate and release host resources dynamically and no reboot of the host machine is required when altering configuration. IHK also provides a low-level inter-kernel messaging infrastructure, called the Inter-Kernel Communication (IKC) layer. An architectural overview of the main system components is shown below.
![arch](https://www.sys.r-ccs.riken.jp/members_files/bgerofi/mckernel.png)
McKernel is a lightweight kernel written from scratch. It is designed for HPC and is booted from IHK. McKernel retains a binary compatible ABI with Linux, however, it implements only a small set of performance sensitive system calls and the rest are offloaded to Linux. Specifically, McKernel has its own memory management, it supports processes and multi-threading with a simple round-robin cooperative (tick-less) scheduler, and it implements signaling. It also allows inter-process memory mappings and it provides interfaces to hardware performance counters.
### Functionality
An overview of some of the principal functionalities of the IHK/McKernel stack is provided below.
#### System Call Offloading
System call forwarding in McKernel is implemented as follows. When an offloaded system call occurs, McKernel marshals the system call number along with its arguments and sends a message to Linux via a dedicated IKC channel. The corresponding proxy process running on Linux is by default waiting for system call requests through an ioctl() call into IHKs system call delegator kernel module. The delegator kernel modules IKC interrupt handler wakes up the proxy process, which returns to userspace and simply invokes the requested system call. Once it obtains the return value, it instructs the delegator module to send the result back to McKernel, which subsequently passes the value to user-space.
#### Unified Address Space
The unified address space model in IHK/McKernel ensures that offloaded system calls can seamlessly resolve arguments even in case of pointers. This mechanism is depicted below and is implemented as follows.
![unified_ap](https://www.sys.r-ccs.riken.jp/members_files/bgerofi/img/unified_address_space_en.png)
First, the proxy process is compiled as a position independent binary, which enables us to map the code and data segments specific to the proxy process to an address range which is explicitly excluded from McKernels user space. The grey box on the right side of the figure demonstrates the excluded region. Second, the entire valid virtual address range of McKernels application user-space is covered by a special mapping in the proxy process for which we use a pseudo file mapping in Linux. This mapping is indicated by the blue box on the left side of the figure.
## Installation
For a smooth experience, we recommend the following combination of OS distributions and platforms:
- CentOS 7.3+ running on Intel Xeon / Xeon Phi
##### 1. Change SELinux settings
Log in as the root and disable SELinux:
~~~~
vim /etc/selinux/config
~~~~
Change the file to SELINUX=disabled
##### 2. Reboot the host machine
~~~~
sudo reboot
~~~~
##### 3. Prepare packages, kernel symbol table file
You will need the following packages installed:
~~~~
sudo yum install kernel-devel binutils-devel libnuma-devel
~~~~
Grant read permission to the System.map file of your kernel version:
~~~~
sudo chmod a+r /boot/System.map-`uname -r`
~~~~
##### 4. Obtain sources and compile the kernel
Clone the source code and set up ihk symlink (this is currently required):
~~~~
mkdir -p ~/src/ihk+mckernel/
cd ~/src/ihk+mckernel/
git clone -r git@github.com:RIKEN-SysSoft/mckernel.git
~~~~
Configure and compile:
~~~~
mkdir -p build && cd build
cmake -DCMAKE_INSTALL_PREFIX=${HOME}/ihk+mckernel $HOME/src/mckernel
make -j install
~~~~
The IHK kernel modules and McKernel kernel image should be installed under the **ihk+mckernel** folder in your home directory.
##### 5. Boot McKernel
A boot script called mcreboot.sh is provided under sbin in the install folder. To boot on logical CPU 1 with 512MB of memory, use the following invocation:
~~~~
export TOP=${HOME}/ihk+mckernel/
cd ${TOP}
sudo ./sbin/mcreboot.sh -c 1 -m 512m
~~~~
You should see something similar like this if you display the McKernel's kernel message log:
~~~~
./sbin/ihkosctl 0 kmsg
IHK/McKernel started.
[ -1]: no_execute_available: 1
[ -1]: map_fixed: phys: 0xfee00000 => 0xffff860000009000 (1 pages)
[ -1]: setup_x86 done.
[ -1]: ns_per_tsc: 385
[ -1]: KCommand Line: hidos dump_level=24
[ -1]: Physical memory: 0x1ad3000 - 0x21000000, 525520896 bytes, 128301 pages available @ NUMA: 0
[ -1]: NUMA: 0, Linux NUMA: 0, type: 1, available bytes: 525520896, pages: 128301
[ -1]: NUMA 0 distances: 0 (10),
[ -1]: map_fixed: phys: 0x28000 => 0xffff86000000a000 (2 pages)
[ -1]: Trampoline area: 0x28000
[ -1]: map_fixed: phys: 0x0 => 0xffff86000000c000 (1 pages)
[ -1]: # of cpus : 1
[ -1]: locals = ffff880001af6000
[ 0]: BSP: 0 (HW ID: 1 @ NUMA 0)
[ 0]: BSP: booted 0 AP CPUs
[ 0]: Master channel init acked.
[ 0]: vdso is enabled
IHK/McKernel booted.
~~~~
##### 5. Run a simple program on McKernel
The mcexec command line tool (which is also the Linux proxy process) can be used for executing applications on McKernel:
~~~~
./bin/mcexec hostname
centos-vm
~~~~
##### 6. Shutdown McKernel
Finally, to shutdown McKernel and release CPU/memory resources back to Linux use the following command:
~~~~
sudo ./sbin/mcstop+release.sh
~~~~
## The Team
The McKernel project was started at The University of Tokyo and currently it is mainly developed at RIKEN.
Some of our collaborators include:
- Hitachi
- Fujitsu
- CEA (France)
- NEC
## License
McKernel is GPL licensed, as found in the LICENSE file.

View File

@ -1,4 +1,4 @@
# Makefile.arch COPYRIGHT FUJITSU LIMITED 2015-2017
# Makefile.arch.in COPYRIGHT FUJITSU LIMITED 2015-2018
VDSO_SRCDIR = $(SRC)/../arch/$(IHKARCH)/kernel/vdso
VDSO_BUILDDIR = @abs_builddir@/vdso
VDSO_SO_O = $(O)/vdso.so.o
@ -6,23 +6,22 @@ VDSO_SO_O = $(O)/vdso.so.o
IHK_OBJS += assert.o cache.o cpu.o cputable.o context.o entry.o entry-fpsimd.o
IHK_OBJS += fault.o head.o hyp-stub.o local.o perfctr.o perfctr_armv8pmu.o proc.o proc-macros.o
IHK_OBJS += psci.o smp.o trampoline.o traps.o fpsimd.o
IHK_OBJS += debug-monitors.o hw_breakpoint.o ptrace.o
IHK_OBJS += debug-monitors.o hw_breakpoint.o ptrace.o timer.o
IHK_OBJS += $(notdir $(VDSO_SO_O)) memory.o syscall.o vdso.o
IHK_OBJS += irq-gic-v2.o irq-gic-v3.o
IHK_OBJS += memcpy.o memset.o
IHK_OBJS += cpufeature.o
# POSTK_DEBUG_ARCH_DEP_18 coredump arch separation.
# IHK_OBJS added coredump.o
IHK_OBJS += imp-sysreg.o
IHK_OBJS += coredump.o
$(VDSO_SO_O): $(VDSO_BUILDDIR)/vdso.so
$(VDSO_BUILDDIR)/vdso.so: FORCE
$(call echo_cmd,BUILD VDSO,$(TARGET))
@mkdir -p $(O)/vdso
@TARGETDIR="$(TARGETDIR)" $(submake) -C $(VDSO_BUILDDIR) $(SUBOPTS) prepare
@TARGETDIR="$(TARGETDIR)" $(submake) -C $(VDSO_BUILDDIR) $(SUBOPTS)
mkdir -p $(O)/vdso
TARGETDIR="$(TARGETDIR)" $(submake) -C $(VDSO_BUILDDIR) $(SUBOPTS) prepare
TARGETDIR="$(TARGETDIR)" $(submake) -C $(VDSO_BUILDDIR) $(SUBOPTS)
FORCE:

View File

@ -1,4 +1,4 @@
/* assert.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* assert.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <process.h>
#include <list.h>
@ -24,6 +24,7 @@ STATIC_ASSERT(offsetof(struct pt_regs, sp) == S_SP);
STATIC_ASSERT(offsetof(struct pt_regs, pc) == S_PC);
STATIC_ASSERT(offsetof(struct pt_regs, pstate) == S_PSTATE);
STATIC_ASSERT(offsetof(struct pt_regs, orig_x0) == S_ORIG_X0);
STATIC_ASSERT(offsetof(struct pt_regs, orig_pc) == S_ORIG_PC);
STATIC_ASSERT(offsetof(struct pt_regs, syscallno) == S_SYSCALLNO);
STATIC_ASSERT(sizeof(struct pt_regs) == S_FRAME_SIZE);
@ -50,3 +51,6 @@ STATIC_ASSERT(sizeof(struct sigcontext) - offsetof(struct sigcontext, __reserved
ALIGN_UP(sizeof(struct _aarch64_ctx), 16) > sizeof(struct extra_context));
STATIC_ASSERT(SVE_PT_FPSIMD_OFFSET == sizeof(struct user_sve_header));
STATIC_ASSERT(SVE_PT_SVE_OFFSET == sizeof(struct user_sve_header));
/* assert for struct arm64_cpu_local_thread member offset define */
STATIC_ASSERT(offsetof(struct arm64_cpu_local_thread, panic_regs) == 160);

View File

@ -1,5 +1,4 @@
/* coredump.c COPYRIGHT FUJITSU LIMITED 2015-2016 */
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
#include <process.h>
#include <elfcore.h>
#include <string.h>
@ -31,5 +30,3 @@ void arch_fill_prstatus(struct elf_prstatus64 *prstatus, struct thread *thread,
/* copy unaligned prstatus addr */
memcpy(prstatus, &tmp_prstatus, sizeof(*prstatus));
}
#endif /* POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,4 +1,4 @@
/* cpu.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* cpu.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <ihk/cpu.h>
#include <ihk/debug.h>
#include <ihk/mm.h>
@ -30,27 +30,20 @@
#include <debug-monitors.h>
#include <sysreg.h>
#include <cpufeature.h>
#ifdef POSTK_DEBUG_ARCH_DEP_65
#include <debug.h>
#include <hwcap.h>
#endif /* POSTK_DEBUG_ARCH_DEP_65 */
#include <virt.h>
//#define DEBUG_PRINT_CPU
#include "postk_print_sysreg.c"
#ifdef DEBUG_PRINT_CPU
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
struct cpuinfo_arm64 cpuinfo_data[NR_CPUS]; /* index is logical cpuid */
static unsigned int per_cpu_timer_val[NR_CPUS] = { 0 };
static struct list_head handlers[1024];
static void cpu_init_interrupt_handler(void);
@ -60,7 +53,6 @@ void assign_processor_id(void);
void arch_delay(int);
int gettime_local_support = 0;
extern int ihk_mc_pt_print_pte(struct page_table *pt, void *virt);
extern int interrupt_from_user(void *);
extern unsigned long ihk_param_gic_dist_base_pa;
@ -121,137 +113,54 @@ static struct ihk_mc_interrupt_handler cpu_stop_handler = {
.priv = NULL,
};
/* @ref.impl include/clocksource/arm_arch_timer.h */
#define ARCH_TIMER_CTRL_ENABLE (1 << 0)
#define ARCH_TIMER_CTRL_IT_MASK (1 << 1)
#define ARCH_TIMER_CTRL_IT_STAT (1 << 2)
static void physical_timer_handler(void *priv)
extern long freeze_thaw(void *nmi_ctx);
static void multi_nm_interrupt_handler(void *priv)
{
unsigned int ctrl = 0;
int cpu = ihk_mc_get_processor_id();
dkprintf("CPU%d: catch physical timer\n", cpu);
asm volatile("mrs %0, cntp_ctl_el0" : "=r" (ctrl));
if (ctrl & ARCH_TIMER_CTRL_IT_STAT) {
unsigned int zero = 0;
unsigned int val = ctrl;
unsigned int clocks = per_cpu_timer_val[cpu];
unsigned long irqstate;
struct cpu_local_var *v = get_this_cpu_local_var();
/* set resched flag */
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
v->flags |= CPU_FLAG_NEED_RESCHED;
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* gen control register value */
val &= ~(ARCH_TIMER_CTRL_IT_STAT | ARCH_TIMER_CTRL_IT_MASK);
val |= ARCH_TIMER_CTRL_ENABLE;
/* set timer re-enable for periodic */
asm volatile("msr cntp_ctl_el0, %0" : : "r" (zero));
asm volatile("msr cntp_tval_el0, %0" : : "r" (clocks));
asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
}
}
static struct ihk_mc_interrupt_handler phys_timer_handler = {
.func = physical_timer_handler,
.priv = NULL,
};
static void virtual_timer_handler(void *priv)
{
unsigned int ctrl = 0;
int cpu = ihk_mc_get_processor_id();
dkprintf("CPU%d: catch virtual timer\n", cpu);
asm volatile("mrs %0, cntv_ctl_el0" : "=r" (ctrl));
if (ctrl & ARCH_TIMER_CTRL_IT_STAT) {
unsigned int zero = 0;
unsigned int val = ctrl;
unsigned int clocks = per_cpu_timer_val[cpu];
unsigned long irqstate;
struct cpu_local_var *v = get_this_cpu_local_var();
/* set resched flag */
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
v->flags |= CPU_FLAG_NEED_RESCHED;
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* gen control register value */
val &= ~(ARCH_TIMER_CTRL_IT_STAT | ARCH_TIMER_CTRL_IT_MASK);
val |= ARCH_TIMER_CTRL_ENABLE;
/* set timer re-enable for periodic */
asm volatile("msr cntv_ctl_el0, %0" : : "r" (zero));
asm volatile("msr cntv_tval_el0, %0" : : "r" (clocks));
asm volatile("msr cntv_ctl_el0, %0" : : "r" (val));
}
}
static struct ihk_mc_interrupt_handler virt_timer_handler = {
.func = virtual_timer_handler,
.priv = NULL,
};
static void memdump_interrupt_handler(void *priv)
{
struct pt_regs *regs;
extern int nmi_mode;
struct pt_regs *regs = (struct pt_regs *)priv;
union arm64_cpu_local_variables *clv;
regs = cpu_local_var(current)->uctx;
clv = get_arm64_this_cpu_local();
switch (nmi_mode) {
case 1:
case 2:
/* mode == 1or2, for FREEZER NMI */
dkprintf("%s: freeze mode NMI catch. (nmi_mode=%d)\n",
__func__, nmi_mode);
freeze_thaw(NULL);
break;
if (regs && interrupt_from_user(regs)) {
memcpy(clv->arm64_cpu_local_thread.panic_regs, regs->regs, sizeof(regs->regs));
clv->arm64_cpu_local_thread.panic_regs[31] = regs->sp;
clv->arm64_cpu_local_thread.panic_regs[32] = regs->pc;
clv->arm64_cpu_local_thread.panic_regs[33] = regs->pstate;
}
else {
asm volatile (
"stp x0, x1, [%3, #16 * 0]\n"
"stp x2, x3, [%3, #16 * 1]\n"
"stp x4, x5, [%3, #16 * 2]\n"
"stp x6, x7, [%3, #16 * 3]\n"
"stp x8, x9, [%3, #16 * 4]\n"
"stp x10, x11, [%3, #16 * 5]\n"
"stp x12, x13, [%3, #16 * 6]\n"
"stp x14, x15, [%3, #16 * 7]\n"
"stp x16, x17, [%3, #16 * 8]\n"
"stp x18, x19, [%3, #16 * 9]\n"
"stp x20, x21, [%3, #16 * 10]\n"
"stp x22, x23, [%3, #16 * 11]\n"
"stp x24, x25, [%3, #16 * 12]\n"
"stp x26, x27, [%3, #16 * 13]\n"
"stp x28, x29, [%3, #16 * 14]\n"
"str x30, [%3, #16 * 15]\n"
"mov %0, sp\n"
"adr %1, 1f\n"
"mrs %2, spsr_el1\n"
"1:"
: "=r" (clv->arm64_cpu_local_thread.panic_regs[31]), /* sp */
"=r" (clv->arm64_cpu_local_thread.panic_regs[32]), /* pc */
"=r" (clv->arm64_cpu_local_thread.panic_regs[33]) /* spsr_el1 */
: "r" (&clv->arm64_cpu_local_thread.panic_regs)
: "memory"
);
}
case 0:
/* mode == 0, for MEMDUMP NMI */
clv = get_arm64_this_cpu_local();
clv->arm64_cpu_local_thread.paniced = 1;
if (regs) {
memcpy(clv->arm64_cpu_local_thread.panic_regs,
regs->regs, sizeof(regs->regs));
clv->arm64_cpu_local_thread.panic_regs[31] = regs->sp;
clv->arm64_cpu_local_thread.panic_regs[32] = regs->pc;
clv->arm64_cpu_local_thread.panic_regs[33] =
regs->pstate;
}
clv->arm64_cpu_local_thread.paniced = 1;
ihk_mc_query_mem_areas();
/* memdump-nmi is halted McKernel, break is unnecessary. */
/* fall through */
case 3:
/* mode == 3, for SHUTDOWN-WAIT NMI */
while (1) {
cpu_halt();
}
break;
while(1)
{
cpu_halt();
default:
ekprintf("%s: Unknown nmi-mode(%d) detected.\n",
__func__, nmi_mode);
break;
}
}
static struct ihk_mc_interrupt_handler memdump_handler = {
.func = memdump_interrupt_handler,
static struct ihk_mc_interrupt_handler multi_nmi_handler = {
.func = multi_nm_interrupt_handler,
.priv = NULL,
};
@ -338,19 +247,160 @@ static void setup_processor(void)
static char *trampoline_va, *first_page_va;
unsigned long is_use_virt_timer(void)
static inline uint64_t pwr_arm64hpc_read_imp_fj_core_uarch_restrection_el1(void)
{
extern unsigned long ihk_param_use_virt_timer;
uint64_t reg;
switch (ihk_param_use_virt_timer) {
case 0: /* physical */
case 1: /* virtual */
break;
default: /* invalid */
panic("PANIC: is_use_virt_timer(): timer select neither phys-timer nor virt-timer.\n");
break;
asm volatile("mrs_s %0, " __stringify(IMP_FJ_CORE_UARCH_RESTRECTION_EL1)
: "=r" (reg) : : "memory");
return reg;
}
static ihk_spinlock_t imp_fj_core_uarch_restrection_el1_lock =
SPIN_LOCK_UNLOCKED;
static inline void pwr_arm64hpc_write_imp_fj_core_uarch_restrection_el1(uint64_t set_bit,
uint64_t clear_bit)
{
uint64_t reg;
unsigned long flags;
flags = ihk_mc_spinlock_lock(&imp_fj_core_uarch_restrection_el1_lock);
reg = pwr_arm64hpc_read_imp_fj_core_uarch_restrection_el1();
reg = (reg & ~clear_bit) | set_bit;
asm volatile("msr_s " __stringify(IMP_FJ_CORE_UARCH_RESTRECTION_EL1) ", %0"
: : "r" (reg) : "memory");
ihk_mc_spinlock_unlock(&imp_fj_core_uarch_restrection_el1_lock, flags);
}
static inline uint64_t pwr_arm64hpc_read_imp_soc_standby_ctrl_el1(void)
{
uint64_t reg;
asm volatile("mrs_s %0, " __stringify(IMP_SOC_STANDBY_CTRL_EL1)
: "=r" (reg) : : "memory");
return reg;
}
static ihk_spinlock_t imp_soc_standby_ctrl_el1_lock = SPIN_LOCK_UNLOCKED;
static inline void pwr_arm64hpc_write_imp_soc_standby_ctrl_el1(uint64_t set_bit,
uint64_t clear_bit)
{
unsigned long flags;
uint64_t reg;
flags = ihk_mc_spinlock_lock(&imp_soc_standby_ctrl_el1_lock);
reg = pwr_arm64hpc_read_imp_soc_standby_ctrl_el1();
reg = (reg & ~clear_bit) | set_bit;
asm volatile("msr_s " __stringify(IMP_SOC_STANDBY_CTRL_EL1) ", %0"
: : "r" (reg) : "memory");
ihk_mc_spinlock_unlock(&imp_soc_standby_ctrl_el1_lock, flags);
}
static unsigned long *retention_state_flag;
static inline int is_hpcpwr_available(void)
{
if (retention_state_flag)
return 1;
else
return 0;
}
static inline void pwr_arm64hpc_map_retention_state_flag(void)
{
extern unsigned long ihk_param_retention_state_flag_pa;
unsigned long size = BITS_TO_LONGS(NR_CPUS) * sizeof(unsigned long);
if (!ihk_param_retention_state_flag_pa) {
return;
}
return ihk_param_use_virt_timer;
retention_state_flag = map_fixed_area(ihk_param_retention_state_flag_pa,
size, 0);
}
static inline int pwr_arm64hpc_retention_state_get(uint64_t *val)
{
unsigned long linux_core_id;
int cpu = ihk_mc_get_processor_id();
int ret;
if (!is_hpcpwr_available()) {
*val = 0;
return 0;
}
ret = ihk_mc_get_core(cpu, &linux_core_id, NULL, NULL);
if (ret) {
return ret;
}
*val = test_bit(linux_core_id, retention_state_flag);
return ret;
}
static inline int pwr_arm64hpc_retention_state_set(uint64_t val)
{
unsigned long linux_core_id;
int cpu = ihk_mc_get_processor_id();
int ret;
if (!is_hpcpwr_available()) {
return 0;
}
ret = ihk_mc_get_core(cpu, &linux_core_id, NULL, NULL);
if (ret) {
return ret;
}
if (val) {
set_bit(cpu, retention_state_flag);
} else {
clear_bit(cpu, retention_state_flag);
}
return ret;
}
static inline int pwr_arm64hpc_enable_retention_state(void)
{
if (!is_hpcpwr_available()) {
return 0;
}
pwr_arm64hpc_write_imp_soc_standby_ctrl_el1(IMP_SOC_STANDBY_CTRL_EL1_RETENTION,
0);
return 0;
}
static inline void init_power_management(void)
{
int state;
uint64_t imp_fj_clear_bit = 0;
uint64_t imp_soc_clear_bit = 0;
if (!is_hpcpwr_available()) {
return;
}
/* retention state */
state = pwr_arm64hpc_retention_state_set(0);
if (state) {
panic("error: initialize power management\n");
}
/* issue state */
imp_fj_clear_bit |= IMP_FJ_CORE_UARCH_RESTRECTION_EL1_ISSUE_RESTRICTION;
/* eco_state */
imp_fj_clear_bit |= IMP_FJ_CORE_UARCH_RESTRECTION_EL1_FL_RESTRICT_TRANS;
imp_soc_clear_bit |= IMP_SOC_STANDBY_CTRL_EL1_ECO_MODE;
/* ex_pipe_state */
imp_fj_clear_bit |= IMP_FJ_CORE_UARCH_RESTRECTION_EL1_EX_RESTRICTION;
/* write */
pwr_arm64hpc_write_imp_fj_core_uarch_restrection_el1(0,
imp_fj_clear_bit);
pwr_arm64hpc_write_imp_soc_standby_ctrl_el1(0, imp_soc_clear_bit);
}
/*@
@ -368,23 +418,19 @@ void ihk_mc_init_ap(void)
kprintf("# of cpus : %d\n", cpu_info->ncpus);
init_processors_local(cpu_info->ncpus);
kprintf("IKC IRQ vector: %d, IKC target CPU APIC: %d\n",
ihk_ikc_irq, ihk_ikc_irq_apicid);
/* Do initialization for THIS cpu (BSP) */
assign_processor_id();
ihk_mc_register_interrupt_handler(INTRID_CPU_STOP, &cpu_stop_handler);
ihk_mc_register_interrupt_handler(INTRID_MEMDUMP, &memdump_handler);
ihk_mc_register_interrupt_handler(INTRID_MULTI_NMI, &multi_nmi_handler);
ihk_mc_register_interrupt_handler(
ihk_mc_get_vector(IHK_TLB_FLUSH_IRQ_VECTOR_START), &remote_tlb_flush_handler);
ihk_mc_get_vector(IHK_TLB_FLUSH_IRQ_VECTOR_START),
&remote_tlb_flush_handler);
ihk_mc_register_interrupt_handler(get_timer_intrid(),
get_timer_handler());
if (is_use_virt_timer()) {
ihk_mc_register_interrupt_handler(get_virt_timer_intrid(), &virt_timer_handler);
} else {
ihk_mc_register_interrupt_handler(get_phys_timer_intrid(), &phys_timer_handler);
}
init_smp_processor();
init_power_management();
}
extern void vdso_init(void);
@ -393,15 +439,13 @@ long (*__arm64_syscall_handler)(int, ihk_mc_user_context_t *);
/* @ref.impl arch/arm64/include/asm/arch_timer.h::arch_timer_get_cntkctl */
static inline unsigned int arch_timer_get_cntkctl(void)
{
unsigned int cntkctl;
asm volatile("mrs %0, cntkctl_el1" : "=r" (cntkctl));
return cntkctl;
return read_sysreg(cntkctl_el1);
}
/* @ref.impl arch/arm64/include/asm/arch_timer.h::arch_timer_set_cntkctl */
static inline void arch_timer_set_cntkctl(unsigned int cntkctl)
{
asm volatile("msr cntkctl_el1, %0" : : "r" (cntkctl));
write_sysreg(cntkctl, cntkctl_el1);
}
#ifdef CONFIG_ARM_ARCH_TIMER_EVTSTREAM
@ -467,19 +511,19 @@ void init_cpu(void)
{
if(gic_enable)
gic_enable();
arm64_init_per_cpu_perfctr();
arm64_enable_pmu();
if (xos_is_tchip()) {
vhbm_barrier_registers_init();
scdrv_registers_init();
hpc_registers_init();
}
arm64_enable_user_access_pmu_regs();
}
#ifdef CONFIG_ARM64_VHE
/* @ref.impl arch/arm64/include/asm/virt.h */
static inline int is_kernel_in_hyp_mode(void)
{
unsigned long el;
asm("mrs %0, CurrentEL" : "=r" (el));
return el == CurrentEL_EL2;
}
/* @ref.impl arch/arm64/kernel/smp.c */
/* Whether the boot CPU is running in HYP mode or not */
static int boot_cpu_hyp_mode;
@ -524,8 +568,12 @@ void setup_arm64(void)
arm64_init_perfctr();
arch_timer_init();
gic_init();
pwr_arm64hpc_map_retention_state_flag();
init_cpu();
init_gettime_support();
@ -568,6 +616,7 @@ void setup_arm64_ap(void (*next_func)(void))
debug_monitors_init();
arch_timer_configure_evtstream();
init_cpu();
init_power_management();
call_ap_func(next_func);
/* BUG */
@ -597,6 +646,7 @@ static void show_context_stack(struct pt_regs *regs)
max_loop = (stack_top - sp) / min_stack_frame_size;
for (i = 0; i < max_loop; i++) {
extern char _head[], _end[];
uintptr_t *fp, *lr;
fp = (uintptr_t *)sp;
lr = (uintptr_t *)(sp + 8);
@ -605,7 +655,8 @@ static void show_context_stack(struct pt_regs *regs)
break;
}
if ((*lr < MAP_KERNEL_START) || (*lr > MAP_KERNEL_START + MAP_KERNEL_SIZE)) {
if ((*lr < (unsigned long)_head) ||
(*lr > (unsigned long)_end)) {
break;
}
@ -630,7 +681,7 @@ void handle_IPI(unsigned int vector, struct pt_regs *regs)
else {
list_for_each_entry(h, &handlers[vector], list) {
if (h->func) {
h->func(h->priv);
h->func(h->priv == NULL ? regs : h->priv);
}
}
}
@ -646,7 +697,18 @@ static void __arm64_wakeup(int hw_cpuid, unsigned long entry)
/** IHK Functions **/
/* send WFI(Wait For Interrupt) instruction */
extern void cpu_do_idle(void);
static inline void cpu_do_idle(void)
{
extern void __cpu_do_idle(void);
uint64_t retention;
int state;
state = pwr_arm64hpc_retention_state_get(&retention);
if ((state == 0) && (retention != 0)) {
pwr_arm64hpc_enable_retention_state();
}
__cpu_do_idle();
}
/* halt by WFI(Wait For Interrupt) */
void cpu_halt(void)
@ -1120,20 +1182,32 @@ long ihk_mc_show_cpuinfo(char *buf, size_t buf_size, unsigned long read_off, int
int j = 0;
/* generate strings */
loff += snprintf(lbuf + loff, lbuf_size - loff, "processor\t: %d\n", cpuinfo->hwid);
loff += snprintf(lbuf + loff, lbuf_size - loff, "Features\t:");
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"processor\t: %d\n", cpuinfo->hwid);
loff += scnprintf(lbuf + loff, lbuf_size - loff, "Features\t:");
for (j = 0; hwcap_str[j]; j++) {
if (elf_hwcap & (1 << j)) {
loff += snprintf(lbuf + loff, lbuf_size - loff, " %s", hwcap_str[j]);
loff += scnprintf(lbuf + loff,
lbuf_size - loff,
" %s", hwcap_str[j]);
}
}
loff += snprintf(lbuf + loff, lbuf_size - loff, "\n");
loff += snprintf(lbuf + loff, lbuf_size - loff, "CPU implementer\t: 0x%02x\n", MIDR_IMPLEMENTOR(midr));
loff += snprintf(lbuf + loff, lbuf_size - loff, "CPU architecture: 8\n");
loff += snprintf(lbuf + loff, lbuf_size - loff, "CPU variant\t: 0x%x\n", MIDR_VARIANT(midr));
loff += snprintf(lbuf + loff, lbuf_size - loff, "CPU part\t: 0x%03x\n", MIDR_PARTNUM(midr));
loff += snprintf(lbuf + loff, lbuf_size - loff, "CPU revision\t: %d\n\n", MIDR_REVISION(midr));
loff += scnprintf(lbuf + loff, lbuf_size - loff, "\n");
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"CPU implementer\t: 0x%02x\n",
MIDR_IMPLEMENTOR(midr));
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"CPU architecture: 8\n");
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"CPU variant\t: 0x%x\n",
MIDR_VARIANT(midr));
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"CPU part\t: 0x%03x\n",
MIDR_PARTNUM(midr));
loff += scnprintf(lbuf + loff, lbuf_size - loff,
"CPU revision\t: %d\n\n",
MIDR_REVISION(midr));
/* check buffer depletion */
if ((i < num_processors - 1) && ((lbuf_size - loff) == 1)) {
@ -1162,7 +1236,6 @@ err:
static int check_and_allocate_fp_regs(struct thread *thread);
void save_fp_regs(struct thread *thread);
#ifdef POSTK_DEBUG_ARCH_DEP_23 /* add arch dep. clone_thread() function */
void arch_clone_thread(struct thread *othread, unsigned long pc,
unsigned long sp, struct thread *nthread)
{
@ -1172,10 +1245,6 @@ void arch_clone_thread(struct thread *othread, unsigned long pc,
asm("mrs %0, tpidr_el0" : "=r" (tls));
othread->tlsblock_base = nthread->tlsblock_base = tls;
if ((othread->fp_regs != NULL) && (check_and_allocate_fp_regs(nthread) == 0)) {
memcpy(nthread->fp_regs, othread->fp_regs, sizeof(fp_regs_struct));
}
/* if SVE enable, takeover lower 128 bit register */
if (likely(elf_hwcap & HWCAP_SVE)) {
fp_regs_struct fp_regs;
@ -1185,7 +1254,6 @@ void arch_clone_thread(struct thread *othread, unsigned long pc,
thread_fpsimd_to_sve(nthread, &fp_regs);
}
}
#endif /* POSTK_DEBUG_ARCH_DEP_23 */
/*@
@ requires \valid(handler);
@ -1205,7 +1273,7 @@ void ihk_mc_delay_us(int us)
arch_delay(us);
}
void arch_print_stack()
void arch_print_stack(void)
{
}
@ -1241,6 +1309,11 @@ void arch_show_interrupt_context(const void *reg)
kprintf(" syscallno : %016lx\n", regs->syscallno);
}
void arch_cpu_stop(void)
{
psci_cpu_off();
}
/*@
@ behavior fs_base:
@ assumes type == IHK_ASR_X86_FS;
@ -1283,7 +1356,6 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
/*
* @ref.impl linux-linaro/arch/arm64/kernel/process.c::tls_thread_switch()
*/
@ -1309,14 +1381,13 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *last;
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
struct mcs_rwlock_node_irqsave lock;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
/* Set up new TLS.. */
dkprintf("[%d] arch_switch_context: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
#ifdef ENABLE_PERF
/* Performance monitoring inherit */
if(next->proc->monitoring_event) {
if(next->proc->perf_status == PP_RESET)
@ -1326,10 +1397,10 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
perf_start(next->proc->monitoring_event);
}
}
#endif /*ENABLE_PERF*/
if (likely(prev)) {
tls_thread_switch(prev, next);
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
@ -1343,11 +1414,12 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock, &lock);
}
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
@ -1357,7 +1429,6 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
return last;
}
#endif /* POSTK_DEBUG_ARCH_DEP_22 */
/*@
@ requires \valid(thread);
@ -1423,6 +1494,10 @@ out:
void
save_fp_regs(struct thread *thread)
{
if (thread == &cpu_local_var(idle)) {
return;
}
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
if (check_and_allocate_fp_regs(thread) != 0) {
// alloc error.
@ -1439,8 +1514,7 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
void
clear_fp_regs(struct thread *thread)
void clear_fp_regs(void)
{
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
#ifdef CONFIG_ARM64_SVE
@ -1477,105 +1551,13 @@ restore_fp_regs(struct thread *thread)
if (likely(elf_hwcap & (HWCAP_FP | HWCAP_ASIMD))) {
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs(thread);
clear_fp_regs();
return;
}
thread_fpsimd_load(thread);
}
}
void
lapic_timer_enable(unsigned int clocks)
{
unsigned int val = 0;
/* gen control register value */
asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
val &= ~(ARCH_TIMER_CTRL_IT_STAT | ARCH_TIMER_CTRL_IT_MASK);
val |= ARCH_TIMER_CTRL_ENABLE;
if (is_use_virt_timer()) {
asm volatile("msr cntv_tval_el0, %0" : : "r" (clocks));
asm volatile("msr cntv_ctl_el0, %0" : : "r" (val));
} else {
asm volatile("msr cntp_tval_el0, %0" : : "r" (clocks));
asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
}
per_cpu_timer_val[ihk_mc_get_processor_id()] = clocks;
}
void
unhandled_page_fault(struct thread *thread, void *fault_addr, void *regs)
{
const uintptr_t address = (uintptr_t)fault_addr;
struct process_vm *vm = thread->vm;
struct vm_range *range;
unsigned long irqflags;
unsigned long error = 0;
irqflags = kprintf_lock();
__kprintf("Page fault for 0x%lx\n", address);
__kprintf("%s for %s access in %s mode (reserved bit %s set), "
"it %s an instruction fetch\n",
(error & PF_PROT ? "protection fault" : "no page found"),
(error & PF_WRITE ? "write" : "read"),
(error & PF_USER ? "user" : "kernel"),
(error & PF_RSVD ? "was" : "wasn't"),
(error & PF_INSTR ? "was" : "wasn't"));
range = lookup_process_memory_range(vm, address, address+1);
if (range) {
__kprintf("address is in range, flag: 0x%lx\n",
range->flag);
ihk_mc_pt_print_pte(vm->address_space->page_table, (void*)address);
} else {
__kprintf("address is out of range! \n");
}
kprintf_unlock(irqflags);
/* TODO */
ihk_mc_debug_show_interrupt_context(regs);
if (!interrupt_from_user(regs)) {
panic("panic: kernel mode PF");
}
//dkprintf("now dump a core file\n");
//coredump(proc, regs);
#ifdef DEBUG_PRINT_MEM
{
uint64_t *sp = (void *)REGS_GET_STACK_POINTER(regs);
kprintf("*rsp:%lx,*rsp+8:%lx,*rsp+16:%lx,*rsp+24:%lx,\n",
sp[0], sp[1], sp[2], sp[3]);
}
#endif
return;
}
void
lapic_timer_disable()
{
unsigned int zero = 0;
unsigned int val = 0;
/* gen control register value */
asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
val &= ~(ARCH_TIMER_CTRL_IT_STAT | ARCH_TIMER_CTRL_IT_MASK | ARCH_TIMER_CTRL_ENABLE);
if (is_use_virt_timer()) {
asm volatile("msr cntv_ctl_el0, %0" : : "r" (val));
asm volatile("msr cntv_tval_el0, %0" : : "r" (zero));
} else {
asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
asm volatile("msr cntp_tval_el0, %0" : : "r" (zero));
}
per_cpu_timer_val[ihk_mc_get_processor_id()] = 0;
}
void init_tick(void)
{
dkprintf("init_tick():\n");
@ -1604,25 +1586,174 @@ void arch_start_pvclock(void)
void
mod_nmi_ctx(void *nmi_ctx, void (*func)())
{
/* TODO: skeleton for rusage */
func();
}
extern void freeze(void);
void __freeze(void)
{
freeze();
}
#define SYSREG_READ_S(sys_reg) case (sys_reg): asm volatile("mrs_s %0, " __stringify(sys_reg) : "=r" (*val)); break
static inline int arch_cpu_mrs(uint32_t sys_reg, uint64_t *val)
{
int ret = 0;
switch (sys_reg) {
SYSREG_READ_S(IMP_FJ_TAG_ADDRESS_CTRL_EL1);
SYSREG_READ_S(IMP_SCCR_CTRL_EL1);
SYSREG_READ_S(IMP_SCCR_ASSIGN_EL1);
SYSREG_READ_S(IMP_SCCR_SET0_L2_EL1);
SYSREG_READ_S(IMP_SCCR_SET1_L2_EL1);
SYSREG_READ_S(IMP_SCCR_L1_EL0);
SYSREG_READ_S(IMP_PF_CTRL_EL1);
SYSREG_READ_S(IMP_PF_STREAM_DETECT_CTRL_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL0_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL1_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL2_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL3_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL4_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL5_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL6_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_CTRL7_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE0_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE1_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE2_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE3_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE4_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE5_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE6_EL0);
SYSREG_READ_S(IMP_PF_INJECTION_DISTANCE7_EL0);
SYSREG_READ_S(IMP_BARRIER_CTRL_EL1);
SYSREG_READ_S(IMP_BARRIER_BST_BIT_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB0_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB1_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB2_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB3_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB4_EL1);
SYSREG_READ_S(IMP_BARRIER_INIT_SYNC_BB5_EL1);
SYSREG_READ_S(IMP_BARRIER_ASSIGN_SYNC_W0_EL1);
SYSREG_READ_S(IMP_BARRIER_ASSIGN_SYNC_W1_EL1);
SYSREG_READ_S(IMP_BARRIER_ASSIGN_SYNC_W2_EL1);
SYSREG_READ_S(IMP_BARRIER_ASSIGN_SYNC_W3_EL1);
SYSREG_READ_S(IMP_SOC_STANDBY_CTRL_EL1);
SYSREG_READ_S(IMP_FJ_CORE_UARCH_CTRL_EL2);
SYSREG_READ_S(IMP_FJ_CORE_UARCH_RESTRECTION_EL1);
/* fall through */
default:
return -EINVAL;
}
return ret;
}
static inline int arch_cpu_read_register(struct ihk_os_cpu_register *desc)
{
int ret = 0;
uint64_t value;
if (desc->addr) {
panic("memory mapped register is not supported.\n");
} else if (desc->addr_ext) {
ret = arch_cpu_mrs(desc->addr_ext, &value);
if (ret == 0) {
desc->val = value;
}
} else {
ret = -EINVAL;
}
return ret;
}
#define SYSREG_WRITE_S(sys_reg) case (sys_reg): asm volatile("msr_s " __stringify(sys_reg) ", %0" :: "r" (val)); break
static inline int arch_cpu_msr(uint32_t sys_reg, uint64_t val)
{
int ret = 0;
switch (sys_reg) {
SYSREG_WRITE_S(IMP_FJ_TAG_ADDRESS_CTRL_EL1);
SYSREG_WRITE_S(IMP_SCCR_CTRL_EL1);
SYSREG_WRITE_S(IMP_SCCR_ASSIGN_EL1);
SYSREG_WRITE_S(IMP_SCCR_SET0_L2_EL1);
SYSREG_WRITE_S(IMP_SCCR_SET1_L2_EL1);
SYSREG_WRITE_S(IMP_SCCR_L1_EL0);
SYSREG_WRITE_S(IMP_PF_CTRL_EL1);
SYSREG_WRITE_S(IMP_PF_STREAM_DETECT_CTRL_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL0_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL1_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL2_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL3_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL4_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL5_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL6_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_CTRL7_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE0_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE1_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE2_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE3_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE4_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE5_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE6_EL0);
SYSREG_WRITE_S(IMP_PF_INJECTION_DISTANCE7_EL0);
SYSREG_WRITE_S(IMP_BARRIER_CTRL_EL1);
SYSREG_WRITE_S(IMP_BARRIER_BST_BIT_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB0_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB1_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB2_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB3_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB4_EL1);
SYSREG_WRITE_S(IMP_BARRIER_INIT_SYNC_BB5_EL1);
SYSREG_WRITE_S(IMP_BARRIER_ASSIGN_SYNC_W0_EL1);
SYSREG_WRITE_S(IMP_BARRIER_ASSIGN_SYNC_W1_EL1);
SYSREG_WRITE_S(IMP_BARRIER_ASSIGN_SYNC_W2_EL1);
SYSREG_WRITE_S(IMP_BARRIER_ASSIGN_SYNC_W3_EL1);
SYSREG_WRITE_S(IMP_FJ_CORE_UARCH_CTRL_EL2);
SYSREG_WRITE_S(IMP_FJ_CORE_UARCH_RESTRECTION_EL1);
/* fallthrough */
case IMP_SOC_STANDBY_CTRL_EL1:
asm volatile("msr_s " __stringify(IMP_SOC_STANDBY_CTRL_EL1) ", %0"
: : "r" (val) : "memory");
if (val & IMP_SOC_STANDBY_CTRL_EL1_MODE_CHANGE) {
wfe();
}
break;
default:
return -EINVAL;
}
return ret;
}
static inline int arch_cpu_write_register(struct ihk_os_cpu_register *desc)
{
int ret = 0;
if (desc->addr) {
panic("memory mapped register is not supported.\n");
} else if (desc->addr_ext) {
ret = arch_cpu_msr(desc->addr_ext, desc->val);
} else {
ret = -EINVAL;
}
return ret;
}
int arch_cpu_read_write_register(
struct ihk_os_cpu_register *desc,
enum mcctrl_os_cpu_operation op)
{
/* TODO: skeleton for patch:0676 */
int ret;
if (op == MCCTRL_OS_CPU_READ_REGISTER) {
// desc->val = rdmsr(desc->addr);
ret = arch_cpu_read_register(desc);
}
else if (op == MCCTRL_OS_CPU_WRITE_REGISTER) {
// wrmsr(desc->addr, desc->val);
ret = arch_cpu_write_register(desc);
}
else {
return -1;
ret = -1;
}
return 0;
return ret;
}
int smp_call_func(cpu_set_t *__cpu_set, smp_func_t __func, void *__arg)

View File

@ -1,4 +1,4 @@
/* cpufeature.c COPYRIGHT FUJITSU LIMITED 2017 */
/* cpufeature.c COPYRIGHT FUJITSU LIMITED 2017-2018 */
#include <cpufeature.h>
#include <ihk/debug.h>
@ -10,9 +10,7 @@
#include <ptrace.h>
#include <hwcap.h>
#ifdef POSTK_DEBUG_ARCH_DEP_65
unsigned long elf_hwcap;
#endif /* POSTK_DEBUG_ARCH_DEP_65 */
/* @ref.impl arch/arm64/kernel/cpufeature.c */
#define __ARM64_FTR_BITS(SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
@ -54,6 +52,19 @@ static const struct arm64_ftr_bits ftr_id_aa64isar0[] = {
ARM64_FTR_END,
};
/* @ref.impl linux4.16.0 arch/arm64/kernel/cpufeature.c */
static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
ID_AA64ISAR1_LRCPC_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
ID_AA64ISAR1_FCMA_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
ID_AA64ISAR1_JSCVT_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
ID_AA64ISAR1_DPB_SHIFT, 4, 0),
ARM64_FTR_END,
};
/* @ref.impl arch/arm64/kernel/cpufeature.c */
static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
@ -304,7 +315,7 @@ static const struct __ftr_reg_entry {
/* Op1 = 0, CRn = 0, CRm = 6 */
ARM64_FTR_REG(SYS_ID_AA64ISAR0_EL1, ftr_id_aa64isar0),
ARM64_FTR_REG(SYS_ID_AA64ISAR1_EL1, ftr_raz),
ARM64_FTR_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1),
/* Op1 = 0, CRn = 0, CRm = 7 */
ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
@ -997,9 +1008,7 @@ void setup_cpu_features(void)
setup_elf_hwcaps(arm64_elf_hwcaps);
}
#ifdef POSTK_DEBUG_ARCH_DEP_65
unsigned long arch_get_hwcap(void)
{
return elf_hwcap;
}
#endif /* POSTK_DEBUG_ARCH_DEP_65 */

View File

@ -1,10 +1,11 @@
/* entry.S COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* entry.S COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <linkage.h>
#include <assembler.h>
#include <asm-offsets.h>
#include <esr.h>
#include <thread_info.h>
#include <asm-syscall.h>
/*
* Bad Abort numbers
@ -77,16 +78,20 @@
.macro kernel_exit, el, need_enable_step = 0
.if \el == 0
mov x0, #0
mov x1, sp
mov x2, #0
bl check_signal // check whether the signal is delivered
bl check_sig_pending
bl check_need_resched // or reschedule is needed.
mov x0, #0
mov x1, sp
mov x2, #0
bl check_signal // check whether the signal is delivered
mov x0, #0
mov x1, sp
mov x2, #0
bl check_signal_irq_disabled // check whether the signal is delivered(for kernel_exit)
.endif
.if \el == 1
bl check_sig_pending
.endif
disable_irq x1 // disable interrupts
.if \need_enable_step == 1
ldr x1, [tsk, #TI_FLAGS]
@ -367,7 +372,12 @@ el0_sync:
b el0_inv
el0_svc:
uxtw scno, w8 // syscall number in w8
stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
cmp scno, #__NR_rt_sigreturn
b.eq 1f
str x0, [sp, #S_ORIG_X0] // save the original x0
ldr x16, [sp, #S_PC]
str x16, [sp, #S_ORIG_PC] // save the original pc
1: str scno, [sp, #S_SYSCALLNO] // save syscall number
enable_nmi
enable_dbg_and_irq x0
adrp x16, __arm64_syscall_handler
@ -550,9 +560,7 @@ ENTRY(ret_from_fork)
blr x19
1: get_thread_info tsk
bl release_runq_lock
bl utilthr_migrate
b ret_to_user
ENDPROC(ret_from_fork)
/* TODO: skeleton for rusage */
ENTRY(__freeze)
ENDPROC(__freeze)

View File

@ -1,4 +1,4 @@
/* fault.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* fault.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <ihk/context.h>
#include <ihk/debug.h>
@ -13,7 +13,6 @@
unsigned long __page_fault_handler_address;
extern int interrupt_from_user(void *);
void set_signal(int sig, void *regs, struct siginfo *info);
static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *regs);
static int do_page_fault(unsigned long addr, unsigned int esr, struct pt_regs *regs);
static int do_translation_fault(unsigned long addr, unsigned int esr, struct pt_regs *regs);
@ -105,12 +104,13 @@ void do_mem_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
{
const struct fault_info *inf = fault_info + (esr & 63);
struct siginfo info;
const int from_user = interrupt_from_user(regs);
/* set_cputime called in inf->fn() */
if (!inf->fn(addr, esr, regs))
return;
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
kprintf("Unhandled fault: %s (0x%08x) at 0x%016lx\n", inf->name, esr, addr);
info.si_signo = inf->sig;
info.si_errno = 0;
@ -118,7 +118,7 @@ void do_mem_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
info._sifields._sigfault.si_addr = (void*)addr;
arm64_notify_die("", regs, &info, esr);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
/*
@ -127,21 +127,24 @@ void do_mem_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
{
struct siginfo info;
const int from_user = interrupt_from_user(regs);
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
info.si_signo = SIGBUS;
info.si_errno = 0;
info.si_code = BUS_ADRALN;
info._sifields._sigfault.si_addr = (void*)addr;
arm64_notify_die("", regs, &info, esr);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *regs)
{
struct siginfo info;
set_cputime(interrupt_from_user(regs) ? 1: 2);
const int from_user = interrupt_from_user(regs);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
/*
* If we are in kernel mode at this point, we have no context to
* handle this fault with.
@ -163,7 +166,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
(addr < PAGE_SIZE) ? "NULL pointer dereference" : "paging request", addr);
panic("OOps.");
}
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
static int is_el0_instruction_abort(unsigned int esr)
@ -192,6 +195,7 @@ static int do_page_fault(unsigned long addr, unsigned int esr,
}
}
/* set_cputime() call in page_fault_handler() */
page_fault_handler = (void *)__page_fault_handler_address;
(*page_fault_handler)((void *)addr, reason, regs);
@ -252,10 +256,10 @@ int do_debug_exception(unsigned long addr, unsigned int esr, struct pt_regs *reg
{
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int from_user = interrupt_from_user(regs);
const int from_user = interrupt_from_user(regs);
int ret = -1;
set_cputime(from_user ? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
if (!inf->fn(addr, esr, regs)) {
ret = 1;
@ -274,7 +278,7 @@ int do_debug_exception(unsigned long addr, unsigned int esr, struct pt_regs *reg
ret = 0;
out:
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
return ret;
}
@ -283,7 +287,9 @@ out:
*/
static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
{
set_cputime(interrupt_from_user(regs) ? 1: 2);
set_cputime(0);
const int from_user = interrupt_from_user(regs);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
return 1;
}

View File

@ -1,4 +1,4 @@
/* fpsimd.c COPYRIGHT FUJITSU LIMITED 2016-2017 */
/* fpsimd.c COPYRIGHT FUJITSU LIMITED 2016-2018 */
#include <thread_info.h>
#include <fpsimd.h>
#include <cpuinfo.h>
@ -9,20 +9,16 @@
#include <prctl.h>
#include <cpufeature.h>
#include <kmalloc.h>
#include <debug.h>
#include <process.h>
//#define DEBUG_PRINT_FPSIMD
#ifdef DEBUG_PRINT_FPSIMD
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
#ifdef CONFIG_ARM64_SVE
/* Maximum supported vector length across all CPUs (initially poisoned) */
@ -73,9 +69,6 @@ static int get_nr_threads(struct process *proc)
return nr_threads;
}
extern void save_fp_regs(struct thread *thread);
extern void clear_fp_regs(struct thread *thread);
extern void restore_fp_regs(struct thread *thread);
/* @ref.impl arch/arm64/kernel/fpsimd.c::sve_set_vector_length */
int sve_set_vector_length(struct thread *thread,
unsigned long vl, unsigned long flags)
@ -129,7 +122,7 @@ int sve_set_vector_length(struct thread *thread,
/* for self at prctl syscall */
if (thread == cpu_local_var(current)) {
save_fp_regs(thread);
clear_fp_regs(thread);
clear_fp_regs();
thread_sve_to_fpsimd(thread, &fp_regs);
sve_free(thread);

View File

@ -1,4 +1,4 @@
/* head.S COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* head.S COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <linkage.h>
#include <ptrace.h>
@ -11,8 +11,6 @@
#include <arm-gic-v3.h>
#define KERNEL_RAM_VADDR MAP_KERNEL_START
#define EARLY_ALLOC_VADDR MAP_EARLY_ALLOC
#define BOOT_PARAM_VADDR MAP_BOOT_PARAM
//#ifndef CONFIG_SMP
//# define PTE_FLAGS PTE_TYPE_PAGE | PTE_AF
@ -50,16 +48,6 @@
add \ttb0, \ttb0, \virt_to_phys
.endm
#ifdef CONFIG_ARM64_64K_PAGES
# define BLOCK_SHIFT PAGE_SHIFT
# define BLOCK_SIZE PAGE_SIZE
# define TABLE_SHIFT PMD_SHIFT
#else
# define BLOCK_SHIFT SECTION_SHIFT
# define BLOCK_SIZE SECTION_SIZE
# define TABLE_SHIFT PUD_SHIFT
#endif
#define KERNEL_START KERNEL_RAM_VADDR
#define KERNEL_END _end
@ -87,6 +75,7 @@
#define TRAMPOLINE_DATA_CPU_MAP_SIZE_SIZE 0x08
#define TRAMPOLINE_DATA_CPU_MAP_SIZE (NR_CPUS * 8)
#define TRAMPOLINE_DATA_DATA_RDISTS_PA_SIZE (NR_CPUS * 8)
#define TRAMPOLINE_DATA_RETENTION_STATE_FLAG_PA_SIZE 0x08
#define TRAMPOLINE_DATA_NR_PMU_AFFI_SIZE 0x04
#define TRAMPOLINE_DATA_PMU_AFF_SIZE (CONFIG_SMP_MAX_CORES * 4)
@ -105,9 +94,9 @@
.globl ihk_param_gic_percpu_offset, ihk_param_gic_version
.globl ihk_param_lpj, ihk_param_hz, ihk_param_psci_method
.globl ihk_param_cpu_logical_map, ihk_param_gic_rdist_base_pa
.globl ihk_param_pmu_irq_affiniry, ihk_param_nr_pmu_irq_affiniry
.globl ihk_param_pmu_irq_affi, ihk_param_nr_pmu_irq_affi
.globl ihk_param_use_virt_timer, ihk_param_evtstrm_timer_rate
.globl ihk_param_default_vl
.globl ihk_param_retention_state_flag_pa, ihk_param_default_vl
ihk_param_head:
ihk_param_param_addr:
.quad 0
@ -145,9 +134,11 @@ ihk_param_cpu_logical_map:
.skip NR_CPUS * 8 /* array of the MPIDR and the core number */
ihk_param_gic_rdist_base_pa:
.skip NR_CPUS * 8 /* per-cpu re-distributer PA */
ihk_param_pmu_irq_affiniry:
ihk_param_retention_state_flag_pa:
.quad 0
ihk_param_pmu_irq_affi:
.skip CONFIG_SMP_MAX_CORES * 4 /* array of the pmu affinity list */
ihk_param_nr_pmu_irq_affiniry:
ihk_param_nr_pmu_irq_affi:
.word 0 /* number of pmu affinity list elements. */
/* @ref.impl arch/arm64/include/asm/kvm_arm.h */
@ -265,13 +256,17 @@ ENTRY(arch_start)
mov x16, #NR_CPUS /* calc next data */
lsl x16, x16, 3
add x0, x0, x16
/* nr_pmu_irq_affiniry */
/* retention_state_flag_pa */
ldr x16, [x0], #TRAMPOLINE_DATA_RETENTION_STATE_FLAG_PA_SIZE
adr x15, ihk_param_retention_state_flag_pa
str x16, [x15]
/* nr_pmu_irq_affi */
ldr w16, [x0], #TRAMPOLINE_DATA_NR_PMU_AFFI_SIZE
adr x15, ihk_param_nr_pmu_irq_affiniry
adr x15, ihk_param_nr_pmu_irq_affi
str w16, [x15]
/* pmu_irq_affiniry */
/* pmu_irq_affi */
mov x18, x0
adr x15, ihk_param_pmu_irq_affiniry
adr x15, ihk_param_pmu_irq_affi
b 2f
1: ldr w17, [x18], #4
str w17, [x15], #4
@ -410,14 +405,17 @@ __create_page_tables:
* Map the early_alloc_pages area, kernel_img next block
*/
ldr x3, =KERNEL_END
add x3, x3, x28 // __pa(KERNEL_END)
add x3, x3, x28 // __pa(KERNEL_END)
add x3, x3, #BLOCK_SIZE
sub x3, x3, #1
bic x3, x3, #(BLOCK_SIZE - 1) // start PA calc.
ldr x5, =EARLY_ALLOC_VADDR // get start VA
mov x6, #1
lsl x6, x6, #(PAGE_SHIFT + MAP_EARLY_ALLOC_SHIFT)
sub x3, x3, #1
bic x3, x3, #(BLOCK_SIZE - 1) // start PA calc.
ldr x5, =KERNEL_END // get start VA
add x5, x5, #BLOCK_SIZE
sub x5, x5, #1
bic x5, x5, #(BLOCK_SIZE - 1) // start VA calc.
mov x6, #MAP_EARLY_ALLOC_SIZE
add x6, x5, x6 // end VA calc
mov x23, x6 // save end VA
sub x6, x6, #1 // inclusive range
create_block_map x0, x7, x3, x5, x6
@ -425,11 +423,13 @@ __create_page_tables:
* Map the boot_param area
*/
adr x3, ihk_param_param_addr
ldr x3, [x3] // get boot_param PA
ldr x5, =BOOT_PARAM_VADDR // get boot_param VA
mov x6, #1
lsl x6, x6, #MAP_BOOT_PARAM_SHIFT
add x6, x5, x6 // end VA calc
ldr x3, [x3] // get boot_param PA
mov x5, x23 // get start VA
add x5, x5, #BLOCK_SIZE
sub x5, x5, #1
bic x5, x5, #(BLOCK_SIZE - 1) // start VA calc
mov x6, #MAP_BOOT_PARAM_SIZE
add x6, x5, x6 // end VA calc.
sub x6, x6, #1 // inclusive range
create_block_map x0, x7, x3, x5, x6

View File

@ -7,6 +7,7 @@
#include <hw_breakpoint.h>
#include <arch-memory.h>
#include <signal.h>
#include <process.h>
/* @ref.impl arch/arm64/kernel/hw_breakpoint.c::core_num_[brps|wrps] */
/* Number of BRP/WRP registers on this CPU. */

View File

@ -0,0 +1,131 @@
/* imp-sysreg.c COPYRIGHT FUJITSU LIMITED 2018 */
#include <sysreg.h>
/* hpc */
ACCESS_REG_FUNC(fj_tag_address_ctrl_el1, IMP_FJ_TAG_ADDRESS_CTRL_EL1);
ACCESS_REG_FUNC(pf_ctrl_el1, IMP_PF_CTRL_EL1);
ACCESS_REG_FUNC(pf_stream_detect_ctrl_el0, IMP_PF_STREAM_DETECT_CTRL_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl0_el0, IMP_PF_INJECTION_CTRL0_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl1_el0, IMP_PF_INJECTION_CTRL1_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl2_el0, IMP_PF_INJECTION_CTRL2_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl3_el0, IMP_PF_INJECTION_CTRL3_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl4_el0, IMP_PF_INJECTION_CTRL4_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl5_el0, IMP_PF_INJECTION_CTRL5_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl6_el0, IMP_PF_INJECTION_CTRL6_EL0);
ACCESS_REG_FUNC(pf_injection_ctrl7_el0, IMP_PF_INJECTION_CTRL7_EL0);
ACCESS_REG_FUNC(pf_injection_distance0_el0, IMP_PF_INJECTION_DISTANCE0_EL0);
ACCESS_REG_FUNC(pf_injection_distance1_el0, IMP_PF_INJECTION_DISTANCE1_EL0);
ACCESS_REG_FUNC(pf_injection_distance2_el0, IMP_PF_INJECTION_DISTANCE2_EL0);
ACCESS_REG_FUNC(pf_injection_distance3_el0, IMP_PF_INJECTION_DISTANCE3_EL0);
ACCESS_REG_FUNC(pf_injection_distance4_el0, IMP_PF_INJECTION_DISTANCE4_EL0);
ACCESS_REG_FUNC(pf_injection_distance5_el0, IMP_PF_INJECTION_DISTANCE5_EL0);
ACCESS_REG_FUNC(pf_injection_distance6_el0, IMP_PF_INJECTION_DISTANCE6_EL0);
ACCESS_REG_FUNC(pf_injection_distance7_el0, IMP_PF_INJECTION_DISTANCE7_EL0);
static void hpc_prefetch_regs_init(void)
{
uint64_t reg = 0;
/* PF_CTRL_EL1 */
reg = IMP_PF_CTRL_EL1_EL1AE_ENABLE | IMP_PF_CTRL_EL1_EL0AE_ENABLE;
xos_access_pf_ctrl_el1(WRITE_ACCESS, &reg);
/* PF_STREAM_DETECT_CTRL */
reg = 0;
xos_access_pf_stream_detect_ctrl_el0(WRITE_ACCESS, &reg);
/* PF_INJECTION_CTRL */
reg = 0;
xos_access_pf_injection_ctrl0_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl1_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl2_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl3_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl4_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl5_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl6_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_ctrl7_el0(WRITE_ACCESS, &reg);
/* PF_INJECTION_DISTANCE */
reg = 0;
xos_access_pf_injection_distance0_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance1_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance2_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance3_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance4_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance5_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance6_el0(WRITE_ACCESS, &reg);
xos_access_pf_injection_distance7_el0(WRITE_ACCESS, &reg);
}
static void hpc_tag_address_regs_init(void)
{
uint64_t reg = IMP_FJ_TAG_ADDRESS_CTRL_EL1_TBO0_MASK |
IMP_FJ_TAG_ADDRESS_CTRL_EL1_SEC0_MASK |
IMP_FJ_TAG_ADDRESS_CTRL_EL1_PFE0_MASK;
/* FJ_TAG_ADDRESS_CTRL */
xos_access_fj_tag_address_ctrl_el1(WRITE_ACCESS, &reg);
}
void hpc_registers_init(void)
{
hpc_prefetch_regs_init();
hpc_tag_address_regs_init();
}
/* vhbm */
ACCESS_REG_FUNC(barrier_ctrl_el1, IMP_BARRIER_CTRL_EL1);
ACCESS_REG_FUNC(barrier_bst_bit_el1, IMP_BARRIER_BST_BIT_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb0_el1, IMP_BARRIER_INIT_SYNC_BB0_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb1_el1, IMP_BARRIER_INIT_SYNC_BB1_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb2_el1, IMP_BARRIER_INIT_SYNC_BB2_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb3_el1, IMP_BARRIER_INIT_SYNC_BB3_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb4_el1, IMP_BARRIER_INIT_SYNC_BB4_EL1);
ACCESS_REG_FUNC(barrier_init_sync_bb5_el1, IMP_BARRIER_INIT_SYNC_BB5_EL1);
ACCESS_REG_FUNC(barrier_assign_sync_w0_el1, IMP_BARRIER_ASSIGN_SYNC_W0_EL1);
ACCESS_REG_FUNC(barrier_assign_sync_w1_el1, IMP_BARRIER_ASSIGN_SYNC_W1_EL1);
ACCESS_REG_FUNC(barrier_assign_sync_w2_el1, IMP_BARRIER_ASSIGN_SYNC_W2_EL1);
ACCESS_REG_FUNC(barrier_assign_sync_w3_el1, IMP_BARRIER_ASSIGN_SYNC_W3_EL1);
void vhbm_barrier_registers_init(void)
{
uint64_t reg = 0;
reg = IMP_BARRIER_CTRL_EL1_EL1AE_ENABLE |
IMP_BARRIER_CTRL_EL1_EL0AE_ENABLE;
xos_access_barrier_ctrl_el1(WRITE_ACCESS, &reg);
reg = 0;
xos_access_barrier_init_sync_bb0_el1(WRITE_ACCESS, &reg);
xos_access_barrier_init_sync_bb1_el1(WRITE_ACCESS, &reg);
xos_access_barrier_init_sync_bb2_el1(WRITE_ACCESS, &reg);
xos_access_barrier_init_sync_bb3_el1(WRITE_ACCESS, &reg);
xos_access_barrier_init_sync_bb4_el1(WRITE_ACCESS, &reg);
xos_access_barrier_init_sync_bb5_el1(WRITE_ACCESS, &reg);
xos_access_barrier_assign_sync_w0_el1(WRITE_ACCESS, &reg);
xos_access_barrier_assign_sync_w1_el1(WRITE_ACCESS, &reg);
xos_access_barrier_assign_sync_w2_el1(WRITE_ACCESS, &reg);
xos_access_barrier_assign_sync_w3_el1(WRITE_ACCESS, &reg);
}
/* sccr */
ACCESS_REG_FUNC(sccr_ctrl_el1, IMP_SCCR_CTRL_EL1);
ACCESS_REG_FUNC(sccr_assign_el1, IMP_SCCR_ASSIGN_EL1);
ACCESS_REG_FUNC(sccr_set0_l2_el1, IMP_SCCR_SET0_L2_EL1);
ACCESS_REG_FUNC(sccr_l1_el0, IMP_SCCR_L1_EL0);
void scdrv_registers_init(void)
{
uint64_t reg = 0;
reg = IMP_SCCR_CTRL_EL1_EL1AE_MASK;
xos_access_sccr_ctrl_el1(WRITE_ACCESS, &reg);
reg = 0;
xos_access_sccr_assign_el1(WRITE_ACCESS, &reg);
xos_access_sccr_l1_el0(WRITE_ACCESS, &reg);
reg = (14UL << IMP_SCCR_SET0_L2_EL1_L2_SEC0_SHIFT);
xos_access_sccr_set0_l2_el1(WRITE_ACCESS, &reg);
}

View File

@ -1,4 +1,4 @@
/* arch-futex.h COPYRIGHT FUJITSU LIMITED 2015 */
/* arch-futex.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_ARCH_FUTEX_H
#define __HEADER_ARM64_COMMON_ARCH_FUTEX_H
@ -32,12 +32,13 @@
* @ref.impl
* linux-linaro/arch/arm64/include/asm/futex.h:futex_atomic_op_inuser
*/
static inline int futex_atomic_op_inuser(int encoded_op, int __user *uaddr)
static inline int futex_atomic_op_inuser(int encoded_op,
int __user *uaddr)
{
int op = (encoded_op >> 28) & 7;
int cmp = (encoded_op >> 24) & 15;
int oparg = (encoded_op << 8) >> 20;
int cmparg = (encoded_op << 20) >> 20;
int oparg = (encoded_op & 0x00fff000) >> 12;
int cmparg = encoded_op & 0xfff;
int oldval = 0, ret, tmp;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))

View File

@ -1,4 +1,4 @@
/* arch-lock.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* arch-lock.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_ARCH_LOCK_H
#define __HEADER_ARM64_COMMON_ARCH_LOCK_H
@ -6,6 +6,9 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include "affinity.h"
#include <lwk/compiler.h>
#include "config.h"
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@ -19,14 +22,14 @@ int __kprintf(const char *format, ...);
/* @ref.impl arch/arm64/include/asm/spinlock_types.h::arch_spinlock_t */
typedef struct {
//#ifdef __AARCH64EB__
// uint16_t next;
// uint16_t owner;
//#else /* __AARCH64EB__ */
#ifdef __AARCH64EB__
uint16_t next;
uint16_t owner;
#else /* __AARCH64EB__ */
uint16_t owner;
uint16_t next;
//#endif /* __AARCH64EB__ */
} ihk_spinlock_t;
#endif /* __AARCH64EB__ */
} __attribute__((aligned(4))) ihk_spinlock_t;
extern void preempt_enable(void);
extern void preempt_disable(void);
@ -34,14 +37,100 @@ extern void preempt_disable(void);
/* @ref.impl arch/arm64/include/asm/spinlock_types.h::__ARCH_SPIN_LOCK_UNLOCKED */
#define SPIN_LOCK_UNLOCKED { 0, 0 }
/* @ref.impl arch/arm64/include/asm/barrier.h::__nops */
#define __nops(n) ".rept " #n "\nnop\n.endr\n"
/* @ref.impl ./arch/arm64/include/asm/lse.h::ARM64_LSE_ATOMIC_INSN */
/* else defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS) */
#define ARM64_LSE_ATOMIC_INSN(llsc, lse) llsc
/* initialized spinlock struct */
static void ihk_mc_spinlock_init(ihk_spinlock_t *lock)
{
*lock = (ihk_spinlock_t)SPIN_LOCK_UNLOCKED;
}
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_lock */
/* spinlock lock */
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock_noirq(l) { \
int rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock_noirq %p %s:%d\n", \
ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock_noirq(l); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock_noirq\n", \
ihk_mc_get_processor_id()); \
rc; \
}
#else
#define ihk_mc_spinlock_trylock_noirq __ihk_mc_spinlock_trylock_noirq
#endif
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_trylock */
/* spinlock trylock */
static int __ihk_mc_spinlock_trylock_noirq(ihk_spinlock_t *lock)
{
unsigned int tmp;
ihk_spinlock_t lockval;
int success;
preempt_disable();
asm volatile(ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
" prfm pstl1strm, %2\n"
"1: ldaxr %w0, %2\n"
" eor %w1, %w0, %w0, ror #16\n"
" cbnz %w1, 2f\n"
" add %w0, %w0, %3\n"
" stxr %w1, %w0, %2\n"
" cbnz %w1, 1b\n"
"2:",
/* LSE atomics */
" ldr %w0, %2\n"
" eor %w1, %w0, %w0, ror #16\n"
" cbnz %w1, 1f\n"
" add %w1, %w0, %3\n"
" casa %w0, %w1, %2\n"
" sub %w1, %w1, %3\n"
" eor %w1, %w1, %w0\n"
"1:")
: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
: "I" (1 << TICKET_SHIFT)
: "memory");
success = !tmp;
if (!success) {
preempt_enable();
}
return success;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock(l, result) ({ \
unsigned long rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock %p %s:%d\n", \
ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock(l, result); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock\n", \
ihk_mc_get_processor_id()); \
rc; \
})
#else
#define ihk_mc_spinlock_trylock __ihk_mc_spinlock_trylock
#endif
/* spinlock trylock & interrupt disable & PSTATE.DAIF save */
static unsigned long __ihk_mc_spinlock_trylock(ihk_spinlock_t *lock,
int *result)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
*result = __ihk_mc_spinlock_trylock_noirq(lock);
return flags;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_lock_noirq(l) { \
__kprintf("[%d] call ihk_mc_spinlock_lock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
@ -52,6 +141,8 @@ __kprintf("[%d] ret ihk_mc_spinlock_lock_noirq\n", ihk_mc_get_processor_id()); \
#define ihk_mc_spinlock_lock_noirq __ihk_mc_spinlock_lock_noirq
#endif
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_lock */
/* spinlock lock */
static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
{
unsigned int tmp;
@ -61,11 +152,19 @@ static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
asm volatile(
/* Atomically increment the next ticket. */
ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
" prfm pstl1strm, %3\n"
"1: ldaxr %w0, %3\n"
" add %w1, %w0, %w5\n"
" stxr %w2, %w1, %3\n"
" cbnz %w2, 1b\n"
" cbnz %w2, 1b\n",
/* LSE atomics */
" mov %w2, %w5\n"
" ldadda %w2, %w0, %3\n"
__nops(3)
)
/* Did we get the lock? */
" eor %w1, %w0, %w0, ror #16\n"
" cbz %w1, 3f\n"
@ -85,7 +184,6 @@ static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
: "memory");
}
/* spinlock lock & interrupt disable & PSTATE.DAIF save */
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_lock(l) ({ unsigned long rc;\
__kprintf("[%d] call ihk_mc_spinlock_lock %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
@ -95,6 +193,8 @@ __kprintf("[%d] ret ihk_mc_spinlock_lock\n", ihk_mc_get_processor_id()); rc;\
#else
#define ihk_mc_spinlock_lock __ihk_mc_spinlock_lock
#endif
/* spinlock lock & interrupt disable & PSTATE.DAIF save */
static unsigned long __ihk_mc_spinlock_lock(ihk_spinlock_t *lock)
{
unsigned long flags;
@ -106,8 +206,6 @@ static unsigned long __ihk_mc_spinlock_lock(ihk_spinlock_t *lock)
return flags;
}
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_unlock */
/* spinlock unlock */
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_unlock_noirq(l) { \
__kprintf("[%d] call ihk_mc_spinlock_unlock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
@ -117,12 +215,24 @@ __kprintf("[%d] ret ihk_mc_spinlock_unlock_noirq\n", ihk_mc_get_processor_id());
#else
#define ihk_mc_spinlock_unlock_noirq __ihk_mc_spinlock_unlock_noirq
#endif
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_unlock */
/* spinlock unlock */
static void __ihk_mc_spinlock_unlock_noirq(ihk_spinlock_t *lock)
{
asm volatile(
" stlrh %w1, %0\n"
: "=Q" (lock->owner)
: "r" (lock->owner + 1)
unsigned long tmp;
asm volatile(ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
" ldrh %w1, %0\n"
" add %w1, %w1, #1\n"
" stlrh %w1, %0",
/* LSE atomics */
" mov %w1, #1\n"
" staddlh %w1, %0\n"
__nops(1))
: "=Q" (lock->owner), "=&r" (tmp)
:
: "memory");
preempt_enable();
@ -150,7 +260,13 @@ typedef struct mcs_lock_node {
unsigned long locked;
struct mcs_lock_node *next;
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_lock_node_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_lock_node_t;
#else
} mcs_lock_node_t;
#endif
typedef mcs_lock_node_t mcs_lock_t;
static void mcs_lock_init(struct mcs_lock_node *node)
{
@ -238,14 +354,22 @@ typedef struct mcs_rwlock_node {
char dmy1; // unused
char dmy2; // unused
struct mcs_rwlock_node *next;
} __attribute__((aligned(64))) mcs_rwlock_node_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_node_t;
#else
} mcs_rwlock_node_t;
#endif
typedef struct mcs_rwlock_node_irqsave {
#ifndef SPINLOCK_IN_MCS_RWLOCK
struct mcs_rwlock_node node;
#endif
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_rwlock_node_irqsave_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_node_irqsave_t;
#else
} mcs_rwlock_node_irqsave_t;
#endif
typedef struct mcs_rwlock_lock {
#ifdef SPINLOCK_IN_MCS_RWLOCK
@ -254,7 +378,11 @@ typedef struct mcs_rwlock_lock {
struct mcs_rwlock_node reader; /* common reader lock */
struct mcs_rwlock_node *node; /* base */
#endif
} __attribute__((aligned(64))) mcs_rwlock_lock_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_lock_t;
#else
} mcs_rwlock_lock_t;
#endif
static void
mcs_rwlock_init(struct mcs_rwlock_lock *lock)
@ -602,4 +730,18 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
#if defined(CONFIG_HAS_NMI)
#include <arm-gic-v3.h>
static inline int irqflags_can_interrupt(unsigned long flags)
{
return (flags == ICC_PMR_EL1_UNMASKED);
}
#else /* CONFIG_HAS_NMI */
static inline int irqflags_can_interrupt(unsigned long flags)
{
return !(flags & 0x2);
}
#endif /* CONFIG_HAS_NMI */
#endif /* !__HEADER_ARM64_COMMON_ARCH_LOCK_H */

View File

@ -1,96 +1,110 @@
/* arch-memory.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* arch-memory.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_ARCH_MEMORY_H
#define __HEADER_ARM64_COMMON_ARCH_MEMORY_H
#include <const.h>
#include <errno.h>
#ifndef __ASSEMBLY__
#include <list.h>
#include <page.h>
void panic(const char *);
#endif /*__ASSEMBLY__*/
#define _SZ4KB (1UL<<12)
#define _SZ16KB (1UL<<14)
#define _SZ64KB (1UL<<16)
#ifdef CONFIG_ARM64_64K_PAGES
# define GRANULE_SIZE _SZ64KB
# define GRANULE_SIZE _SZ64KB
# define BLOCK_SHIFT PAGE_SHIFT
# define BLOCK_SIZE PAGE_SIZE
# define TABLE_SHIFT PMD_SHIFT
#else
# define GRANULE_SIZE _SZ4KB
# define GRANULE_SIZE _SZ4KB
# define BLOCK_SHIFT SECTION_SHIFT
# define BLOCK_SIZE SECTION_SIZE
# define TABLE_SHIFT PUD_SHIFT
#endif
#define VA_BITS CONFIG_ARM64_VA_BITS
/*
* Address define
*/
#define MAP_KERNEL_SHIFT 21
#define MAP_KERNEL_SIZE (UL(1) << MAP_KERNEL_SHIFT)
/* early alloc area address */
/* START:_end, SIZE:512 pages */
#define MAP_EARLY_ALLOC_SHIFT 9
#define MAP_EARLY_ALLOC_SIZE (UL(1) << (PAGE_SHIFT + MAP_EARLY_ALLOC_SHIFT))
#ifndef __ASSEMBLY__
# define ALIGN_UP(x, align) ALIGN_DOWN((x) + (align) - 1, align)
# define ALIGN_DOWN(x, align) ((x) & ~((align) - 1))
extern char _end[];
# define MAP_EARLY_ALLOC (ALIGN_UP((unsigned long)_end, BLOCK_SIZE))
# define MAP_EARLY_ALLOC_END (MAP_EARLY_ALLOC + MAP_EARLY_ALLOC_SIZE)
#endif /* !__ASSEMBLY__ */
/* bootparam area address */
/* START:early alloc area end, SIZE:2MiB */
#define MAP_BOOT_PARAM_SHIFT 21
#define MAP_BOOT_PARAM_SIZE (UL(1) << MAP_BOOT_PARAM_SHIFT)
#ifndef __ASSEMBLY__
# define MAP_BOOT_PARAM (ALIGN_UP(MAP_EARLY_ALLOC_END, BLOCK_SIZE))
# define MAP_BOOT_PARAM_END (MAP_BOOT_PARAM + MAP_BOOT_PARAM_SIZE)
#endif /* !__ASSEMBLY__ */
#if (VA_BITS == 39 && GRANULE_SIZE == _SZ4KB)
#
# define LD_TASK_UNMAPPED_BASE UL(0x0000000400000000)
# define TASK_UNMAPPED_BASE UL(0x0000000800000000)
# define USER_END UL(0x0000002000000000)
# define MAP_VMAP_START UL(0xffffffbdc0000000)
# define MAP_VMAP_SIZE UL(0x0000000100000000)
# define MAP_FIXED_START UL(0xffffffbffbdfd000)
# define MAP_ST_START UL(0xffffffc000000000)
# define MAP_KERNEL_START UL(0xffffffffff800000) // 0xffff_ffff_ff80_0000
# define MAP_ST_SIZE (MAP_KERNEL_START - MAP_ST_START) // 0x0000_003f_ff80_0000
# define MAP_EARLY_ALLOC (MAP_KERNEL_START + MAP_KERNEL_SIZE) // 0xffff_ffff_ffa0_0000
# define MAP_EARLY_ALLOC_END (MAP_EARLY_ALLOC + MAP_EARLY_ALLOC_SIZE)
# define MAP_BOOT_PARAM (MAP_EARLY_ALLOC_END) // 0xffff_ffff_ffc0_0000
# define MAP_BOOT_PARAM_END (MAP_BOOT_PARAM + MAP_BOOT_PARAM_SIZE) // 0xffff_ffff_ffe0_0000
# define MAP_KERNEL_START UL(0xffffffffff800000)
#
#elif (VA_BITS == 42 && GRANULE_SIZE == _SZ64KB)
#
# define LD_TASK_UNMAPPED_BASE UL(0x0000002000000000)
# define TASK_UNMAPPED_BASE UL(0x0000004000000000)
# define USER_END UL(0x0000010000000000)
# define MAP_VMAP_START UL(0xfffffdfee0000000)
# define MAP_VMAP_SIZE UL(0x0000000100000000)
# define MAP_FIXED_START UL(0xfffffdfffbdd0000)
# define MAP_ST_START UL(0xfffffe0000000000)
# define MAP_KERNEL_START UL(0xffffffffe0000000) // 0xffff_ffff_e000_0000
# define MAP_ST_SIZE (MAP_KERNEL_START - MAP_ST_START) // 0x0000_01ff_e000_0000
# define MAP_EARLY_ALLOC (MAP_KERNEL_START + MAP_KERNEL_SIZE) // 0xffff_ffff_e020_0000
# define MAP_EARLY_ALLOC_END (MAP_EARLY_ALLOC + MAP_EARLY_ALLOC_SIZE)
# define MAP_BOOT_PARAM (MAP_EARLY_ALLOC_END) // 0xffff_ffff_e220_0000
# define MAP_BOOT_PARAM_END (MAP_BOOT_PARAM + MAP_BOOT_PARAM_SIZE) // 0xffff_ffff_e240_0000
# define MAP_KERNEL_START UL(0xffffffffe0000000)
#
#elif (VA_BITS == 48 && GRANULE_SIZE == _SZ4KB)
#
# define LD_TASK_UNMAPPED_BASE UL(0x0000080000000000)
# define TASK_UNMAPPED_BASE UL(0x0000100000000000)
# define USER_END UL(0x0000400000000000)
# define MAP_VMAP_START UL(0xffff7bffc0000000)
# define MAP_VMAP_SIZE UL(0x0000000100000000)
# define MAP_FIXED_START UL(0xffff7ffffbdfd000)
# define MAP_ST_START UL(0xffff800000000000)
# define MAP_KERNEL_START UL(0xffffffffff800000) // 0xffff_ffff_ff80_0000
# define MAP_ST_SIZE (MAP_KERNEL_START - MAP_ST_START) // 0x0000_7fff_ff80_0000
# define MAP_EARLY_ALLOC (MAP_KERNEL_START + MAP_KERNEL_SIZE) // 0xffff_ffff_ffa0_0000
# define MAP_EARLY_ALLOC_END (MAP_EARLY_ALLOC + MAP_EARLY_ALLOC_SIZE)
# define MAP_BOOT_PARAM (MAP_EARLY_ALLOC_END) // 0xffff_ffff_ffc0_0000
# define MAP_BOOT_PARAM_END (MAP_BOOT_PARAM + MAP_BOOT_PARAM_SIZE) // 0xffff_ffff_ffe0_0000
#
# define MAP_KERNEL_START UL(0xffffffffff800000)
#
#elif (VA_BITS == 48 && GRANULE_SIZE == _SZ64KB)
#
# define LD_TASK_UNMAPPED_BASE UL(0x0000080000000000)
# define TASK_UNMAPPED_BASE UL(0x0000100000000000)
# define USER_END UL(0x0000400000000000)
# define MAP_VMAP_START UL(0xffff780000000000)
# define MAP_VMAP_SIZE UL(0x0000000100000000)
# define MAP_FIXED_START UL(0xffff7ffffbdd0000)
# define MAP_ST_START UL(0xffff800000000000)
# define MAP_KERNEL_START UL(0xffffffffe0000000) // 0xffff_ffff_e000_0000
# define MAP_ST_SIZE (MAP_KERNEL_START - MAP_ST_START) // 0x0000_7fff_e000_0000
# define MAP_EARLY_ALLOC (MAP_KERNEL_START + MAP_KERNEL_SIZE) // 0xffff_ffff_e020_0000
# define MAP_EARLY_ALLOC_END (MAP_EARLY_ALLOC + MAP_EARLY_ALLOC_SIZE)
# define MAP_BOOT_PARAM (MAP_EARLY_ALLOC_END) // 0xffff_ffff_e220_0000
# define MAP_BOOT_PARAM_END (MAP_BOOT_PARAM + MAP_BOOT_PARAM_SIZE) // 0xffff_ffff_e240_0000
# define MAP_KERNEL_START UL(0xffffffffe0000000)
#
#else
# error address space is not defined.
#endif
#define STACK_TOP(region) ((region)->user_end)
#define MAP_ST_SIZE (MAP_KERNEL_START - MAP_ST_START)
#define STACK_TOP(region) ((region)->user_end)
/*
* pagetable define
@ -104,7 +118,10 @@
# define PTL3_INDEX_MASK PTL4_INDEX_MASK
# define PTL2_INDEX_MASK PTL3_INDEX_MASK
# define PTL1_INDEX_MASK PTL2_INDEX_MASK
# define FIRST_LEVEL_BLOCK_SUPPORT 1
# define __PTL4_CONT_SHIFT (__PTL4_SHIFT + 0)
# define __PTL3_CONT_SHIFT (__PTL3_SHIFT + 4)
# define __PTL2_CONT_SHIFT (__PTL2_SHIFT + 4)
# define __PTL1_CONT_SHIFT (__PTL1_SHIFT + 4)
#elif GRANULE_SIZE == _SZ16KB
# define __PTL4_SHIFT 47
# define __PTL3_SHIFT 36
@ -114,9 +131,12 @@
# define PTL3_INDEX_MASK ((UL(1) << 11) - 1)
# define PTL2_INDEX_MASK PTL3_INDEX_MASK
# define PTL1_INDEX_MASK PTL2_INDEX_MASK
# define FIRST_LEVEL_BLOCK_SUPPORT 0
# define __PTL4_CONT_SHIFT (__PTL4_SHIFT + 0)
# define __PTL3_CONT_SHIFT (__PTL3_SHIFT + 0)
# define __PTL2_CONT_SHIFT (__PTL2_SHIFT + 5)
# define __PTL1_CONT_SHIFT (__PTL1_SHIFT + 7)
#elif GRANULE_SIZE == _SZ64KB
# define __PTL4_SHIFT 0
# define __PTL4_SHIFT 55
# define __PTL3_SHIFT 42
# define __PTL2_SHIFT 29
# define __PTL1_SHIFT 16
@ -124,19 +144,39 @@
# define PTL3_INDEX_MASK ((UL(1) << 6) - 1)
# define PTL2_INDEX_MASK ((UL(1) << 13) - 1)
# define PTL1_INDEX_MASK PTL2_INDEX_MASK
# define FIRST_LEVEL_BLOCK_SUPPORT 0
# define __PTL4_CONT_SHIFT (__PTL4_SHIFT + 0)
# define __PTL3_CONT_SHIFT (__PTL3_SHIFT + 0)
# define __PTL2_CONT_SHIFT (__PTL2_SHIFT + 5)
# define __PTL1_CONT_SHIFT (__PTL1_SHIFT + 5)
#else
# error granule size error.
#endif
#ifndef __ASSEMBLY__
extern int first_level_block_support;
#endif /* __ASSEMBLY__ */
# define __PTL4_SIZE (UL(1) << __PTL4_SHIFT)
# define __PTL3_SIZE (UL(1) << __PTL3_SHIFT)
# define __PTL2_SIZE (UL(1) << __PTL2_SHIFT)
# define __PTL1_SIZE (UL(1) << __PTL1_SHIFT)
# define __PTL4_MASK (~__PTL4_SIZE - 1)
# define __PTL3_MASK (~__PTL3_SIZE - 1)
# define __PTL2_MASK (~__PTL2_SIZE - 1)
# define __PTL1_MASK (~__PTL1_SIZE - 1)
# define __PTL4_MASK (~(__PTL4_SIZE - 1))
# define __PTL3_MASK (~(__PTL3_SIZE - 1))
# define __PTL2_MASK (~(__PTL2_SIZE - 1))
# define __PTL1_MASK (~(__PTL1_SIZE - 1))
# define __PTL4_CONT_SIZE (UL(1) << __PTL4_CONT_SHIFT)
# define __PTL3_CONT_SIZE (UL(1) << __PTL3_CONT_SHIFT)
# define __PTL2_CONT_SIZE (UL(1) << __PTL2_CONT_SHIFT)
# define __PTL1_CONT_SIZE (UL(1) << __PTL1_CONT_SHIFT)
# define __PTL4_CONT_MASK (~(__PTL4_CONT_SIZE - 1))
# define __PTL3_CONT_MASK (~(__PTL3_CONT_SIZE - 1))
# define __PTL2_CONT_MASK (~(__PTL2_CONT_SIZE - 1))
# define __PTL1_CONT_MASK (~(__PTL1_CONT_SIZE - 1))
# define __PTL4_CONT_COUNT (UL(1) << (__PTL4_CONT_SHIFT - __PTL4_SHIFT))
# define __PTL3_CONT_COUNT (UL(1) << (__PTL3_CONT_SHIFT - __PTL3_SHIFT))
# define __PTL2_CONT_COUNT (UL(1) << (__PTL2_CONT_SHIFT - __PTL2_SHIFT))
# define __PTL1_CONT_COUNT (UL(1) << (__PTL1_CONT_SHIFT - __PTL1_SHIFT))
/* calculate entries */
#if (CONFIG_ARM64_PGTABLE_LEVELS > 3) && (VA_BITS > __PTL4_SHIFT)
@ -183,6 +223,22 @@ static const unsigned int PTL4_ENTRIES = __PTL4_ENTRIES;
static const unsigned int PTL3_ENTRIES = __PTL3_ENTRIES;
static const unsigned int PTL2_ENTRIES = __PTL2_ENTRIES;
static const unsigned int PTL1_ENTRIES = __PTL1_ENTRIES;
static const unsigned int PTL4_CONT_SHIFT = __PTL4_CONT_SHIFT;
static const unsigned int PTL3_CONT_SHIFT = __PTL3_CONT_SHIFT;
static const unsigned int PTL2_CONT_SHIFT = __PTL2_CONT_SHIFT;
static const unsigned int PTL1_CONT_SHIFT = __PTL1_CONT_SHIFT;
static const unsigned long PTL4_CONT_SIZE = __PTL4_CONT_SIZE;
static const unsigned long PTL3_CONT_SIZE = __PTL3_CONT_SIZE;
static const unsigned long PTL2_CONT_SIZE = __PTL2_CONT_SIZE;
static const unsigned long PTL1_CONT_SIZE = __PTL1_CONT_SIZE;
static const unsigned long PTL4_CONT_MASK = __PTL4_CONT_MASK;
static const unsigned long PTL3_CONT_MASK = __PTL3_CONT_MASK;
static const unsigned long PTL2_CONT_MASK = __PTL2_CONT_MASK;
static const unsigned long PTL1_CONT_MASK = __PTL1_CONT_MASK;
static const unsigned int PTL4_CONT_COUNT = __PTL4_CONT_COUNT;
static const unsigned int PTL3_CONT_COUNT = __PTL3_CONT_COUNT;
static const unsigned int PTL2_CONT_COUNT = __PTL2_CONT_COUNT;
static const unsigned int PTL1_CONT_COUNT = __PTL1_CONT_COUNT;
#else
# define PTL4_SHIFT __PTL4_SHIFT
# define PTL3_SHIFT __PTL3_SHIFT
@ -200,8 +256,26 @@ static const unsigned int PTL1_ENTRIES = __PTL1_ENTRIES;
# define PTL3_ENTRIES __PTL3_ENTRIES
# define PTL2_ENTRIES __PTL2_ENTRIES
# define PTL1_ENTRIES __PTL1_ENTRIES
# define PTL4_CONT_SHIFT __PTL4_CONT_SHIFT
# define PTL3_CONT_SHIFT __PTL3_CONT_SHIFT
# define PTL2_CONT_SHIFT __PTL2_CONT_SHIFT
# define PTL1_CONT_SHIFT __PTL1_CONT_SHIFT
# define PTL4_CONT_SIZE __PTL4_CONT_SIZE
# define PTL3_CONT_SIZE __PTL3_CONT_SIZE
# define PTL2_CONT_SIZE __PTL2_CONT_SIZE
# define PTL1_CONT_SIZE __PTL1_CONT_SIZE
# define PTL4_CONT_MASK __PTL4_CONT_MASK
# define PTL3_CONT_MASK __PTL3_CONT_MASK
# define PTL2_CONT_MASK __PTL2_CONT_MASK
# define PTL1_CONT_MASK __PTL1_CONT_MASK
# define PTL4_CONT_COUNT __PTL4_CONT_COUNT
# define PTL3_CONT_COUNT __PTL3_CONT_COUNT
# define PTL2_CONT_COUNT __PTL2_CONT_COUNT
# define PTL1_CONT_COUNT __PTL1_CONT_COUNT
#endif/*__ASSEMBLY__*/
#define __page_size(pgshift) (UL(1) << (pgshift))
#define __page_mask(pgsize) (~((pgsize) - 1))
#define __page_offset(addr, size) ((unsigned long)(addr) & ((size) - 1))
#define __page_align(addr, size) ((unsigned long)(addr) & ~((size) - 1))
#define __page_align_up(addr, size) __page_align((unsigned long)(addr) + (size) - 1, size)
@ -210,8 +284,8 @@ static const unsigned int PTL1_ENTRIES = __PTL1_ENTRIES;
* nornal page
*/
#define PAGE_SHIFT __PTL1_SHIFT
#define PAGE_SIZE (UL(1) << __PTL1_SHIFT)
#define PAGE_MASK (~(PTL1_SIZE - 1))
#define PAGE_SIZE __page_size(PAGE_SHIFT)
#define PAGE_MASK __page_mask(PAGE_SIZE)
#define PAGE_P2ALIGN 0
#define page_offset(addr) __page_offset(addr, PAGE_SIZE)
#define page_align(addr) __page_align(addr, PAGE_SIZE)
@ -221,8 +295,8 @@ static const unsigned int PTL1_ENTRIES = __PTL1_ENTRIES;
* large page
*/
#define LARGE_PAGE_SHIFT __PTL2_SHIFT
#define LARGE_PAGE_SIZE (UL(1) << __PTL2_SHIFT)
#define LARGE_PAGE_MASK (~(PTL2_SIZE - 1))
#define LARGE_PAGE_SIZE __page_size(LARGE_PAGE_SHIFT)
#define LARGE_PAGE_MASK __page_mask(LARGE_PAGE_SIZE)
#define LARGE_PAGE_P2ALIGN (LARGE_PAGE_SHIFT - PAGE_SHIFT)
#define large_page_offset(addr) __page_offset(addr, LARGE_PAGE_SIZE)
#define large_page_align(addr) __page_align(addr, LARGE_PAGE_SIZE)
@ -263,6 +337,18 @@ static const unsigned int PTL1_ENTRIES = __PTL1_ENTRIES;
#define PTE_FILEOFF PTE_SPECIAL
#ifdef CONFIG_ARM64_64K_PAGES
# define USER_STACK_PREPAGE_SIZE PAGE_SIZE
# define USER_STACK_PAGE_MASK PAGE_MASK
# define USER_STACK_PAGE_P2ALIGN PAGE_P2ALIGN
# define USER_STACK_PAGE_SHIFT PAGE_SHIFT
#else
# define USER_STACK_PREPAGE_SIZE LARGE_PAGE_SIZE
# define USER_STACK_PAGE_MASK LARGE_PAGE_MASK
# define USER_STACK_PAGE_P2ALIGN LARGE_PAGE_P2ALIGN
# define USER_STACK_PAGE_SHIFT LARGE_PAGE_SHIFT
#endif
#define PT_ENTRIES (PAGE_SIZE >> 3)
#ifndef __ASSEMBLY__
@ -312,6 +398,8 @@ enum ihk_mc_pt_attribute {
PTATTR_FOR_USER = UL(1) << (PHYS_MASK_SHIFT - 1),
/* WriteCombine */
PTATTR_WRITE_COMBINED = PTE_ATTRINDX(2),
/* converted flag */
ARCH_PTATTR_FLIPPED = PTE_PROT_NONE,
};
extern enum ihk_mc_pt_attribute attr_mask;
@ -323,18 +411,23 @@ static inline int pfn_is_write_combined(uintptr_t pfn)
//共通部と意味がするビット定義
#define attr_flip_bits (PTATTR_WRITABLE | PTATTR_LARGEPAGE)
static inline int pgsize_to_tbllv(size_t pgsize);
static inline int pte_is_type_page(const pte_t *ptep, size_t pgsize)
{
int ret = 0; //default D_TABLE
if ((PTL4_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 3) ||
(PTL3_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 2) ||
(PTL2_SIZE == pgsize)) {
int level = pgsize_to_tbllv(pgsize);
switch (level) {
case 4:
case 3:
case 2:
// check D_BLOCK
ret = ((*ptep & PMD_TYPE_MASK) == PMD_TYPE_SECT);
}
else if (PTL1_SIZE == pgsize) {
break;
case 1:
// check D_PAGE
ret = ((*ptep & PTE_TYPE_MASK) == PTE_TYPE_PAGE);
break;
}
return ret;
}
@ -413,21 +506,18 @@ static inline enum ihk_mc_pt_attribute pte_get_attr(pte_t *ptep, size_t pgsize)
static inline void pte_make_null(pte_t *ptep, size_t pgsize)
{
if ((PTL4_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 3) ||
(PTL3_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 2) ||
(PTL2_SIZE == pgsize) ||
(PTL1_SIZE == pgsize)) {
*ptep = PTE_NULL;
}
*ptep = PTE_NULL;
}
static inline void pte_make_fileoff(off_t off,
enum ihk_mc_pt_attribute ptattr, size_t pgsize, pte_t *ptep)
{
if ((PTL4_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 3) ||
(PTL3_SIZE == pgsize && CONFIG_ARM64_PGTABLE_LEVELS > 2) ||
(PTL2_SIZE == pgsize) ||
(PTL1_SIZE == pgsize)) {
if (((PTL4_SIZE == pgsize || PTL4_CONT_SIZE == pgsize)
&& CONFIG_ARM64_PGTABLE_LEVELS > 3) ||
((PTL3_SIZE == pgsize || PTL3_CONT_SIZE == pgsize)
&& CONFIG_ARM64_PGTABLE_LEVELS > 2) ||
(PTL2_SIZE == pgsize || PTL2_CONT_SIZE == pgsize) ||
(PTL1_SIZE == pgsize || PTL1_CONT_SIZE == pgsize)) {
*ptep = PTE_FILEOFF | off | PTE_TYPE_PAGE;
}
}
@ -457,7 +547,260 @@ static inline void pte_set_dirty(pte_t *ptep, size_t pgsize)
}
}
static inline int pte_is_contiguous(const pte_t *ptep)
{
return !!(*ptep & PTE_CONT);
}
static inline int pgsize_is_contiguous(size_t pgsize)
{
int ret = 0;
if ((pgsize == PTL4_CONT_SIZE && CONFIG_ARM64_PGTABLE_LEVELS > 3) ||
(pgsize == PTL3_CONT_SIZE && CONFIG_ARM64_PGTABLE_LEVELS > 2) ||
(pgsize == PTL2_CONT_SIZE) ||
(pgsize == PTL1_CONT_SIZE)) {
ret = 1;
}
return ret;
}
static inline int pgsize_to_tbllv(size_t pgsize)
{
int level = -EINVAL;
if ((pgsize == PTL4_CONT_SIZE || pgsize == PTL4_SIZE)
&& (CONFIG_ARM64_PGTABLE_LEVELS > 3)) {
level = 4;
} else if ((pgsize == PTL3_CONT_SIZE || pgsize == PTL3_SIZE)
&& (CONFIG_ARM64_PGTABLE_LEVELS > 2)) {
level = 3;
} else if (pgsize == PTL2_CONT_SIZE || pgsize == PTL2_SIZE) {
level = 2;
} else if (pgsize == PTL1_CONT_SIZE || pgsize == PTL1_SIZE) {
level = 1;
}
return level;
}
static inline size_t tbllv_to_pgsize(int level)
{
size_t pgsize = 0;
switch (level) {
case 4:
if (CONFIG_ARM64_PGTABLE_LEVELS > 3) {
pgsize = PTL4_SIZE;
} else {
panic("page table level 4 is invalid.");
}
break;
case 3:
if (CONFIG_ARM64_PGTABLE_LEVELS > 2) {
pgsize = PTL3_SIZE;
} else {
panic("page table level 3 is invalid.");
}
break;
case 2:
pgsize = PTL2_SIZE;
break;
case 1:
pgsize = PTL1_SIZE;
break;
default:
panic("page table level is invalid.");
}
return pgsize;
}
static inline size_t tbllv_to_contpgsize(int level)
{
size_t pgsize = 0;
switch (level) {
case 4:
if (CONFIG_ARM64_PGTABLE_LEVELS > 3) {
pgsize = PTL4_CONT_SIZE;
} else {
panic("page table level 4 is invalid.");
}
break;
case 3:
if (CONFIG_ARM64_PGTABLE_LEVELS > 2) {
pgsize = PTL3_CONT_SIZE;
} else {
panic("page table level 3 is invalid.");
}
break;
case 2:
pgsize = PTL2_CONT_SIZE;
break;
case 1:
pgsize = PTL1_CONT_SIZE;
break;
default:
panic("page table level is invalid.");
}
return pgsize;
}
static inline int tbllv_to_contpgshift(int level)
{
int ret = 0;
switch (level) {
case 4:
if (CONFIG_ARM64_PGTABLE_LEVELS > 3) {
ret = PTL4_CONT_SHIFT;
} else {
panic("page table level 4 is invalid.");
}
break;
case 3:
if (CONFIG_ARM64_PGTABLE_LEVELS > 2) {
ret = PTL3_CONT_SHIFT;
} else {
panic("page table level 3 is invalid.");
}
break;
case 2:
ret = PTL2_CONT_SHIFT;
break;
case 1:
ret = PTL1_CONT_SHIFT;
break;
default:
panic("page table level is invalid.");
}
return ret;
}
static inline pte_t *get_contiguous_head(pte_t *__ptep, size_t __pgsize)
{
unsigned long align;
int shift = 0;
switch (pgsize_to_tbllv(__pgsize)) {
case 4:
if (CONFIG_ARM64_PGTABLE_LEVELS > 3) {
shift = PTL4_CONT_SHIFT - PTL4_SHIFT;
} else {
panic("page table level 4 is invalid.");
}
break;
case 3:
if (CONFIG_ARM64_PGTABLE_LEVELS > 2) {
shift = PTL3_CONT_SHIFT - PTL3_SHIFT;
} else {
panic("page table level 3 is invalid.");
}
break;
case 2:
shift = PTL2_CONT_SHIFT - PTL2_SHIFT;
break;
case 1:
shift = PTL1_CONT_SHIFT - PTL1_SHIFT;
break;
default:
panic("page table level is invalid.");
}
align = sizeof(*__ptep) << shift;
return (pte_t *)__page_align(__ptep, align);
}
static inline pte_t *get_contiguous_tail(pte_t *__ptep, size_t __pgsize)
{
unsigned long align;
int shift = 0;
switch (pgsize_to_tbllv(__pgsize)) {
case 4:
if (CONFIG_ARM64_PGTABLE_LEVELS > 3) {
shift = PTL4_CONT_SHIFT - PTL4_SHIFT;
} else {
panic("page table level 4 is invalid.");
}
break;
case 3:
if (CONFIG_ARM64_PGTABLE_LEVELS > 2) {
shift = PTL3_CONT_SHIFT - PTL3_SHIFT;
} else {
panic("page table level 3 is invalid.");
}
break;
case 2:
shift = PTL2_CONT_SHIFT - PTL2_SHIFT;
break;
case 1:
shift = PTL1_CONT_SHIFT - PTL1_SHIFT;
break;
default:
panic("page table level is invalid.");
}
align = sizeof(*__ptep) << shift;
return (pte_t *)__page_align_up(__ptep + 1, align) - 1;
}
static inline int split_contiguous_pages(pte_t *ptep, size_t pgsize)
{
int ret;
pte_t *head = get_contiguous_head(ptep, pgsize);
pte_t *tail = get_contiguous_tail(ptep, pgsize);
pte_t *ptr;
uintptr_t phys;
struct page *page;
phys = pte_get_phys(head);
page = phys_to_page(phys);
if (page && (page_is_in_memobj(page)
|| page_is_multi_mapped(page))) {
ret = -EINVAL;
goto out;
}
for (ptr = head; ptr <= tail; ptr++) {
*ptr &= ~PTE_CONT;
}
ret = 0;
out:
return ret;
}
static inline int page_is_contiguous_head(pte_t *ptep, size_t pgsize)
{
pte_t *ptr = get_contiguous_head(ptep, pgsize);
return (ptr == ptep);
}
static inline int page_is_contiguous_tail(pte_t *ptep, size_t pgsize)
{
pte_t *ptr = get_contiguous_tail(ptep, pgsize);
return (ptr == ptep);
}
/* Return true if PTE doesn't belong to a contiguous PTE group or PTE
* is the head of a contiguous PTE group
*/
static inline int pte_is_head(pte_t *ptep, pte_t *old, size_t cont_size)
{
if (!pte_is_contiguous(old))
return 1;
return page_is_contiguous_head(ptep, cont_size);
}
struct page_table;
void arch_adjust_allocate_page_size(struct page_table *pt,
uintptr_t fault_addr,
pte_t *ptep,
void **pgaddrp,
size_t *pgsizep);
void set_pte(pte_t *ppte, unsigned long phys, enum ihk_mc_pt_attribute attr);
pte_t *get_pte(struct page_table *pt, void *virt, enum ihk_mc_pt_attribute attr);

View File

@ -1,9 +1,16 @@
/* arch-perfctr.h COPYRIGHT FUJITSU LIMITED 2016-2017 */
/* arch-perfctr.h COPYRIGHT FUJITSU LIMITED 2016-2018 */
#ifndef __ARCH_PERFCTR_H__
#define __ARCH_PERFCTR_H__
#include <ihk/types.h>
#include <ihk/cpu.h>
#include <bitops.h>
struct per_cpu_arm_pmu {
int num_events;
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
};
/* @ref.impl arch/arm64/include/asm/pmu.h */
struct arm_pmu {
@ -19,9 +26,12 @@ struct arm_pmu {
int (*disable_intens)(int);
int (*set_event_filter)(unsigned long*, int);
void (*write_evtype)(int, uint32_t);
int (*get_event_idx)(int, unsigned long);
int (*get_event_idx)(int num_events, unsigned long used_mask,
unsigned long config);
int (*map_event)(uint32_t, uint64_t);
int num_events;
void (*enable_user_access_pmu_regs)(void);
void (*disable_user_access_pmu_regs)(void);
struct per_cpu_arm_pmu *per_cpu;
};
static inline const struct arm_pmu* get_cpu_pmu(void)
@ -29,44 +39,21 @@ static inline const struct arm_pmu* get_cpu_pmu(void)
extern struct arm_pmu cpu_pmu;
return &cpu_pmu;
}
static inline const struct per_cpu_arm_pmu *get_per_cpu_pmu(void)
{
const struct arm_pmu *cpu_pmu = get_cpu_pmu();
return &cpu_pmu->per_cpu[ihk_mc_get_processor_id()];
}
int arm64_init_perfctr(void);
void arm64_init_per_cpu_perfctr(void);
int arm64_enable_pmu(void);
void arm64_disable_pmu(void);
int armv8pmu_init(struct arm_pmu* cpu_pmu);
/* TODO[PMU]: 共通部に定義があっても良い。今後の動向を見てここの定義を削除する */
/*
* Generalized hardware cache events:
*
* { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
* { read, write, prefetch } x
* { accesses, misses }
*/
enum perf_hw_cache_id {
PERF_COUNT_HW_CACHE_L1D = 0,
PERF_COUNT_HW_CACHE_L1I = 1,
PERF_COUNT_HW_CACHE_LL = 2,
PERF_COUNT_HW_CACHE_DTLB = 3,
PERF_COUNT_HW_CACHE_ITLB = 4,
PERF_COUNT_HW_CACHE_BPU = 5,
PERF_COUNT_HW_CACHE_NODE = 6,
PERF_COUNT_HW_CACHE_MAX, /* non-ABI */
};
enum perf_hw_cache_op_id {
PERF_COUNT_HW_CACHE_OP_READ = 0,
PERF_COUNT_HW_CACHE_OP_WRITE = 1,
PERF_COUNT_HW_CACHE_OP_PREFETCH = 2,
PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */
};
enum perf_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0,
PERF_COUNT_HW_CACHE_RESULT_MISS = 1,
PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
};
void armv8pmu_per_cpu_init(struct per_cpu_arm_pmu *per_cpu);
void arm64_enable_user_access_pmu_regs(void);
void arm64_disable_user_access_pmu_regs(void);
#endif

View File

@ -1,7 +1,9 @@
/* arch-timer.h COPYRIGHT FUJITSU LIMITED 2016 */
/* arch-timer.h COPYRIGHT FUJITSU LIMITED 2016-2018 */
#ifndef __HEADER_ARM64_COMMON_ARCH_TIMER_H
#define __HEADER_ARM64_COMMON_ARCH_TIMER_H
#include <ihk/cpu.h>
/* @ref.impl include/clocksource/arm_arch_timer.h */
#define ARCH_TIMER_USR_PCT_ACCESS_EN (1 << 0) /* physical counter */
#define ARCH_TIMER_USR_VCT_ACCESS_EN (1 << 1) /* virtual counter */
@ -11,4 +13,19 @@
#define ARCH_TIMER_USR_VT_ACCESS_EN (1 << 8) /* virtual timer registers */
#define ARCH_TIMER_USR_PT_ACCESS_EN (1 << 9) /* physical timer registers */
/* @ref.impl linux4.10.16 */
/* include/clocksource/arm_arch_timer.h */
#define ARCH_TIMER_CTRL_ENABLE (1 << 0)
#define ARCH_TIMER_CTRL_IT_MASK (1 << 1)
#define ARCH_TIMER_CTRL_IT_STAT (1 << 2)
enum arch_timer_reg {
ARCH_TIMER_REG_CTRL,
ARCH_TIMER_REG_TVAL,
};
extern int get_timer_intrid(void);
extern void arch_timer_init(void);
extern struct ihk_mc_interrupt_handler *get_timer_handler(void);
#endif /* __HEADER_ARM64_COMMON_ARCH_TIMER_H */

View File

@ -1,4 +1,4 @@
/* cpu.h COPYRIGHT FUJITSU LIMITED 2016-2017 */
/* cpu.h COPYRIGHT FUJITSU LIMITED 2016-2018 */
#ifndef __HEADER_ARM64_ARCH_CPU_H
#define __HEADER_ARM64_ARCH_CPU_H
@ -12,6 +12,8 @@
#define dmb(opt) asm volatile("dmb " #opt : : : "memory")
#define dsb(opt) asm volatile("dsb " #opt : : : "memory")
#include <registers.h>
#define mb() dsb(sy)
#define rmb() dsb(ld)
#define wmb() dsb(st)
@ -69,12 +71,10 @@ do { \
#define smp_mb__before_atomic() smp_mb()
#define smp_mb__after_atomic() smp_mb()
/* @ref.impl linux-linaro/arch/arm64/include/asm/arch_timer.h::arch_counter_get_cntvct */
#define read_tsc() \
({ \
unsigned long cval; \
isb(); \
asm volatile("mrs %0, cntvct_el0" : "=r" (cval)); \
cval = rdtsc(); \
cval; \
})

View File

@ -21,12 +21,11 @@
/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
#define MAP_HUGE_SHIFT 26
#if FIRST_LEVEL_BLOCK_SUPPORT
# define MAP_HUGE_FIRST_BLOCK (__PTL3_SHIFT << MAP_HUGE_SHIFT)
#else
# define MAP_HUGE_FIRST_BLOCK -1 /* not supported */
#endif
#define MAP_HUGE_SECOND_BLOCK (__PTL2_SHIFT << MAP_HUGE_SHIFT)
#define MAP_HUGE_FIRST_BLOCK (__PTL3_SHIFT << MAP_HUGE_SHIFT)
#define MAP_HUGE_FIRST_CONT_BLOCK (__PTL3_CONT_SHIFT << MAP_HUGE_SHIFT)
#define MAP_HUGE_SECOND_BLOCK (__PTL2_SHIFT << MAP_HUGE_SHIFT)
#define MAP_HUGE_SECOND_CONT_BLOCK (__PTL2_CONT_SHIFT << MAP_HUGE_SHIFT)
#define MAP_HUGE_THIRD_CONT_BLOCK (__PTL1_CONT_SHIFT << MAP_HUGE_SHIFT)
/*
* for mlockall()

View File

@ -6,12 +6,11 @@
/* shmflg */
#define SHM_HUGE_SHIFT 26
#if FIRST_LEVEL_BLOCK_SUPPORT
# define SHM_HUGE_FIRST_BLOCK (__PTL3_SHIFT << SHM_HUGE_SHIFT)
#else
# define SHM_HUGE_FIRST_BLOCK -1 /* not supported */
#endif
#define SHM_HUGE_SECOND_BLOCK (__PTL2_SHIFT << SHM_HUGE_SHIFT)
#define SHM_HUGE_FIRST_BLOCK (__PTL3_SHIFT << SHM_HUGE_SHIFT)
#define SHM_HUGE_FIRST_CONT_BLOCK (__PTL3_CONT_SHIFT << SHM_HUGE_SHIFT)
#define SHM_HUGE_SECOND_BLOCK (__PTL2_SHIFT << SHM_HUGE_SHIFT)
#define SHM_HUGE_SECOND_CONT_BLOCK (__PTL2_CONT_SHIFT << SHM_HUGE_SHIFT)
#define SHM_HUGE_THIRD_CONT_BLOCK (__PTL1_CONT_SHIFT << SHM_HUGE_SHIFT)
struct ipc_perm {
key_t key;

View File

@ -3,30 +3,29 @@
#include <arch-memory.h>
//#define DEBUG_RUSAGE
#define DEBUG_RUSAGE
extern struct rusage_global *rusage;
#define IHK_OS_PGSIZE_4KB 0
#define IHK_OS_PGSIZE_2MB 1
#define IHK_OS_PGSIZE_1GB 2
#define IHK_OS_PGSIZE_4KB 0
#define IHK_OS_PGSIZE_16KB 1
#define IHK_OS_PGSIZE_64KB 2
extern struct rusage_global rusage;
static inline int rusage_pgsize_to_pgtype(size_t pgsize)
{
int ret = IHK_OS_PGSIZE_4KB;
switch (pgsize) {
case __PTL1_SIZE:
if (pgsize == PTL1_SIZE) {
ret = IHK_OS_PGSIZE_4KB;
break;
case __PTL2_SIZE:
ret = IHK_OS_PGSIZE_16KB;
break;
case __PTL3_SIZE:
ret = IHK_OS_PGSIZE_64KB;
break;
default:
}
else if (pgsize == PTL2_SIZE) {
ret = IHK_OS_PGSIZE_2MB;
}
else if (pgsize == PTL3_SIZE) {
ret = IHK_OS_PGSIZE_1GB;
}
else {
kprintf("%s: Error: Unknown pgsize=%ld\n", __FUNCTION__, pgsize);
break;
}
return ret;
}

View File

@ -60,9 +60,9 @@
#ifdef CONFIG_HAS_NMI
#define GICD_INT_NMI_PRI 0x40
#define GICD_INT_DEF_PRI 0xc0
#define GICD_INT_DEF_PRI 0xc0U
#else
#define GICD_INT_DEF_PRI 0xa0
#define GICD_INT_DEF_PRI 0xa0U
#endif
#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
(GICD_INT_DEF_PRI << 16) |\

View File

@ -19,6 +19,7 @@
#ifndef __LINUX_IRQCHIP_ARM_GIC_V3_H
#define __LINUX_IRQCHIP_ARM_GIC_V3_H
#include <stringify.h>
/* @ref.impl include/linux/irqchip/arm-gic-v3.h */
#include <sysreg.h>
@ -381,11 +382,4 @@
#define ICH_AP1R2_EL2 __AP1Rx_EL2(2)
#define ICH_AP1R3_EL2 __AP1Rx_EL2(3)
/**
* @ref.impl host-kernel/include/linux/stringify.h
*/
#define __stringify_1(x...) #x
#define __stringify(x...) __stringify_1(x)
#endif /* __LINUX_IRQCHIP_ARM_GIC_V3_H */

View File

@ -15,8 +15,9 @@
#define S_PC 0x100 /* offsetof(struct pt_regs, pc) */
#define S_PSTATE 0x108 /* offsetof(struct pt_regs, pstate) */
#define S_ORIG_X0 0x110 /* offsetof(struct pt_regs, orig_x0) */
#define S_SYSCALLNO 0x118 /* offsetof(struct pt_regs, syscallno) */
#define S_FRAME_SIZE 0x120 /* sizeof(struct pt_regs) */
#define S_ORIG_PC 0x118 /* offsetof(struct pt_regs, orig_pc) */
#define S_SYSCALLNO 0x120 /* offsetof(struct pt_regs, syscallno) */
#define S_FRAME_SIZE 0x130 /* sizeof(struct pt_regs) must be 16 byte align */
#define CPU_INFO_SETUP 0x10 /* offsetof(struct cpu_info, cpu_setup) */
#define CPU_INFO_SZ 0x18 /* sizeof(struct cpu_info) */

View File

@ -0,0 +1,19 @@
/* asm-syscall.h COPYRIGHT FUJITSU LIMITED 2018 */
#ifndef __HEADER_ARM64_ASM_SYSCALL_H
#define __HEADER_ARM64_ASM_SYSCALL_H
#ifdef __ASSEMBLY__
#define DECLARATOR(number, name) .equ __NR_##name, number
#define SYSCALL_HANDLED(number, name) DECLARATOR(number, name)
#define SYSCALL_DELEGATED(number, name) DECLARATOR(number, name)
#include <syscall_list.h>
#undef DECLARATOR
#undef SYSCALL_HANDLED
#undef SYSCALL_DELEGATED
#endif /* __ASSEMBLY__ */
#endif /* !__HEADER_ARM64_ASM_SYSCALL_H */

View File

@ -25,17 +25,78 @@
#define MIDR_PARTNUM(midr) \
(((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
#define MIDR_ARCHITECTURE_SHIFT 16
#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
#define MIDR_ARCHITECTURE(midr) \
(((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
#define MIDR_VARIANT_SHIFT 20
#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
#define MIDR_VARIANT(midr) \
(((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
#define MIDR_IMPLEMENTOR_SHIFT 24
#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
#define MIDR_IMPLEMENTOR_MASK (0xffU << MIDR_IMPLEMENTOR_SHIFT)
#define MIDR_IMPLEMENTOR(midr) \
(((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
#define ARM_CPU_IMP_CAVIUM 0x43
#define MIDR_CPU_MODEL(imp, partnum) \
(((imp) << MIDR_IMPLEMENTOR_SHIFT) | \
(0xf << MIDR_ARCHITECTURE_SHIFT) | \
((partnum) << MIDR_PARTNUM_SHIFT))
#define MIDR_CPU_VAR_REV(var, rev) \
(((var) << MIDR_VARIANT_SHIFT) | (rev))
#define MIDR_CPU_MODEL_MASK (MIDR_IMPLEMENTOR_MASK | MIDR_PARTNUM_MASK | \
MIDR_ARCHITECTURE_MASK)
#define MIDR_IS_CPU_MODEL_RANGE(midr, model, rv_min, rv_max) \
({ \
u32 _model = (midr) & MIDR_CPU_MODEL_MASK; \
u32 rv = (midr) & (MIDR_REVISION_MASK | MIDR_VARIANT_MASK); \
\
_model == (model) && rv >= (rv_min) && rv <= (rv_max); \
})
#define ARM_CPU_IMP_ARM 0x41
#define ARM_CPU_IMP_APM 0x50
#define ARM_CPU_IMP_CAVIUM 0x43
#define ARM_CPU_IMP_BRCM 0x42
#define ARM_CPU_IMP_QCOM 0x51
#define ARM_CPU_PART_AEM_V8 0xD0F
#define ARM_CPU_PART_FOUNDATION 0xD00
#define ARM_CPU_PART_CORTEX_A57 0xD07
#define ARM_CPU_PART_CORTEX_A72 0xD08
#define ARM_CPU_PART_CORTEX_A53 0xD03
#define ARM_CPU_PART_CORTEX_A73 0xD09
#define ARM_CPU_PART_CORTEX_A75 0xD0A
#define APM_CPU_PART_POTENZA 0x000
#define CAVIUM_CPU_PART_THUNDERX 0x0A1
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
#define CAVIUM_CPU_PART_THUNDERX2 0x0AF
#define BRCM_CPU_PART_VULCAN 0x516
#define QCOM_CPU_PART_FALKOR_V1 0x800
#define QCOM_CPU_PART_FALKOR 0xC00
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
#define MIDR_CORTEX_A75 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A75)
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
#define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2)
#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN)
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
#ifndef __ASSEMBLY__

View File

@ -1,92 +0,0 @@
/* elfcore.h COPYRIGHT FUJITSU LIMITED 2015 */
#ifndef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
#ifndef __HEADER_ARM64_COMMON_ELFCORE_H
#define __HEADER_ARM64_COMMON_ELFCORE_H
typedef uint16_t Elf64_Half;
typedef uint32_t Elf64_Word;
typedef uint64_t Elf64_Xword;
typedef uint64_t Elf64_Addr;
typedef uint64_t Elf64_Off;
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
#define EI_MAG0 0
#define EI_MAG1 1
#define EI_MAG2 2
#define EI_MAG3 3
#define EI_CLASS 4
#define EI_DATA 5
#define EI_VERSION 6
#define EI_OSABI 7
#define EI_ABIVERSION 8
#define EI_PAD 9
#define ELFMAG0 0x7f
#define ELFMAG1 'E'
#define ELFMAG2 'L'
#define ELFMAG3 'F'
#define ELFCLASS64 2 /* 64-bit object */
#define ELFDATA2LSB 1 /* LSB */
#define El_VERSION 1 /* defined to be the same as EV CURRENT */
#define ELFOSABI_NONE 0 /* unspecied */
#define El_ABIVERSION_NONE 0 /* unspecied */
#define ET_CORE 4 /* Core file */
#define EM_X86_64 62 /* AMD x86-64 architecture */
#define EM_K10M 181 /* Intel K10M */
#define EV_CURRENT 1 /* Current version */
typedef struct {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;
#define PT_LOAD 1
#define PT_NOTE 4
#define PF_X 1 /* executable bit */
#define PF_W 2 /* writable bit */
#define PF_R 4 /* readable bit */
struct note {
Elf64_Word namesz;
Elf64_Word descsz;
Elf64_Word type;
/* name char[namesz] and desc[descsz] */
};
#define NT_PRSTATUS 1
#define NT_PRFRPREG 2
#define NT_PRPSINFO 3
#define NT_AUXV 6
#define NT_X86_STATE 0x202
#include "elfcoregpl.h"
#endif /* !__HEADER_ARM64_COMMON_ELFCORE_H */
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,98 +0,0 @@
/* elfcoregpl.h COPYRIGHT FUJITSU LIMITED 2015 */
#ifndef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
#ifndef __HEADER_ARM64_COMMON_ELFCOREGPL_H
#define __HEADER_ARM64_COMMON_ELFCOREGPL_H
#define pid_t int
/* From /usr/include/linux/elfcore.h of Linux */
#define ELF_PRARGSZ (80)
/* From /usr/include/linux/elfcore.h fro Linux */
struct elf_siginfo
{
int si_signo;
int si_code;
int si_errno;
};
/* From bfd/hosts/x86-64linux.h of gdb. */
typedef uint64_t __attribute__ ((__aligned__ (8))) a8_uint64_t;
typedef a8_uint64_t elf_greg64_t;
struct user_regs64_struct
{
a8_uint64_t r15;
a8_uint64_t r14;
a8_uint64_t r13;
a8_uint64_t r12;
a8_uint64_t rbp;
a8_uint64_t rbx;
a8_uint64_t r11;
a8_uint64_t r10;
a8_uint64_t r9;
a8_uint64_t r8;
a8_uint64_t rax;
a8_uint64_t rcx;
a8_uint64_t rdx;
a8_uint64_t rsi;
a8_uint64_t rdi;
a8_uint64_t orig_rax;
a8_uint64_t rip;
a8_uint64_t cs;
a8_uint64_t eflags;
a8_uint64_t rsp;
a8_uint64_t ss;
a8_uint64_t fs_base;
a8_uint64_t gs_base;
a8_uint64_t ds;
a8_uint64_t es;
a8_uint64_t fs;
a8_uint64_t gs;
};
#define ELF_NGREG64 (sizeof (struct user_regs64_struct) / sizeof(elf_greg64_t))
typedef elf_greg64_t elf_gregset64_t[ELF_NGREG64];
struct prstatus64_timeval
{
a8_uint64_t tv_sec;
a8_uint64_t tv_usec;
};
struct elf_prstatus64
{
struct elf_siginfo pr_info;
short int pr_cursig;
a8_uint64_t pr_sigpend;
a8_uint64_t pr_sighold;
pid_t pr_pid;
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
struct prstatus64_timeval pr_utime;
struct prstatus64_timeval pr_stime;
struct prstatus64_timeval pr_cutime;
struct prstatus64_timeval pr_cstime;
elf_gregset64_t pr_reg;
int pr_fpvalid;
};
struct elf_prpsinfo64
{
char pr_state;
char pr_sname;
char pr_zomb;
char pr_nice;
a8_uint64_t pr_flag;
unsigned int pr_uid;
unsigned int pr_gid;
int pr_pid, pr_ppid, pr_pgrp, pr_sid;
char pr_fname[16];
char pr_psargs[ELF_PRARGSZ];
};
#endif /* !__HEADER_ARM64_COMMON_ELFCOREGPL_H */
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,5 +1,4 @@
/* hwcap.h COPYRIGHT FUJITSU LIMITED 2017 */
#ifdef POSTK_DEBUG_ARCH_DEP_65
#ifndef _UAPI__ASM_HWCAP_H
#define _UAPI__ASM_HWCAP_H
@ -25,4 +24,3 @@ unsigned long arch_get_hwcap(void);
extern unsigned long elf_hwcap;
#endif /* _UAPI__ASM_HWCAP_H */
#endif /* POSTK_DEBUG_ARCH_DEP_65 */

View File

@ -1,4 +1,4 @@
/* context.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* context.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_IHK_CONTEXT_H
#define __HEADER_ARM64_IHK_CONTEXT_H
@ -27,7 +27,9 @@ struct pt_regs {
};
};
unsigned long orig_x0;
unsigned long orig_pc;
unsigned long syscallno;
unsigned long __padding;
};
typedef struct pt_regs ihk_mc_user_context_t;
@ -65,17 +67,17 @@ static inline void pt_regs_write_reg(struct pt_regs *regs, int r,
}
/* temp */
#define ihk_mc_syscall_arg0(uc) (uc)->regs[0]
#define ihk_mc_syscall_arg1(uc) (uc)->regs[1]
#define ihk_mc_syscall_arg2(uc) (uc)->regs[2]
#define ihk_mc_syscall_arg3(uc) (uc)->regs[3]
#define ihk_mc_syscall_arg4(uc) (uc)->regs[4]
#define ihk_mc_syscall_arg5(uc) (uc)->regs[5]
#define ihk_mc_syscall_arg0(uc) ((uc)->regs[0])
#define ihk_mc_syscall_arg1(uc) ((uc)->regs[1])
#define ihk_mc_syscall_arg2(uc) ((uc)->regs[2])
#define ihk_mc_syscall_arg3(uc) ((uc)->regs[3])
#define ihk_mc_syscall_arg4(uc) ((uc)->regs[4])
#define ihk_mc_syscall_arg5(uc) ((uc)->regs[5])
#define ihk_mc_syscall_ret(uc) (uc)->regs[0]
#define ihk_mc_syscall_number(uc) (uc)->regs[8]
#define ihk_mc_syscall_ret(uc) ((uc)->regs[0])
#define ihk_mc_syscall_number(uc) ((uc)->regs[8])
#define ihk_mc_syscall_pc(uc) (uc)->pc
#define ihk_mc_syscall_sp(uc) (uc)->sp
#define ihk_mc_syscall_pc(uc) ((uc)->pc)
#define ihk_mc_syscall_sp(uc) ((uc)->sp)
#endif /* !__HEADER_ARM64_IHK_CONTEXT_H */

View File

@ -20,13 +20,11 @@ typedef uint64_t size_t;
typedef int64_t ssize_t;
typedef int64_t off_t;
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
typedef int32_t key_t;
typedef uint32_t uid_t;
typedef uint32_t gid_t;
typedef int64_t time_t;
typedef int32_t pid_t;
#endif /* POSTK_DEBUG_ARCH_DEP_18 */
#endif /* __ASSEMBLY__ */

View File

@ -0,0 +1,102 @@
/* imp-sysreg.h COPYRIGHT FUJITSU LIMITED 2016-2018 */
#ifndef __ASM_IMP_SYSREG_H
#define __ASM_IMP_SYSREG_H
#ifndef __ASSEMBLY__
/* register sys_reg list */
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1 sys_reg(3, 0, 11, 2, 0)
#define IMP_SCCR_CTRL_EL1 sys_reg(3, 0, 11, 8, 0)
#define IMP_SCCR_ASSIGN_EL1 sys_reg(3, 0, 11, 8, 1)
#define IMP_SCCR_SET0_L2_EL1 sys_reg(3, 0, 15, 8, 2)
#define IMP_SCCR_SET1_L2_EL1 sys_reg(3, 0, 15, 8, 3)
#define IMP_SCCR_L1_EL0 sys_reg(3, 3, 11, 8, 2)
#define IMP_PF_CTRL_EL1 sys_reg(3, 0, 11, 4, 0)
#define IMP_PF_STREAM_DETECT_CTRL_EL0 sys_reg(3, 3, 11, 4, 0)
#define IMP_PF_INJECTION_CTRL0_EL0 sys_reg(3, 3, 11, 6, 0)
#define IMP_PF_INJECTION_CTRL1_EL0 sys_reg(3, 3, 11, 6, 1)
#define IMP_PF_INJECTION_CTRL2_EL0 sys_reg(3, 3, 11, 6, 2)
#define IMP_PF_INJECTION_CTRL3_EL0 sys_reg(3, 3, 11, 6, 3)
#define IMP_PF_INJECTION_CTRL4_EL0 sys_reg(3, 3, 11, 6, 4)
#define IMP_PF_INJECTION_CTRL5_EL0 sys_reg(3, 3, 11, 6, 5)
#define IMP_PF_INJECTION_CTRL6_EL0 sys_reg(3, 3, 11, 6, 6)
#define IMP_PF_INJECTION_CTRL7_EL0 sys_reg(3, 3, 11, 6, 7)
#define IMP_PF_INJECTION_DISTANCE0_EL0 sys_reg(3, 3, 11, 7, 0)
#define IMP_PF_INJECTION_DISTANCE1_EL0 sys_reg(3, 3, 11, 7, 1)
#define IMP_PF_INJECTION_DISTANCE2_EL0 sys_reg(3, 3, 11, 7, 2)
#define IMP_PF_INJECTION_DISTANCE3_EL0 sys_reg(3, 3, 11, 7, 3)
#define IMP_PF_INJECTION_DISTANCE4_EL0 sys_reg(3, 3, 11, 7, 4)
#define IMP_PF_INJECTION_DISTANCE5_EL0 sys_reg(3, 3, 11, 7, 5)
#define IMP_PF_INJECTION_DISTANCE6_EL0 sys_reg(3, 3, 11, 7, 6)
#define IMP_PF_INJECTION_DISTANCE7_EL0 sys_reg(3, 3, 11, 7, 7)
#define IMP_BARRIER_CTRL_EL1 sys_reg(3, 0, 11, 12, 0)
#define IMP_BARRIER_BST_BIT_EL1 sys_reg(3, 0, 11, 12, 4)
#define IMP_BARRIER_INIT_SYNC_BB0_EL1 sys_reg(3, 0, 15, 13, 0)
#define IMP_BARRIER_INIT_SYNC_BB1_EL1 sys_reg(3, 0, 15, 13, 1)
#define IMP_BARRIER_INIT_SYNC_BB2_EL1 sys_reg(3, 0, 15, 13, 2)
#define IMP_BARRIER_INIT_SYNC_BB3_EL1 sys_reg(3, 0, 15, 13, 3)
#define IMP_BARRIER_INIT_SYNC_BB4_EL1 sys_reg(3, 0, 15, 13, 4)
#define IMP_BARRIER_INIT_SYNC_BB5_EL1 sys_reg(3, 0, 15, 13, 5)
#define IMP_BARRIER_ASSIGN_SYNC_W0_EL1 sys_reg(3, 0, 15, 15, 0)
#define IMP_BARRIER_ASSIGN_SYNC_W1_EL1 sys_reg(3, 0, 15, 15, 1)
#define IMP_BARRIER_ASSIGN_SYNC_W2_EL1 sys_reg(3, 0, 15, 15, 2)
#define IMP_BARRIER_ASSIGN_SYNC_W3_EL1 sys_reg(3, 0, 15, 15, 3)
#define IMP_SOC_STANDBY_CTRL_EL1 sys_reg(3, 0, 11, 0, 0)
#define IMP_FJ_CORE_UARCH_CTRL_EL2 sys_reg(3, 4, 11, 0, 4)
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1 sys_reg(3, 0, 11, 0, 5)
/* macros */
#define PWR_REG_MASK(reg, feild) (((UL(1) << ((reg##_##feild##_MSB) - (reg##_##feild##_LSB) + 1)) - 1) << (reg##_##feild##_LSB))
/* IMP_FJ_TAG_ADDRESS_CTRL_EL1 */
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_TBO0_SHIFT (0)
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_SEC0_SHIFT (8)
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_PFE0_SHIFT (9)
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_TBO0_MASK (1UL << IMP_FJ_TAG_ADDRESS_CTRL_EL1_TBO0_SHIFT)
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_SEC0_MASK (1UL << IMP_FJ_TAG_ADDRESS_CTRL_EL1_SEC0_SHIFT)
#define IMP_FJ_TAG_ADDRESS_CTRL_EL1_PFE0_MASK (1UL << IMP_FJ_TAG_ADDRESS_CTRL_EL1_PFE0_SHIFT)
/* IMP_SCCR_CTRL_EL1 */
#define IMP_SCCR_CTRL_EL1_EL1AE_SHIFT (63)
#define IMP_SCCR_CTRL_EL1_EL1AE_MASK (1UL << IMP_SCCR_CTRL_EL1_EL1AE_SHIFT)
/* IMP_SCCR_SET0_L2_EL1 */
#define IMP_SCCR_SET0_L2_EL1_L2_SEC0_SHIFT (0)
/* IMP_PF_CTRL_EL1 */
#define IMP_PF_CTRL_EL1_EL1AE_ENABLE (1UL << 63)
#define IMP_PF_CTRL_EL1_EL0AE_ENABLE (1UL << 62)
/* IMP_BARRIER_CTRL_EL1 */
#define IMP_BARRIER_CTRL_EL1_EL1AE_ENABLE (1UL << 63)
#define IMP_BARRIER_CTRL_EL1_EL0AE_ENABLE (1UL << 62)
/* IMP_SOC_STANDBY_CTRL_EL1 */
#define IMP_SOC_STANDBY_CTRL_EL1_ECO_MODE_MSB 2
#define IMP_SOC_STANDBY_CTRL_EL1_ECO_MODE_LSB 2
#define IMP_SOC_STANDBY_CTRL_EL1_MODE_CHANGE_MSB 1
#define IMP_SOC_STANDBY_CTRL_EL1_MODE_CHANGE_LSB 1
#define IMP_SOC_STANDBY_CTRL_EL1_RETENTION_MSB 0
#define IMP_SOC_STANDBY_CTRL_EL1_RETENTION_LSB 0
#define IMP_SOC_STANDBY_CTRL_EL1_ECO_MODE PWR_REG_MASK(IMP_SOC_STANDBY_CTRL_EL1, ECO_MODE)
#define IMP_SOC_STANDBY_CTRL_EL1_MODE_CHANGE PWR_REG_MASK(IMP_SOC_STANDBY_CTRL_EL1, MODE_CHANGE)
#define IMP_SOC_STANDBY_CTRL_EL1_RETENTION PWR_REG_MASK(IMP_SOC_STANDBY_CTRL_EL1, RETENTION)
/* IMP_FJ_CORE_UARCH_RESTRECTION_EL1 */
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_FL_RESTRICT_TRANS_MSB 33
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_FL_RESTRICT_TRANS_LSB 33
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_ISSUE_RESTRICTION_MSB 9
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_ISSUE_RESTRICTION_LSB 8
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_EX_RESTRICTION_MSB 0
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_EX_RESTRICTION_LSB 0
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_FL_RESTRICT_TRANS PWR_REG_MASK(IMP_FJ_CORE_UARCH_RESTRECTION_EL1, FL_RESTRICT_TRANS)
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_ISSUE_RESTRICTION PWR_REG_MASK(IMP_FJ_CORE_UARCH_RESTRECTION_EL1, ISSUE_RESTRICTION)
#define IMP_FJ_CORE_UARCH_RESTRECTION_EL1_EX_RESTRICTION PWR_REG_MASK(IMP_FJ_CORE_UARCH_RESTRECTION_EL1, EX_RESTRICTION)
void scdrv_registers_init(void);
void hpc_registers_init(void);
void vhbm_barrier_registers_init(void);
#endif /* __ASSEMBLY__ */
#endif /* __ASM_IMP_SYSREG_H */

View File

@ -1,4 +1,4 @@
/* irq.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* irq.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_IRQ_H
#define __HEADER_ARM64_IRQ_H
@ -15,42 +15,15 @@
#define INTRID_CPU_STOP 3
#define INTRID_TLB_FLUSH 4
#define INTRID_STACK_TRACE 6
#define INTRID_MEMDUMP 7
#define INTRID_MULTI_NMI 7
/* use PPI interrupt number */
#define INTRID_PERF_OVF 23
#define INTRID_HYP_PHYS_TIMER 26 /* cnthp */
#define INTRID_VIRT_TIMER 27 /* cntv */
#define INTRID_HYP_VIRT_TIMER 28 /* cnthv */
#define INTRID_PHYS_TIMER 30 /* cntp */
/* timer intrid getter */
static int get_virt_timer_intrid(void)
{
#ifdef CONFIG_ARM64_VHE
unsigned long mmfr = read_cpuid(ID_AA64MMFR1_EL1);
if ((mmfr >> ID_AA64MMFR1_VHE_SHIFT) & 1UL) {
return INTRID_HYP_VIRT_TIMER;
}
#endif /* CONFIG_ARM64_VHE */
return INTRID_VIRT_TIMER;
}
static int get_phys_timer_intrid(void)
{
#ifdef CONFIG_ARM64_VHE
unsigned long mmfr = read_cpuid(ID_AA64MMFR1_EL1);
if ((mmfr >> ID_AA64MMFR1_VHE_SHIFT) & 1UL) {
return INTRID_HYP_PHYS_TIMER;
}
#endif /* CONFIG_ARM64_VHE */
return INTRID_PHYS_TIMER;
}
/* use timer checker */
extern unsigned long is_use_virt_timer(void);
/* Functions for GICv2 */
extern void gic_dist_init_gicv2(unsigned long dist_base_pa, unsigned long size);
extern void gic_cpu_init_gicv2(unsigned long cpu_base_pa, unsigned long size);

View File

@ -72,6 +72,7 @@
#define PMD_SECT_S (UL(3) << 8)
#define PMD_SECT_AF (UL(1) << 10)
#define PMD_SECT_NG (UL(1) << 11)
#define PMD_SECT_CONT (UL(1) << 52)
#define PMD_SECT_PXN (UL(1) << 53)
#define PMD_SECT_UXN (UL(1) << 54)
@ -93,6 +94,7 @@
#define PTE_SHARED (UL(3) << 8) /* SH[1:0], inner shareable */
#define PTE_AF (UL(1) << 10) /* Access Flag */
#define PTE_NG (UL(1) << 11) /* nG */
#define PTE_CONT (UL(1) << 52) /* Contiguous range */
#define PTE_PXN (UL(1) << 53) /* Privileged XN */
#define PTE_UXN (UL(1) << 54) /* User XN */
/* Software defined PTE bits definition.*/

View File

@ -2,6 +2,9 @@
#ifndef __HEADER_ARM64_COMMON_PRCTL_H
#define __HEADER_ARM64_COMMON_PRCTL_H
#define PR_SET_THP_DISABLE 41
#define PR_GET_THP_DISABLE 42
/* arm64 Scalable Vector Extension controls */
#define PR_SVE_SET_VL 48 /* set task vector length */
#define PR_SVE_SET_VL_THREAD (1 << 1) /* set just this thread */

View File

@ -1,9 +1,10 @@
/* registers.h COPYRIGHT FUJITSU LIMITED 2015-2016 */
/* registers.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_REGISTERS_H
#define __HEADER_ARM64_COMMON_REGISTERS_H
#include <types.h>
#include <arch/cpu.h>
#include <sysreg.h>
#define RFLAGS_CF (1 << 0)
#define RFLAGS_PF (1 << 2)
@ -76,15 +77,12 @@ static unsigned long rdmsr(unsigned int index)
return 0;
}
/* @ref.impl linux-linaro/arch/arm64/include/asm/arch_timer.h::arch_counter_get_cntvct */
static unsigned long rdtsc(void)
/* @ref.impl linux4.10.16 */
/* arch/arm64/include/asm/arch_timer.h:arch_counter_get_cntvct() */
static inline unsigned long rdtsc(void)
{
unsigned long cval;
isb();
asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
return cval;
return read_sysreg(cntvct_el0);
}
static void set_perfctl(int counter, int event, int mask)

View File

@ -1,4 +1,4 @@
/* signal.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* signal.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_SIGNAL_H
#define __HEADER_ARM64_COMMON_SIGNAL_H
@ -9,6 +9,11 @@
#define _NSIG_BPW 64
#define _NSIG_WORDS (_NSIG / _NSIG_BPW)
static inline int valid_signal(unsigned long sig)
{
return sig <= _NSIG ? 1 : 0;
}
typedef unsigned long int __sigset_t;
#define __sigmask(sig) (((__sigset_t) 1) << ((sig) - 1))
@ -402,8 +407,6 @@ struct ucontext {
};
void arm64_notify_die(const char *str, struct pt_regs *regs, struct siginfo *info, int err);
void set_signal(int sig, void *regs, struct siginfo *info);
void check_signal(unsigned long rc, void *regs, int num);
void check_signal_irq_disabled(unsigned long rc, void *regs, int num);
#endif /* __HEADER_ARM64_COMMON_SIGNAL_H */

View File

@ -1,17 +1,14 @@
/* syscall_list.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* syscall_list.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
SYSCALL_DELEGATED(4, io_getevents)
SYSCALL_DELEGATED(17, getcwd)
SYSCALL_DELEGATED(22, epoll_pwait)
SYSCALL_DELEGATED(25, fcntl)
SYSCALL_HANDLED(29, ioctl)
SYSCALL_DELEGATED(35, unlinkat)
SYSCALL_DELEGATED(43, statfs)
SYSCALL_DELEGATED(44, fstatfs)
#ifdef POSTK_DEBUG_ARCH_DEP_62 /* Absorb the difference between open and openat args. */
SYSCALL_HANDLED(56, openat)
#else /* POSTK_DEBUG_ARCH_DEP_62 */
SYSCALL_DELEGATED(56, openat)
#endif /* POSTK_DEBUG_ARCH_DEP_62 */
SYSCALL_HANDLED(57, close)
SYSCALL_DELEGATED(61, getdents64)
SYSCALL_DELEGATED(62, lseek)
@ -114,14 +111,20 @@ SYSCALL_HANDLED(236, get_mempolicy)
SYSCALL_HANDLED(237, set_mempolicy)
SYSCALL_HANDLED(238, migrate_pages)
SYSCALL_HANDLED(239, move_pages)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(241, perf_event_open)
#else // PERF_ENABLE
SYSCALL_DELEGATED(241, perf_event_open)
#endif // PERF_ENABLE
SYSCALL_HANDLED(260, wait4)
SYSCALL_HANDLED(270, process_vm_readv)
SYSCALL_HANDLED(271, process_vm_writev)
#ifdef PERF_ENABLE
SYSCALL_HANDLED(601, pmc_init)
SYSCALL_HANDLED(602, pmc_start)
SYSCALL_HANDLED(603, pmc_stop)
SYSCALL_HANDLED(604, pmc_reset)
#endif // PERF_ENABLE
SYSCALL_HANDLED(700, get_cpu_id)
#ifdef PROFILE_ENABLE
SYSCALL_HANDLED(__NR_profile, profile)
@ -138,9 +141,8 @@ SYSCALL_HANDLED(804, resume_threads)
SYSCALL_HANDLED(811, linux_spawn)
SYSCALL_DELEGATED(1024, open)
SYSCALL_DELEGATED(1026, unlink)
SYSCALL_DELEGATED(1035, readlink)
SYSCALL_HANDLED(1045, signalfd)
SYSCALL_DELEGATED(1049, stat)
SYSCALL_DELEGATED(1060, getpgrp)
SYSCALL_DELEGATED(1062, time)
SYSCALL_HANDLED(1062, time)

View File

@ -1,4 +1,4 @@
/* sysreg.h COPYRIGHT FUJITSU LIMITED 2016-2017 */
/* sysreg.h COPYRIGHT FUJITSU LIMITED 2016-2018 */
/*
* Macros for accessing system registers with older binutils.
*
@ -23,6 +23,7 @@
#include <types.h>
#include <stringify.h>
#include <ihk/types.h>
/*
* ARMv8 ARM reserves the following encoding for system registers:
@ -56,12 +57,6 @@
#define sys_reg_CRm(id) (((id) >> CRm_shift) & CRm_mask)
#define sys_reg_Op2(id) (((id) >> Op2_shift) & Op2_mask)
#ifdef __ASSEMBLY__
#define __emit_inst(x).inst (x)
#else
#define __emit_inst(x)".inst " __stringify((x)) "\n\t"
#endif
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@ -143,6 +138,12 @@
#define ID_AA64ISAR0_SHA1_SHIFT 8
#define ID_AA64ISAR0_AES_SHIFT 4
/* id_aa64isar1 */
#define ID_AA64ISAR1_LRCPC_SHIFT 20
#define ID_AA64ISAR1_FCMA_SHIFT 16
#define ID_AA64ISAR1_JSCVT_SHIFT 12
#define ID_AA64ISAR1_DPB_SHIFT 0
/* id_aa64pfr0 */
#define ID_AA64PFR0_SVE_SHIFT 32
#define ID_AA64PFR0_GIC_SHIFT 24
@ -178,6 +179,14 @@
#define ID_AA64MMFR0_TGRAN64_SUPPORTED 0x0
#define ID_AA64MMFR0_TGRAN16_NI 0x0
#define ID_AA64MMFR0_TGRAN16_SUPPORTED 0x1
#define ID_AA64MMFR0_PARANGE_48 0x5
#define ID_AA64MMFR0_PARANGE_52 0x6
#ifdef CONFIG_ARM64_PA_BITS_52
#define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52
#else
#define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_48
#endif
/* id_aa64mmfr1 */
#define ID_AA64MMFR1_PAN_SHIFT 20
@ -264,15 +273,46 @@
/* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
#define SYS_MPIDR_SAFE_VAL (1UL << 31)
#ifdef __ASSEMBLY__
/* SYS_MIDR_EL1 */
//mask
#define SYS_MIDR_EL1_IMPLEMENTER_MASK (0xFFUL)
#define SYS_MIDR_EL1_PPNUM_MASK (0xFFFUL)
//shift
#define SYS_MIDR_EL1_IMPLEMENTER_SHIFT (24)
#define SYS_MIDR_EL1_PPNUM_SHIFT (0x4)
//val
#define SYS_MIDR_EL1_IMPLEMENTER_FJ (0x46)
#define SYS_MIDR_EL1_PPNUM_TCHIP (0x1)
#define READ_ACCESS (0)
#define WRITE_ACCESS (1)
#define ACCESS_REG_FUNC(name, reg) \
static void xos_access_##name(uint8_t flag, uint64_t *reg_value) \
{ \
if (flag == READ_ACCESS) { \
__asm__ __volatile__("mrs_s %0," __stringify(reg) "\n\t" \
:"=&r"(*reg_value)::); \
} \
else if (flag == WRITE_ACCESS) { \
__asm__ __volatile__("msr_s" __stringify(reg) ", %0\n\t" \
::"r"(*reg_value):); \
} else { \
; \
} \
}
#define XOS_FALSE (0)
#define XOS_TRUE (1)
#ifdef __ASSEMBLY__
#define __emit_inst(x).inst (x)
.irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
.equ .L__reg_num_x\num, \num
.endr
.equ .L__reg_num_xzr, 31
.macro mrs_s, rt, sreg
__emit_inst(0xd5200000|(\sreg)|(.L__reg_num_\rt))
__emit_inst(0xd5200000|(\sreg)|(.L__reg_num_\rt))
.endm
.macro msr_s, sreg, rt
@ -280,7 +320,7 @@
.endm
#else
#define __emit_inst(x)".inst " __stringify((x)) "\n\t"
asm(
" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n"
" .equ .L__reg_num_x\\num, \\num\n"
@ -296,6 +336,28 @@ asm(
" .endm\n"
);
ACCESS_REG_FUNC(midr_el1, SYS_MIDR_EL1);
static int xos_is_tchip(void)
{
uint64_t reg = 0;
int ret = 0, impl = 0, part = 0;
xos_access_midr_el1(READ_ACCESS, &reg);
impl = (reg >> SYS_MIDR_EL1_IMPLEMENTER_SHIFT) &
SYS_MIDR_EL1_IMPLEMENTER_MASK;
part = (reg >> SYS_MIDR_EL1_PPNUM_SHIFT) & SYS_MIDR_EL1_PPNUM_MASK;
if ((impl == SYS_MIDR_EL1_IMPLEMENTER_FJ) &&
(part == SYS_MIDR_EL1_PPNUM_TCHIP)) {
ret = XOS_TRUE;
}
else {
ret = XOS_FALSE;
}
return ret;
}
#endif
/*
@ -336,4 +398,6 @@ asm(
/* @ref.impl arch/arm64/include/asm/kvm_arm.h */
#define CPTR_EL2_TZ (1 << 8)
#include "imp-sysreg.h"
#endif /* __ASM_SYSREG_H */

View File

@ -1,15 +1,22 @@
/* thread_info.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* thread_info.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_THREAD_INFO_H
#define __HEADER_ARM64_COMMON_THREAD_INFO_H
#define KERNEL_STACK_SIZE 32768 /* 8 page */
#define MIN_KERNEL_STACK_SHIFT 15
#include <arch-memory.h>
#if (MIN_KERNEL_STACK_SHIFT < PAGE_SHIFT)
#define KERNEL_STACK_SHIFT PAGE_SHIFT
#else
#define KERNEL_STACK_SHIFT MIN_KERNEL_STACK_SHIFT
#endif
#define KERNEL_STACK_SIZE (UL(1) << KERNEL_STACK_SHIFT)
#define THREAD_START_SP KERNEL_STACK_SIZE - 16
#ifndef __ASSEMBLY__
#define ALIGN_UP(x, align) ALIGN_DOWN((x) + (align) - 1, align)
#define ALIGN_DOWN(x, align) ((x) & ~((align) - 1))
#include <process.h>
#include <prctl.h>
@ -53,8 +60,8 @@ struct thread_info {
struct arm64_cpu_local_thread {
struct thread_info thread_info;
unsigned long paniced; /* 136 */
uint64_t panic_regs[34]; /* 144 */
unsigned long paniced;
uint64_t panic_regs[34];
};
union arm64_cpu_local_variables {

View File

@ -1,8 +1,22 @@
/* virt.h COPYRIGHT FUJITSU LIMITED 2015 */
/* virt.h COPYRIGHT FUJITSU LIMITED 2015-2017 */
#ifndef __HEADER_ARM64_COMMON_VIRT_H
#define __HEADER_ARM64_COMMON_VIRT_H
/* @ref.impl linux-v4.15-rc3 arch/arm64/include/asm/virt.h */
#define BOOT_CPU_MODE_EL1 (0xe11)
#define BOOT_CPU_MODE_EL2 (0xe12)
#ifndef __ASSEMBLY__
#include <sysreg.h>
#include <ptrace.h>
/* @ref.impl linux-v4.15-rc3 arch/arm64/include/asm/virt.h */
static inline int is_kernel_in_hyp_mode(void)
{
return read_sysreg(CurrentEL) == CurrentEL_EL2;
}
#endif /* !__ASSEMBLY__ */
#endif /* !__HEADER_ARM64_COMMON_VIRT_H */

View File

@ -1,21 +1,21 @@
/* irq-gic-v2.c COPYRIGHT FUJITSU LIMITED 2015-2016 */
/* irq-gic-v2.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <ihk/cpu.h>
#include <irq.h>
#include <arm-gic-v2.h>
#include <io.h>
#include <arch/cpu.h>
#include <memory.h>
#include <affinity.h>
#include <syscall.h>
#include <debug.h>
#include <arch-timer.h>
#include <cls.h>
// #define DEBUG_GICV2
#ifdef DEBUG_GICV2
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
void *dist_base;
@ -54,17 +54,11 @@ static void arm64_raise_sgi_gicv2(unsigned int cpuid, unsigned int vector)
* arm64_raise_spi_gicv2
* @ref.impl nothing.
*/
extern unsigned int ihk_ikc_irq_apicid;
static void arm64_raise_spi_gicv2(unsigned int cpuid, unsigned int vector)
{
uint64_t spi_reg_offset;
uint32_t spi_set_pending_bitpos;
if (cpuid != ihk_ikc_irq_apicid) {
ekprintf("SPI(irq#%d) cannot send other than the host.\n", vector);
return;
}
/**
* calculates register offset and bit position corresponding to the numbers.
*
@ -111,8 +105,9 @@ extern int interrupt_from_user(void *);
void handle_interrupt_gicv2(struct pt_regs *regs)
{
unsigned int irqstat, irqnr;
const int from_user = interrupt_from_user(regs);
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
do {
// get GICC_IAR.InterruptID
irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
@ -132,7 +127,13 @@ void handle_interrupt_gicv2(struct pt_regs *regs)
*/
break;
} while (1);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
/* for migration by IPI */
if (get_this_cpu_local_var()->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
}
}
void gic_dist_init_gicv2(unsigned long dist_base_pa, unsigned long size)
@ -149,10 +150,6 @@ void gic_enable_gicv2(void)
{
unsigned int enable_ppi_sgi = 0;
if (is_use_virt_timer()) {
enable_ppi_sgi |= GICD_ENABLE << get_virt_timer_intrid();
} else {
enable_ppi_sgi |= GICD_ENABLE << get_phys_timer_intrid();
}
enable_ppi_sgi |= GICD_ENABLE << get_timer_intrid();
writel_relaxed(enable_ppi_sgi, dist_base + GIC_DIST_ENABLE_SET);
}

View File

@ -1,5 +1,4 @@
/* irq-gic-v3.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* irq-gic-v3.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <irq.h>
#include <arm-gic-v2.h>
#include <arm-gic-v3.h>
@ -7,17 +6,17 @@
#include <cputype.h>
#include <process.h>
#include <syscall.h>
#include <debug.h>
#include <arch-timer.h>
#include <cls.h>
//#define DEBUG_GICV3
#define USE_CAVIUM_THUNDER_X
#ifdef DEBUG_GICV3
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#ifdef USE_CAVIUM_THUNDER_X
@ -266,6 +265,7 @@ static void arm64_raise_spi_gicv3(uint32_t cpuid, uint32_t vector)
static void arm64_raise_lpi_gicv3(uint32_t cpuid, uint32_t vector)
{
// @todo.impl
ekprintf("%s called.\n", __func__);
}
void arm64_issue_ipi_gicv3(uint32_t cpuid, uint32_t vector)
@ -283,7 +283,7 @@ void arm64_issue_ipi_gicv3(uint32_t cpuid, uint32_t vector)
// send LPI (allow only to host)
arm64_raise_lpi_gicv3(cpuid, vector);
} else {
ekprintf("#%d is bad irq number.", vector);
ekprintf("#%d is bad irq number.\n", vector);
}
}
@ -291,10 +291,11 @@ extern int interrupt_from_user(void *);
void handle_interrupt_gicv3(struct pt_regs *regs)
{
uint64_t irqnr;
const int from_user = interrupt_from_user(regs);
irqnr = gic_read_iar();
cpu_enable_nmi();
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
while (irqnr != ICC_IAR1_EL1_SPURIOUS) {
if ((irqnr < 1020) || (irqnr >= 8192)) {
gic_write_eoir(irqnr);
@ -302,11 +303,51 @@ void handle_interrupt_gicv3(struct pt_regs *regs)
}
irqnr = gic_read_iar();
}
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
/* for migration by IPI */
if (get_this_cpu_local_var()->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
}
}
static uint64_t gic_mpidr_to_affinity(unsigned long mpidr)
{
uint64_t aff;
aff = ((uint64_t)MPIDR_AFFINITY_LEVEL(mpidr, 3) << 32 |
MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 |
MPIDR_AFFINITY_LEVEL(mpidr, 0));
return aff;
}
static void init_spi_routing(uint32_t irq, uint32_t linux_cpu)
{
uint64_t spi_route_reg_val, spi_route_reg_offset;
if (irq < 32 || 1020 <= irq) {
ekprintf("%s: irq is not spi number. (irq=%d)\n",
__func__, irq);
return;
}
/* write to GICD_IROUTER */
spi_route_reg_offset = irq * 8;
spi_route_reg_val = gic_mpidr_to_affinity(cpu_logical_map(linux_cpu));
writeq_relaxed(spi_route_reg_val,
(void *)(dist_base + GICD_IROUTER +
spi_route_reg_offset));
}
void gic_dist_init_gicv3(unsigned long dist_base_pa, unsigned long size)
{
extern int spi_table[];
extern int nr_spi_table;
int i;
dist_base = map_fixed_area(dist_base_pa, size, 1 /*non chachable*/);
#ifdef USE_CAVIUM_THUNDER_X
@ -315,6 +356,14 @@ void gic_dist_init_gicv3(unsigned long dist_base_pa, unsigned long size)
is_cavium_thunderx = 1;
}
#endif
/* initialize spi routing */
for (i = 0; i < nr_spi_table; i++) {
if (spi_table[i] == -1) {
continue;
}
init_spi_routing(spi_table[i], i);
}
}
void gic_cpu_init_gicv3(unsigned long cpu_base_pa, unsigned long size)
@ -351,11 +400,23 @@ void gic_enable_gicv3(void)
void *rd_sgi_base = rbase + 0x10000 /* SZ_64K */;
int i;
unsigned int enable_ppi_sgi = GICD_INT_EN_SET_SGI;
extern int ihk_param_nr_pmu_irq_affi;
extern int ihk_param_pmu_irq_affi[CONFIG_SMP_MAX_CORES];
if (is_use_virt_timer()) {
enable_ppi_sgi |= GICD_ENABLE << get_virt_timer_intrid();
} else {
enable_ppi_sgi |= GICD_ENABLE << get_phys_timer_intrid();
enable_ppi_sgi |= GICD_ENABLE << get_timer_intrid();
if (0 < ihk_param_nr_pmu_irq_affi) {
for (i = 0; i < ihk_param_nr_pmu_irq_affi; i++) {
if ((0 <= ihk_param_pmu_irq_affi[i]) &&
(ihk_param_pmu_irq_affi[i] <
sizeof(enable_ppi_sgi) * BITS_PER_BYTE)) {
enable_ppi_sgi |= GICD_ENABLE <<
ihk_param_pmu_irq_affi[i];
}
}
}
else {
enable_ppi_sgi |= GICD_ENABLE << INTRID_PERF_OVF;
}
/*
@ -368,9 +429,10 @@ void gic_enable_gicv3(void)
/*
* Set priority on PPI and SGI interrupts
*/
for (i = 0; i < 32; i += 4)
for (i = 0; i < 32; i += 4) {
writel_relaxed(GICD_INT_DEF_PRI_X4,
rd_sgi_base + GIC_DIST_PRI + i * 4 / 4);
rd_sgi_base + GIC_DIST_PRI + i);
}
/* sync wait */
gic_do_wait_for_rwp(rbase);
@ -406,9 +468,12 @@ void gic_enable_gicv3(void)
gic_write_bpr1(0);
/* Set specific IPI to NMI */
writeb_relaxed(GICD_INT_NMI_PRI, rd_sgi_base + GIC_DIST_PRI + INTRID_CPU_STOP);
writeb_relaxed(GICD_INT_NMI_PRI, rd_sgi_base + GIC_DIST_PRI + INTRID_MEMDUMP);
writeb_relaxed(GICD_INT_NMI_PRI, rd_sgi_base + GIC_DIST_PRI + INTRID_STACK_TRACE);
writeb_relaxed(GICD_INT_NMI_PRI,
rd_sgi_base + GIC_DIST_PRI + INTRID_CPU_STOP);
writeb_relaxed(GICD_INT_NMI_PRI,
rd_sgi_base + GIC_DIST_PRI + INTRID_MULTI_NMI);
writeb_relaxed(GICD_INT_NMI_PRI,
rd_sgi_base + GIC_DIST_PRI + INTRID_STACK_TRACE);
/* sync wait */
gic_do_wait_for_rwp(rbase);

View File

@ -1,4 +1,4 @@
/* local.c COPYRIGHT FUJITSU LIMITED 2015-2016 */
/* local.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <cpulocal.h>
#include <ihk/atomic.h>
#include <ihk/mm.h>
@ -7,24 +7,31 @@
#include <registers.h>
#include <string.h>
#define LOCALS_SPAN (8 * PAGE_SIZE)
/* BSP initialized stack area */
union arm64_cpu_local_variables init_thread_info __attribute__((aligned(KERNEL_STACK_SIZE)));
/* BSP/AP idle stack pointer head */
static union arm64_cpu_local_variables *locals;
size_t arm64_cpu_local_variables_span = LOCALS_SPAN; /* for debugger */
size_t arm64_cpu_local_variables_span = KERNEL_STACK_SIZE; /* for debugger */
/* allocate & initialize BSP/AP idle stack */
void init_processors_local(int max_id)
{
int i = 0;
const int sz = (max_id + 1) * KERNEL_STACK_SIZE;
union arm64_cpu_local_variables *tmp;
const int npages = ((max_id + 1) *
(ALIGN_UP(KERNEL_STACK_SIZE, PAGE_SIZE) >>
PAGE_SHIFT));
if (npages < 1) {
panic("idle kernel stack allocation failed.");
}
/* allocate one more for alignment */
locals = ihk_mc_alloc_pages(((sz + PAGE_SIZE - 1) / PAGE_SIZE), IHK_MC_AP_CRITICAL);
locals = ihk_mc_alloc_pages(npages, IHK_MC_AP_CRITICAL);
if (locals == NULL) {
panic("idle kernel stack allocation failed.");
}
locals = (union arm64_cpu_local_variables *)ALIGN_UP((unsigned long)locals, KERNEL_STACK_SIZE);
/* clear struct process, struct process_vm, struct thread_info area */

File diff suppressed because it is too large Load Diff

View File

@ -19,7 +19,7 @@ int ihk_mc_ikc_init_first_local(struct ihk_ikc_channel_desc *channel,
memset(channel, 0, sizeof(struct ihk_ikc_channel_desc));
mikc_queue_pages = ((2 * num_processors * MASTER_IKCQ_PKTSIZE)
mikc_queue_pages = ((4 * num_processors * MASTER_IKCQ_PKTSIZE)
+ (PAGE_SIZE - 1)) / PAGE_SIZE;
/* Place both sides in this side */

View File

@ -1,4 +1,4 @@
/* perfctr.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* perfctr.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <arch-perfctr.h>
#include <ihk/perfctr.h>
#include <mc_perf_event.h>
@ -6,32 +6,60 @@
#include <ihk/debug.h>
#include <registers.h>
#include <string.h>
#include <ihk/mm.h>
#include <irq.h>
/*
* @ref.impl arch/arm64/kernel/perf_event.c
* Set at runtime when we know what CPU type we are.
*/
struct arm_pmu cpu_pmu;
extern int ihk_param_pmu_irq_affiniry[CONFIG_SMP_MAX_CORES];
extern int ihk_param_nr_pmu_irq_affiniry;
extern int ihk_param_pmu_irq_affi[CONFIG_SMP_MAX_CORES];
extern int ihk_param_nr_pmu_irq_affi;
int arm64_init_perfctr(void)
{
int ret;
int i;
int pages;
const struct ihk_mc_cpu_info *cpu_info;
memset(&cpu_pmu, 0, sizeof(cpu_pmu));
ret = armv8pmu_init(&cpu_pmu);
if (!ret) {
if (ret) {
return ret;
}
for (i = 0; i < ihk_param_nr_pmu_irq_affiniry; i++) {
ret = ihk_mc_register_interrupt_handler(ihk_param_pmu_irq_affiniry[i], cpu_pmu.handler);
cpu_info = ihk_mc_get_cpu_info();
pages = (sizeof(struct per_cpu_arm_pmu) * cpu_info->ncpus +
PAGE_SIZE - 1) >> PAGE_SHIFT;
cpu_pmu.per_cpu = ihk_mc_alloc_pages(pages, IHK_MC_AP_NOWAIT);
if (cpu_pmu.per_cpu == NULL) {
return -ENOMEM;
}
memset(cpu_pmu.per_cpu, 0, pages * PAGE_SIZE);
if (0 < ihk_param_nr_pmu_irq_affi) {
for (i = 0; i < ihk_param_nr_pmu_irq_affi; i++) {
ret = ihk_mc_register_interrupt_handler(ihk_param_pmu_irq_affi[i],
cpu_pmu.handler);
if (ret) {
break;
}
}
}
else {
ret = ihk_mc_register_interrupt_handler(INTRID_PERF_OVF,
cpu_pmu.handler);
}
return ret;
}
void arm64_init_per_cpu_perfctr(void)
{
armv8pmu_per_cpu_init(&cpu_pmu.per_cpu[ihk_mc_get_processor_id()]);
}
int arm64_enable_pmu(void)
{
int ret;
@ -47,11 +75,21 @@ void arm64_disable_pmu(void)
cpu_pmu.disable_pmu();
}
void arm64_enable_user_access_pmu_regs(void)
{
cpu_pmu.enable_user_access_pmu_regs();
}
void arm64_disable_user_access_pmu_regs(void)
{
cpu_pmu.disable_user_access_pmu_regs();
}
extern unsigned int *arm64_march_perfmap;
static int __ihk_mc_perfctr_init(int counter, uint32_t type, uint64_t config, int mode)
{
int ret;
int ret = -1;
unsigned long config_base = 0;
int mapping;
@ -61,17 +99,17 @@ static int __ihk_mc_perfctr_init(int counter, uint32_t type, uint64_t config, in
}
ret = cpu_pmu.disable_counter(counter);
if (!ret) {
if (ret < 0) {
return ret;
}
ret = cpu_pmu.enable_intens(counter);
if (!ret) {
if (ret < 0) {
return ret;
}
ret = cpu_pmu.set_event_filter(&config_base, mode);
if (!ret) {
if (ret) {
return ret;
}
config_base |= (unsigned long)mapping;
@ -93,20 +131,48 @@ int ihk_mc_perfctr_init(int counter, uint64_t config, int mode)
return ret;
}
int ihk_mc_perfctr_start(int counter)
int ihk_mc_perfctr_start(unsigned long counter_mask)
{
int ret;
ret = cpu_pmu.enable_counter(counter);
int ret = 0, i;
for (i = 0; i < sizeof(counter_mask) * BITS_PER_BYTE; i++) {
if (counter_mask & (1UL << i)) {
ret = cpu_pmu.enable_counter(i);
if (ret < 0) {
kprintf("%s: enable failed(idx=%d)\n",
__func__, i);
break;
}
}
}
return ret;
}
int ihk_mc_perfctr_stop(int counter)
int ihk_mc_perfctr_stop(unsigned long counter_mask, int flags)
{
cpu_pmu.disable_counter(counter);
int i = 0;
// ihk_mc_perfctr_startが呼ばれるときには、
// init系関数が呼ばれるのでdisableにする。
cpu_pmu.disable_intens(counter);
for (i = 0; i < sizeof(counter_mask) * BITS_PER_BYTE; i++) {
if (!(counter_mask & (1UL << i)))
continue;
int ret = 0;
ret = cpu_pmu.disable_counter(i);
if (ret < 0) {
continue;
}
if (flags & IHK_MC_PERFCTR_DISABLE_INTERRUPT) {
// when ihk_mc_perfctr_start is called,
// ihk_mc_perfctr_init is also called so disable
// interrupt
ret = cpu_pmu.disable_intens(i);
if (ret < 0) {
continue;
}
}
}
return 0;
}
@ -117,8 +183,7 @@ int ihk_mc_perfctr_reset(int counter)
return 0;
}
//int ihk_mc_perfctr_set(int counter, unsigned long val)
int ihk_mc_perfctr_set(int counter, long val) /* 0416_patchtemp */
int ihk_mc_perfctr_set(int counter, long val)
{
// TODO[PMU]: 共通部でサンプリングレートの計算をして、設定するカウンタ値をvalに渡してくるようになると想定。サンプリングレートの扱いを見てから本実装。
uint32_t v = val;
@ -140,17 +205,45 @@ unsigned long ihk_mc_perfctr_read(int counter)
return count;
}
//int ihk_mc_perfctr_alloc_counter(unsigned long pmc_status)
int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config, unsigned long pmc_status) /* 0416_patchtemp */
int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config,
unsigned long pmc_status)
{
int ret;
ret = cpu_pmu.get_event_idx(cpu_pmu.num_events, pmc_status);
if (*type == PERF_TYPE_HARDWARE) {
switch (*config) {
case PERF_COUNT_HW_INSTRUCTIONS:
ret = cpu_pmu.map_event(*type, *config);
if (ret < 0) {
return -1;
}
*type = PERF_TYPE_RAW;
break;
default:
// Unexpected config
return -1;
}
}
else if (*type != PERF_TYPE_RAW) {
return -1;
}
ret = cpu_pmu.get_event_idx(get_per_cpu_pmu()->num_events, pmc_status,
*config);
return ret;
}
/* 0416_patchtemp */
/* ihk_mc_perfctr_fixed_init() stub added. */
int ihk_mc_perfctr_fixed_init(int counter, int mode)
int ihk_mc_perf_counter_mask_check(unsigned long counter_mask)
{
return -1;
return 1;
}
int ihk_mc_perf_get_num_counters(void)
{
return cpu_pmu.per_cpu[ihk_mc_get_processor_id()].num_events;
}
int ihk_mc_perfctr_set_extra(struct mc_perf_event *event)
{
/* Nothing to do. */
return 0;
}

File diff suppressed because it is too large Load Diff

View File

@ -30,7 +30,7 @@
*/
#if defined(CONFIG_HAS_NMI)
#include <arm-gic-v3.h>
ENTRY(cpu_do_idle)
ENTRY(__cpu_do_idle)
mrs x0, daif // save I bit
msr daifset, #2 // set I bit
mrs_s x1, ICC_PMR_EL1 // save PMR
@ -41,13 +41,13 @@ ENTRY(cpu_do_idle)
msr_s ICC_PMR_EL1, x1 // restore PMR
msr daif, x0 // restore I bit
ret
ENDPROC(cpu_do_idle)
ENDPROC(__cpu_do_idle)
#else /* defined(CONFIG_HAS_NMI) */
ENTRY(cpu_do_idle)
ENTRY(__cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
ENDPROC(__cpu_do_idle)
#endif /* defined(CONFIG_HAS_NMI) */
/*

View File

@ -1,4 +1,4 @@
/* psci.c COPYRIGHT FUJITSU LIMITED 2015-2016 */
/* psci.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
/* @ref.impl arch/arm64/kernel/psci.c */
/*
* This program is free software; you can redistribute it and/or modify
@ -21,15 +21,13 @@
#include <ihk/debug.h>
#include <compiler.h>
#include <lwk/compiler.h>
#include <debug.h>
//#define DEBUG_PRINT_PSCI
#ifdef DEBUG_PRINT_PSCI
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define PSCI_POWER_STATE_TYPE_POWER_DOWN 1

View File

@ -1,4 +1,4 @@
/* ptrace.c COPYRIGHT FUJITSU LIMITED 2016-2017 */
/* ptrace.c COPYRIGHT FUJITSU LIMITED 2016-2018 */
#include <errno.h>
#include <debug-monitors.h>
#include <hw_breakpoint.h>
@ -11,24 +11,18 @@
#include <hwcap.h>
#include <string.h>
#include <thread_info.h>
#include <debug.h>
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#define dkprintf kprintf
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define NOT_IMPLEMENTED() do { kprintf("%s is not implemented\n", __func__); while(1);} while(0)
#define BUG_ON(condition) do { if (condition) { kprintf("PANIC: %s: %s(line:%d)\n",\
__FILE__, __FUNCTION__, __LINE__); panic(""); } } while(0)
extern void save_debugreg(unsigned long *debugreg);
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern int interrupt_from_user(void *);
enum aarch64_regset {
@ -953,27 +947,32 @@ void ptrace_report_signal(struct thread *thread, int sig)
}
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
if(!(proc->ptrace & PT_TRACED)){
if (!(thread->ptrace & PT_TRACED)) {
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
return;
}
thread->exit_status = sig;
/* Transition thread state */
#ifdef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
proc->status = PS_DELAY_TRACED;
#else /* POSTK_DEBUG_TEMP_FIX_41 */
proc->status = PS_TRACED;
#endif /* POSTK_DEBUG_TEMP_FIX_41 */
thread->exit_status = sig;
thread->status = PS_TRACED;
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
proc->signal_flags |= SIGNAL_STOP_STOPPED;
} else {
proc->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
parent_pid = proc->parent->pid;
thread->ptrace &= ~PT_TRACE_SYSCALL;
save_debugreg(thread->ptrace_debugreg);
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
thread->signal_flags |= SIGNAL_STOP_STOPPED;
}
else {
thread->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
if (thread == proc->main_thread) {
proc->status = PS_DELAY_TRACED;
parent_pid = proc->parent->pid;
}
else {
parent_pid = thread->report_proc->pid;
waitq_wakeup(&thread->report_proc->waitpid_q);
}
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
memset(&info, '\0', sizeof info);
@ -982,10 +981,6 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
#ifndef POSTK_DEBUG_TEMP_FIX_41 /* early to wait4() wakeup for ptrace, fix. */
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
#endif /* !POSTK_DEBUG_TEMP_FIX_41 */
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */

File diff suppressed because it is too large Load Diff

201
arch/arm64/kernel/timer.c Normal file
View File

@ -0,0 +1,201 @@
/* timer.c COPYRIGHT FUJITSU LIMITED 2018 */
#include <ihk/types.h>
#include <ihk/cpu.h>
#include <ihk/lock.h>
#include <sysreg.h>
#include <kmalloc.h>
#include <cls.h>
#include <cputype.h>
#include <irq.h>
#include <arch-timer.h>
#include <debug.h>
//#define DEBUG_PRINT_TIMER
#ifdef DEBUG_PRINT_TIMER
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
static unsigned int per_cpu_timer_val[NR_CPUS] = { 0 };
static int timer_intrid = INTRID_VIRT_TIMER;
static void arch_timer_virt_reg_write(enum arch_timer_reg reg, uint32_t val);
static void (*arch_timer_reg_write)(enum arch_timer_reg, uint32_t) =
arch_timer_virt_reg_write;
static uint32_t arch_timer_virt_reg_read(enum arch_timer_reg reg);
static uint32_t (*arch_timer_reg_read)(enum arch_timer_reg) =
arch_timer_virt_reg_read;
static void arch_timer_phys_reg_write(enum arch_timer_reg reg, uint32_t val)
{
switch (reg) {
case ARCH_TIMER_REG_CTRL:
write_sysreg(val, cntp_ctl_el0);
break;
case ARCH_TIMER_REG_TVAL:
write_sysreg(val, cntp_tval_el0);
break;
}
isb();
}
static void arch_timer_virt_reg_write(enum arch_timer_reg reg, uint32_t val)
{
switch (reg) {
case ARCH_TIMER_REG_CTRL:
write_sysreg(val, cntv_ctl_el0);
break;
case ARCH_TIMER_REG_TVAL:
write_sysreg(val, cntv_tval_el0);
break;
}
isb();
}
static uint32_t arch_timer_phys_reg_read(enum arch_timer_reg reg)
{
uint32_t val = 0;
switch (reg) {
case ARCH_TIMER_REG_CTRL:
val = read_sysreg(cntp_ctl_el0);
break;
case ARCH_TIMER_REG_TVAL:
val = read_sysreg(cntp_tval_el0);
break;
}
return val;
}
static uint32_t arch_timer_virt_reg_read(enum arch_timer_reg reg)
{
uint32_t val = 0;
switch (reg) {
case ARCH_TIMER_REG_CTRL:
val = read_sysreg(cntv_ctl_el0);
break;
case ARCH_TIMER_REG_TVAL:
val = read_sysreg(cntv_tval_el0);
break;
}
return val;
}
static void timer_handler(void *priv)
{
unsigned long ctrl;
const int cpu = ihk_mc_get_processor_id();
dkprintf("CPU%d: catch %s timer\n", cpu,
((timer_intrid == INTRID_PHYS_TIMER) ||
(timer_intrid == INTRID_HYP_PHYS_TIMER)) ?
"physical" : "virtual");
ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
if (ctrl & ARCH_TIMER_CTRL_IT_STAT) {
const unsigned int clocks = per_cpu_timer_val[cpu];
struct cpu_local_var *v = get_this_cpu_local_var();
unsigned long irqstate;
/* set resched flag */
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
v->flags |= CPU_FLAG_NEED_RESCHED;
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* gen control register value */
ctrl &= ~ARCH_TIMER_CTRL_IT_STAT;
/* set timer re-enable for periodic */
arch_timer_reg_write(ARCH_TIMER_REG_TVAL, clocks);
arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
}
}
static unsigned long is_use_virt_timer(void)
{
extern unsigned long ihk_param_use_virt_timer;
switch (ihk_param_use_virt_timer) {
case 0: /* physical */
case 1: /* virtual */
break;
default: /* invalid */
panic("PANIC: is_use_virt_timer(): timer select neither phys-timer nor virt-timer.\n");
break;
}
return ihk_param_use_virt_timer;
}
static struct ihk_mc_interrupt_handler timer_interrupt_handler = {
.func = timer_handler,
.priv = NULL,
};
/* other source use functions */
struct ihk_mc_interrupt_handler *get_timer_handler(void)
{
return &timer_interrupt_handler;
}
void
lapic_timer_enable(unsigned int clocks)
{
unsigned long ctrl = 0;
/* gen control register value */
ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
ctrl |= ARCH_TIMER_CTRL_ENABLE;
ctrl &= ~(ARCH_TIMER_CTRL_IT_MASK | ARCH_TIMER_CTRL_IT_STAT);
arch_timer_reg_write(ARCH_TIMER_REG_TVAL, clocks);
arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
per_cpu_timer_val[ihk_mc_get_processor_id()] = clocks;
}
void
lapic_timer_disable()
{
unsigned long ctrl = 0;
ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
ctrl &= ~ARCH_TIMER_CTRL_ENABLE;
arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
per_cpu_timer_val[ihk_mc_get_processor_id()] = 0;
}
int get_timer_intrid(void)
{
return timer_intrid;
}
void arch_timer_init(void)
{
const unsigned long is_virt = is_use_virt_timer();
#ifdef CONFIG_ARM64_VHE
const unsigned long mmfr = read_cpuid(ID_AA64MMFR1_EL1);
#endif /* CONFIG_ARM64_VHE */
if (is_virt) {
timer_intrid = INTRID_VIRT_TIMER;
arch_timer_reg_write = arch_timer_virt_reg_write;
arch_timer_reg_read = arch_timer_virt_reg_read;
} else {
timer_intrid = INTRID_PHYS_TIMER;
arch_timer_reg_write = arch_timer_phys_reg_write;
arch_timer_reg_read = arch_timer_phys_reg_read;
}
#ifdef CONFIG_ARM64_VHE
if ((mmfr >> ID_AA64MMFR1_VHE_SHIFT) & 1UL) {
if (is_virt) {
timer_intrid = INTRID_HYP_VIRT_TIMER;
} else {
timer_intrid = INTRID_HYP_PHYS_TIMER;
}
}
#endif /* CONFIG_ARM64_VHE */
}

View File

@ -1,4 +1,4 @@
/* traps.c COPYRIGHT FUJITSU LIMITED 2015-2017 */
/* traps.c COPYRIGHT FUJITSU LIMITED 2015-2018 */
#include <ihk/context.h>
#include <ihk/debug.h>
#include <traps.h>
@ -29,12 +29,14 @@ void arm64_notify_die(const char *str, struct pt_regs *regs, struct siginfo *inf
*/
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
{
const int from_user = interrupt_from_user(regs);
// /* TODO: implement lazy context saving/restoring */
set_cputime(1);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
// WARN_ON(1);
kprintf("WARNING: CPU: %d PID: %d Trapped FP/ASIMD access.\n",
ihk_mc_get_processor_id(), cpu_local_var(current)->proc->pid);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
/*
@ -51,7 +53,9 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
{
siginfo_t info;
unsigned int si_code = 0;
set_cputime(1);
const int from_user = interrupt_from_user(regs);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
if (esr & FPEXC_IOF)
si_code = FPE_FLTINV;
@ -70,7 +74,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
info._sifields._sigfault.si_addr = (void*)regs->pc;
set_signal(SIGFPE, regs, &info);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
/* @ref.impl arch/arm64/kernel/traps.c */
@ -133,8 +137,9 @@ exit:
void do_undefinstr(struct pt_regs *regs)
{
siginfo_t info;
const int from_user = interrupt_from_user(regs);
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
if (call_undef_hook(regs) == 0) {
goto out;
@ -147,7 +152,7 @@ void do_undefinstr(struct pt_regs *regs)
arm64_notify_die("Oops - undefined instruction", regs, &info, 0);
out:
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
/*
@ -157,7 +162,9 @@ out:
void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
{
siginfo_t info;
set_cputime(interrupt_from_user(regs)? 1: 2);
const int from_user = interrupt_from_user(regs);
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
kprintf("entering bad_mode !! (regs:0x%p, reason:%d, esr:0x%x)\n", regs, reason, esr);
kprintf("esr Analyse:\n");
@ -173,5 +180,5 @@ void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
info._sifields._sigfault.si_addr = (void*)regs->pc;
arm64_notify_die("Oops - bad mode", regs, &info, 0);
set_cputime(0);
set_cputime(from_user ? CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}

View File

@ -1,4 +1,4 @@
/* vdso.c COPYRIGHT FUJITSU LIMITED 2016 */
/* vdso.c COPYRIGHT FUJITSU LIMITED 2016-2018 */
/* @ref.impl arch/arm64/kernel/vdso.c */
#include <arch-memory.h>
@ -14,15 +14,13 @@
#include <ihk/debug.h>
#include <ikc/queue.h>
#include <vdso.h>
#include <debug.h>
//#define DEBUG_PRINT_VDSO
#ifdef DEBUG_PRINT_VDSO
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#ifdef POSTK_DEBUG_ARCH_DEP_52
@ -90,26 +88,8 @@ int arch_setup_vdso(void)
kprintf("Enable Host mapping vDSO.\n");
return 0;
}
kprintf("Enable McK mapping vDSO.\n");
if (memcmp(&vdso_start, "\177ELF", 4)) {
panic("vDSO is not a valid ELF object!\n");
}
vdso.vdso_npages = (&vdso_end - &vdso_start) >> PAGE_SHIFT;
dkprintf("vdso: %ld pages (%ld code @ %p, %ld data @ %p)\n",
vdso.vdso_npages + 1, vdso.vdso_npages, &vdso_start, 1L, &tod_data);
if (vdso.vdso_npages != 1) {
panic("vDSO is not a valid number of pages!\n");
}
vdso.vvar_phys = virt_to_phys((void *)&tod_data);
vdso.vdso_physlist[0] = virt_to_phys((void *)&vdso_start);
vdso.lbase = VDSO_LBASE;
vdso.offset_sigtramp = vdso_offset_sigtramp;
return 0;
panic("Only support host mapping vDSO");
}
static int get_free_area(struct process_vm *vm, size_t len, intptr_t hint,
@ -158,6 +138,7 @@ int arch_map_vdso(struct process_vm *vm)
unsigned long start, end;
unsigned long flag;
int ret;
struct vm_range *range;
vdso_text_len = vdso.vdso_npages << PAGE_SHIFT;
/* Be sure to map the data page */
@ -176,7 +157,7 @@ int arch_map_vdso(struct process_vm *vm)
flag = VR_REMOTE | VR_PROT_READ;
flag |= VRFLAG_PROT_TO_MAXPROT(flag);
ret = add_process_memory_range(vm, start, end, vdso.vvar_phys, flag,
NULL, 0, PAGE_SHIFT, NULL);
NULL, 0, PAGE_SHIFT, &range);
if (ret != 0){
dkprintf("ERROR: adding memory range for tod_data\n");
goto exit;
@ -188,7 +169,7 @@ int arch_map_vdso(struct process_vm *vm)
flag = VR_REMOTE | VR_PROT_READ | VR_PROT_EXEC;
flag |= VRFLAG_PROT_TO_MAXPROT(flag);
ret = add_process_memory_range(vm, start, end, vdso.vdso_physlist[0], flag,
NULL, 0, PAGE_SHIFT, NULL);
NULL, 0, PAGE_SHIFT, &range);
if (ret != 0) {
dkprintf("ERROR: adding memory range for vdso_text\n");

View File

@ -1,33 +0,0 @@
/* vdso.so.S COPYRIGHT FUJITSU LIMITED 2016 */
/* @ref.impl arch/arm64/kernel/vdso/vdso.S */
/*
* Copyright (C) 2012 ARM Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
* Author: Will Deacon <will.deacon@arm.com>
*/
#include <arch-memory.h>
#include <vdso-so-path.h>
.section ".vdso.txet", "aw"
.globl vdso_start, vdso_end
.balign PAGE_SIZE
vdso_start:
.incbin VDSO_SO_PATH
.balign PAGE_SIZE
vdso_end:
.previous

View File

@ -1,131 +0,0 @@
# Makefile.in COPYRIGHT FUJITSU LIMITED 2016
# @ref.impl arch/arm64/kernel/vdso/Makefile
# Building a vDSO image for AArch64.
HOST_DIR=@KDIR@
HOST_CONFIG=$(HOST_DIR)/.config
HOST_KERNEL_CONFIG_ARM64_4K_PAGES=$(shell grep -E "^CONFIG_ARM64_4K_PAGES=y" $(HOST_CONFIG) | sed 's|CONFIG_ARM64_4K_PAGES=||g')
HOST_KERNEL_CONFIG_ARM64_16K_PAGES=$(shell grep -E "^CONFIG_ARM64_16K_PAGES=y" $(HOST_CONFIG) | sed 's|CONFIG_ARM64_16K_PAGES=||g')
HOST_KERNEL_CONFIG_ARM64_64K_PAGES=$(shell grep -E "^CONFIG_ARM64_64K_PAGES=y" $(HOST_CONFIG) | sed 's|CONFIG_ARM64_64K_PAGES=||g')
VDSOSRC = @abs_srcdir@
VDSOBUILD = @abs_builddir@
INCDIR = $(VDSOSRC)/../include
ECHO_SUFFIX = [VDSO]
VDSOOBJS := gettimeofday.o
DESTOBJS = $(addprefix $(VDSOBUILD)/, $(VDSOOBJS))
VDSOASMOBJS := note.o sigreturn.o
DESTASMOBJS = $(addprefix $(VDSOBUILD)/, $(VDSOASMOBJS))
$(if $(VDSOSRC),,$(error IHK output directory is not specified))
$(if $(TARGET),,$(error Target is not specified))
#CFLAGS := -nostdinc -mlittle-endian -Wall -mabi=lp64 -Wa,-gdwarf-2
CFLAGS := -nostdinc -mlittle-endian -Wall -Wa,-gdwarf-2
CFLAGS += -D__KERNEL__ -I$(SRC)/include
CFLAGS += -I$(SRC)/../lib/include -I$(INCDIR) -I$(IHKBASE)/smp/arm64/include
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_ARCH_DEP_, $(i)))
CFLAGS += $(foreach i, $(shell seq 1 100), $(addprefix -DPOSTK_DEBUG_TEMP_FIX_, $(i)))
LDFLAGS := -nostdinc -mlittle-endian -Wall -Wundef -Wstrict-prototypes
LDFLAGS += -Wno-trigraphs -fno-strict-aliasing -fno-common
LDFLAGS += -Werror-implicit-function-declaration -Wno-format-security
#LDFLAGS += -std=gnu89 -mgeneral-regs-only -mabi=lp64 -O2
LDFLAGS += -std=gnu89 -mgeneral-regs-only -O2
LDFLAGS += -Wframe-larger-than=2048 -fno-stack-protector
LDFLAGS += -fno-delete-null-pointer-checks -Wno-unused-but-set-variable
LDFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
LDFLAGS += -fno-var-tracking-assignments -g -Wdeclaration-after-statement
LDFLAGS += -Wno-pointer-sign -fno-strict-overflow -fconserve-stack
LDFLAGS += -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time
LDFLAGS += -shared -fno-common -fno-builtin -nostdlib
LDFLAGS += -Wl,-soname=linux-vdso.so.1 -Wl,--hash-style=sysv -Wl,-n -Wl,-T
LDFLAGS += --param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO
LDFLAGS += -D"KBUILD_STR(s)=\#s" -D"KBUILD_BASENAME=KBUILD_STR(vdso.so)"
LDFLAGS += -D"KBUILD_MODNAME=KBUILD_STR(vdso.so)" -D__KERNEL__
DEPSRCS = $(wildcard $(VDSOSRC)/*.c $(VDSOSRC)/*.S)
CFLAGS_lds := -E -P -C -U$(ARCH)
CFLAGS_lds += -nostdinc
CFLAGS_lds += -mlittle-endian
CFLAGS_lds += -D__KERNEL__
CFLAGS_lds += -D__ASSEMBLY__
CFLAGS_lds += -DLINKER_SCRIPT
CFLAGS_lds += -DVDSO_LBASE=0
ifeq ($(HOST_KERNEL_CONFIG_ARM64_4K_PAGES), y)
CFLAGS_lds += -DPAGE_SIZE=0x1000
endif
ifeq ($(HOST_KERNEL_CONFIG_ARM64_16K_PAGES), y)
CFLAGS_lds += -DPAGE_SIZE=0x4000
endif
ifeq ($(HOST_KERNEL_CONFIG_ARM64_64K_PAGES), y)
CFLAGS_lds += -DPAGE_SIZE=0x10000
endif
#load mckernel config (append CPPFLAGS)
include @abs_top_builddir@/../ihk/cokernel/$(TARGETDIR)/Makefile.predefines
default: all
.PHONY: all clean depend prepare
all: depend $(VDSOBUILD)/vdso.so $(VDSOBUILD)/../include/vdso-offsets.h $(VDSOBUILD)/../include/vdso-so-path.h
# Strip rule for the .so file
$(VDSOBUILD)/vdso.so: OBJCOPYFLAGS := -S
$(VDSOBUILD)/vdso.so: $(VDSOBUILD)/vdso.so.dbg
$(objcopy_cmd)
# Generate VDSO offsets using helper script
$(VDSOBUILD)/../include/vdso-offsets.h: $(VDSOBUILD)/vdso.so.dbg
$(call echo_cmd,VDSOSYM,$<)
@mkdir -p $(VDSOBUILD)/../include
@nm $< | sh $(VDSOSRC)/gen_vdso_offsets.sh | LC_ALL=C sort > $@
$(VDSOBUILD)/../include/vdso-so-path.h:
@echo "#define VDSO_SO_PATH \"@abs_builddir@/vdso.so\"" > $@
# Link rule for the .so file, .lds has to be first
$(VDSOBUILD)/vdso.so.dbg: $(VDSOBUILD)/vdso.lds $(DESTOBJS) $(DESTASMOBJS)
$(ld_cmd)
$(VDSOBUILD)/vdso.lds: $(VDSOSRC)/vdso.lds.S
$(lds_cmd)
clean:
$(rm_cmd) $(DESTOBJS) $(DESTASMOBJS) $(VDSOBUILD)/Makefile.dep $(VDSOBUILD)/vdso.* -r $(VDSOBUILD)/../include
depend: $(VDSOBUILD)/Makefile.dep
$(VDSOBUILD)/Makefile.dep:
$(call dep_cmd,$(DEPSRCS))
prepare:
@$(RM) $(VDSOBUILD)/Makefile.dep
-include $(VDSOBUILD)/Makefile.dep
# Actual build commands
ifeq ($(V),1)
echo_cmd =
submake = make
else
echo_cmd = @echo ' ($(TARGET))' $1 $(ECHO_SUFFIX) $2;
submake = make --no-print-directory
endif
cc_cmd = $(call echo_cmd,CC,$<)$(CC) $(CFLAGS) -c -o $@
ld_cmd = $(call echo_cmd,LD,$@)$(CC) $(LDFLAGS) $^ -o $@
dep_cmd = $(call echo_cmd,DEPEND,)$(CC) $(CFLAGS) -MM $1 > $@
rm_cmd = $(call echo_cmd,CLEAN,)$(RM)
objcopy_cmd = $(call echo_cmd,OBJCOPY,$<)$(OBJCOPY) $(OBJCOPYFLAGS) $< $@
lds_cmd = $(call echo_cmd,LDS,$<)$(CC) $(CFLAGS_lds) -c -o $@ $<
$(DESTOBJS):
$(cc_cmd) $(addprefix $(VDSOSRC)/, $(notdir $(@:.o=.c)))
$(DESTASMOBJS):
$(cc_cmd) $(addprefix $(VDSOSRC)/, $(notdir $(@:.o=.S))) -D__ASSEMBLY__

View File

@ -1,17 +0,0 @@
#!/bin/sh
# gen_vdso_offsets.sh COPYRIGHT FUJITSU LIMITED 2016
# @ref.impl arch/arm64/kernel/vdso/gen_vdso_offsets.sh
#
# Match symbols in the DSO that look like VDSO_*; produce a header file
# of constant offsets into the shared object.
#
# Doing this inside the Makefile will break the $(filter-out) function,
# causing Kbuild to rebuild the vdso-offsets header file every time.
#
# Author: Will Deacon <will.deacon@arm.com
#
LC_ALL=C
sed -n -e 's/^00*/0/' -e \
's/^\([0-9a-fA-F]*\) . VDSO_\([a-zA-Z0-9_]*\)$/\#define vdso_offset_\2\t0x\1/p'

View File

@ -1,205 +0,0 @@
/* gettimeofday.c COPYRIGHT FUJITSU LIMITED 2016 */
#include <time.h>
#include <syscall.h>
#include <registers.h>
#include <ihk/atomic.h>
extern int __kernel_gettimeofday(struct timeval *tv, void *tz);
static inline void cpu_pause_for_vsyscall(void)
{
asm volatile ("yield" ::: "memory");
return;
}
static inline void calculate_time_from_tsc(struct timespec *ts,
struct tod_data_s *tod_data)
{
long ver;
unsigned long current_tsc;
__time_t sec_delta;
long ns_delta;
for (;;) {
while ((ver = ihk_atomic64_read(&tod_data->version)) & 1) {
/* settimeofday() is in progress */
cpu_pause_for_vsyscall();
}
rmb();
*ts = tod_data->origin;
rmb();
if (ver == ihk_atomic64_read(&tod_data->version)) {
break;
}
/* settimeofday() has intervened */
cpu_pause_for_vsyscall();
}
current_tsc = rdtsc();
sec_delta = current_tsc / tod_data->clocks_per_sec;
ns_delta = NS_PER_SEC * (current_tsc % tod_data->clocks_per_sec)
/ tod_data->clocks_per_sec;
/* calc. of ns_delta overflows if clocks_per_sec exceeds 18.44 GHz */
ts->tv_sec += sec_delta;
ts->tv_nsec += ns_delta;
if (ts->tv_nsec >= NS_PER_SEC) {
ts->tv_nsec -= NS_PER_SEC;
++ts->tv_sec;
}
return;
}
static inline struct tod_data_s *get_tod_data_addr(void)
{
unsigned long addr;
asm volatile("adr %0, _tod_data\n"
: "=r" (addr)
:
: "memory");
return (struct tod_data_s *)addr;
}
int __kernel_gettimeofday(struct timeval *tv, void *tz)
{
long ret;
struct tod_data_s *tod_data;
struct timespec ats;
if(!tv && !tz) {
/* nothing to do */
return 0;
}
tod_data = get_tod_data_addr();
/* DO it locally if supported */
if (!tz && tod_data->do_local) {
calculate_time_from_tsc(&ats, tod_data);
tv->tv_sec = ats.tv_sec;
tv->tv_usec = ats.tv_nsec / 1000;
return 0;
}
/* Otherwize syscall */
asm volatile("mov w8, %w1\n"
"mov x0, %2\n"
"mov x1, %3\n"
"svc #0\n"
"mov %0, x0\n"
: "=r" (ret)
: "r" (__NR_gettimeofday), "r"(tv), "r"(tz)
: "memory");
if (ret) {
*(int *)0 = 0; /* i.e. raise(SIGSEGV) */
}
return (int)ret;
}
/*
* The IDs of the various system clocks (for POSIX.1b interval timers):
* @ref.impl include/uapi/linux/time.h
*/
// #define CLOCK_REALTIME 0
// #define CLOCK_MONOTONIC 1
// #define CLOCK_PROCESS_CPUTIME_ID 2
// #define CLOCK_THREAD_CPUTIME_ID 3
#define CLOCK_MONOTONIC_RAW 4
#define CLOCK_REALTIME_COARSE 5
#define CLOCK_MONOTONIC_COARSE 6
#define CLOCK_BOOTTIME 7
#define CLOCK_REALTIME_ALARM 8
#define CLOCK_BOOTTIME_ALARM 9
#define CLOCK_SGI_CYCLE 10 /* Hardware specific */
#define CLOCK_TAI 11
#define HIGH_RES_NSEC 1 /* nsec. */
#define CLOCK_REALTIME_RES HIGH_RES_NSEC
#define CLOCK_COARSE_RES ((NS_PER_SEC+CONFIG_HZ/2)/CONFIG_HZ) /* 10,000,000 nsec*/
typedef int clockid_t;
int __kernel_clock_gettime(clockid_t clk_id, struct timespec *tp)
{
long ret;
struct tod_data_s *tod_data;
struct timespec ats;
if (!tp) {
/* nothing to do */
return 0;
}
tod_data = get_tod_data_addr();
/* DO it locally if supported */
if (tod_data->do_local && clk_id == CLOCK_REALTIME) {
calculate_time_from_tsc(&ats, tod_data);
tp->tv_sec = ats.tv_sec;
tp->tv_nsec = ats.tv_nsec;
return 0;
}
/* Otherwize syscall */
asm volatile("mov w8, %w1\n"
"mov x0, %2\n"
"mov x1, %3\n"
"svc #0\n"
"mov %0, x0\n"
: "=r" (ret)
: "r" (__NR_clock_gettime), "r"(clk_id), "r"(tp)
: "memory");
return (int)ret;
}
int __kernel_clock_getres(clockid_t clk_id, struct timespec *res)
{
long ret;
if (!res) {
/* nothing to do */
return 0;
}
switch (clk_id) {
case CLOCK_REALTIME:
case CLOCK_MONOTONIC:
res->tv_sec = 0;
res->tv_nsec = CLOCK_REALTIME_RES;
return 0;
break;
case CLOCK_REALTIME_COARSE:
case CLOCK_MONOTONIC_COARSE:
res->tv_sec = 0;
res->tv_nsec = CLOCK_COARSE_RES;
return 0;
break;
default:
break;
}
/* Otherwise syscall */
asm volatile("mov w8, %w1\n"
"mov x0, %2\n"
"mov x1, %3\n"
"svc #0\n"
"mov %0, x0\n"
: "=r" (ret)
: "r" (__NR_clock_getres), "r"(clk_id), "r"(res)
: "memory");
return (int)ret;
}

View File

@ -1,28 +0,0 @@
/* note.S COPYRIGHT FUJITSU LIMITED 2016 */
/* @ref.impl arch/arm64/kernel/vdso/note.S */
/*
* Copyright (C) 2012 ARM Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
* Author: Will Deacon <will.deacon@arm.com>
*
* This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
* Here we can supply some information useful to userland.
*/
#include <elfnote.h>
ELFNOTE_START(McKernel, 0, "a")
.long 0x10000 /* MCKERNEL_VERSION_CODE */
ELFNOTE_END

View File

@ -1,39 +0,0 @@
/* sigreturn.S COPYRIGHT FUJITSU LIMITED 2016 */
/* @ref.impl arch/arm64/kernel/vdso/sigreturn.S */
/*
* Sigreturn trampoline for returning from a signal when the SA_RESTORER
* flag is not set.
*
* Copyright (C) 2012 ARM Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
* Author: Will Deacon <will.deacon@arm.com>
*/
#include <linkage.h>
#include "syscall.h"
.text
nop
ENTRY(__kernel_rt_sigreturn)
.cfi_startproc
.cfi_signal_frame
.cfi_def_cfa x29, 0
.cfi_offset x29, 0 * 8
.cfi_offset x30, 1 * 8
mov x8, #__NR_rt_sigreturn
svc #0
.cfi_endproc
ENDPROC(__kernel_rt_sigreturn)

View File

@ -1,15 +0,0 @@
/* syscall.h COPYRIGHT FUJITSU LIMITED 2016 */
#ifndef __HEADER_ARM64_VDSO_SYSCALL_H
#define __HEADER_ARM64_VDSO_SYSCALL_H
#define DECLARATOR(number,name) .equ __NR_##name, number
#define SYSCALL_HANDLED(number,name) DECLARATOR(number,name)
#define SYSCALL_DELEGATED(number,name) DECLARATOR(number,name)
#include <syscall_list.h>
#undef DECLARATOR
#undef SYSCALL_HANDLED
#undef SYSCALL_DELEGATED
#endif /* !__HEADER_ARM64_VDSO_SYSCALL_H */

View File

@ -1,96 +0,0 @@
/* vdso.lds.S COPYRIGHT FUJITSU LIMITED 2016 */
/* @ref.impl arch/arm64/kernel/vdso/vdso.lds.S */
/*
* GNU linker script for the VDSO library.
*
* Copyright (C) 2012 ARM Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
* Author: Will Deacon <will.deacon@arm.com>
* Heavily based on the vDSO linker scripts for other archs.
*/
OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64")
OUTPUT_ARCH(aarch64)
SECTIONS
{
PROVIDE(_tod_data = . - PAGE_SIZE);
. = VDSO_LBASE + SIZEOF_HEADERS;
.hash : { *(.hash) } :text
.gnu.hash : { *(.gnu.hash) }
.dynsym : { *(.dynsym) }
.dynstr : { *(.dynstr) }
.gnu.version : { *(.gnu.version) }
.gnu.version_d : { *(.gnu.version_d) }
.gnu.version_r : { *(.gnu.version_r) }
.note : { *(.note.*) } :text :note
. = ALIGN(16);
.text : { *(.text*) } :text =0xd503201f
PROVIDE (__etext = .);
PROVIDE (_etext = .);
PROVIDE (etext = .);
.eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr
.eh_frame : { KEEP (*(.eh_frame)) } :text
.dynamic : { *(.dynamic) } :text :dynamic
.rodata : { *(.rodata*) } :text
_end = .;
PROVIDE(end = .);
/DISCARD/ : {
*(.note.GNU-stack)
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
}
}
/*
* We must supply the ELF program headers explicitly to get just one
* PT_LOAD segment, and set the flags explicitly to make segments read-only.
*/
PHDRS
{
text PT_LOAD FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
note PT_NOTE FLAGS(4); /* PF_R */
eh_frame_hdr PT_GNU_EH_FRAME;
}
/*
* This controls what symbols we export from the DSO.
*/
VERSION
{
LINUX_2.6.39 {
global:
__kernel_rt_sigreturn;
__kernel_gettimeofday;
__kernel_clock_gettime;
__kernel_clock_getres;
local: *;
};
}
/*
* Make the sigreturn code visible to the kernel.
*/
VDSO_sigtramp = __kernel_rt_sigreturn;

View File

@ -9,29 +9,29 @@ PHDRS
SECTIONS
{
. = SIZEOF_HEADERS;
. = ALIGN(4096);
. = ALIGN(4096);
.text : {
*(.text)
*(.text)
} :text
.data : {
*(.data)
*(.data.*)
*(.data)
*(.data.*)
} :data
.rodata : {
*(.rodata .rodata.*)
*(.rodata .rodata.*)
} :data
. = ALIGN(8);
.bss : {
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
_bss_start = .;
*(.bss .bss.*)
_bss_end = .;
. = ALIGN(4096);
_stack_end = .;
} :data
/DISCARD/ : {
*(.eh_frame)
*(.note.gnu.build-id)
*(.eh_frame)
*(.note.gnu.build-id)
}
}
}

View File

@ -1,2 +1,2 @@
IHK_OBJS += cpu.o interrupt.o memory.o trampoline.o local.o context.o
IHK_OBJS += perfctr.o syscall.o vsyscall.o
IHK_OBJS += perfctr.o syscall.o vsyscall.o coredump.o

View File

@ -1,4 +1,4 @@
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
/* coredump.c COPYRIGHT FUJITSU LIMITED 2018 */
#include <process.h>
#include <elfcore.h>
@ -55,5 +55,3 @@ void arch_fill_prstatus(struct elf_prstatus64 *prstatus, struct thread *thread,
prstatus->pr_fpvalid = 0; /* We assume no fp */
}
#endif /* POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,3 +1,4 @@
/* cpu.c COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file cpu.c
* License details are found in the file LICENSE.
@ -31,6 +32,7 @@
#include <prctl.h>
#include <page.h>
#include <kmalloc.h>
#include <debug.h>
#define LAPIC_ID 0x020
#define LAPIC_TIMER 0x320
@ -69,11 +71,8 @@
//#define DEBUG_PRINT_CPU
#ifdef DEBUG_PRINT_CPU
#define dkprintf kprintf
#define ekprintf kprintf
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf kprintf
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
static void *lapic_vp;
@ -93,9 +92,10 @@ void x86_set_warm_reset(unsigned long ip, char *first_page_va);
void x86_init_perfctr(void);
int gettime_local_support = 0;
extern int ihk_mc_pt_print_pte(struct page_table *pt, void *virt);
extern int kprintf(const char *format, ...);
extern int interrupt_from_user(void *);
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
static struct idt_entry{
uint32_t desc[4];
@ -824,11 +824,14 @@ void call_ap_func(void (*next_func)(void))
next_func();
}
struct page_table *get_init_page_table(void);
void setup_x86_ap(void (*next_func)(void))
{
unsigned long rsp;
cpu_disable_interrupt();
ihk_mc_load_page_table(get_init_page_table());
assign_processor_id();
init_smp_processor();
@ -847,9 +850,6 @@ void setup_x86_ap(void (*next_func)(void))
}
void arch_show_interrupt_context(const void *reg);
void set_signal(int sig, void *regs, struct siginfo *info);
void check_signal(unsigned long, void *, int);
void check_sig_pending();
extern void tlb_flush_handler(int vector);
void __show_stack(uintptr_t *sp) {
@ -877,7 +877,7 @@ void interrupt_exit(struct x86_user_context *regs)
cpu_enable_interrupt();
check_sig_pending();
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
else {
check_sig_pending();
@ -892,7 +892,8 @@ void handle_interrupt(int vector, struct x86_user_context *regs)
lapic_ack();
++v->in_interrupt;
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
dkprintf("CPU[%d] got interrupt, vector: %d, RIP: 0x%lX\n",
ihk_mc_get_processor_id(), vector, regs->gpr.rip);
@ -1007,14 +1008,22 @@ void handle_interrupt(int vector, struct x86_user_context *regs)
}
interrupt_exit(regs);
set_cputime(interrupt_from_user(regs)? 0: 1);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
--v->in_interrupt;
/* for migration by IPI */
if (v->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
}
}
void gpe_handler(struct x86_user_context *regs)
{
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
kprintf("General protection fault (err: %lx, %lx:%lx)\n",
regs->gpr.error, regs->gpr.cs, regs->gpr.rip);
arch_show_interrupt_context(regs);
@ -1023,7 +1032,8 @@ void gpe_handler(struct x86_user_context *regs)
}
set_signal(SIGSEGV, regs, NULL);
interrupt_exit(regs);
set_cputime(interrupt_from_user(regs)? 0: 1);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
panic("GPF");
}
@ -1033,7 +1043,8 @@ void debug_handler(struct x86_user_context *regs)
int si_code = 0;
struct siginfo info;
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
#ifdef DEBUG_PRINT_CPU
kprintf("debug exception (err: %lx, %lx:%lx)\n",
regs->gpr.error, regs->gpr.cs, regs->gpr.rip);
@ -1052,14 +1063,16 @@ void debug_handler(struct x86_user_context *regs)
info.si_code = si_code;
set_signal(SIGTRAP, regs, &info);
interrupt_exit(regs);
set_cputime(interrupt_from_user(regs)? 0: 1);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
void int3_handler(struct x86_user_context *regs)
{
struct siginfo info;
set_cputime(interrupt_from_user(regs)? 1: 2);
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
#ifdef DEBUG_PRINT_CPU
kprintf("int3 exception (err: %lx, %lx:%lx)\n",
regs->gpr.error, regs->gpr.cs, regs->gpr.rip);
@ -1070,59 +1083,8 @@ void int3_handler(struct x86_user_context *regs)
info.si_code = TRAP_BRKPT;
set_signal(SIGTRAP, regs, &info);
interrupt_exit(regs);
set_cputime(interrupt_from_user(regs)? 0: 1);
}
void
unhandled_page_fault(struct thread *thread, void *fault_addr, void *regs)
{
const uintptr_t address = (uintptr_t)fault_addr;
struct process_vm *vm = thread->vm;
struct vm_range *range;
unsigned long irqflags;
unsigned long error = ((struct x86_user_context *)regs)->gpr.error;
irqflags = kprintf_lock();
__kprintf("Page fault for 0x%lx\n", address);
__kprintf("%s for %s access in %s mode (reserved bit %s set), "
"it %s an instruction fetch\n",
(error & PF_PROT ? "protection fault" : "no page found"),
(error & PF_WRITE ? "write" : "read"),
(error & PF_USER ? "user" : "kernel"),
(error & PF_RSVD ? "was" : "wasn't"),
(error & PF_INSTR ? "was" : "wasn't"));
range = lookup_process_memory_range(vm, address, address+1);
if (range) {
__kprintf("address is in range, flag: 0x%lx\n",
range->flag);
ihk_mc_pt_print_pte(vm->address_space->page_table, (void*)address);
} else {
__kprintf("address is out of range! \n");
}
kprintf_unlock(irqflags);
/* TODO */
ihk_mc_debug_show_interrupt_context(regs);
if (!(error & PF_USER)) {
panic("panic: kernel mode PF");
}
//dkprintf("now dump a core file\n");
//coredump(proc, regs);
#ifdef DEBUG_PRINT_MEM
{
uint64_t *sp = (void *)REGS_GET_STACK_POINTER(regs);
kprintf("*rsp:%lx,*rsp+8:%lx,*rsp+16:%lx,*rsp+24:%lx,\n",
sp[0], sp[1], sp[2], sp[3]);
}
#endif
return;
set_cputime(interrupt_from_user(regs) ?
CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
}
static void outb(uint8_t v, uint16_t port)
@ -1287,7 +1249,7 @@ void ihk_mc_set_page_fault_handler(void (*h)(void *, uint64_t, void *))
}
extern char trampoline_code_data[], trampoline_code_data_end[];
struct page_table *get_init_page_table(void);
struct page_table *get_boot_page_table(void);
unsigned long get_transit_page_table(void);
/* reusable, but not reentrant */
@ -1311,9 +1273,10 @@ void ihk_mc_boot_cpu(int cpuid, unsigned long pc)
memcpy(p, trampoline_code_data,
trampoline_code_data_end - trampoline_code_data);
p[1] = (unsigned long)virt_to_phys(get_init_page_table());
p[1] = (unsigned long)virt_to_phys(get_boot_page_table());
p[2] = (unsigned long)setup_x86_ap;
p[3] = pc;
p[4] = (unsigned long)get_x86_cpu_local_kstack(cpuid);
p[6] = (unsigned long)get_transit_page_table();
if (!p[6]) {
p[6] = p[1];
@ -1431,13 +1394,11 @@ long ihk_mc_show_cpuinfo(char *buf, size_t buf_size, unsigned long read_off, int
}
#endif /* POSTK_DEBUG_ARCH_DEP_42 */
#ifdef POSTK_DEBUG_ARCH_DEP_23 /* add arch dep. clone_thread() function */
void arch_clone_thread(struct thread *othread, unsigned long pc,
unsigned long sp, struct thread *nthread)
{
return;
}
#endif /* POSTK_DEBUG_ARCH_DEP_23 */
void ihk_mc_print_user_context(ihk_mc_user_context_t *uctx)
{
@ -1541,7 +1502,8 @@ void arch_print_pre_interrupt_stack(const struct x86_basic_regs *regs) {
__print_stack(rbp, regs->rip);
}
void arch_print_stack() {
void arch_print_stack(void)
{
struct stack *rbp;
__kprintf("Approximative stack trace:\n");
@ -1589,6 +1551,13 @@ return;
kprintf_unlock(irqflags);
}
void arch_cpu_stop(void)
{
while (1) {
cpu_halt();
}
}
/*@
@ behavior fs_base:
@ assumes type == IHK_ASR_X86_FS;
@ -1644,12 +1613,10 @@ int ihk_mc_interrupt_cpu(int cpu, int vector)
return 0;
}
#ifdef POSTK_DEBUG_ARCH_DEP_22
extern void perf_start(struct mc_perf_event *event);
extern void perf_reset(struct mc_perf_event *event);
struct thread *arch_switch_context(struct thread *prev, struct thread *next)
{
struct thread *last;
struct mcs_rwlock_node_irqsave lock;
dkprintf("[%d] schedule: tlsblock_base: 0x%lX\n",
ihk_mc_get_processor_id(), next->tlsblock_base);
@ -1668,7 +1635,7 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
#ifdef PROFILE_ENABLE
if (prev->profile && prev->profile_start_ts != 0) {
if (prev && prev->profile && prev->profile_start_ts != 0) {
prev->profile_elapsed_ts +=
(rdtsc() - prev->profile_start_ts);
prev->profile_start_ts = 0;
@ -1680,6 +1647,28 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
#endif
if (prev) {
mcs_rwlock_writer_lock(&prev->proc->update_lock, &lock);
if (prev->proc->status & (PS_DELAY_STOPPED | PS_DELAY_TRACED)) {
switch (prev->proc->status) {
case PS_DELAY_STOPPED:
prev->proc->status = PS_STOPPED;
break;
case PS_DELAY_TRACED:
prev->proc->status = PS_TRACED;
break;
default:
break;
}
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&prev->proc->parent->waitpid_q);
} else {
mcs_rwlock_writer_unlock(&prev->proc->update_lock,
&lock);
}
last = ihk_mc_switch_context(&prev->ctx, &next->ctx, prev);
}
else {
@ -1687,7 +1676,6 @@ struct thread *arch_switch_context(struct thread *prev, struct thread *next)
}
return last;
}
#endif
/*@
@ requires \valid(thread);
@ -1762,14 +1750,6 @@ void copy_fp_regs(struct thread *from, struct thread *to)
}
}
#ifdef POSTK_DEBUG_TEMP_FIX_19
void
clear_fp_regs(struct thread *thread)
{
return;
}
#endif /* POSTK_DEBUG_TEMP_FIX_19 */
/*@
@ requires \valid(thread);
@ assigns thread->fp_regs;
@ -1777,8 +1757,11 @@ clear_fp_regs(struct thread *thread)
void
restore_fp_regs(struct thread *thread)
{
if (!thread->fp_regs)
if (!thread->fp_regs) {
// only clear fpregs.
clear_fp_regs();
return;
}
if (xsave_available) {
unsigned int low, high;
@ -1797,6 +1780,13 @@ restore_fp_regs(struct thread *thread)
//release_fp_regs(thread);
}
void clear_fp_regs(void)
{
struct cpu_local_var *v = get_this_cpu_local_var();
restore_fp_regs(&v->idle);
}
ihk_mc_user_context_t *lookup_user_context(struct thread *thread)
{
ihk_mc_user_context_t *uctx = thread->uctx;

View File

@ -1,546 +0,0 @@
#ifndef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
#include <ihk/debug.h>
#include <kmalloc.h>
#include <cls.h>
#include <list.h>
#include <process.h>
#include <string.h>
#include <elfcore.h>
#define align32(x) ((((x) + 3) / 4) * 4)
#define alignpage(x) ((((x) + (PAGE_SIZE) - 1) / (PAGE_SIZE)) * (PAGE_SIZE))
//#define DEBUG_PRINT_GENCORE
#ifdef DEBUG_PRINT_GENCORE
#define dkprintf(...) kprintf(__VA_ARGS__)
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#endif
/*
* Generate a core file image, which consists of many chunks.
* Returns an allocated table, an etnry of which is a pair of the address
* of a chunk and its length.
*/
/**
* \brief Fill the elf header.
*
* \param eh An Elf64_Ehdr structure.
* \param segs Number of segments of the core file.
*/
void fill_elf_header(Elf64_Ehdr *eh, int segs)
{
eh->e_ident[EI_MAG0] = 0x7f;
eh->e_ident[EI_MAG1] = 'E';
eh->e_ident[EI_MAG2] = 'L';
eh->e_ident[EI_MAG3] = 'F';
eh->e_ident[EI_CLASS] = ELFCLASS64;
eh->e_ident[EI_DATA] = ELFDATA2LSB;
eh->e_ident[EI_VERSION] = El_VERSION;
eh->e_ident[EI_OSABI] = ELFOSABI_NONE;
eh->e_ident[EI_ABIVERSION] = El_ABIVERSION_NONE;
eh->e_type = ET_CORE;
#ifdef CONFIG_MIC
eh->e_machine = EM_K10M;
#else
eh->e_machine = EM_X86_64;
#endif
eh->e_version = EV_CURRENT;
eh->e_entry = 0; /* Do we really need this? */
eh->e_phoff = 64; /* fixed */
eh->e_shoff = 0; /* no section header */
eh->e_flags = 0;
eh->e_ehsize = 64; /* fixed */
eh->e_phentsize = 56; /* fixed */
eh->e_phnum = segs;
eh->e_shentsize = 0;
eh->e_shnum = 0;
eh->e_shstrndx = 0;
}
/**
* \brief Return the size of the prstatus entry of the NOTE segment.
*
*/
int get_prstatus_size(void)
{
return sizeof(struct note) + align32(sizeof("CORE"))
+ align32(sizeof(struct elf_prstatus64));
}
/**
* \brief Fill a prstatus structure.
*
* \param head A pointer to a note structure.
* \param thread A pointer to the current thread structure.
* \param regs0 A pointer to a x86_regs structure.
*/
void fill_prstatus(struct note *head, struct thread *thread, void *regs0)
{
void *name;
struct elf_prstatus64 *prstatus;
struct x86_user_context *uctx = regs0;
struct x86_basic_regs *regs = &uctx->gpr;
register unsigned long _r12 asm("r12");
register unsigned long _r13 asm("r13");
register unsigned long _r14 asm("r14");
register unsigned long _r15 asm("r15");
head->namesz = sizeof("CORE");
head->descsz = sizeof(struct elf_prstatus64);
head->type = NT_PRSTATUS;
name = (void *) (head + 1);
memcpy(name, "CORE", sizeof("CORE"));
prstatus = (struct elf_prstatus64 *)(name + align32(sizeof("CORE")));
/*
We ignore following entries for now.
struct elf_siginfo pr_info;
short int pr_cursig;
a8_uint64_t pr_sigpend;
a8_uint64_t pr_sighold;
pid_t pr_pid;
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
struct prstatus64_timeval pr_utime;
struct prstatus64_timeval pr_stime;
struct prstatus64_timeval pr_cutime;
struct prstatus64_timeval pr_cstime;
*/
prstatus->pr_reg[0] = _r15;
prstatus->pr_reg[1] = _r14;
prstatus->pr_reg[2] = _r13;
prstatus->pr_reg[3] = _r12;
prstatus->pr_reg[4] = regs->rbp;
prstatus->pr_reg[5] = regs->rbx;
prstatus->pr_reg[6] = regs->r11;
prstatus->pr_reg[7] = regs->r10;
prstatus->pr_reg[8] = regs->r9;
prstatus->pr_reg[9] = regs->r8;
prstatus->pr_reg[10] = regs->rax;
prstatus->pr_reg[11] = regs->rcx;
prstatus->pr_reg[12] = regs->rdx;
prstatus->pr_reg[13] = regs->rsi;
prstatus->pr_reg[14] = regs->rdi;
prstatus->pr_reg[15] = regs->rax; /* ??? */
prstatus->pr_reg[16] = regs->rip;
prstatus->pr_reg[17] = regs->cs;
prstatus->pr_reg[18] = regs->rflags;
prstatus->pr_reg[19] = regs->rsp;
prstatus->pr_reg[20] = regs->ss;
prstatus->pr_reg[21] = rdmsr(MSR_FS_BASE);
prstatus->pr_reg[22] = rdmsr(MSR_GS_BASE);
/* There is no ds, es, fs and gs. */
prstatus->pr_fpvalid = 0; /* We assume no fp */
}
/**
* \brief Return the size of the prpsinfo entry of the NOTE segment.
*
*/
int get_prpsinfo_size(void)
{
return sizeof(struct note) + align32(sizeof("CORE"))
+ align32(sizeof(struct elf_prpsinfo64));
}
/**
* \brief Fill a prpsinfo structure.
*
* \param head A pointer to a note structure.
* \param thread A pointer to the current thread structure.
* \param regs A pointer to a x86_regs structure.
*/
void fill_prpsinfo(struct note *head, struct thread *thread, void *regs)
{
void *name;
struct elf_prpsinfo64 *prpsinfo;
head->namesz = sizeof("CORE");
head->descsz = sizeof(struct elf_prpsinfo64);
head->type = NT_PRPSINFO;
name = (void *) (head + 1);
memcpy(name, "CORE", sizeof("CORE"));
prpsinfo = (struct elf_prpsinfo64 *)(name + align32(sizeof("CORE")));
prpsinfo->pr_state = thread->status;
prpsinfo->pr_pid = thread->proc->pid;
/*
We leave most of the fields unfilled.
char pr_sname;
char pr_zomb;
char pr_nice;
a8_uint64_t pr_flag;
unsigned int pr_uid;
unsigned int pr_gid;
int pr_ppid, pr_pgrp, pr_sid;
char pr_fname[16];
char pr_psargs[ELF_PRARGSZ];
*/
}
/**
* \brief Return the size of the AUXV entry of the NOTE segment.
*
*/
int get_auxv_size(void)
{
return sizeof(struct note) + align32(sizeof("CORE"))
+ sizeof(unsigned long) * AUXV_LEN;
}
/**
* \brief Fill an AUXV structure.
*
* \param head A pointer to a note structure.
* \param thread A pointer to the current thread structure.
* \param regs A pointer to a x86_regs structure.
*/
void fill_auxv(struct note *head, struct thread *thread, void *regs)
{
void *name;
void *auxv;
head->namesz = sizeof("CORE");
head->descsz = sizeof(unsigned long) * AUXV_LEN;
head->type = NT_AUXV;
name = (void *) (head + 1);
memcpy(name, "CORE", sizeof("CORE"));
auxv = name + align32(sizeof("CORE"));
memcpy(auxv, thread->proc->saved_auxv, sizeof(unsigned long) * AUXV_LEN);
}
/**
* \brief Return the size of the whole NOTE segment.
*
*/
int get_note_size(void)
{
return get_prstatus_size() + get_prpsinfo_size()
+ get_auxv_size();
}
/**
* \brief Fill the NOTE segment.
*
* \param head A pointer to a note structure.
* \param thread A pointer to the current thread structure.
* \param regs A pointer to a x86_regs structure.
*/
void fill_note(void *note, struct thread *thread, void *regs)
{
fill_prstatus(note, thread, regs);
note += get_prstatus_size();
fill_prpsinfo(note, thread, regs);
note += get_prpsinfo_size();
fill_auxv(note, thread, regs);
}
/**
* \brief Generate an image of the core file.
*
* \param thread A pointer to the current thread structure.
* \param regs A pointer to a x86_regs structure.
* \param coretable(out) An array of core chunks.
* \param chunks(out) Number of the entires of coretable.
*
* A core chunk is represented by a pair of a physical
* address of memory region and its size. If there are
* no corresponding physical address for a VM area
* (an unallocated demand-paging page, e.g.), the address
* should be zero.
*/
/*@
@ requires \valid(thread);
@ requires \valid(regs);
@ requires \valid(coretable);
@ requires \valid(chunks);
@ behavior success:
@ ensures \result == 0;
@ assigns coretable;
@ behavior failure:
@ ensures \result == -1;
@*/
int gencore(struct thread *thread, void *regs,
struct coretable **coretable, int *chunks)
{
struct coretable *ct = NULL;
Elf64_Ehdr eh;
Elf64_Phdr *ph = NULL;
void *note = NULL;
struct vm_range *range, *next;
struct process_vm *vm = thread->vm;
int segs = 1; /* the first one is for NOTE */
int notesize, phsize, alignednotesize;
unsigned int offset = 0;
int i;
*chunks = 3; /* Elf header , header table and NOTE segment */
if (vm == NULL) {
dkprintf("no vm found.\n");
return -1;
}
next = lookup_process_memory_range(vm, 0, -1);
while ((range = next)) {
next = next_process_memory_range(vm, range);
dkprintf("start:%lx end:%lx flag:%lx objoff:%lx\n",
range->start, range->end, range->flag, range->objoff);
/* We omit reserved areas because they are only for
mckernel's internal use. */
if (range->flag & VR_RESERVED)
continue;
if (range->flag & VR_DONTDUMP)
continue;
/* We need a chunk for each page for a demand paging area.
This can be optimized for spacial complexity but we would
lose simplicity instead. */
if (range->flag & VR_DEMAND_PAGING) {
unsigned long p, phys;
int prevzero = 0;
for (p = range->start; p < range->end; p += PAGE_SIZE) {
if (ihk_mc_pt_virt_to_phys(thread->vm->address_space->page_table,
(void *)p, &phys) != 0) {
prevzero = 1;
} else {
if (prevzero == 1)
(*chunks)++;
(*chunks)++;
prevzero = 0;
}
}
if (prevzero == 1)
(*chunks)++;
} else {
(*chunks)++;
}
segs++;
}
dkprintf("we have %d segs and %d chunks.\n\n", segs, *chunks);
{
struct vm_regions region = thread->vm->region;
dkprintf("text: %lx-%lx\n", region.text_start, region.text_end);
dkprintf("data: %lx-%lx\n", region.data_start, region.data_end);
dkprintf("brk: %lx-%lx\n", region.brk_start, region.brk_end);
dkprintf("map: %lx-%lx\n", region.map_start, region.map_end);
dkprintf("stack: %lx-%lx\n", region.stack_start, region.stack_end);
dkprintf("user: %lx-%lx\n\n", region.user_start, region.user_end);
}
dkprintf("now generate a core file image\n");
offset += sizeof(eh);
fill_elf_header(&eh, segs);
/* program header table */
phsize = sizeof(Elf64_Phdr) * segs;
ph = kmalloc(phsize, IHK_MC_AP_NOWAIT);
if (ph == NULL) {
dkprintf("could not alloc a program header table.\n");
goto fail;
}
memset(ph, 0, phsize);
offset += phsize;
/* NOTE segment
* To align the next segment page-sized, we prepare a padded
* region for our NOTE segment.
*/
notesize = get_note_size();
alignednotesize = alignpage(notesize + offset) - offset;
note = kmalloc(alignednotesize, IHK_MC_AP_NOWAIT);
if (note == NULL) {
dkprintf("could not alloc NOTE for core.\n");
goto fail;
}
memset(note, 0, alignednotesize);
fill_note(note, thread, regs);
/* prgram header for NOTE segment is exceptional */
ph[0].p_type = PT_NOTE;
ph[0].p_flags = 0;
ph[0].p_offset = offset;
ph[0].p_vaddr = 0;
ph[0].p_paddr = 0;
ph[0].p_filesz = notesize;
ph[0].p_memsz = notesize;
ph[0].p_align = 0;
offset += alignednotesize;
/* program header for each memory chunk */
i = 1;
next = lookup_process_memory_range(vm, 0, -1);
while ((range = next)) {
next = next_process_memory_range(vm, range);
unsigned long flag = range->flag;
unsigned long size = range->end - range->start;
if (range->flag & VR_RESERVED)
continue;
ph[i].p_type = PT_LOAD;
ph[i].p_flags = ((flag & VR_PROT_READ) ? PF_R : 0)
| ((flag & VR_PROT_WRITE) ? PF_W : 0)
| ((flag & VR_PROT_EXEC) ? PF_X : 0);
ph[i].p_offset = offset;
ph[i].p_vaddr = range->start;
ph[i].p_paddr = 0;
ph[i].p_filesz = size;
ph[i].p_memsz = size;
ph[i].p_align = PAGE_SIZE;
i++;
offset += size;
}
/* coretable to send to host */
ct = kmalloc(sizeof(struct coretable) * (*chunks), IHK_MC_AP_NOWAIT);
if (!ct) {
dkprintf("could not alloc a coretable.\n");
goto fail;
}
ct[0].addr = virt_to_phys(&eh); /* ELF header */
ct[0].len = 64;
dkprintf("coretable[0]: %lx@%lx(%lx)\n", ct[0].len, ct[0].addr, &eh);
ct[1].addr = virt_to_phys(ph); /* program header table */
ct[1].len = phsize;
dkprintf("coretable[1]: %lx@%lx(%lx)\n", ct[1].len, ct[1].addr, ph);
ct[2].addr = virt_to_phys(note); /* NOTE segment */
ct[2].len = alignednotesize;
dkprintf("coretable[2]: %lx@%lx(%lx)\n", ct[2].len, ct[2].addr, note);
i = 3; /* memory segments */
next = lookup_process_memory_range(vm, 0, -1);
while ((range = next)) {
next = next_process_memory_range(vm, range);
unsigned long phys;
if (range->flag & VR_RESERVED)
continue;
if (range->flag & VR_DEMAND_PAGING) {
/* Just an ad hoc kluge. */
unsigned long p, start, phys;
int prevzero = 0;
unsigned long size = 0;
for (start = p = range->start;
p < range->end; p += PAGE_SIZE) {
if (ihk_mc_pt_virt_to_phys(thread->vm->address_space->page_table,
(void *)p, &phys) != 0) {
if (prevzero == 0) {
/* We begin a new chunk */
size = PAGE_SIZE;
start = p;
} else {
/* We extend the previous chunk */
size += PAGE_SIZE;
}
prevzero = 1;
} else {
if (prevzero == 1) {
/* Flush out an empty chunk */
ct[i].addr = 0;
ct[i].len = size;
dkprintf("coretable[%d]: %lx@%lx(%lx)\n", i,
ct[i].len, ct[i].addr, start);
i++;
}
ct[i].addr = phys;
ct[i].len = PAGE_SIZE;
dkprintf("coretable[%d]: %lx@%lx(%lx)\n", i,
ct[i].len, ct[i].addr, p);
i++;
prevzero = 0;
}
}
if (prevzero == 1) {
/* An empty chunk */
ct[i].addr = 0;
ct[i].len = size;
dkprintf("coretable[%d]: %lx@%lx(%lx)\n", i,
ct[i].len, ct[i].addr, start);
i++;
}
} else {
if ((thread->vm->region.user_start <= range->start) &&
(range->end <= thread->vm->region.user_end)) {
if (ihk_mc_pt_virt_to_phys(thread->vm->address_space->page_table,
(void *)range->start, &phys) != 0) {
dkprintf("could not convert user virtual address %lx"
"to physical address", range->start);
goto fail;
}
} else {
phys = virt_to_phys((void *)range->start);
}
ct[i].addr = phys;
ct[i].len = range->end - range->start;
dkprintf("coretable[%d]: %lx@%lx(%lx)\n", i,
ct[i].len, ct[i].addr, range->start);
i++;
}
}
*coretable = ct;
return 0;
fail:
if (ct)
kfree(ct);
if (ph)
kfree(ph);
if (note)
kfree(note);
return -1;
}
/**
* \brief Free all the allocated spaces for an image of the core file.
*
* \param coretable An array of core chunks.
*/
/*@
@ requires \valid(coretable);
@ assigns \nothing;
@*/
void freecore(struct coretable **coretable)
{
struct coretable *ct = *coretable;
kfree(phys_to_virt(ct[2].addr)); /* NOTE segment */
kfree(phys_to_virt(ct[1].addr)); /* ph */
kfree(*coretable);
}
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -64,12 +64,13 @@ static inline int futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval,
return oldval;
}
static inline int futex_atomic_op_inuser(int encoded_op, int __user *uaddr)
static inline int futex_atomic_op_inuser(int encoded_op,
int __user *uaddr)
{
int op = (encoded_op >> 28) & 7;
int cmp = (encoded_op >> 24) & 15;
int oparg = (encoded_op << 8) >> 20;
int cmparg = (encoded_op << 20) >> 20;
int oparg = (encoded_op & 0x00fff000) >> 12;
int cmparg = encoded_op & 0xfff;
int oldval = 0, ret, tem;
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))

View File

@ -6,6 +6,8 @@
#include <ihk/cpu.h>
#include <ihk/atomic.h>
#include <lwk/compiler.h>
#include "config.h"
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@ -14,7 +16,17 @@
int __kprintf(const char *format, ...);
#endif
typedef int ihk_spinlock_t;
typedef unsigned short __ticket_t;
typedef unsigned int __ticketpair_t;
typedef struct ihk_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} ihk_spinlock_t;
extern void preempt_enable(void);
extern void preempt_disable(void);
@ -23,9 +35,61 @@ extern void preempt_disable(void);
static void ihk_mc_spinlock_init(ihk_spinlock_t *lock)
{
*lock = 0;
lock->head_tail = 0;
}
#define SPIN_LOCK_UNLOCKED { .head_tail = 0 }
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock_noirq(l) { int rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock_noirq %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock_noirq(l); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock_noirq\n", ihk_mc_get_processor_id()); rc; \
}
#else
#define ihk_mc_spinlock_trylock_noirq __ihk_mc_spinlock_trylock_noirq
#endif
static int __ihk_mc_spinlock_trylock_noirq(ihk_spinlock_t *lock)
{
ihk_spinlock_t cur = { .head_tail = lock->head_tail };
ihk_spinlock_t next = { .tickets.head = cur.tickets.head, .tickets.tail = cur.tickets.tail + 2 };
int success;
if (cur.tickets.head != cur.tickets.tail) {
return 0;
}
preempt_disable();
/* Use the same increment amount as other functions! */
success = __sync_bool_compare_and_swap((__ticketpair_t*)lock, cur.head_tail, next.head_tail);
if (!success) {
preempt_enable();
}
return success;
}
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_trylock(l, result) ({ unsigned long rc; \
__kprintf("[%d] call ihk_mc_spinlock_trylock %p %s:%d\n", ihk_mc_get_processor_id(), (l), __FILE__, __LINE__); \
rc = __ihk_mc_spinlock_trylock(l, result); \
__kprintf("[%d] ret ihk_mc_spinlock_trylock\n", ihk_mc_get_processor_id()); rc;\
})
#else
#define ihk_mc_spinlock_trylock __ihk_mc_spinlock_trylock
#endif
static unsigned long __ihk_mc_spinlock_trylock(ihk_spinlock_t *lock, int *result)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
*result = __ihk_mc_spinlock_trylock_noirq(lock);
return flags;
}
#define SPIN_LOCK_UNLOCKED 0
#ifdef DEBUG_SPINLOCK
#define ihk_mc_spinlock_lock_noirq(l) { \
@ -39,40 +103,24 @@ __kprintf("[%d] ret ihk_mc_spinlock_lock_noirq\n", ihk_mc_get_processor_id()); \
static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
{
int inc = 0x00010000;
int tmp;
#if 0
asm volatile("lock ; xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
"shrl $16, %0\n\t"
"1:\t"
"cmpl %0, %2\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movzwl %1, %2\n\t"
"jmp 1b\n"
"2:"
: "+Q" (inc), "+m" (*lock), "=r" (tmp) : : "memory", "cc");
#endif
register struct __raw_tickets inc = { .tail = 0x0002 };
preempt_disable();
asm volatile("lock; xaddl %0, %1\n"
"movzwl %w0, %2\n\t"
"shrl $16, %0\n\t"
"1:\t"
"cmpl %0, %2\n\t"
"je 2f\n\t"
"rep ; nop\n\t"
"movzwl %1, %2\n\t"
/* don't need lfence here, because loads are in-order */
"jmp 1b\n"
"2:"
: "+r" (inc), "+m" (*lock), "=&r" (tmp)
:
: "memory", "cc");
asm volatile ("lock xaddl %0, %1\n"
: "+r" (inc), "+m" (*(lock)) : : "memory", "cc");
if (inc.head == inc.tail)
goto out;
for (;;) {
if (*((volatile __ticket_t *)&lock->tickets.head) == inc.tail)
goto out;
cpu_pause();
}
out:
barrier(); /* make sure nothing creeps before the lock is taken */
}
#ifdef DEBUG_SPINLOCK
@ -106,8 +154,11 @@ __kprintf("[%d] ret ihk_mc_spinlock_unlock_noirq\n", ihk_mc_get_processor_id());
#endif
static void __ihk_mc_spinlock_unlock_noirq(ihk_spinlock_t *lock)
{
asm volatile ("lock incw %0" : "+m"(*lock) : : "memory", "cc");
__ticket_t inc = 0x0002;
asm volatile ("lock addw %1, %0\n"
: "+m" (lock->tickets.head) : "ri" (inc) : "memory", "cc");
preempt_enable();
}
@ -132,7 +183,11 @@ typedef struct mcs_lock_node {
unsigned long locked;
struct mcs_lock_node *next;
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_lock_node_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_lock_node_t;
#else
} mcs_lock_node_t;
#endif
typedef mcs_lock_node_t mcs_lock_t;
@ -225,14 +280,22 @@ typedef struct mcs_rwlock_node {
char dmy1; // unused
char dmy2; // unused
struct mcs_rwlock_node *next;
} __attribute__((aligned(64))) mcs_rwlock_node_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_node_t;
#else
} mcs_rwlock_node_t;
#endif
typedef struct mcs_rwlock_node_irqsave {
#ifndef SPINLOCK_IN_MCS_RWLOCK
struct mcs_rwlock_node node;
#endif
unsigned long irqsave;
} __attribute__((aligned(64))) mcs_rwlock_node_irqsave_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_node_irqsave_t;
#else
} mcs_rwlock_node_irqsave_t;
#endif
typedef struct mcs_rwlock_lock {
#ifdef SPINLOCK_IN_MCS_RWLOCK
@ -241,7 +304,11 @@ typedef struct mcs_rwlock_lock {
struct mcs_rwlock_node reader; /* common reader lock */
struct mcs_rwlock_node *node; /* base */
#endif
} __attribute__((aligned(64))) mcs_rwlock_lock_t;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_lock_t;
#else
} mcs_rwlock_lock_t;
#endif
static void
mcs_rwlock_init(struct mcs_rwlock_lock *lock)
@ -602,4 +669,9 @@ __mcs_rwlock_reader_unlock(struct mcs_rwlock_lock *lock, struct mcs_rwlock_node_
#endif
}
static inline int irqflags_can_interrupt(unsigned long flags)
{
return !!(flags & 0x200);
}
#endif

View File

@ -1,3 +1,4 @@
/* arch-memory.h COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file arch-memomry.h
* License details are found in the file LICENSE.
@ -40,18 +41,37 @@
#define LARGE_PAGE_MASK (~((unsigned long)LARGE_PAGE_SIZE - 1))
#define LARGE_PAGE_P2ALIGN (LARGE_PAGE_SHIFT - PAGE_SHIFT)
#define USER_END 0x0000800000000000UL
#define TASK_UNMAPPED_BASE 0x00002AAAAAA00000UL
#define USER_END 0x0000800000000000UL
#define LD_TASK_UNMAPPED_BASE 0x0000155555500000UL
#define TASK_UNMAPPED_BASE 0x00002AAAAAA00000UL
/*
* Canonical negative addresses (i.e., the smallest kernel virtual address)
* on x86 64 bit mode (in its most restricted 48 bit format) starts from
* 0xffff800000000000, but Linux starts mapping physical memory at 0xffff880000000000.
* The 0x80000000000 long gap (8TBs, i.e., 16 PGD level entries in the page tables)
* is used for Xen hyervisor (see arch/x86/include/asm/page.h) and that is
* what we utilize for McKernel.
* This gives us the benefit of being able to use Linux kernel virtual
* addresses identically as in Linux.
*
* NOTE: update these also in eclair.c when modified!
*/
#define MAP_ST_START 0xffff800000000000UL
#define MAP_VMAP_START 0xfffff00000000000UL
#define MAP_FIXED_START 0xffffffff70000000UL
#define MAP_KERNEL_START 0xffffffff80000000UL
#define MAP_VMAP_START 0xffff850000000000UL
#define MAP_FIXED_START 0xffff860000000000UL
#define LINUX_PAGE_OFFSET 0xffff880000000000UL
/*
* MAP_KERNEL_START is 8MB below MODULES_END in Linux.
* Placing the LWK image in the virtual address space at the end of
* the Linux modules section enables us to map the LWK TEXT in Linux
* as well, so that Linux can also call into LWK text.
*/
#define MAP_KERNEL_START 0xFFFFFFFFFE800000UL
#define STACK_TOP(region) ((region)->user_end)
#define MAP_VMAP_SIZE 0x0000000100000000UL
#define KERNEL_PHYS_OFFSET MAP_ST_START
#define PTL4_SHIFT 39
#define PTL4_SIZE (1UL << PTL4_SHIFT)
#define PTL3_SHIFT 30
@ -142,6 +162,10 @@ typedef unsigned long pte_t;
#define PM_PRESENT PM_STATUS(4LL)
#define PM_SWAP PM_STATUS(2LL)
#define USER_STACK_PREPAGE_SIZE LARGE_PAGE_SIZE
#define USER_STACK_PAGE_MASK LARGE_PAGE_MASK
#define USER_STACK_PAGE_P2ALIGN LARGE_PAGE_P2ALIGN
#define USER_STACK_PAGE_SHIFT LARGE_PAGE_SHIFT
/* For easy conversion, it is better to be the same as architecture's ones */
enum ihk_mc_pt_attribute {
@ -314,7 +338,93 @@ static inline void pte_set_dirty(pte_t *ptep, size_t pgsize)
return;
}
static inline int pte_is_contiguous(pte_t *ptep)
{
return 0;
}
static inline int pgsize_is_contiguous(size_t pgsize)
{
return 0;
}
static inline int pgsize_to_tbllv(size_t pgsize)
{
switch (pgsize) {
case PTL1_SIZE: return 1;
case PTL2_SIZE: return 2;
case PTL3_SIZE: return 3;
case PTL4_SIZE: return 4;
default:
#if 0 /* XXX: workaround. cannot use panic() here */
panic("pgsize_to_tbllv");
#else
return 0;
#endif
}
return 0;
}
static inline size_t tbllv_to_pgsize(int level)
{
switch (level) {
case 1: return PTL1_SIZE;
case 2: return PTL2_SIZE;
case 3: return PTL3_SIZE;
case 4: return PTL4_SIZE;
default:
#if 0 /* XXX: workaround. cannot use panic() here */
panic("tbllv_to_pgsize");
#else
return 0;
#endif
}
return 0;
}
static inline size_t tbllv_to_contpgsize(int level)
{
return 0;
}
static inline int tbllv_to_contpgshift(int level)
{
return 0;
}
static inline pte_t *get_contiguous_head(pte_t *__ptep, size_t __pgsize)
{
return __ptep;
}
static inline pte_t *get_contiguous_tail(pte_t *__ptep, size_t __pgsize)
{
return __ptep;
}
static inline int split_contiguous_pages(pte_t *ptep, size_t pgsize)
{
return 0;
}
static inline int page_is_contiguous_head(pte_t *ptep, size_t pgsize)
{
return 0;
}
static inline int page_is_contiguous_tail(pte_t *ptep, size_t pgsize)
{
return 0;
}
struct page_table;
static inline void arch_adjust_allocate_page_size(struct page_table *pt,
uintptr_t fault_addr,
pte_t *ptep,
void **pgaddrp,
size_t *pgsizep)
{
}
void set_pte(pte_t *ppte, unsigned long phys, enum ihk_mc_pt_attribute attr);
pte_t *get_pte(struct page_table *pt, void *virt, enum ihk_mc_pt_attribute attr);

View File

@ -7,7 +7,7 @@
#define IHK_OS_PGSIZE_2MB 1
#define IHK_OS_PGSIZE_1GB 2
extern struct rusage_global *rusage;
extern struct rusage_global rusage;
static inline int rusage_pgsize_to_pgtype(size_t pgsize)
{

View File

@ -50,6 +50,7 @@ struct x86_cpu_local_variables {
struct x86_cpu_local_variables *get_x86_cpu_local_variable(int id);
struct x86_cpu_local_variables *get_x86_this_cpu_local(void);
void *get_x86_cpu_local_kstack(int id);
void *get_x86_this_cpu_kstack(void);

View File

@ -1,4 +1,4 @@
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
/* elf.h COPYRIGHT FUJITSU LIMITED 2018 */
#ifndef __HEADER_X86_COMMON_ELF_H
#define __HEADER_X86_COMMON_ELF_H
@ -56,4 +56,3 @@ struct user_regs64_struct
typedef elf_greg64_t elf_gregset64_t[ELF_NGREG64];
#endif /* __HEADER_S64FX_COMMON_ELF_H */
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,94 +0,0 @@
#ifndef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
/*
* Structures and definitions for ELF core file.
* Extracted from
* System V Application Binary Interface - DRAFT - 10 June 2013,
* http://www.sco.com/developers/gabi/latest/contents.html
*/
typedef uint16_t Elf64_Half;
typedef uint32_t Elf64_Word;
typedef uint64_t Elf64_Xword;
typedef uint64_t Elf64_Addr;
typedef uint64_t Elf64_Off;
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
#define EI_MAG0 0
#define EI_MAG1 1
#define EI_MAG2 2
#define EI_MAG3 3
#define EI_CLASS 4
#define EI_DATA 5
#define EI_VERSION 6
#define EI_OSABI 7
#define EI_ABIVERSION 8
#define EI_PAD 9
#define ELFMAG0 0x7f
#define ELFMAG1 'E'
#define ELFMAG2 'L'
#define ELFMAG3 'F'
#define ELFCLASS64 2 /* 64-bit object */
#define ELFDATA2LSB 1 /* LSB */
#define El_VERSION 1 /* defined to be the same as EV CURRENT */
#define ELFOSABI_NONE 0 /* unspecied */
#define El_ABIVERSION_NONE 0 /* unspecied */
#define ET_CORE 4 /* Core file */
#define EM_X86_64 62 /* AMD x86-64 architecture */
#define EM_K10M 181 /* Intel K10M */
#define EV_CURRENT 1 /* Current version */
typedef struct {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;
#define PT_LOAD 1
#define PT_NOTE 4
#define PF_X 1 /* executable bit */
#define PF_W 2 /* writable bit */
#define PF_R 4 /* readable bit */
struct note {
Elf64_Word namesz;
Elf64_Word descsz;
Elf64_Word type;
/* name char[namesz] and desc[descsz] */
};
#define NT_PRSTATUS 1
#define NT_PRFRPREG 2
#define NT_PRPSINFO 3
#define NT_AUXV 6
#define NT_X86_STATE 0x202
#include "elfcoregpl.h"
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,96 +0,0 @@
#ifndef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
/*
* Structures and defines from GPLed file.
*/
#define pid_t int
/* From /usr/include/linux/elfcore.h of Linux */
#define ELF_PRARGSZ (80)
/* From /usr/include/linux/elfcore.h fro Linux */
struct elf_siginfo
{
int si_signo;
int si_code;
int si_errno;
};
/* From bfd/hosts/x86-64linux.h of gdb. */
typedef uint64_t __attribute__ ((__aligned__ (8))) a8_uint64_t;
typedef a8_uint64_t elf_greg64_t;
struct user_regs64_struct
{
a8_uint64_t r15;
a8_uint64_t r14;
a8_uint64_t r13;
a8_uint64_t r12;
a8_uint64_t rbp;
a8_uint64_t rbx;
a8_uint64_t r11;
a8_uint64_t r10;
a8_uint64_t r9;
a8_uint64_t r8;
a8_uint64_t rax;
a8_uint64_t rcx;
a8_uint64_t rdx;
a8_uint64_t rsi;
a8_uint64_t rdi;
a8_uint64_t orig_rax;
a8_uint64_t rip;
a8_uint64_t cs;
a8_uint64_t eflags;
a8_uint64_t rsp;
a8_uint64_t ss;
a8_uint64_t fs_base;
a8_uint64_t gs_base;
a8_uint64_t ds;
a8_uint64_t es;
a8_uint64_t fs;
a8_uint64_t gs;
};
#define ELF_NGREG64 (sizeof (struct user_regs64_struct) / sizeof(elf_greg64_t))
typedef elf_greg64_t elf_gregset64_t[ELF_NGREG64];
struct prstatus64_timeval
{
a8_uint64_t tv_sec;
a8_uint64_t tv_usec;
};
struct elf_prstatus64
{
struct elf_siginfo pr_info;
short int pr_cursig;
a8_uint64_t pr_sigpend;
a8_uint64_t pr_sighold;
pid_t pr_pid;
pid_t pr_ppid;
pid_t pr_pgrp;
pid_t pr_sid;
struct prstatus64_timeval pr_utime;
struct prstatus64_timeval pr_stime;
struct prstatus64_timeval pr_cutime;
struct prstatus64_timeval pr_cstime;
elf_gregset64_t pr_reg;
int pr_fpvalid;
};
struct elf_prpsinfo64
{
char pr_state;
char pr_sname;
char pr_zomb;
char pr_nice;
a8_uint64_t pr_flag;
unsigned int pr_uid;
unsigned int pr_gid;
int pr_pid, pr_ppid, pr_pgrp, pr_sid;
char pr_fname[16];
char pr_psargs[ELF_PRARGSZ];
};
#endif /* !POSTK_DEBUG_ARCH_DEP_18 */

View File

@ -1,5 +1,4 @@
/* hwcap.h COPYRIGHT FUJITSU LIMITED 2017 */
#ifdef POSTK_DEBUG_ARCH_DEP_65
/* hwcap.h COPYRIGHT FUJITSU LIMITED 2017-2018 */
#ifndef _UAPI__ASM_HWCAP_H
#define _UAPI__ASM_HWCAP_H
@ -9,4 +8,3 @@ static unsigned long arch_get_hwcap(void)
}
#endif /* _UAPI__ASM_HWCAP_H */
#endif /* POSTK_DEBUG_ARCH_DEP_65 */

View File

@ -1,3 +1,4 @@
/* types.h COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file types.h
* Licence details are found in the file LICENSE.
@ -29,13 +30,11 @@ typedef uint64_t size_t;
typedef int64_t ssize_t;
typedef int64_t off_t;
#ifdef POSTK_DEBUG_ARCH_DEP_18 /* coredump arch separation. */
typedef int32_t key_t;
typedef uint32_t uid_t;
typedef uint32_t gid_t;
typedef int64_t time_t;
typedef int32_t pid_t;
#endif /* POSTK_DEBUG_ARCH_DEP_18 */
#define NULL ((void *)0)

View File

@ -9,6 +9,9 @@
#ifndef __ARCH_PRCTL_H
#define __ARCH_PRCTL_H
#define PR_SET_THP_DISABLE 41
#define PR_GET_THP_DISABLE 42
#define ARCH_SET_GS 0x1001
#define ARCH_SET_FS 0x1002
#define ARCH_GET_FS 0x1003

View File

@ -1,4 +1,4 @@
/* syscall_list.h COPYRIGHT FUJITSU LIMITED 2017 */
/* syscall_list.h COPYRIGHT FUJITSU LIMITED 2017-2018 */
/**
* \file syscall_list.h
* License details are found in the file LICENSE.
@ -109,12 +109,13 @@ SYSCALL_HANDLED(149, mlock)
SYSCALL_HANDLED(150, munlock)
SYSCALL_HANDLED(151, mlockall)
SYSCALL_HANDLED(152, munlockall)
SYSCALL_HANDLED(157, prctl)
SYSCALL_HANDLED(158, arch_prctl)
SYSCALL_HANDLED(160, setrlimit)
SYSCALL_HANDLED(164, settimeofday)
SYSCALL_HANDLED(186, gettid)
SYSCALL_HANDLED(200, tkill)
SYSCALL_DELEGATED(201, time)
SYSCALL_HANDLED(201, time)
SYSCALL_HANDLED(202, futex)
SYSCALL_HANDLED(203, sched_setaffinity)
SYSCALL_HANDLED(204, sched_getaffinity)
@ -133,9 +134,19 @@ SYSCALL_HANDLED(238, set_mempolicy)
SYSCALL_HANDLED(239, get_mempolicy)
SYSCALL_HANDLED(247, waitid)
SYSCALL_HANDLED(256, migrate_pages)
#ifdef POSTK_DEBUG_ARCH_DEP_62 /* Absorb the difference between open and openat args. */
SYSCALL_HANDLED(257, openat)
#endif /* POSTK_DEBUG_ARCH_DEP_62 */
SYSCALL_DELEGATED(258, mkdirat)
SYSCALL_DELEGATED(259, mknodat)
SYSCALL_DELEGATED(260, fchownat)
SYSCALL_DELEGATED(261, futimesat)
SYSCALL_DELEGATED(262, newfstatat)
SYSCALL_DELEGATED(263, unlinkat)
SYSCALL_DELEGATED(264, renameat)
SYSCALL_DELEGATED(265, linkat)
SYSCALL_DELEGATED(266, symlinkat)
SYSCALL_DELEGATED(267, readlinkat)
SYSCALL_DELEGATED(268, fchmodat)
SYSCALL_DELEGATED(269, faccessat)
SYSCALL_DELEGATED(270, pselect6)
SYSCALL_DELEGATED(271, ppoll)
SYSCALL_HANDLED(273, set_robust_list)
@ -161,6 +172,7 @@ SYSCALL_HANDLED(__NR_profile, profile)
SYSCALL_HANDLED(730, util_migrate_inter_kernel)
SYSCALL_HANDLED(731, util_indicate_clone)
SYSCALL_HANDLED(732, get_system)
SYSCALL_HANDLED(733, util_register_desc)
/* McKernel Specific */
SYSCALL_HANDLED(801, swapout)

View File

@ -49,7 +49,7 @@ struct x86_cpu_local_variables *get_x86_cpu_local_variable(int id)
((char *)locals + (LOCALS_SPAN * id));
}
static void *get_x86_cpu_local_kstack(int id)
void *get_x86_cpu_local_kstack(int id)
{
return ((char *)locals + (LOCALS_SPAN * (id + 1)));
}
@ -107,9 +107,17 @@ void init_boot_processor_local(void)
@ ensures \result == %gs;
@ assigns \nothing;
*/
extern int num_processors;
int ihk_mc_get_processor_id(void)
{
int id;
void *gs;
gs = (void *)rdmsr(MSR_GS_BASE);
if (gs < (void *)locals ||
gs > ((void *)locals + LOCALS_SPAN * num_processors)) {
return -1;
}
asm volatile("movl %%gs:0, %0" : "=r"(id));

View File

@ -1,3 +1,4 @@
/* memory.c COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file memory.c
* License details are found in the file LICENSE.
@ -25,15 +26,13 @@
#include <cls.h>
#include <kmalloc.h>
#include <rusage_private.h>
#include <debug.h>
//#define DEBUG
#ifdef DEBUG
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
static char *last_page;
@ -109,6 +108,8 @@ struct page_table {
};
static struct page_table *init_pt;
static struct page_table *boot_pt;
static int init_pt_loaded = 0;
static ihk_spinlock_t init_pt_lock;
static int use_1gb_page = 0;
@ -167,30 +168,6 @@ static unsigned long setup_l3(struct page_table *pt,
return virt_to_phys(pt);
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys, pt_phys;
int ident_index, virt_index;
map_start = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0);
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
kprintf("map_start = %lx, map_end = %lx\n", map_start, map_end);
ident_index = map_start >> PTL4_SHIFT;
virt_index = (MAP_ST_START >> PTL4_SHIFT) & (PT_ENTRIES - 1);
memset(pt, 0, sizeof(struct page_table));
for (phys = (map_start & ~(PTL4_SIZE - 1)); phys < map_end;
phys += PTL4_SIZE) {
pt_phys = setup_l3(ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL), phys,
map_start, map_end);
pt->entry[ident_index++] = pt_phys | PFL4_PDIR_ATTR;
pt->entry[virt_index++] = pt_phys | PFL4_PDIR_ATTR;
}
}
static struct page_table *__alloc_new_pt(ihk_mc_ap_flag ap_flag)
{
struct page_table *newpt = ihk_mc_alloc_pages(1, ap_flag);
@ -258,6 +235,11 @@ static unsigned long attr_to_l1attr(enum ihk_mc_pt_attribute attr)
}
}
#define PTLX_SHIFT(index) PTL ## index ## _SHIFT
#define GET_VIRT_INDEX(virt, index, dest) \
dest = ((virt) >> PTLX_SHIFT(index)) & (PT_ENTRIES - 1)
#define GET_VIRT_INDICES(virt, l4i, l3i, l2i, l1i) \
l4i = ((virt) >> PTL4_SHIFT) & (PT_ENTRIES - 1); \
l3i = ((virt) >> PTL3_SHIFT) & (PT_ENTRIES - 1); \
@ -1518,12 +1500,14 @@ static int clear_range_l1(void *args0, pte_t *ptep, uint64_t base,
if (page) {
dkprintf("%s: page=%p,is_in_memobj=%d,(old & PFL1_DIRTY)=%lx,memobj=%p,args->memobj->flags=%x\n", __FUNCTION__, page, page_is_in_memobj(page), (old & PFL1_DIRTY), args->memobj, args->memobj ? args->memobj->flags : -1);
}
if (page && page_is_in_memobj(page) && (old & PFL1_DIRTY) && (args->memobj) &&
!(args->memobj->flags & MF_ZEROFILL)) {
if (page && page_is_in_memobj(page) &&
pte_is_dirty(&old, PTL1_SIZE) && args->memobj &&
!(args->memobj->flags & (MF_ZEROFILL | MF_PRIVATE))) {
memobj_flush_page(args->memobj, phys, PTL1_SIZE);
}
if (!(old & PFL1_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL1_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -1585,11 +1569,13 @@ static int clear_range_l2(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && (old & PFL2_DIRTY)) {
if (page && page_is_in_memobj(page) &&
pte_is_dirty(&old, PTL2_SIZE) && args->memobj &&
!(args->memobj->flags & (MF_ZEROFILL | MF_PRIVATE))) {
memobj_flush_page(args->memobj, phys, PTL2_SIZE);
}
if (!(old & PFL2_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL2_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -1666,13 +1652,15 @@ static int clear_range_l3(void *args0, pte_t *ptep, uint64_t base,
page = phys_to_page(phys);
}
if (page && page_is_in_memobj(page) && (old & PFL3_DIRTY)) {
if (page && page_is_in_memobj(page) &&
pte_is_dirty(&old, PTL3_SIZE) && args->memobj &&
!(args->memobj->flags & (MF_ZEROFILL | MF_PRIVATE))) {
memobj_flush_page(args->memobj, phys, PTL3_SIZE);
}
dkprintf("%s: phys=%ld, pte_get_phys(&old),PTL3_SIZE\n", __FUNCTION__, pte_get_phys(&old));
if (!(old & PFL3_FILEOFF)) {
if (!pte_is_fileoff(&old, PTL3_SIZE)) {
if(args->free_physical) {
if (!page) {
/* Anonymous || !XPMEM attach */
@ -2259,7 +2247,7 @@ out:
int ihk_mc_pt_set_range(page_table_t pt, struct process_vm *vm, void *start,
void *end, uintptr_t phys, enum ihk_mc_pt_attribute attr,
int pgshift, struct vm_range *range)
int pgshift, struct vm_range *range, int overwrite)
{
int error;
struct set_range_args args;
@ -2468,7 +2456,7 @@ static int move_one_page(void *arg0, page_table_t pt, pte_t *ptep,
attr = apte & ~PT_PHYSMASK;
error = ihk_mc_pt_set_range(pt, args->vm, (void *)dest,
(void *)(dest + pgsize), phys, attr, pgshift, args->range);
(void *)(dest + pgsize), phys, attr, pgshift, args->range, 0);
if (error) {
kprintf("move_one_page(%p,%p,%p %#lx,%p,%d):"
"set failed. %d\n",
@ -2532,6 +2520,11 @@ struct page_table *get_init_page_table(void)
return init_pt;
}
struct page_table *get_boot_page_table(void)
{
return boot_pt;
}
static unsigned long fixed_virt;
static void init_fixed_area(struct page_table *pt)
{
@ -2540,6 +2533,84 @@ static void init_fixed_area(struct page_table *pt)
return;
}
static void init_normal_area(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
map_start = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0);
map_end = ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0);
virt = (void *)MAP_ST_START + map_start;
kprintf("map_start = %lx, map_end = %lx, virt %lx\n",
map_start, map_end, virt);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n",
__func__, virt);
}
virt += LARGE_PAGE_SIZE;
}
}
extern char *find_command_line(char *name);
static void init_linux_kernel_mapping(struct page_table *pt)
{
unsigned long map_start, map_end, phys;
void *virt;
int nr_memory_chunks, chunk_id, numa_id;
/* In case of safe_kernel_map option (safe_kernel_map == 1),
* processing to prevent destruction of the memory area on Linux side
* is executed */
if (find_command_line("safe_kernel_map") == NULL) {
kprintf("Straight-map entire physical memory\n");
/* Map 2 TB for now */
map_start = 0;
map_end = 0x20000000000;
virt = (void *)LINUX_PAGE_OFFSET;
kprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET, LINUX_PAGE_OFFSET + map_end, 0, map_end);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: error setting mapping for 0x%lx\n", __FUNCTION__, virt);
}
virt += LARGE_PAGE_SIZE;
}
} else {
kprintf("Straight-map physical memory areas allocated to McKernel\n");
nr_memory_chunks = ihk_mc_get_nr_memory_chunks();
if (nr_memory_chunks == 0) {
kprintf("%s: ERROR: No memory chunk available.\n", __FUNCTION__);
return;
}
for (chunk_id = 0; chunk_id < nr_memory_chunks; chunk_id++) {
if (ihk_mc_get_memory_chunk(chunk_id, &map_start, &map_end, &numa_id)) {
kprintf("%s: ERROR: Memory chunk id (%d) out of range.\n", __FUNCTION__, chunk_id);
continue;
}
dkprintf("Linux kernel virtual: 0x%lx - 0x%lx -> 0x%lx - 0x%lx\n",
LINUX_PAGE_OFFSET + map_start, LINUX_PAGE_OFFSET + map_end, map_start, map_end);
virt = (void *)(LINUX_PAGE_OFFSET + map_start);
for (phys = map_start; phys < map_end; phys += LARGE_PAGE_SIZE, virt += LARGE_PAGE_SIZE) {
if (set_pt_large_page(pt, virt, phys, PTATTR_WRITABLE) != 0) {
kprintf("%s: set_pt_large_page() failed for 0x%lx\n", __FUNCTION__, virt);
}
}
}
}
}
void init_text_area(struct page_table *pt)
{
unsigned long __end, phys, virt;
@ -2624,17 +2695,27 @@ void init_page_table(void)
init_pt = ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL);
ihk_mc_spinlock_init(&init_pt_lock);
memset(init_pt, 0, sizeof(PAGE_SIZE));
memset(init_pt, 0, sizeof(*init_pt));
/* Normal memory area */
init_normal_area(init_pt);
init_linux_kernel_mapping(init_pt);
init_fixed_area(init_pt);
init_low_area(init_pt);
init_text_area(init_pt);
init_vsyscall_area(init_pt);
/* boot page table: needs zero mapping in order to execute the next
* instruction that jumps into regular regions
*/
boot_pt = ihk_mc_alloc_pages(1, IHK_MC_AP_CRITICAL);
memcpy(boot_pt, init_pt, sizeof(*boot_pt));
init_low_area(boot_pt);
if (memcmp(init_pt, boot_pt, sizeof(*init_pt)) == 0)
panic("init low area for boot pt did not affect toplevel entry");
load_page_table(init_pt);
kprintf("Page table is now at %p\n", init_pt);
init_pt_loaded = 1;
kprintf("Page table is now at 0x%lx\n", init_pt);
}
extern void __reserve_arch_pages(unsigned long, unsigned long,
@ -2662,17 +2743,33 @@ void ihk_mc_reserve_arch_pages(struct ihk_page_allocator_desc *pa_allocator,
unsigned long virt_to_phys(void *v)
{
unsigned long va = (unsigned long)v;
if (va >= MAP_KERNEL_START) {
dkprintf("%s: MAP_KERNEL_START <= 0x%lx <= LINUX_PAGE_OFFSET\n",
__FUNCTION__, va);
return va - MAP_KERNEL_START + x86_kernel_phys_base;
} else {
}
else if (va >= LINUX_PAGE_OFFSET) {
return va - LINUX_PAGE_OFFSET;
}
else if (va >= MAP_FIXED_START) {
return va - MAP_FIXED_START;
}
else {
dkprintf("%s: MAP_ST_START <= 0x%lx <= MAP_FIXED_START\n",
__FUNCTION__, va);
return va - MAP_ST_START;
}
}
void *phys_to_virt(unsigned long p)
{
return (void *)(p + MAP_ST_START);
/* Before loading our own PT use straight mapping */
if (!init_pt_loaded) {
return (void *)(p + MAP_ST_START);
}
return (void *)(p + LINUX_PAGE_OFFSET);
}
int copy_from_user(void *dst, const void *src, size_t siz)
@ -2840,17 +2937,12 @@ int read_process_vm(struct process_vm *vm, void *kdst, const void *usrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, to: %p, pa: %p,"
"cpsize: %d\n", __FUNCTION__, to, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(to, va, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -2924,17 +3016,12 @@ int write_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);
@ -2995,17 +3082,12 @@ int patch_process_vm(struct process_vm *vm, void *udst, const void *ksrc, size_t
return error;
}
#ifdef POSTK_DEBUG_TEMP_FIX_52 /* NUMA support(memory area determination) */
if (!is_mckernel_memory(pa)) {
#else
if (pa < ihk_mc_get_memory_address(IHK_MC_GMA_MAP_START, 0) ||
pa >= ihk_mc_get_memory_address(IHK_MC_GMA_MAP_END, 0)) {
#endif /* POSTK_DEBUG_TEMP_FIX_52 */
if (!is_mckernel_memory(pa, pa + cpsize)) {
dkprintf("%s: pa is outside of LWK memory, from: %p,"
"pa: %p, cpsize: %d\n", __FUNCTION__, from, pa, cpsize);
va = ihk_mc_map_virtual(pa, 1, PTATTR_ACTIVE);
memcpy(va, from, cpsize);
ihk_mc_unmap_virtual(va, 1, 1);
ihk_mc_unmap_virtual(va, 1);
}
else {
va = phys_to_virt(pa);

View File

@ -30,7 +30,7 @@ int ihk_mc_ikc_init_first_local(struct ihk_ikc_channel_desc *channel,
memset(channel, 0, sizeof(struct ihk_ikc_channel_desc));
mikc_queue_pages = ((2 * num_processors * MASTER_IKCQ_PKTSIZE)
mikc_queue_pages = ((4 * num_processors * MASTER_IKCQ_PKTSIZE)
+ (PAGE_SIZE - 1)) / PAGE_SIZE;
/* Place both sides in this side */

View File

@ -1,3 +1,4 @@
/* perfctr.c COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file perfctr.c
* License details are found in the file LICENSE.
@ -16,20 +17,16 @@
#include <registers.h>
#include <mc_perf_event.h>
#include <config.h>
#include <debug.h>
extern unsigned int *x86_march_perfmap;
extern int running_on_kvm(void);
#ifdef POSTK_DEBUG_TEMP_FIX_31
int ihk_mc_perfctr_fixed_init(int counter, int mode);
#endif/*POSTK_DEBUG_TEMP_FIX_31*/
static int ihk_mc_perfctr_fixed_init(int counter, int mode);
//#define PERFCTR_DEBUG
#ifdef PERFCTR_DEBUG
#define dkprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#else
#define dkprintf(...) do { } while (0)
#define ekprintf(...) do { kprintf(__VA_ARGS__); } while (0)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
#define X86_CR4_PCE 0x00000100
@ -43,11 +40,11 @@ int ihk_mc_perfctr_fixed_init(int counter, int mode);
} \
} while(0)
int perf_counters_discovered = 0;
int X86_IA32_NUM_PERF_COUNTERS = 0;
unsigned long X86_IA32_PERF_COUNTERS_MASK = 0;
int X86_IA32_NUM_FIXED_PERF_COUNTERS = 0;
unsigned long X86_IA32_FIXED_PERF_COUNTERS_MASK = 0;
int perf_counters_discovered;
int NUM_PERF_COUNTERS;
unsigned long PERF_COUNTERS_MASK;
int NUM_FIXED_PERF_COUNTERS;
unsigned long FIXED_PERF_COUNTERS_MASK;
void x86_init_perfctr(void)
{
@ -78,17 +75,17 @@ void x86_init_perfctr(void)
op = 0x0a;
asm volatile("cpuid" : "=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"a"(op));
X86_IA32_NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
X86_IA32_PERF_COUNTERS_MASK = (1 << X86_IA32_NUM_PERF_COUNTERS) - 1;
NUM_PERF_COUNTERS = ((eax & 0xFF00) >> 8);
PERF_COUNTERS_MASK = (1 << NUM_PERF_COUNTERS) - 1;
X86_IA32_NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
X86_IA32_FIXED_PERF_COUNTERS_MASK =
((1UL << X86_IA32_NUM_FIXED_PERF_COUNTERS) - 1) <<
X86_IA32_BASE_FIXED_PERF_COUNTERS;
NUM_FIXED_PERF_COUNTERS = (edx & 0x0F);
FIXED_PERF_COUNTERS_MASK =
((1UL << NUM_FIXED_PERF_COUNTERS) - 1) <<
BASE_FIXED_PERF_COUNTERS;
perf_counters_discovered = 1;
kprintf("X86_IA32_NUM_PERF_COUNTERS: %d, X86_IA32_NUM_FIXED_PERF_COUNTERS: %d\n",
X86_IA32_NUM_PERF_COUNTERS, X86_IA32_NUM_FIXED_PERF_COUNTERS);
kprintf("NUM_PERF_COUNTERS: %d, NUM_FIXED_PERF_COUNTERS: %d\n",
NUM_PERF_COUNTERS, NUM_FIXED_PERF_COUNTERS);
}
/* Clear Fixed Counter Control */
@ -97,20 +94,20 @@ void x86_init_perfctr(void)
wrmsr(MSR_PERF_FIXED_CTRL, value);
/* Clear Generic Counter Control */
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
wrmsr(MSR_IA32_PERFEVTSEL0 + i, 0);
}
/* Enable PMC Control */
value = rdmsr(MSR_PERF_GLOBAL_CTRL);
value |= X86_IA32_PERF_COUNTERS_MASK;
value |= X86_IA32_FIXED_PERF_COUNTERS_MASK;
value |= PERF_COUNTERS_MASK;
value |= FIXED_PERF_COUNTERS_MASK;
wrmsr(MSR_PERF_GLOBAL_CTRL, value);
}
static int set_perfctr_x86_direct(int counter, int mode, unsigned int value)
{
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
@ -149,13 +146,14 @@ static int set_pmc_x86_direct(int counter, long val)
val &= 0x000000ffffffffff; // 40bit Mask
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// set generic pmc
wrmsr(MSR_IA32_PMC0 + counter, val);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// set fixed pmc
wrmsr(MSR_IA32_FIXED_CTR0 + counter - X86_IA32_BASE_FIXED_PERF_COUNTERS, val);
wrmsr(MSR_IA32_FIXED_CTR0 +
counter - BASE_FIXED_PERF_COUNTERS, val);
}
else {
return -EINVAL;
@ -175,10 +173,10 @@ static int set_fixed_counter(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@ -208,14 +206,13 @@ int ihk_mc_perfctr_init_raw(int counter, uint64_t config, int mode)
int ihk_mc_perfctr_init_raw(int counter, unsigned int code, int mode)
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
{
#ifdef POSTK_DEBUG_TEMP_FIX_31
// PAPI_REF_CYC counted by fixed counter
if (counter >= X86_IA32_BASE_FIXED_PERF_COUNTERS) {
if (counter >= BASE_FIXED_PERF_COUNTERS &&
counter < BASE_FIXED_PERF_COUNTERS + NUM_FIXED_PERF_COUNTERS) {
return ihk_mc_perfctr_fixed_init(counter, mode);
}
#endif /*POSTK_DEBUG_TEMP_FIX_31*/
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
@ -248,7 +245,7 @@ int ihk_mc_perfctr_init(int counter, enum ihk_perfctr_type type, int mode)
}
#endif /*POSTK_DEBUG_TEMP_FIX_29*/
if (counter < 0 || counter >= X86_IA32_NUM_PERF_COUNTERS) {
if (counter < 0 || counter >= NUM_PERF_COUNTERS) {
return -EINVAL;
}
if (type < 0 || type >= PERFCTR_MAX_TYPE) {
@ -300,18 +297,11 @@ int ihk_mc_perfctr_set_extra(struct mc_perf_event *event)
extern void x86_march_perfctr_start(unsigned long counter_mask);
#endif
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_start(int counter)
#else
int ihk_mc_perfctr_start(unsigned long counter_mask)
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
{
int ret = 0;
unsigned long value = 0;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif /*POSTK_DEBUG_TEMP_FIX_30*/
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@ -328,18 +318,11 @@ int ihk_mc_perfctr_start(unsigned long counter_mask)
goto fn_exit;
}
#ifdef POSTK_DEBUG_TEMP_FIX_30
int ihk_mc_perfctr_stop(int counter)
#else
int ihk_mc_perfctr_stop(unsigned long counter_mask)
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
int ihk_mc_perfctr_stop(unsigned long counter_mask, int flags)
{
int ret = 0;
unsigned long value;
unsigned long mask = X86_IA32_PERF_COUNTERS_MASK | X86_IA32_FIXED_PERF_COUNTERS_MASK;
#ifdef POSTK_DEBUG_TEMP_FIX_30
unsigned long counter_mask = 1UL << counter;
#endif/*POSTK_DEBUG_TEMP_FIX_30*/
unsigned long mask = PERF_COUNTERS_MASK | FIXED_PERF_COUNTERS_MASK;
PERFCTR_CHKANDJUMP(counter_mask & ~mask, "counter_mask out of range", -EINVAL);
@ -372,14 +355,14 @@ int ihk_mc_perfctr_stop(unsigned long counter_mask)
}
// init for fixed counter
int ihk_mc_perfctr_fixed_init(int counter, int mode)
static int ihk_mc_perfctr_fixed_init(int counter, int mode)
{
unsigned long value = 0;
unsigned int ctr_mask = 0xf;
int counter_idx = counter - X86_IA32_BASE_FIXED_PERF_COUNTERS ;
int counter_idx = counter - BASE_FIXED_PERF_COUNTERS;
unsigned int set_val = 0;
if (counter_idx < 0 || counter_idx >= X86_IA32_NUM_FIXED_PERF_COUNTERS) {
if (counter_idx < 0 || counter_idx >= NUM_FIXED_PERF_COUNTERS) {
return -EINVAL;
}
@ -420,7 +403,7 @@ int ihk_mc_perfctr_read_mask(unsigned long counter_mask, unsigned long *value)
{
int i, j;
for (i = 0, j = 0; i < X86_IA32_NUM_PERF_COUNTERS && counter_mask;
for (i = 0, j = 0; i < NUM_PERF_COUNTERS && counter_mask;
i++, counter_mask >>= 1) {
if (counter_mask & 1) {
value[j++] = rdpmc(i);
@ -440,13 +423,14 @@ unsigned long ihk_mc_perfctr_read(int counter)
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// read generic pmc
retval = rdpmc(counter);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// read fixed pmc
retval = rdpmc((1 << 30) + (counter - X86_IA32_BASE_FIXED_PERF_COUNTERS));
retval = rdpmc((1 << 30) +
(counter - BASE_FIXED_PERF_COUNTERS));
}
else {
retval = -EINVAL;
@ -468,12 +452,12 @@ unsigned long ihk_mc_perfctr_read_msr(int counter)
cnt_bit = 1UL << counter;
if ( cnt_bit & X86_IA32_PERF_COUNTERS_MASK ) {
if (cnt_bit & PERF_COUNTERS_MASK) {
// read generic pmc
idx = MSR_IA32_PMC0 + counter;
retval = (unsigned long) rdmsr(idx);
}
else if ( cnt_bit & X86_IA32_FIXED_PERF_COUNTERS_MASK ) {
else if (cnt_bit & FIXED_PERF_COUNTERS_MASK) {
// read fixed pmc
idx = MSR_IA32_FIXED_CTR0 + counter;
retval = (unsigned long) rdmsr(idx);
@ -506,8 +490,8 @@ int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config, unsi
}
// find avail generic counter
for(i = 0; i < X86_IA32_NUM_PERF_COUNTERS; i++) {
if(!(pmc_status & (1 << i))) {
for (i = 0; i < NUM_PERF_COUNTERS; i++) {
if (!(pmc_status & (1 << i))) {
ret = i;
break;
}
@ -515,3 +499,17 @@ int ihk_mc_perfctr_alloc_counter(unsigned int *type, unsigned long *config, unsi
return ret;
}
int ihk_mc_perf_counter_mask_check(unsigned long counter_mask)
{
if ((counter_mask & PERF_COUNTERS_MASK) |
(counter_mask & FIXED_PERF_COUNTERS_MASK)) {
return 1;
}
return 0;
}
int ihk_mc_perf_get_num_counters(void)
{
return NUM_PERF_COUNTERS;
}

View File

@ -1,3 +1,4 @@
/* syscall.c COPYRIGHT FUJITSU LIMITED 2018 */
/**
* \file syscall.c
* License details are found in the file LICENSE.
@ -31,12 +32,11 @@
#include <page.h>
#include <limits.h>
#include <syscall.h>
#include <debug.h>
void terminate_mcexec(int, int);
extern long do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact);
long syscall(int num, ihk_mc_user_context_t *ctx);
void set_signal(int sig, void *regs0, siginfo_t *info);
void check_signal(unsigned long rc, void *regs0, int num);
extern unsigned long do_fork(int, unsigned long, unsigned long, unsigned long,
unsigned long, unsigned long, unsigned long);
extern int get_xsave_size();
@ -45,11 +45,8 @@ extern uint64_t get_xsave_mask();
//#define DEBUG_PRINT_SC
#ifdef DEBUG_PRINT_SC
#define dkprintf kprintf
#define ekprintf(...) kprintf(__VA_ARGS__)
#else
#define dkprintf(...) do { if (0) kprintf(__VA_ARGS__); } while (0)
#define ekprintf(...) kprintf(__VA_ARGS__)
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#endif
uintptr_t debug_constants[] = {
@ -92,33 +89,45 @@ static ptrdiff_t vdso_offset;
extern int num_processors;
int obtain_clone_cpuid(cpu_set_t *cpu_set) {
int obtain_clone_cpuid(cpu_set_t *cpu_set, int use_last) {
int min_queue_len = -1;
int cpu, min_cpu = -1;
int cpu, min_cpu = -1, uti_cpu = -1;
unsigned long irqstate;
irqstate = ihk_mc_spinlock_lock(&runq_reservation_lock);
/* Find the first allowed core with the shortest run queue */
for (cpu = 0; cpu < num_processors; ++cpu) {
struct cpu_local_var *v;
unsigned long irqstate;
if (!CPU_ISSET(cpu, cpu_set)) continue;
v = get_cpu_local_var(cpu);
irqstate = ihk_mc_spinlock_lock(&v->runq_lock);
if (min_queue_len == -1 || v->runq_len < min_queue_len) {
min_queue_len = v->runq_len;
ihk_mc_spinlock_lock_noirq(&v->runq_lock);
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved);
if (min_queue_len == -1 || v->runq_len + v->runq_reserved < min_queue_len) {
min_queue_len = v->runq_len + v->runq_reserved;
min_cpu = cpu;
}
ihk_mc_spinlock_unlock(&v->runq_lock, irqstate);
/* Record the last tie CPU */
if (min_cpu != cpu && v->runq_len + v->runq_reserved == min_queue_len) {
uti_cpu = cpu;
}
dkprintf("%s: cpu=%d,runq_len=%d,runq_reserved=%d,min_cpu=%d,uti_cpu=%d\n", __FUNCTION__, cpu, v->runq_len, v->runq_reserved, min_cpu, uti_cpu);
ihk_mc_spinlock_unlock_noirq(&v->runq_lock);
#if 0
if (min_queue_len == 0)
break;
#endif
}
min_cpu = use_last ? uti_cpu : min_cpu;
if (min_cpu != -1) {
if (get_cpu_local_var(min_cpu)->status != CPU_STATUS_RESERVED)
get_cpu_local_var(min_cpu)->status = CPU_STATUS_RESERVED;
__sync_fetch_and_add(&get_cpu_local_var(min_cpu)->runq_reserved, 1);
}
ihk_mc_spinlock_unlock(&runq_reservation_lock, irqstate);
return min_cpu;
}
@ -161,6 +170,38 @@ fault:
return -EFAULT;
}
SYSCALL_DECLARE(prctl)
{
struct process *proc = cpu_local_var(current)->proc;
int option = (int)ihk_mc_syscall_arg0(ctx);
unsigned long arg2 = (unsigned long)ihk_mc_syscall_arg1(ctx);
unsigned long arg3 = (unsigned long)ihk_mc_syscall_arg2(ctx);
unsigned long arg4 = (unsigned long)ihk_mc_syscall_arg3(ctx);
unsigned long arg5 = (unsigned long)ihk_mc_syscall_arg4(ctx);
int ret = 0;
switch (option) {
case PR_SET_THP_DISABLE:
if (arg3 || arg4 || arg5) {
return -EINVAL;
}
proc->thp_disable = arg2;
ret = 0;
break;
case PR_GET_THP_DISABLE:
if (arg2 || arg3 || arg4 || arg5) {
return -EINVAL;
}
ret = proc->thp_disable;
break;
default:
ret = syscall_generic_forwarding(__NR_prctl, ctx);
break;
}
return ret;
}
struct sigsp {
unsigned long flags;
void *link;
@ -251,7 +292,7 @@ SYSCALL_DECLARE(rt_sigreturn)
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
if(ksigsp.fpregs && xsavesize){
@ -276,7 +317,6 @@ SYSCALL_DECLARE(rt_sigreturn)
}
extern struct cpu_local_var *clv;
extern unsigned long do_kill(struct thread *thread, int pid, int tid, int sig, struct siginfo *info, int ptracecont);
extern void interrupt_syscall(struct thread *, int sig);
extern void terminate(int, int);
extern int num_processors;
@ -530,23 +570,32 @@ void ptrace_report_signal(struct thread *thread, int sig)
dkprintf("ptrace_report_signal, tid=%d, pid=%d\n", thread->tid, thread->proc->pid);
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
if(!(proc->ptrace & PT_TRACED)){
if (!(thread->ptrace & PT_TRACED)) {
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
return;
}
thread->exit_status = sig;
/* Transition thread state */
proc->status = PS_TRACED;
thread->exit_status = sig;
thread->status = PS_TRACED;
proc->ptrace &= ~PT_TRACE_SYSCALL;
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
proc->signal_flags |= SIGNAL_STOP_STOPPED;
} else {
proc->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
parent_pid = proc->parent->pid;
thread->ptrace &= ~PT_TRACE_SYSCALL;
save_debugreg(thread->ptrace_debugreg);
if (sig == SIGSTOP || sig == SIGTSTP ||
sig == SIGTTIN || sig == SIGTTOU) {
thread->signal_flags |= SIGNAL_STOP_STOPPED;
}
else {
thread->signal_flags &= ~SIGNAL_STOP_STOPPED;
}
if (thread == proc->main_thread) {
proc->status = PS_DELAY_TRACED;
parent_pid = proc->parent->pid;
}
else {
parent_pid = thread->report_proc->pid;
waitq_wakeup(&thread->report_proc->waitpid_q);
}
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
memset(&info, '\0', sizeof info);
@ -555,8 +604,6 @@ void ptrace_report_signal(struct thread *thread, int sig)
info._sifields._sigchld.si_pid = thread->tid;
info._sifields._sigchld.si_status = thread->exit_status;
do_kill(cpu_local_var(current), parent_pid, -1, SIGCHLD, &info, 0);
/* Wake parent (if sleeping in wait4()) */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("ptrace_report_signal,sleeping\n");
/* Sleep */
@ -569,9 +616,8 @@ ptrace_arch_prctl(int pid, long code, long addr)
{
long rc = -EIO;
struct thread *child;
struct mcs_rwlock_node_irqsave lock;
child = find_thread(pid, pid, &lock);
child = find_thread(pid, pid);
if (!child)
return -ESRCH;
if (child->proc->status & (PS_TRACED | PS_STOPPED)) {
@ -613,7 +659,7 @@ ptrace_arch_prctl(int pid, long code, long addr)
break;
}
}
thread_unlock(child, &lock);
thread_unlock(child);
return rc;
}
@ -635,11 +681,13 @@ arch_ptrace(long request, int pid, long addr, long data)
static int
isrestart(int num, unsigned long rc, int sig, int restart)
{
if(sig == SIGKILL || sig == SIGSTOP)
if (sig == SIGKILL || sig == SIGSTOP)
return 0;
if(num == 0 || rc != -EINTR)
if (num < 0 || rc != -EINTR)
return 0;
switch(num){
if (sig == SIGCHLD)
return 1;
switch (num) {
case __NR_pause:
case __NR_rt_sigsuspend:
case __NR_rt_sigtimedwait:
@ -660,14 +708,12 @@ isrestart(int num, unsigned long rc, int sig, int restart)
case __NR_io_getevents:
return 0;
}
if(sig == SIGCHLD)
return 1;
if(restart)
if (restart)
return 1;
return 0;
}
void
int
do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pending *pending, int num)
{
struct x86_user_context *regs = regs0;
@ -679,14 +725,15 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
int ptraceflag = 0;
struct mcs_rwlock_node_irqsave lock;
struct mcs_rwlock_node_irqsave mcs_rw_node;
int restart = 0;
for(w = pending->sigmask.__val[0], sig = 0; w; sig++, w >>= 1);
dkprintf("do_signal(): tid=%d, pid=%d, sig=%d\n", thread->tid, proc->pid, sig);
orgsig = sig;
if((proc->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
if ((thread->ptrace & PT_TRACED) &&
pending->ptracecont == 0 &&
sig != SIGKILL) {
ptraceflag = 1;
sig = SIGSTOP;
}
@ -707,7 +754,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
if(k->sa.sa_handler == SIG_IGN){
kfree(pending);
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
return;
goto out;
}
else if(k->sa.sa_handler){
unsigned long *usp; /* user stack */
@ -757,9 +804,8 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
memcpy(&ksigsp.sigstack, &thread->sigstack, sizeof(stack_t));
ksigsp.sigrc = rc;
ksigsp.num = num;
ksigsp.restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
if(num != 0 && rc == -EINTR && sig == SIGCHLD)
ksigsp.restart = 1;
restart = isrestart(num, rc, sig, k->sa.sa_flags & SA_RESTART);
ksigsp.restart = restart;
if(xsavesize){
uint64_t xsave_mask = get_xsave_mask();
unsigned int low = (unsigned int)xsave_mask;
@ -772,7 +818,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,no space available\n");
terminate(0, sig);
return;
goto out;
}
kfpregs = (void *)((((unsigned long)_kfpregs) + 63) & ~63);
memset(kfpregs, '\0', xsavesize);
@ -782,7 +828,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
kfree(_kfpregs);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
return;
goto out;
}
ksigsp.fpregs = (void *)fpregs;
kfree(_kfpregs);
@ -794,7 +840,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
mcs_rwlock_writer_unlock(&thread->sigcommon->lock, &mcs_rw_node);
kprintf("do_signal,write_process_vm failed\n");
terminate(0, sig);
return;
goto out;
}
usp = (unsigned long *)sigsp;
@ -824,12 +870,13 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = TRAP_TRACE;
set_signal(SIGTRAP, regs, &info);
check_need_resched();
check_signal(0, regs, 0);
check_signal(0, regs, -1);
}
}
else {
int coredumped = 0;
siginfo_t info;
int ptc = pending->ptracecont;
if(ptraceflag){
if(thread->ptrace_recvsig)
@ -856,25 +903,37 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info.si_code = CLD_STOPPED;
info._sifields._sigchld.si_pid = thread->proc->pid;
info._sifields._sigchld.si_status = (sig << 8) | 0x7f;
do_kill(cpu_local_var(current), thread->proc->parent->pid, -1, SIGCHLD, &info, 0);
dkprintf("do_signal,SIGSTOP,changing state\n");
if (ptc == 2 &&
thread != thread->proc->main_thread) {
thread->signal_flags =
SIGNAL_STOP_STOPPED;
thread->status = PS_STOPPED;
thread->exit_status = SIGSTOP;
do_kill(thread,
thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(
&thread->report_proc->waitpid_q);
}
else {
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(
&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
proc->group_exit_status = SIGSTOP;
/* Reap and set new signal_flags */
proc->main_thread->signal_flags =
SIGNAL_STOP_STOPPED;
/* Reap and set new signal_flags */
proc->signal_flags = SIGNAL_STOP_STOPPED;
proc->status = PS_DELAY_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(
&proc->update_lock, &lock);
proc->status = PS_STOPPED;
thread->status = PS_STOPPED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&proc->parent->waitpid_q);
dkprintf("do_signal(): pid: %d, tid: %d SIGSTOP, sleeping\n",
proc->pid, thread->tid);
do_kill(thread,
thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
/* Sleep */
schedule();
dkprintf("SIGSTOP(): woken up\n");
@ -882,19 +941,28 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
case SIGTRAP:
dkprintf("do_signal,SIGTRAP\n");
if(!(proc->ptrace & PT_TRACED)) {
if (!(thread->ptrace & PT_TRACED)) {
goto core;
}
/* Update thread state in fork tree */
mcs_rwlock_writer_lock(&proc->update_lock, &lock);
thread->exit_status = SIGTRAP;
proc->status = PS_TRACED;
thread->status = PS_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock, &lock);
/* Wake up the parent who tried wait4 and sleeping */
waitq_wakeup(&thread->proc->parent->waitpid_q);
if (thread == proc->main_thread) {
mcs_rwlock_writer_lock(&proc->update_lock,
&lock);
proc->group_exit_status = SIGTRAP;
proc->status = PS_DELAY_TRACED;
mcs_rwlock_writer_unlock(&proc->update_lock,
&lock);
do_kill(thread, thread->proc->parent->pid, -1,
SIGCHLD, &info, 0);
}
else {
do_kill(thread, thread->report_proc->pid, -1,
SIGCHLD, &info, 0);
waitq_wakeup(&thread->report_proc->waitpid_q);
}
/* Sleep */
dkprintf("do_signal,SIGTRAP,sleeping\n");
@ -909,7 +977,7 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
info._sifields._sigchld.si_pid = proc->pid;
info._sifields._sigchld.si_status = 0x0000ffff;
do_kill(cpu_local_var(current), proc->parent->pid, -1, SIGCHLD, &info, 0);
proc->signal_flags = SIGNAL_STOP_CONTINUED;
proc->main_thread->signal_flags = SIGNAL_STOP_CONTINUED;
proc->status = PS_RUNNING;
dkprintf("do_signal,SIGCONT,do nothing\n");
break;
@ -938,6 +1006,8 @@ do_signal(unsigned long rc, void *regs0, struct thread *thread, struct sig_pendi
break;
}
}
out:
return restart;
}
static struct sig_pending *
@ -957,10 +1027,12 @@ getsigpending(struct thread *thread, int delflag){
lock = &thread->sigcommon->lock;
head = &thread->sigcommon->sigpending;
for(;;) {
if (delflag)
if (delflag) {
mcs_rwlock_writer_lock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_lock(lock, &mcs_rw_node);
}
list_for_each_entry_safe(pending, next, head, list){
for(x = pending->sigmask.__val[0], sig = 0; x; sig++, x >>= 1);
@ -973,19 +1045,23 @@ getsigpending(struct thread *thread, int delflag){
if(delflag)
list_del(&pending->list);
if (delflag)
if (delflag) {
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
return pending;
}
}
}
if (delflag)
if (delflag) {
mcs_rwlock_writer_unlock(lock, &mcs_rw_node);
else
}
else {
mcs_rwlock_reader_unlock(lock, &mcs_rw_node);
}
if(lock == &thread->sigpendinglock)
return NULL;
@ -1000,6 +1076,11 @@ getsigpending(struct thread *thread, int delflag){
struct sig_pending *
hassigpending(struct thread *thread)
{
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
return NULL;
}
return getsigpending(thread, 0);
}
@ -1017,6 +1098,12 @@ void save_syscall_return_value(int num, unsigned long rc)
return;
}
/** \brief check arrived signals and processing
*
* @param rc return value of syscall
* @param regs0 context
* @param num syscall number (-1: Not called on exiting system call)
*/
void
check_signal(unsigned long rc, void *regs0, int num)
{
@ -1050,6 +1137,11 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
if (list_empty(&thread->sigpending) &&
list_empty(&thread->sigcommon->sigpending)) {
goto out;
}
for(;;){
pending = getsigpending(thread, 1);
if(!pending) {
@ -1057,7 +1149,9 @@ check_signal(unsigned long rc, void *regs0, int num)
goto out;
}
do_signal(rc, regs, thread, pending, num);
if (do_signal(rc, regs, thread, pending, num)) {
num = -1;
}
}
out:
@ -1137,7 +1231,7 @@ check_sig_pending_thread(struct thread *thread)
}
void
check_sig_pending()
check_sig_pending(void)
{
struct thread *thread;
struct cpu_local_var *v;
@ -1158,7 +1252,7 @@ repeat:
continue;
}
if (thread->proc->exit_status & 0x0000000100000000L) {
if (thread->proc->group_exit_status & 0x0000000100000000L) {
continue;
}
@ -1210,8 +1304,8 @@ do_kill(struct thread *thread, int pid, int tid, int sig, siginfo_t *info,
struct mcs_rwlock_node_irqsave slock;
int pgid = -pid;
int rc = -ESRCH;
int *pids;
int n = 0;
int *pids = NULL;
int n = 0, nr_pids = 0;
int sendme = 0;
if(pid == 0){
@ -1219,10 +1313,41 @@ do_kill(struct thread *thread, int pid, int tid, int sig, siginfo_t *info,
return -ESRCH;
pgid = thread->proc->pgid;
}
pids = kmalloc(sizeof(int) * num_processors, IHK_MC_AP_NOWAIT);
if(!pids)
return -ENOMEM;
// Count nr of pids
for(i = 0; i < HASH_SIZE; i++){
mcs_rwlock_reader_lock(&phash->lock[i], &slock);
list_for_each_entry(p, &phash->list[i], hash_list){
if(pgid != 1 && p->pgid != pgid)
continue;
if(thread && p->pid == thread->proc->pid){
sendme = 1;
continue;
}
++nr_pids;
}
mcs_rwlock_reader_unlock(&phash->lock[i], &slock);
}
if (nr_pids) {
pids = kmalloc(sizeof(int) * nr_pids, IHK_MC_AP_NOWAIT);
if(!pids)
return -ENOMEM;
}
else {
if (sendme) {
goto sendme;
}
return rc;
}
// Collect pids and do the kill
for(i = 0; i < HASH_SIZE; i++){
if (n == nr_pids) {
break;
}
mcs_rwlock_reader_lock(&phash->lock[i], &slock);
list_for_each_entry(p, &phash->list[i], hash_list){
if(pgid != 1 && p->pgid != pgid)
@ -1235,11 +1360,15 @@ do_kill(struct thread *thread, int pid, int tid, int sig, siginfo_t *info,
pids[n] = p->pid;
n++;
if (n == nr_pids) {
break;
}
}
mcs_rwlock_reader_unlock(&phash->lock[i], &slock);
}
for(i = 0; i < n; i++)
rc = do_kill(thread, pids[i], -1, sig, info, ptracecont);
sendme:
if(sendme)
rc = do_kill(thread, thread->proc->pid, -1, sig, info, ptracecont);
@ -1367,7 +1496,8 @@ done:
return 0;
}
if (tthread->thread_offloaded) {
/* Forward signal to Linux by interrupt_syscall mechanism */
if (tthread->uti_state == UTI_STATE_RUNNING_IN_LINUX) {
if (!tthread->proc->nohost) {
interrupt_syscall(tthread, sig);
}
@ -1384,10 +1514,10 @@ done:
in check_signal */
rc = 0;
k = tthread->sigcommon->action + sig - 1;
if((sig != SIGKILL && (tproc->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))){
if ((sig != SIGKILL && (tthread->ptrace & PT_TRACED)) ||
(k->sa.sa_handler != (void *)1 &&
(k->sa.sa_handler != NULL ||
(sig != SIGCHLD && sig != SIGURG)))) {
struct sig_pending *pending = NULL;
if (sig < 33) { // SIGRTMIN - SIGRTMAX
list_for_each_entry(pending, head, list){
@ -1471,7 +1601,7 @@ set_signal(int sig, void *regs0, siginfo_t *info)
SYSCALL_DECLARE(mmap)
{
const int supported_flags = 0
const unsigned int supported_flags = 0
| MAP_SHARED // 01
| MAP_PRIVATE // 02
| MAP_FIXED // 10
@ -1479,7 +1609,7 @@ SYSCALL_DECLARE(mmap)
| MAP_LOCKED // 2000
| MAP_POPULATE // 8000
| MAP_HUGETLB // 00040000
| (0x3F << MAP_HUGE_SHIFT) // FC000000
| (0x3FU << MAP_HUGE_SHIFT) // FC000000
;
const int ignored_flags = 0
#ifdef USE_NOCACHE_MMAP
@ -1498,7 +1628,7 @@ SYSCALL_DECLARE(mmap)
| MAP_NONBLOCK // 00010000
;
const intptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const uintptr_t addr0 = ihk_mc_syscall_arg0(ctx);
const size_t len0 = ihk_mc_syscall_arg1(ctx);
const int prot = ihk_mc_syscall_arg2(ctx);
const int flags0 = ihk_mc_syscall_arg3(ctx);
@ -1507,7 +1637,7 @@ SYSCALL_DECLARE(mmap)
struct thread *thread = cpu_local_var(current);
struct vm_regions *region = &thread->vm->region;
int error;
intptr_t addr = 0;
uintptr_t addr = 0;
size_t len;
int flags = flags0;
size_t pgsize;
@ -1628,16 +1758,14 @@ SYSCALL_DECLARE(shmget)
dkprintf("shmget(%#lx,%#lx,%#x)\n", key, size, shmflg0);
if (shmflg & SHM_HUGETLB) {
switch (shmflg & (0x3F << SHM_HUGE_SHIFT)) {
case 0:
int hugeshift = shmflg & (0x3F << SHM_HUGE_SHIFT);
if (hugeshift == 0) {
shmflg |= SHM_HUGE_2MB; /* default hugepage size */
break;
case SHM_HUGE_2MB:
case SHM_HUGE_1GB:
break;
default:
} else if (hugeshift == SHM_HUGE_2MB ||
hugeshift == SHM_HUGE_1GB) {
/*nop*/
} else {
error = -EINVAL;
goto out;
}
@ -1699,6 +1827,11 @@ SYSCALL_DECLARE(arch_prctl)
ihk_mc_syscall_arg1(ctx));
}
SYSCALL_DECLARE(time)
{
return time();
}
static int vdso_get_vdso_info(void)
{
int error;
@ -1915,7 +2048,7 @@ int arch_map_vdso(struct process_vm *vm)
s = vm->vdso_addr + (i * PAGE_SIZE);
e = s + PAGE_SIZE;
error = ihk_mc_pt_set_range(pt, vm, s, e,
vdso.vdso_physlist[i], attr, 0, range);
vdso.vdso_physlist[i], attr, 0, range, 0);
if (error) {
ekprintf("ihk_mc_pt_set_range failed. %d\n", error);
goto out;
@ -1947,7 +2080,7 @@ int arch_map_vdso(struct process_vm *vm)
e = s + PAGE_SIZE;
attr = PTATTR_ACTIVE | PTATTR_USER | PTATTR_NO_EXECUTE;
error = ihk_mc_pt_set_range(pt, vm, s, e,
vdso.vvar_phys, attr, 0, range);
vdso.vvar_phys, attr, 0, range, 0);
if (error) {
ekprintf("ihk_mc_pt_set_range failed. %d\n", error);
goto out;
@ -1958,7 +2091,7 @@ int arch_map_vdso(struct process_vm *vm)
e = s + PAGE_SIZE;
attr = PTATTR_ACTIVE | PTATTR_USER | PTATTR_NO_EXECUTE | PTATTR_UNCACHABLE;
error = ihk_mc_pt_set_range(pt, vm, s, e,
vdso.hpet_phys, attr, 0, range);
vdso.hpet_phys, attr, 0, range, 0);
if (error) {
ekprintf("ihk_mc_pt_set_range failed. %d\n", error);
goto out;
@ -1969,7 +2102,7 @@ int arch_map_vdso(struct process_vm *vm)
e = s + PAGE_SIZE;
attr = PTATTR_ACTIVE | PTATTR_USER | PTATTR_NO_EXECUTE;
error = ihk_mc_pt_set_range(pt, vm, s, e,
vdso.pvti_phys, attr, 0, range);
vdso.pvti_phys, attr, 0, range, 0);
if (error) {
ekprintf("ihk_mc_pt_set_range failed. %d\n", error);
goto out;
@ -2081,7 +2214,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)local_iov,
(uintptr_t)(local_iov + liovcnt * sizeof(struct iovec)));
(uintptr_t)(local_iov + liovcnt));
if (!range) {
ret = -EFAULT;
@ -2090,7 +2223,7 @@ int do_process_vm_read_writev(int pid,
range = lookup_process_memory_range(lthread->vm,
(uintptr_t)remote_iov,
(uintptr_t)(remote_iov + riovcnt * sizeof(struct iovec)));
(uintptr_t)(remote_iov + riovcnt));
if (!range) {
ret = -EFAULT;
@ -2366,8 +2499,6 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
@ -2387,41 +2518,38 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
case 1:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 1:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
mpsr->nodes_ready = 1;
break;
default:
break;
}
}
else if (nr_cpus >= 4 && nr_cpus < 8) {
else if (nr_cpus >= 4 && nr_cpus < 7) {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
break;
case 1:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 2:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 3:
case 2:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 3:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
@ -2431,7 +2559,7 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
break;
}
}
else if (nr_cpus >= 8) {
else {
switch (cpu_index) {
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
@ -2443,28 +2571,23 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * (count / 2));
break;
case 2:
memcpy(mpsr->status, mpsr->user_status,
sizeof(int) * count);
break;
case 3:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
mpsr->nodes_ready = 1;
break;
case 4:
case 3:
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
break;
case 5:
case 4:
memset(mpsr->status, 0, sizeof(int) * count);
break;
case 6:
case 5:
memset(mpsr->nr_pages, 0, sizeof(int) * count);
break;
case 7:
case 6:
memset(mpsr->dst_phys, 0,
sizeof(unsigned long) * count);
break;
default:
break;
}
@ -2672,11 +2795,19 @@ out:
time_t time(void) {
struct syscall_request sreq IHK_DMA_ALIGN;
struct thread *thread = cpu_local_var(current);
time_t ret;
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id(), thread->proc->pid);
struct timespec ats;
time_t ret = 0;
if (gettime_local_support) {
calculate_time_from_tsc(&ats);
ret = ats.tv_sec;
}
else {
sreq.number = __NR_time;
sreq.args[0] = (uintptr_t)NULL;
ret = (time_t)do_syscall(&sreq, ihk_mc_get_processor_id());
}
return ret;
}

View File

@ -31,51 +31,6 @@ struct tod_data_s tod_data
.version = IHK_ATOMIC64_INIT(0),
};
static inline void cpu_pause_for_vsyscall(void)
{
asm volatile ("pause" ::: "memory");
return;
} /* cpu_pause_for_vsyscall() */
static inline void calculate_time_from_tsc(struct timespec *ts)
{
long ver;
unsigned long current_tsc;
__time_t sec_delta;
long ns_delta;
for (;;) {
while ((ver = ihk_atomic64_read(&tod_data.version)) & 1) {
/* settimeofday() is in progress */
cpu_pause_for_vsyscall();
}
rmb();
*ts = tod_data.origin;
rmb();
if (ver == ihk_atomic64_read(&tod_data.version)) {
break;
}
/* settimeofday() has intervened */
cpu_pause_for_vsyscall();
}
current_tsc = rdtsc();
sec_delta = current_tsc / tod_data.clocks_per_sec;
ns_delta = NS_PER_SEC * (current_tsc % tod_data.clocks_per_sec)
/ tod_data.clocks_per_sec;
/* calc. of ns_delta overflows if clocks_per_sec exceeds 18.44 GHz */
ts->tv_sec += sec_delta;
ts->tv_nsec += ns_delta;
if (ts->tv_nsec >= NS_PER_SEC) {
ts->tv_nsec -= NS_PER_SEC;
++ts->tv_sec;
}
return;
} /* calculate_time_from_tsc() */
int vsyscall_gettimeofday(struct timeval *tv, void *tz)
{
int error;

View File

@ -1,3 +1,4 @@
# mcoverlay-create-smp-x86.sh.in COPYRIGHT FUJITSU LIMITED 2018
# Overlay /proc, /sys with McKernel specific contents
#

View File

@ -1,4 +1,5 @@
#!/bin/bash
# mcreboot-attached-mic.sh.in COPYRIGHT FUJITSU LIMITED 2018
# \file arch/x86/tools/mcreboot-attached-mic.sh.in
# License details are found in the file LICENSE.

View File

@ -1,4 +1,5 @@
#!/bin/bash -x
# mcreboot-builtin-x86.sh.in COPYRIGHT FUJITSU LIMITED 2018
# \file arch/x86/tools/mcreboot-builtin-x86.sh.in
# License details are found in the file LICENSE.

View File

@ -1,4 +1,5 @@
#!/bin/bash
# mcreboot-smp-x86.sh.in COPYRIGHT FUJITSU LIMITED 2018
# IHK SMP-x86 example boot script.
# author: Balazs Gerofi <bgerofi@riken.jp>
@ -17,8 +18,8 @@ prefix="@prefix@"
BINDIR="${prefix}/bin"
SBINDIR="${prefix}/sbin"
ETCDIR=@ETCDIR@
KMODDIR="${prefix}/kmod"
KERNDIR="${prefix}/@TARGET@/kernel"
KMODDIR="@KMODDIR@"
KERNDIR="@MCKERNELDIR@"
ENABLE_MCOVERLAYFS="@ENABLE_MCOVERLAYFS@"
MCK_BUILDID=@BUILDID@
@ -31,6 +32,12 @@ if [ "${BASH_VERSINFO[0]}" -lt 4 ]; then
exit 1
fi
# Check SELinux
if which getenforce 1>/dev/null 2>/dev/null && [ "`getenforce | tr '[:upper:]' '[:lower:]'`" == "enforcing" ]; then
echo "error: SELinux must not be enabled when running McKernel (update /etc/selinux/config or see setenforce)"
exit 1
fi
redirect_kmsg=0
mon_interval="-1"
DUMP_LEVEL=24
@ -45,11 +52,13 @@ fi
turbo=""
ihk_irq=""
safe_kernel_map=""
umask_old=`umask`
idle_halt=""
allow_oversubscribe=""
time_sharing="time_sharing"
while getopts :tk:c:m:o:f:r:q:i:d:e:hO OPT
while getopts stk:c:m:o:f:r:q:i:d:e:hOT: OPT
do
case ${OPT} in
f) facility=${OPTARG}
@ -62,6 +71,8 @@ do
;;
m) mem=${OPTARG}
;;
s) safe_kernel_map="safe_kernel_map"
;;
r) ikc_map=${OPTARG}
;;
q) ihk_irq=${OPTARG}
@ -78,8 +89,16 @@ do
;;
O) allow_oversubscribe="allow_oversubscribe"
;;
*) echo "invalid option -${OPT}" >&2
exit 1
T)
case ${OPTARG} in
1) time_sharing="time_sharing"
;;
0) time_sharing=""
;;
esac
;;
\?) exit 1
;;
esac
done
@ -119,32 +138,32 @@ error_exit() {
fi
;&
mcos_sys_mounted)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
umount /tmp/mcos/mcos0_sys
fi
;&
mcos_proc_mounted)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
umount /tmp/mcos/mcos0_proc
fi
;&
mcoverlayfs_loaded)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
rmmod mcoverlay 2>/dev/null
fi
;&
linux_proc_bind_mounted)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
umount /tmp/mcos/linux_proc
fi
;&
tmp_mcos_mounted)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
umount /tmp/mcos
fi
;&
tmp_mcos_created)
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
rm -rf /tmp/mcos
fi
;&
@ -221,26 +240,6 @@ if [ "${release}" == "${rhel_release}" ]; then
rhel_release="";
fi
enable_mcoverlay="no"
if [ "${ENABLE_MCOVERLAYFS}" == "yes" ]; then
if [ "${rhel_release}" == "" ]; then
if [ ${linux_version_code} -ge 262144 -a ${linux_version_code} -lt 262400 ]; then
enable_mcoverlay="yes"
fi
if [ ${linux_version_code} -ge 263680 -a ${linux_version_code} -lt 263936 ]; then
enable_mcoverlay="yes"
fi
else
if [ ${linux_version_code} -eq 199168 -a ${rhel_release} -ge 327 -a ${rhel_release} -le 693 ]; then
enable_mcoverlay="yes"
fi
if [ ${linux_version_code} -ge 262144 -a ${linux_version_code} -lt 262400 ]; then
enable_mcoverlay="yes"
fi
fi
fi
# Figure out CPUs if not requested by user
if [ "$cpus" == "" ]; then
# Get the number of CPUs on NUMA node 0
@ -256,7 +255,7 @@ if [ "$cpus" == "" ]; then
fi
# Remove mcoverlay if loaded
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
@ -393,7 +392,7 @@ fi
IHK_BUILDID=`${SBINDIR}/ihkconfig 0 get buildid`
if [ "${IHK_BUILDID}" != "${MCK_BUILDID}" ]; then
echo "IHK build-id (${IHK_BUILDID}) didn't match McKernel build-id (${MCK_BUILDID})." >&2
exit 1
error_exit "mcctrl_loaded"
fi
# Destroy all LWK instances
@ -446,7 +445,7 @@ if ! ${SBINDIR}/ihkosctl 0 load ${KERNDIR}/mckernel.img; then
fi
# Set kernel arguments
if ! ${SBINDIR}/ihkosctl 0 kargs "hidos $turbo $idle_halt dump_level=${DUMP_LEVEL} $extra_kopts $allow_oversubscribe"; then
if ! ${SBINDIR}/ihkosctl 0 kargs "hidos $turbo $safe_kernel_map $idle_halt dump_level=${DUMP_LEVEL} $extra_kopts $allow_oversubscribe $time_sharing"; then
echo "error: setting kernel arguments" >&2
error_exit "os_created"
fi
@ -463,7 +462,7 @@ if ! chown ${chown_option} /dev/mcd* /dev/mcos*; then
fi
# Overlay /proc, /sys with McKernel specific contents
if [ "$enable_mcoverlay" == "yes" ]; then
if [ "$ENABLE_MCOVERLAYFS" == "ON" ]; then
${SBINDIR}/mcoverlay-create.sh
ret=$?
if [ $ret -ne 0 ]; then

View File

@ -1,4 +1,5 @@
#!/bin/bash
# mcstop+release-smp-x86.sh.in COPYRIGHT FUJITSU LIMITED 2018
# IHK SMP-x86 example McKernel unload script.
# author: Balazs Gerofi <bgerofi@riken.jp>
@ -99,15 +100,6 @@ if grep mcctrl /proc/modules &>/dev/null; then
fi
fi
# Remove mcoverlay if loaded
${SBINDIR}/mcoverlay-destroy.sh
ret=$?
if [ $ret -ne 0 ]; then
echo "error: mcoverlay-destroy.sh" >&2
exit $ret
fi
# Remove SMP module
if grep ihk_smp_@ARCH@ /proc/modules &>/dev/null; then
if ! rmmod ihk_smp_@ARCH@ 2>/dev/null; then

View File

@ -0,0 +1,46 @@
Cross compilation:
------------------
The standard way of cross compiling with cmake is to give cmake a "toolchain
file" that describes the compiler prefix and where it can find libraries for
the target system, we provide an example in cmake/cross-aarch64.cmake.
This obviously requires installing a toolchain and a rootfs with target
libraries to link against.
In addition, mckernel borrows the Kbuild system from linux kernel, which also
makes the assumption that you can run generated executables (Kbuild uses various
scripts around module building and does not make the distinction between build
target and host target); you can get this working by setting up qemu-user on your
machine which the kernel will transparently use through binfmt magic when trying
to execute other arch binaries.
# yum install gcc-aarch64-linux-gnu qemu-user
(dnf brings in --forcearch, we use this to setup the sysroot ; there are other
ways of building one. It is available on el7 in extras.)
# yum install dnf dnf-plugins-core
(install these separately because most scripts cannot be run, and dependency hell means we need them first)
# dnf download --releasever=7 --forcearch=aarch64 filesystem centos-release
# rpm --root /usr/aarch64-linux-gnu/sys-root --ignorearch --nodeps -ivh filesystem-*.rpm
# rpm --root /usr/aarch64-linux-gnu/sys-root --ignorearch --nodeps -ivh centos-release-*.rpm
# rpm --root /usr/aarch64-linux-gnu/sys-root --import /usr/aarch64-linux-gnu/sys-root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
# rpm --root /usr/aarch64-linux-gnu/sys-root --import /usr/aarch64-linux-gnu/sys-root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7-aarch64
# dnf install --releasever=7 --forcearch=aarch64 --installroot=/usr/aarch64-linux-gnu/sys-root/ --setopt=tsflags=noscripts glibc-devel kernel-devel numactl-devel systemd-devel binutils-devel
(el7 lacks a binfmt for aarch64... fix this)
# echo ':qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/bin/qemu-aarch64:' > /etc/binfmt.d/qemu-aarch64-dynamic.conf
# systemctl restart systemd-binfmt
(optional) test your setup
# gcc -xc - <<<'#include <stdio.h>'$'\n''int main() { printf("ok\n"); return 0; }'
# export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/sys-root
# file a.out
a.out: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 3.7.0, BuildID[sha1]=5ff445d3353cad2dae0a22550fe4cc572287dd90, not stripped
# ./a.out
ok
finally, build mckernel!
# mkdir build; cd build
# export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/sys-root
# cmake -DCMAKE_INSTALL_PREFIX=/tmp/install-aarch64 -DCMAKE_TOOLCHAIN_FILE=../cmake/cross-aarch64.cmake -DUNAME_R=4.14.0-115.2.2.el7a.aarch64 -DKERNEL_DIR=/usr/aarch64-linux-gnu/sys-root/usr/src/kernels/4.14.0-115.2.2.el7a.aarch64/ -DBUILD_TARGET=smp-arm64 ..
# make -j

10
cmake/cross-aarch64.cmake Normal file
View File

@ -0,0 +1,10 @@
SET(CMAKE_SYSTEM_NAME Linux)
SET(CMAKE_C_COMPILER /usr/bin/aarch64-linux-gnu-gcc)
SET(CMAKE_CXX_COMPILER /usr/bin/aarch64-linux-gnu-g++)
SET(CMAKE_FIND_ROOT_PATH /usr/aarch64-linux-gnu/sys-root)
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

Some files were not shown because too many files have changed in this diff Show More