Compare commits

..

123 Commits

Author SHA1 Message Date
2585c8afaa prerelease: 0.95: add ihk_*_str() functions
Change-Id: I0dc2ff3c8a2b21d167cfff04ccf6d1533555ee1c
2021-02-26 11:24:48 +09:00
82056961cd uti: integrate libuti and redirect to mck/libuti.so
Change-Id: I74e0f677ea8e1cd06e8ab05d92f1d38f9be8fd7a
2021-02-26 11:03:16 +09:00
0848b64c1d uti: integrate syscall_intercept
Change-Id: Ide14341acdca1450b0ad4f8a16cc078d0743afc8
2021-02-26 10:37:56 +09:00
8a9b43fee0 cmake: add -Wno-stringop-truncation
Change-Id: I43d9ba731d0feaf8934d2724ff98072df88a902d
2021-02-26 10:37:56 +09:00
19cb302d5f uti: util_indicate_clone: check --enable-uti mcexec option
Change-Id: Ic7474d01c18acd1edbc07844d7a7b010b2175f71
2021-02-26 10:37:56 +09:00
90895cfb1f test: uti: add tofu examples
Change-Id: I1c55c872d125201e60b4fe744af74106e1c5d3a4
2021-02-26 10:37:55 +09:00
32afa80718 uti: fix handling UTI_CPU_SET env
Change-Id: Icbf8dc7e82bd6983374aefdd0d5b89ad4152c9aa
2021-02-26 10:24:19 +09:00
e3927a0b95 uti: futex: McKernel waker sends IPI to Linux waiter CPU
Change-Id: I6f725b3a6b1b26b9f553d8c58132c0c0a4416683
2021-02-26 10:24:19 +09:00
adc5b7102f uti: futex: cache remote va to remote pa result
Change-Id: Idbbb3f2981b76a0235615fceaa6281d2c7134ca2
2021-02-26 10:24:19 +09:00
5d16ce9dcc uti: identify UTI thread by thread local variable
Change-Id: I64372a932378e4ead09ea27fbf5b52062a109756
2021-02-26 10:24:19 +09:00
a9973e913d uti: futex call function in mcctrl
Previously, futex code of McKerenl was called by mccontrol,
but there ware some problems with this method.
(Mainly, location of McKernel image on memory)

Call futex code in mcctrl instead of the one in McKernel image,
giving the following benefits:
1. Not relying on shared kernel virtual address space with Linux any more
2. The cpu id store / retrieve is not needed and resulting in the code

Change-Id: Ic40929b64a655b270c435859fa287fedb713ee5c
refe: #1428
2021-02-26 10:24:19 +09:00
35296c8210 uti: fix syscall response is mis-consumed by __do_in_kernel_irq_syscall
Refs: #1617
Change-Id: Iddd8ccd81d7f692f1f45ec888d31c2a87ec521ce
2021-02-25 01:42:29 +00:00
afea6af667 Send a signal to mcexec after switching to that process.
Change-Id: Ia882ef5027931009ee65febd0cbe22022a755c4a
Refs: #1505
2021-02-19 02:28:29 +00:00
b0bd1feefb remap_file_pages: check file mapping
Change-Id: Ibf145a20181938a9825214253337a423fcd53064
Refs: #1521
2021-02-19 02:23:39 +00:00
e6e66e0392 shmget: make small free numbers reusable.
Change-Id: Ic6670214fa31a309e96794361e3ec2dcc6375f4a
Refs: #1531
2021-02-19 02:22:50 +00:00
b3ddd60277 shmget: don't update refcount when shmid is found.
Change-Id: I3eac47cd67d27efd838190f5a4c21b5d682c5fe9
Refs: #1379
2021-02-19 02:22:33 +00:00
6dce9a2bf9 add_process_memory_range: Change order of update page and insert range.
Unintended update page was occurred, when inserting range failed.

Change-Id: I3d117b8613c5fbb64463c759b5fcc81db22bd624
refs: #1512
2021-02-18 16:02:30 +09:00
93dafc5f79 migrate: Don't migrate on in-kernel interrupt
Change-Id: I9c07e0d633687ce232ec3cd0c80439ca2e856293
Refs: #1555
2021-02-18 15:30:22 +09:00
583319125a prerelease: 0.94: fix __mcctrl_os_read_write_cpu_register
Change-Id: Ibcfbe7796347cc9c2148cdea2519fe6c7ca9e97e
2021-02-18 15:23:01 +09:00
9f39d1cd88 move_pages: Fix and support some specs for LTP.
1. When nodes array is NULL, move_pages doesn't move any pages,
 instead will return the node where each page
 currently resides by status array.
2. Check whether all specified node is online or not.

Change-Id: Ie3534997833d797e2a9f595d1107b07d46e1c6cf
Refs: #1523
2021-02-18 06:16:17 +00:00
a0d446b27f smp: make smp_call_func() arch independent
Change-Id: Ib60604ceb3274b173bd7f96cf57c8c35c1889e44
2021-02-18 06:16:17 +00:00
f3c875b8e6 mbind: Use range_policy's numamask as priority on MPOL_BIND
Change-Id: Iaaa7998945c6e2b42d91d34a2f7b05db1f4d696d
2021-02-18 06:16:17 +00:00
9f1e6d707c get_mempolicy: Support (MPOL_F_NODE | MPOL_F_ADDR) specified
If flags specifies both MPOL_F_NODE and MPOL_F_ADDR,
get_mempolicy() will return the node ID of the node on
which the address addr is allocated into the location pointed to by mode.

Change-Id: Id485e3f4838e3679d877a95e53b21e3421cac88a
2021-02-18 06:16:17 +00:00
aef50d710c mempolicy: Support MPOL_INTERLEAVE
Change-Id: I6357892d792b2de8ea859a0a6799250f05066713
Refs: #959
2021-02-18 06:16:17 +00:00
7f0594d784 TO RESET: mbind: do nothing
Fixes: 00007daf ("mbind: do nothing (workaround for Fugaku)")

Change-Id: Id41940bebd2cbcc3e8637eadd4847984627b1c72
2021-02-18 06:16:17 +00:00
866f5c51a0 docs: add limitation of system calls that call copy_to_user()
Change-Id: If449c73f8d5949ab5526ea598b0f713ed4431157
Refs: #1514
2021-02-18 13:04:53 +09:00
48b1d548f2 __mcctrl_os_read_write_cpu_register: fix timeout
Change-Id: Id5a7d316d793bd535f24fd353b214aa12af1dab4
2021-02-15 08:56:04 +00:00
822b64b03c docs: add limitation related to Fujitsu TCS xos_hwb
Change-Id: I83a1ecd7a0b6d3bcde2b902cd526dfd4feb9e23a
2021-02-15 16:03:52 +09:00
aca83bcd3d Tofu: fault stack area if VM range doesn't exist in STAG registration
Change-Id: I407a8954ccaf22019b3082fd6eee68e772d1cb26
2021-02-15 14:46:58 +09:00
c7145c4b38 xpmem: fault stack area of remote process if VM range doesn't yet exist
Change-Id: I2bbb745cc9b79ab4f9ea81b242f35f1b88ad531e
2021-02-15 14:46:58 +09:00
a82d161be8 prerelease: 0.93: investigate smp_ihk_os_panic_notifier
Change-Id: I997b41f80038603261de2e8232b6b8ca200cd8cd
2021-02-09 21:39:49 -05:00
7152269a59 spec: create one rpm including .ko and binaries
Don't use kernel_module_package not to create a separate
kmod-mckernel-*.rpm containing .ko files.

Change-Id: I25b7ff662476bfc735d319b57cdf2da82f2c6aa7
2021-02-09 20:55:38 -05:00
31c08bcb7d spec, docs: update cmake options
Change-Id: Ib8277413a413b5ce956a48f7e3d9922311937ea8
2021-02-09 20:55:38 -05:00
dffb0918a2 docs: add capstone installation options
Change-Id: I96aa9a6405c17f8d9653f3d3894f0e71a57ab460
2021-02-09 06:10:32 +00:00
23cd14af7d __mcctrl_os_read_write_cpu_register: timeout in 1 sec for when McKernel can't respond
Change-Id: Ia2d5f64e107697dda1f3bae499eb3afb8a7aedba
2021-02-09 06:09:11 +00:00
a5cf2019bc cmake: fix detection of Fugaku native compilation
Change-Id: I4210e9b57223c3869464caea10c2d414e9484e14
2021-02-09 06:06:13 +00:00
11b9fe0377 page_fault_handler: fix missing increment of in_page_fault on SEGV
This integrates some of the changes of the following commit:
1cf0bd5a ("TO RESET: add debug instruments, map Linux areas for tofu")

Change-Id: Iffd8432d5a7b35f20bd45829a125583a0363dbf0
2021-02-09 00:56:15 -05:00
4905c8e638 mcexec: propagate error in __NR_gettid handler
Change-Id: I0e0f06199970fe839065567dcd5418d017b6ec00
2021-02-03 18:53:33 -05:00
3d71c6a8eb mcexec_transfer_image(): map exact size of remote memory (instead of forcing PAGE_SIZE)
Change-Id: Ic66770af6cdb15b7a2e18a08cbcd1736e5558bdf
2021-02-03 18:53:33 -05:00
1cea75dd51 mcexec: fix strncat missing NULL and pclose of uninitialized
Change-Id: I9ce4004580845a983949caa5668b2f950880cd24
2021-02-02 01:51:57 +00:00
661ba0ce4a docs: add editing spec file when building rpm
Change-Id: Ic8dc9d8c6aef6d2180844891d743a09f4a3bdd9d
2021-01-29 01:23:35 +00:00
7e82adc761 prerelease: 0.92: fix uninitialized usrdata->cpu_topology_list
Change-Id: Ia12970bda1225898823a67c2d0461144fc62ebb9
2021-01-29 09:50:53 +09:00
1f9fbe82db mcctrl: fix access to uninitialized usrdata->cpu_topology_list
Change-Id: I25a9182b9b470bb069f4f755a67fb50b88817cd2
2021-01-29 09:34:24 +09:00
aa3d4ba7bd spec: prerelease 0.91 for 4.18.0-240.8.1.el8_3.aarch64 support
Change-Id: I8b33714157b1c68c1fc1eadf0b9d072a3ee59608
2021-01-26 02:34:35 -05:00
c89ac042f9 spec: prerelease 0.9 for testing hidos and cgroup check
Change-Id: I3b04fbf3a1ffa10df9c76da7b2730b9a2521bf98
2021-01-20 13:03:16 +09:00
0f1fc88ce9 spec: prerelease 0.8 for testing hidos and cgroup check
Change-Id: I6261380ab8e99d39191cbd8aac851038cdeb5ce2
2021-01-19 17:34:45 +09:00
bbc6565e7e docs: users: add how to specify boot parameters with Fujitsu TCS
Change-Id: I0216603388780d0e5497373598c3151812238932
2021-01-19 04:03:05 +00:00
1a29f8213f spec: prerelease 0.7 for testing hidos and cgroup check
Change-Id: I17f1608051a8f8ca33d2ba7385b75b8b492d1886
2021-01-19 12:25:06 +09:00
fd21fe7411 copy_user_ranges: copy straight_start of struct vm_range
This fixes the panic in ihk_os_set_ikc_map01 of the ihklib test suite.

Change-Id: Ic03efc81c5ca2c4deaeb06673afef8cef7a1cf92
2021-01-19 00:59:46 +00:00
2460228052 mcctrl: abort on invalid addr in mcexec_transfer_image()
Change-Id: Ic064b6ffc30368ff1d3dfb14403e524cbb837ce5
2021-01-19 00:55:20 +00:00
bf926f234a Tofu: manage stag ranges in VM range split and misc cleanup
Conflicts:
	kernel/process.c

Change-Id: I480850fe93a7963a5bd4d1687fb1e5c43f58057f
2021-01-19 00:55:20 +00:00
507b937509 Tofu: mcctrl side MMU notifier and CQ/BCH cleanup
Conflicts:
	executer/kernel/mcctrl/arch/arm64/archdeps.c
	executer/kernel/mcctrl/syscall.c

Change-Id: Ided8172331a5469c6ced68fa98a42302812efe71
2021-01-19 00:55:20 +00:00
a99cf99396 cmake: add switch to turn on/off krm workaround
Change-Id: I2dfd3d7f3373cce714247f9fc36bf5040a2a8fad
2021-01-19 00:52:53 +00:00
6f373186bf docs: add specifications of IHK and McKernel
Change-Id: I523ad68c5627ca1081c0c8684606a08101982ec9
2021-01-18 08:24:37 +00:00
6667321dc1 spec: prerelease 0.6 for testing capped best-effort memory reservation
Change-Id: Iaa91b311ee6879e84ce862aeabb4bd1fcd95d35f
2021-01-07 11:14:22 +09:00
f849745b60 spec: prerelease 0.5 for testing capped best-effort memory reservation
Change-Id: I139d6e24fbadb7313116029005e115053f31a899
2021-01-07 10:56:27 +09:00
78bc06d998 cmake: set default value of ENABLE_FUGAKU_DEBUG to OFF
Change-Id: I70703410922aa1d1440d61ead6e225d92cf60003
2021-01-07 10:42:36 +09:00
d726bd3d11 profile: fix definition of PROFILE_ENABLE and __NR_profile
Change-Id: I3f9f5870f8380d3668e1ccb06fd0f6d3307e3fa4
2021-01-06 01:03:17 +00:00
df37d6867f docs: add scheduling limitations
Change-Id: Ida4a16efa4d47f448da7417a3b4bdb5fb5304fcd
2021-01-06 09:58:38 +09:00
a4b5410d0c docs: add mlockall/munlockall limitations
Change-Id: I01d1c4eb6955baee89f6827748ac8ce4082884da
2021-01-04 12:57:32 +09:00
d73e6a161c spec: prerelease 0.4 for testing capped best-effort memory reservation
Change-Id: Iec35ea1b7fa6b8930153461c395675f1576042ba
2020-12-29 17:12:14 +09:00
67334b65c3 rus_vm_fault: vmf_insert_pfn: treat VM_FAULT_NOPAGE as success
vmf_insert_pfn is added with the following commit.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1c8f422059ae5da07db7406ab916203f9417e396

Refer to the following page for the meaning of VM_FAULT_NOPAGE.
https://lwn.net/Articles/242237/

Change-Id: I2b0144a20a57c74e0e2e0d2fc24281852f49b717
2020-12-29 16:31:41 +09:00
fe3992a3a2 cmake: add switch to turn on/off Fugaku debug modifications
To prevent "TO RESET: send SIGSTOP instead of SIGV in PF" from making
some tests expecting SIGSEGV fail.

Change-Id: I8bb111cff59fe5b0b2bf6bc652dfd2fa308321ed
2020-12-29 16:31:41 +09:00
5d58100c20 cmake: add switch to turn on/off Fugaku hacks
Change-Id: I2a1ac906a19c4e45ee62acdbf0bc6f77f61974f8
2020-12-29 16:31:41 +09:00
1b106d825c Tofu: fix phys addr calculation for contiguous pages in MBPT/BCH update
Change-Id: I70def9d02bdd7e1e969dedfc277a20df6ed2dff8
2020-12-29 16:31:41 +09:00
a680395093 Tofu: kmalloc cache for stag range
Change-Id: Ib5ea12c7c8cdafa7b699308c4eeb6e9ab39905c7
2020-12-29 16:31:41 +09:00
fd5a1c4b0a TO RESET: send SIGSTOP instead of SIGV in PF
Change-Id: I5f7e07cb89f5f38b7c631d838f0eee0a2a98e246
2020-12-29 16:31:40 +09:00
b3b1883ad8 eclair: turn off gdb pagination by default
Change-Id: I7758d97b90705310bc57cb9b6da6f6af436ea7fb
2020-12-29 16:31:40 +09:00
7145c4d383 TO RESET: stack changes
Change-Id: I325420701dfa5e9eac294be086a9d1e7326d95bc
2020-12-29 16:31:40 +09:00
0b82c8942b Tofu: keep track of stags per memory range
Change-Id: I033beaeee3b141dab4485dd3a2a3848eaa84e54e
2020-12-29 16:31:40 +09:00
75694152f0 Tofu: match page sizes to MBPT and fault PTEs if not present
Change-Id: Ia7aa92005a9941d6399063fec9a0776e73fc88fe
2020-12-29 16:31:40 +09:00
1cf0bd5a78 TO RESET: add debug instruments, map Linux areas for tofu
Change-Id: I09880cad3b87182cb663d414041254817c254759
2020-12-29 16:31:39 +09:00
25943634e9 TO RESET: do_mmap: show debug message when profile is turned on
Change-Id: I18f498f3a8660114b5e038e74179df95a645d232
2020-12-29 16:31:39 +09:00
72f95f92f8 TO RESET: hugefileobj: show debug messages
Change-Id: I904c811c13a59c0db74052bc92f6661a3e1b5d34
2020-12-29 16:31:39 +09:00
ab1014863d TO RESET: page_fault_handler: send SIGSTOP instead of SIGSEGV for debug
Change-Id: Ie281dbf43280464c8f412c8444a6861e43f28beb
2020-12-29 16:31:39 +09:00
4cd7051c2d TO RESET: setup_rt_frame: show debug message
Change-Id: I07d4f2dbba9bdb72f8a2892e6b5bd429b8e0aeec
2020-12-29 16:31:39 +09:00
d5716d3c3a TO RESET: mcctrl_get_request_os_cpu and __mcctrl_os_read_write_cpu_register: show debug messages
Change-Id: Ic8430e3fd6a814b888192233b029c942500a2dc9
2020-12-29 16:31:39 +09:00
2a984a12fe TO RESET: unhandled_page_fault: show instruction address
Change-Id: I29a8d30d9b3e5cfbe5e16b1faaa253e794b8fc5b
2020-12-29 16:31:38 +09:00
3949ab65a8 TO RESET: Add kernel argument to toggle on-demand paging for hugetlbfs map
Change-Id: Id748e0a2afc4ea59142fedb652a15b4007c5dee4
2020-12-29 16:31:33 +09:00
ed923ac82f TO RESET: hugefileobj: pre-allocate on mmap
Set this change to "TO RESET" because one of the Fujitsu tests fails.

Change-Id: Iddc30e8452b3d39da4975079d0c6a035e4f3dbde
2020-12-25 11:34:14 +09:00
191e6f7499 TO RESET: preempt_enable: check if no_preempt isn't negative
Change-Id: I1cef2077c50f3b3020870505dd065d10617f440e
2020-12-25 11:34:14 +09:00
4f7fd90300 TO RESET: lock: check if runq lock is held with IRQs disabled
Change-Id: I9a79ceaf9e399ad3695ed8959ca10c587591751a
2020-12-25 11:34:09 +09:00
8f2c8791bf TO RESET: arm64: enable interrupt on panic
Change-Id: I1ceb321de324f307fc82366b162c72f64184247b
2020-12-24 17:18:37 +09:00
bbfb296c26 TO RESET: mcreboot, mcstop+release.sh: add functions
Change-Id: Ic3992dc4e16b7ade00e93edbd107c64a32068c02
2020-12-24 16:53:27 +09:00
10b17e230c TO RESET: physical memory: free memory consistency checker
Change-Id: I15aa59bb81be4d8f2acfe8d161c8255f70f9e7d3
2020-12-24 16:53:12 +09:00
b268c28e7e TO RESET: mmap: ignore MAP_HUGETLB
Change-Id: Ifd50f24de0747b06d71ebba441ae2ef451f66c4d
2020-12-24 16:51:51 +09:00
2fa1c053d7 spec: prerelease 0.3 for testing ihk_reserve_mem and memory policy
Change-Id: I4fbcfa1f93522fd01af42d1ef13d0be075086773
2020-12-24 15:11:01 +09:00
530110e3a9 Tofu: fix ENABLE_TOFU switching
Change-Id: Ib33323d4b59ea8fb4f5f40dff7ea25a36773d5e2
2020-12-24 15:00:14 +09:00
f6ed44aeec spec: prerelease 0.2 for testing ihk_reserve_mem and memory policy
Change-Id: I9ff171c5d65b5f465ce7a2767be1a710de0a0400
2020-12-24 11:23:17 +09:00
33dd2e60b1 mcexec: memory policy control by environmental variable
Refs: #1470
Change-Id: I3d556cae90d31d81572b1c4e5c680e826577d428
2020-12-24 11:18:01 +09:00
ed670c03af spec: prerelease 0.1 for testing ihk_create_os_str
Change-Id: I3c9bbc6f3c9e8951c0ad700b9c02fcdec65018ff
2020-12-23 11:33:31 +09:00
e5f4a4e87d Tofu: proper cleanup of device files when mcexec gets killed
Change-Id: I6cb0290f72d96682700f945b29585e132e525ac1
2020-12-09 13:05:54 +09:00
1918df7765 Tofu: support for barrier gate, kmalloc cache
Change-Id: I6f4cfec2ec404efd03b332fc3f449a775816230e
2020-12-09 13:05:54 +09:00
5d784f3ea4 kernel: increase stack size
Change-Id: I27698149e9206138402dcc65db0078d5dbf548cb
2020-12-09 13:05:53 +09:00
10c09aa10e MM: generic lockless kmalloc and page cache
Change-Id: I71ad498fdd10136d9c72ffe2b16b9122d1bc9673
2020-12-09 13:05:53 +09:00
41f5c0bdde MM: deferred zero cleaning on Linux CPUs
Change-Id: Icdb8ac807688533be7a95b7101edfd904250cd02
2020-12-09 13:05:53 +09:00
e7b8aeb4f7 Tofu: per-fd path memory leak fix
Change-Id: I451472365806333adfac6dae32746195e3c30694
2020-12-09 13:05:53 +09:00
1b3dd45dbc MM: straight mapping memory leak fix
Change-Id: I7d841fbedb1db498b5994eb69b0350df7a5cefb0
2020-12-09 13:05:53 +09:00
623d6f8bc3 arm64: record register state at kernel mode page fault (for eclair)
Change-Id: I066bceecc0377110faaca0b21d45a476d000e684
2020-12-09 13:05:53 +09:00
92902d36fc Tofu: initial version
Change-Id: I9c464d5af883c18715a97ca9e9981cf73b260f90
2020-12-09 13:03:01 +09:00
fe83deb3db profile: make header user-space includable
Change-Id: I4a88d9be7c169f29ef6f6328e8576a3fe3b6e34f
2020-12-08 12:32:10 +09:00
e056cb799f memclear: non-temporal memory clean (arm64)
Change-Id: I8f80ff20e98bc01088450282e1790c27c67c16eb
2020-12-08 12:32:10 +09:00
201f5ce500 MM: straight mapping
Change-Id: I70871f8c382fb00aa719ed501cc5de436d916d7f
2020-12-08 12:32:10 +09:00
100bbe6231 MM: zero memory at free and deferred zero
Change-Id: Ib0055d6f2bdd10d05d749dcd1f3d5c3d318f22f3
2020-12-08 12:32:10 +09:00
fbd121d28c mmap: return -EINVAL for non-anonymous, MAP_HUGETLB map
Change-Id: I2bcbbf0ee9c0f47160eabac4a8d09991c71fe852
2020-12-07 15:23:38 +09:00
d1d93d90cc mcexec: detect mismatch of mcexec -n and mpirun -ppn
Change-Id: I0ce1b2d48cda10713920cb88692e107b8c4d3bab
Refs: #929
2020-12-07 15:23:34 +09:00
45bc6a617a __return_syscall: check input & fix unmap memory in error cases
Change-Id: I5de3ab3acd46770518b79bdc6f1c2e00c1cd5096
2020-11-25 01:58:47 +00:00
924ba7fd65 mcctrl_ikc_send_wait: free desc only if we allocated it internally
Change-Id: I4710ea6bb31f098451347c53ac0ff0be422aec06
2020-11-25 01:58:47 +00:00
2814f7cac4 mcctrl_get_request_os_cpu: check os instance & ret_cpu
Change-Id: I4d3f6fd93eaa183d560c874ba33add83c4308c5a
2020-11-25 01:58:47 +00:00
b510de7bd5 mcctrl_perf_get: check os instance & cpu info
Change-Id: Ic4f9d818b7d58f8ae651e43175fb1c478baec9c1
2020-11-25 01:58:47 +00:00
3e927f61dc mcctrl_perf_disable: check os instance & cpu info
Change-Id: I7195272a65b31db72158f5e5bbfc490bac547b91
2020-11-25 01:58:47 +00:00
64579830dd mcctrl_perf_enable: check os instance & cpu info
Change-Id: I31ab829d63833f924af17445fd9b8488d6eb454f
2020-11-25 01:58:47 +00:00
3cc98883f5 delete_procfs_entries: fix possible crash if top entry has no children
Change-Id: I209842699615f9bb58c12ccd262ae4b17f8f558c
2020-11-25 01:58:47 +00:00
442045a320 mcctrl_ikc_send: validate os and check input packet
Change-Id: I1f8c2228043841685617b665eeeaf2ce15a08703
2020-11-25 01:58:47 +00:00
fe5d8fc71f mcctrl_getrusage: validate os input
Change-Id: I97908069f8bc4703b99f9ffca94f3dd33eb64cc4
2020-11-25 01:58:47 +00:00
550c6cc5fb mcctrl_perf_set : validate os input & check cpu info
Change-Id: If308013746ff6dce03fa8e0eb1ebaca1cb2a4a64
2020-11-25 01:58:47 +00:00
8c0b2ab6ce mcctrl_perf_num: check "os" argument
Change-Id: I13c8b0c337cac9bbb240667808e871defce34aab
2020-11-25 01:58:47 +00:00
239b1b265f release 1.7.0
Change-Id: I8413aa2d051c6164235816bae2823187870efe49
2020-11-25 10:51:40 +09:00
f646fd141b prerelase 0.96: ihk_reserve_mem: balanced, capped best effort
Change-Id: Ia98c87e651d8dd34dfd36bc0c45f1d23e245330d
2020-11-24 03:40:01 +00:00
734d1cc056 ihk submodule update: ihklib: ihk_create_os_str: add ihk_reserve_mem_conf equivalent
Change-Id: Iede1a043b0316d6541656e86091f2288fd299383
2020-11-24 03:40:01 +00:00
040a9c0c7f cmake: set QEMU_LD_PREFIX when cross-compiling
Change-Id: Ie7b86ddba344e02d6f739225e44f3ad4927f5a2f
2020-11-20 07:59:55 +00:00
8784ee4710 spec: prerelase 0.95 for testing /dev/mcosN related fix
Change-Id: I02397984cd5c4c3a3e83968ff03cf9a68e84d200
2020-09-07 16:12:09 +09:00
3a761c138e ihk submodule update: ihklib, ihkmond: fix /dev/mcosN related issues
Change-Id: I533b277f249dc4afc84929dd2bf22c19648e21d1
2020-09-07 16:11:36 +09:00
243 changed files with 22848 additions and 1697 deletions

6
.gitmodules vendored
View File

@ -4,3 +4,9 @@
[submodule "executer/user/lib/libdwarf/libdwarf"]
path = executer/user/lib/libdwarf/libdwarf
url = https://github.com/bgerofi/libdwarf.git
[submodule "executer/user/lib/syscall_intercept"]
path = executer/user/lib/syscall_intercept
url = https://github.com/RIKEN-SysSoft/syscall_intercept.git
[submodule "executer/user/lib/uti"]
path = executer/user/lib/uti
url = https://github.com/RIKEN-SysSoft/uti.git

View File

@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 2.6)
cmake_minimum_required(VERSION 3.11)
if (NOT CMAKE_BUILD_TYPE)
set (CMAKE_BUILD_TYPE "Debug" CACHE STRING "Build type: Debug Release..." FORCE)
@ -7,10 +7,10 @@ endif (NOT CMAKE_BUILD_TYPE)
enable_language(C ASM)
project(mckernel C ASM)
set(MCKERNEL_VERSION "1.7.0")
set(MCKERNEL_VERSION "1.7.1")
# See "Fedora Packaging Guidlines -- Versioning"
set(MCKERNEL_RELEASE "0.94")
# See "Fedora Packaging Guidelines -- Versioning"
set(MCKERNEL_RELEASE "0.95")
set(CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake/modules)
# for rpmbuild
@ -41,6 +41,11 @@ if(IMPLICIT_FALLTHROUGH)
set(EXTRA_WARNINGS "-Wno-implicit-fallthrough")
endif(IMPLICIT_FALLTHROUGH)
CHECK_C_COMPILER_FLAG(-Wno-stringop-truncation STRINGOP_TRUNCATION)
if(STRINGOP_TRUNCATION)
list(APPEND EXTRA_WARNINGS "-Wno-stringop-truncation")
endif(STRINGOP_TRUNCATION)
# build options
set(CFLAGS_WARNING "-Wall" "-Wextra" "-Wno-unused-parameter" "-Wno-sign-compare" "-Wno-unused-function" ${EXTRA_WARNINGS} CACHE STRING "Warning flags")
add_compile_options(${CFLAGS_WARNING})
@ -50,6 +55,64 @@ if (ENABLE_WERROR)
add_compile_options("-Werror")
endif(ENABLE_WERROR)
execute_process(COMMAND bash -c "ls -ld /proc/tofu/ 2>/dev/null | wc -l"
OUTPUT_VARIABLE PROC_TOFU OUTPUT_STRIP_TRAILING_WHITESPACE)
if(PROC_TOFU STREQUAL "1")
option(ENABLE_TOFU "Built-in tofu driver support" ON)
else()
option(ENABLE_TOFU "Built-in tofu driver support" OFF)
endif()
if(ENABLE_TOFU)
add_definitions(-DENABLE_TOFU)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DENABLE_TOFU")
endif()
# when compiling on a compute-node
execute_process(COMMAND bash -c "grep $(hostname) /etc/opt/FJSVfefs/config/fefs_node1.csv 2>/dev/null | cut -d, -f2 | grep -o CN"
OUTPUT_VARIABLE FUGAKU_NODE_TYPE OUTPUT_STRIP_TRAILING_WHITESPACE)
if(FUGAKU_NODE_TYPE STREQUAL "CN")
option(ENABLE_FUGAKU_HACKS "Fugaku hacks" ON)
else()
option(ENABLE_FUGAKU_HACKS "Fugaku hacks" OFF)
endif()
if(ENABLE_FUGAKU_HACKS)
add_definitions(-DENABLE_FUGAKU_HACKS)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DENABLE_FUGAKU_HACKS")
endif()
# krm that mandates reserved memory amount >= available at boot time?
execute_process(COMMAND bash -c "rpm -qi FJSVpxkrm-plugin-mckernel | awk '$1 == \"Version\" && $2 == \":\" { print $3 }'"
OUTPUT_VARIABLE KRM_VERSION OUTPUT_STRIP_TRAILING_WHITESPACE)
message("KRM_VERSION: ${KRM_VERSION}")
if(NOT "${KRM_VERSION}" STREQUAL "" AND "${KRM_VERSION}" VERSION_LESS_EQUAL 4.0.1)
option(ENABLE_KRM_WORKAROUND "krm workaround" ON)
else()
option(ENABLE_KRM_WORKAROUND "krm workaround" OFF)
endif()
if(ENABLE_KRM_WORKAROUND)
add_definitions(-DENABLE_KRM_WORKAROUND)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DENABLE_KRM_WORKAROUND")
endif()
# SIGSTOP instead of SIGSEGV, additional IHK Linux kmsg
option(ENABLE_FUGAKU_DEBUG "Fugaku debug instrumentation" OFF)
if(ENABLE_FUGAKU_DEBUG)
add_definitions(-DENABLE_FUGAKU_DEBUG)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DENABLE_FUGAKU_DEBUG")
endif()
option(PROFILE_ENABLE "System call profile" ON)
if(PROFILE_ENABLE)
add_definitions(-DPROFILE_ENABLE)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DPROFILE_ENABLE")
endif()
option(ENABLE_LINUX_WORK_IRQ_FOR_IKC "Use Linux work IRQ for IKC IPI" ON)
if (ENABLE_LINUX_WORK_IRQ_FOR_IKC)
set(KBUILD_C_FLAGS "${KBUILD_C_FLAGS} -DIHK_IKC_USE_LINUX_WORK_IRQ")
@ -155,11 +218,6 @@ if (ENABLE_QLMPI)
find_package(MPI REQUIRED)
endif()
if (ENABLE_UTI)
pkg_check_modules(LIBSYSCALL_INTERCEPT REQUIRED libsyscall_intercept)
link_directories(${LIBSYSCALL_INTERCEPT_LIBRARY_DIRS})
endif()
string(REGEX REPLACE "^([0-9]+)\\.([0-9]+)\\.([0-9]+)(-([0-9]+)(.*))?" "\\1;\\2;\\3;\\5;\\6" LINUX_VERSION ${UNAME_R})
list(GET LINUX_VERSION 0 LINUX_VERSION_MAJOR)
list(GET LINUX_VERSION 1 LINUX_VERSION_MINOR)
@ -252,6 +310,11 @@ message("KBUILD_C_FLAGS: ${KBUILD_C_FLAGS}")
message("MAP_KERNEL_START: ${MAP_KERNEL_START}")
message("ENABLE_MEMDUMP: ${ENABLE_MEMDUMP}")
message("ENABLE_PERF: ${ENABLE_PERF}")
message("ENABLE_TOFU: ${ENABLE_TOFU}")
message("ENABLE_FUGAKU_HACKS: ${ENABLE_FUGAKU_HACKS}")
message("ENABLE_FUGAKU_DEBUG: ${ENABLE_FUGAKU_DEBUG}")
message("ENABLE_KRM_WORKAROUND: ${ENABLE_KRM_WORKAROUND}")
message("PROFILE_ENABLE: ${PROFILE_ENABLE}")
message("ENABLE_RUSAGE: ${ENABLE_RUSAGE}")
message("ENABLE_QLMPI: ${ENABLE_QLMPI}")
message("ENABLE_UTI: ${ENABLE_UTI}")

View File

@ -143,6 +143,11 @@ void arch_save_panic_regs(void *irq_regs)
clv = get_arm64_this_cpu_local();
/* If kernel mode PF occurred, unroll the causing call stack */
if (cpu_local_var(kernel_mode_pf_regs)) {
regs = cpu_local_var(kernel_mode_pf_regs);
}
/* For user-space, use saved kernel context */
if (regs->pc < USER_END) {
memset(clv->arm64_cpu_local_thread.panic_regs,
@ -725,6 +730,49 @@ static void show_context_stack(struct pt_regs *regs)
}
}
#ifdef ENABLE_FUGAKU_HACKS
void __show_context_stack(struct thread *thread,
unsigned long pc, uintptr_t sp, int kprintf_locked)
{
uintptr_t stack_top;
unsigned long irqflags = 0;
stack_top = ALIGN_UP(sp, (uintptr_t)KERNEL_STACK_SIZE);
if (!kprintf_locked)
irqflags = kprintf_lock();
__kprintf("TID: %d, call stack (most recent first):\n",
thread->tid);
__kprintf("PC: %016lx, SP: %016lx\n", pc, sp);
for (;;) {
extern char _head[], _end[];
uintptr_t *fp, *lr;
fp = (uintptr_t *)sp;
lr = (uintptr_t *)(sp + 8);
if ((*fp <= sp)) {
break;
}
if ((*fp > stack_top)) {
break;
}
if ((*lr < (unsigned long)_head) ||
(*lr > (unsigned long)_end)) {
break;
}
__kprintf("PC: %016lx, SP: %016lx, FP: %016lx\n", *lr - 4, sp, *fp);
sp = *fp;
}
if (!kprintf_locked)
kprintf_unlock(irqflags);
}
#endif
void handle_IPI(unsigned int vector, struct pt_regs *regs)
{
struct ihk_mc_interrupt_handler *h;
@ -786,6 +834,19 @@ void cpu_safe_halt(void)
cpu_enable_interrupt();
}
#ifdef ENABLE_FUGAKU_HACKS
/*@
@ assigns \nothing;
@ ensures \interrupt_disabled == 0;
@*/
void cpu_halt_panic(void)
{
extern void __cpu_do_idle(void);
cpu_enable_interrupt();
__cpu_do_idle();
}
#endif
#if defined(CONFIG_HAS_NMI)
#include <arm-gic-v3.h>
@ -851,6 +912,21 @@ unsigned long cpu_enable_interrupt_save(void)
return flags;
}
#ifdef ENABLE_FUGAKU_HACKS
int cpu_interrupt_disabled(void)
{
unsigned long flags;
unsigned long masked = ICC_PMR_EL1_MASKED;
asm volatile(
"mrs_s %0, " __stringify(ICC_PMR_EL1)
: "=&r" (flags)
:
: "memory");
return (flags == masked);
}
#endif
#else /* defined(CONFIG_HAS_NMI) */
/* @ref.impl arch/arm64/include/asm/irqflags.h::arch_local_irq_enable */
@ -1372,6 +1448,14 @@ void arch_print_stack(void)
{
}
#ifdef ENABLE_FUGAKU_HACKS
unsigned long arch_get_instruction_address(const void *reg)
{
const struct pt_regs *regs = (struct pt_regs *)reg;
return regs->pc;
}
#endif
void arch_show_interrupt_context(const void *reg)
{
const struct pt_regs *regs = (struct pt_regs *)reg;
@ -1440,6 +1524,11 @@ int ihk_mc_arch_get_special_register(enum ihk_asr_type type,
return -1;
}
int ihk_mc_get_interrupt_id(int cpu)
{
return cpu;
}
/*@
@ requires \valid_cpuid(cpu); // valid CPU logical ID
@ ensures \result == 0
@ -1888,15 +1977,15 @@ int arch_cpu_read_write_register(
return ret;
}
int smp_call_func(cpu_set_t *__cpu_set, smp_func_t __func, void *__arg)
{
/* TODO: skeleton for smp_call_func */
return -1;
}
void arch_flush_icache_all(void)
{
asm("ic ialluis");
dsb(ish);
}
int ihk_mc_get_smp_handler_irq(void)
{
return LOCAL_SMP_FUNC_CALL_VECTOR;
}
/*** end of file ***/

View File

@ -89,9 +89,6 @@
mov x2, #0
bl check_signal_irq_disabled // check whether the signal is delivered(for kernel_exit)
.endif
.if \el == 1
bl check_sig_pending
.endif
disable_irq x1 // disable interrupts
.if \need_enable_step == 1
ldr x1, [tsk, #TI_FLAGS]

View File

@ -223,7 +223,12 @@ static int do_translation_fault(unsigned long addr,
unsigned int esr,
struct pt_regs *regs)
{
#ifdef ENABLE_TOFU
// XXX: Handle kernel space page faults for Tofu driver
//if (addr < USER_END)
#else
if (addr < USER_END)
#endif
return do_page_fault(addr, esr, regs);
do_bad_area(addr, esr, regs);

View File

@ -7,7 +7,8 @@
* @ref.impl
* linux-linaro/arch/arm64/include/asm/futex.h:__futex_atomic_op
*/
#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
#define ___futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
do { \
asm volatile( \
"1: ldxr %w1, %2\n" \
insn "\n" \
@ -26,7 +27,24 @@
" .popsection\n" \
: "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp) \
: "r" (oparg), "Ir" (-EFAULT) \
: "memory")
: "memory"); \
} while (0);
#ifndef IHK_OS_MANYCORE
#include <linux/uaccess.h>
#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
do { \
uaccess_enable(); \
___futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
uaccess_disable(); \
} while (0);
#else
#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
___futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
#endif
/*
* @ref.impl
@ -135,12 +153,4 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
return ret;
}
static inline int get_futex_value_locked(uint32_t *dest, uint32_t *from)
{
*dest = *(volatile uint32_t *)from;
return 0;
}
#endif /* !__HEADER_ARM64_COMMON_ARCH_FUTEX_H */

View File

@ -9,6 +9,9 @@
#include "affinity.h"
#include <lwk/compiler.h>
#include "config.h"
#ifdef ENABLE_FUGAKU_HACKS
#include <ihk/debug.h>
#endif
//#define DEBUG_SPINLOCK
//#define DEBUG_MCS_RWLOCK
@ -31,6 +34,10 @@ typedef struct {
#endif /* __AARCH64EB__ */
} __attribute__((aligned(4))) ihk_spinlock_t;
#ifdef ENABLE_FUGAKU_HACKS
extern ihk_spinlock_t *get_this_cpu_runq_lock(void);
#endif
extern void preempt_enable(void);
extern void preempt_disable(void);
@ -98,6 +105,18 @@ static int __ihk_mc_spinlock_trylock_noirq(ihk_spinlock_t *lock)
: "memory");
success = !tmp;
#ifdef ENABLE_FUGAKU_HACKS
#if 0
if (success) {
if (get_this_cpu_runq_lock() == lock &&
!cpu_interrupt_disabled()) {
kprintf("%s: WARNING: runq lock held without IRQs disabled?\n", __func__); \
}
}
#endif
#endif
if (!success) {
preempt_enable();
}
@ -182,6 +201,14 @@ static void __ihk_mc_spinlock_lock_noirq(ihk_spinlock_t *lock)
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
: "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
: "memory");
#ifdef ENABLE_FUGAKU_HACKS
#if 0
if (get_this_cpu_runq_lock() == lock &&
!cpu_interrupt_disabled()) {
kprintf("%s: WARNING: runq lock held without IRQs disabled?\n", __func__); \
}
#endif
#endif
}
#ifdef DEBUG_SPINLOCK

View File

@ -94,7 +94,11 @@ extern char _end[];
# define LD_TASK_UNMAPPED_BASE UL(0x0000080000000000)
# define TASK_UNMAPPED_BASE UL(0x0000100000000000)
# define USER_END UL(0x0000400000000000)
#ifdef ENABLE_TOFU
# define MAP_VMAP_START UL(0xffff7bdfffff0000)
#else
# define MAP_VMAP_START UL(0xffff780000000000)
#endif
# define MAP_VMAP_SIZE UL(0x0000000100000000)
# define MAP_FIXED_START UL(0xffff7ffffbdd0000)
# define MAP_ST_START UL(0xffff800000000000)
@ -142,6 +146,7 @@ extern char _end[];
# define __PTL1_SHIFT 16
# define PTL4_INDEX_MASK 0
# define PTL3_INDEX_MASK ((UL(1) << 6) - 1)
# define PTL3_INDEX_MASK_LINUX ((UL(1) << 10) - 1)
# define PTL2_INDEX_MASK ((UL(1) << 13) - 1)
# define PTL1_INDEX_MASK PTL2_INDEX_MASK
# define __PTL4_CONT_SHIFT (__PTL4_SHIFT + 0)
@ -829,7 +834,13 @@ static inline int pte_is_head(pte_t *ptep, pte_t *old, size_t cont_size)
return page_is_contiguous_head(ptep, cont_size);
}
struct page_table;
typedef pte_t translation_table_t;
struct page_table {
translation_table_t* tt;
translation_table_t* tt_pa;
int asid;
};
void arch_adjust_allocate_page_size(struct page_table *pt,
uintptr_t fault_addr,
pte_t *ptep,
@ -849,7 +860,6 @@ void *map_fixed_area(unsigned long phys, unsigned long size, int uncachable);
void set_address_space_id(struct page_table *pt, int asid);
int get_address_space_id(const struct page_table *pt);
typedef pte_t translation_table_t;
void set_translation_table(struct page_table *pt, translation_table_t* tt);
translation_table_t* get_translation_table(const struct page_table *pt);
translation_table_t* get_translation_table_as_paddr(const struct page_table *pt);

View File

@ -10,4 +10,13 @@ extern void *__inline_memcpy(void *to, const void *from, size_t t);
extern void *__inline_memset(void *s, unsigned long c, size_t count);
#define ARCH_MEMCLEAR
extern void __memclear(void *addr, unsigned long len, void *tmp);
inline static void memclear(void *addr, unsigned long len)
{
uint64_t q0q1[4];
__memclear(addr, len, (void *)&q0q1);
}
#endif /* __HEADER_ARM64_COMMON_ARCH_TIMER_H */

View File

@ -80,6 +80,10 @@ static inline uint64_t __raw_readq(const volatile void *addr)
return val;
}
/* IO barriers */
#define __iormb() rmb()
#define __iowmb() wmb()
/*
* Relaxed I/O memory access primitives. These follow the Device memory
* ordering rules but do not guarantee any ordering relative to Normal memory
@ -95,5 +99,20 @@ static inline uint64_t __raw_readq(const volatile void *addr)
#define writel_relaxed(v,c) ((void)__raw_writel((uint32_t)(v),(c)))
#define writeq_relaxed(v,c) ((void)__raw_writeq((uint64_t)(v),(c)))
/*
* I/O memory access primitives. Reads are ordered relative to any
* following Normal memory access. Writes are ordered relative to any prior
* Normal memory access.
*/
#define readb(c) ({ uint8_t __v = readb_relaxed(c); __iormb(); __v; })
#define readw(c) ({ uint16_t __v = readw_relaxed(c); __iormb(); __v; })
#define readl(c) ({ uint32_t __v = readl_relaxed(c); __iormb(); __v; })
#define readq(c) ({ uint64_t __v = readq_relaxed(c); __iormb(); __v; })
#define writeb(v,c) ({ __iowmb(); writeb_relaxed((v),(c)); })
#define writew(v,c) ({ __iowmb(); writew_relaxed((v),(c)); })
#define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c)); })
#define writeq(v,c) ({ __iowmb(); writeq_relaxed((v),(c)); })
#endif /* __KERNEL__ */
#endif /* __ASM_IO_H */

View File

@ -17,6 +17,7 @@
#define INTRID_STACK_TRACE 5
#define INTRID_MULTI_INTR 6
#define INTRID_MULTI_NMI 7
#define LOCAL_SMP_FUNC_CALL_VECTOR 1 /* same as IKC */
/* use PPI interrupt number */
#define INTRID_PERF_OVF 23

View File

@ -124,7 +124,7 @@ SYSCALL_HANDLED(271, process_vm_writev)
SYSCALL_HANDLED(281, execveat)
SYSCALL_HANDLED(700, get_cpu_id)
#ifdef PROFILE_ENABLE
SYSCALL_HANDLED(__NR_profile, profile)
SYSCALL_HANDLED(PROFILE_EVENT_MAX, profile)
#endif // PROFILE_ENABLE
SYSCALL_HANDLED(730, util_migrate_inter_kernel)
SYSCALL_HANDLED(731, util_indicate_clone)

View File

@ -2,7 +2,7 @@
#ifndef __HEADER_ARM64_COMMON_THREAD_INFO_H
#define __HEADER_ARM64_COMMON_THREAD_INFO_H
#define MIN_KERNEL_STACK_SHIFT 15
#define MIN_KERNEL_STACK_SHIFT 18
#include <arch-memory.h>

View File

@ -7,6 +7,9 @@
#include <process.h>
#include <syscall.h>
#include <ihk/debug.h>
#ifdef ENABLE_FUGAKU_HACKS
#include <ihk/monitor.h>
#endif
#include <arch-timer.h>
#include <cls.h>
@ -313,14 +316,27 @@ void handle_interrupt_gicv3(struct pt_regs *regs)
struct cpu_local_var *v = get_this_cpu_local_var();
//unsigned long irqflags;
int do_check = 0;
#ifdef ENABLE_FUGAKU_HACKS
struct ihk_os_cpu_monitor *monitor = cpu_local_var(monitor);
++v->in_interrupt;
#endif
irqnr = gic_read_iar();
cpu_enable_nmi();
set_cputime(from_user ? CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
while (irqnr != ICC_IAR1_EL1_SPURIOUS) {
if ((irqnr < 1020) || (irqnr >= 8192)) {
gic_write_eoir(irqnr);
#ifndef ENABLE_FUGAKU_HACKS
handle_IPI(irqnr, regs);
#else
/* Once paniced, only allow CPU stop and NMI IRQs */
if (monitor->status != IHK_OS_MONITOR_PANIC ||
irqnr == INTRID_CPU_STOP ||
irqnr == INTRID_MULTI_NMI) {
handle_IPI(irqnr, regs);
}
#endif
}
irqnr = gic_read_iar();
}
@ -328,14 +344,22 @@ void handle_interrupt_gicv3(struct pt_regs *regs)
//irqflags = ihk_mc_spinlock_lock(&v->runq_lock);
/* For migration by IPI or by timesharing */
if (v->flags &
(CPU_FLAG_NEED_MIGRATE | CPU_FLAG_NEED_RESCHED)) {
v->flags &= ~CPU_FLAG_NEED_RESCHED;
do_check = 1;
if (v->flags & CPU_FLAG_NEED_RESCHED) {
if (v->flags & CPU_FLAG_NEED_MIGRATE && !from_user) {
// Don't migrate on K2K schedule
} else {
v->flags &= ~CPU_FLAG_NEED_RESCHED;
do_check = 1;
}
}
//ihk_mc_spinlock_unlock(&v->runq_lock, irqflags);
#ifndef ENABLE_FUGAKU_HACKS
if (do_check) {
#else
--v->in_interrupt;
if (monitor->status != IHK_OS_MONITOR_PANIC && do_check) {
#endif
check_signal(0, regs, 0);
schedule();
}

View File

@ -150,12 +150,6 @@ void flush_tlb_single(unsigned long addr)
arch_flush_tlb_single(asid, addr);
}
struct page_table {
translation_table_t* tt;
translation_table_t* tt_pa;
int asid;
};
extern struct page_table swapper_page_table;
static struct page_table *init_pt = &swapper_page_table;
static ihk_spinlock_t init_pt_lock;
@ -223,6 +217,13 @@ static inline int ptl4_index(unsigned long addr)
int idx = (addr >> PTL4_SHIFT) & PTL4_INDEX_MASK;
return idx;
}
#ifdef ENABLE_TOFU
static inline int ptl3_index_linux(unsigned long addr)
{
int idx = (addr >> PTL3_SHIFT) & PTL3_INDEX_MASK_LINUX;
return idx;
}
#endif
static inline int ptl3_index(unsigned long addr)
{
int idx = (addr >> PTL3_SHIFT) & PTL3_INDEX_MASK;
@ -281,6 +282,40 @@ static inline pte_t* ptl4_offset(const translation_table_t* ptl4, unsigned long
}
return ptep;
}
#ifdef ENABLE_TOFU
static inline pte_t* ptl3_offset_linux(const pte_t* l4p, unsigned long addr)
{
pte_t* ptep = NULL;
pte_t pte = 0;
unsigned long phys = 0;
translation_table_t* ptl3 = NULL;
int idx = 0;
switch (CONFIG_ARM64_PGTABLE_LEVELS)
{
case 4:
pte = ptl4_val(l4p);
phys = pte & PT_PHYSMASK;
ptl3 = phys_to_virt(phys);
idx = ptl3_index_linux(addr);
ptep = (pte_t*)ptl3 + idx;
break;
case 3:
ptl3 = (translation_table_t*)l4p;
idx = ptl3_index_linux(addr);
ptep = (pte_t*)ptl3 + idx;
break;
case 2:
case 1:
/* PTL3が無いときにはエントリではなくページテーブルのアドレスを引渡していく。*/
ptep = (pte_t*)l4p;
break;
}
return ptep;
}
#endif
static inline pte_t* ptl3_offset(const pte_t* l4p, unsigned long addr)
{
pte_t* ptep = NULL;
@ -959,7 +994,14 @@ static void init_normal_area(struct page_table *pt)
int i;
tt = get_translation_table(pt);
#ifdef ENABLE_TOFU
setup(tt,
arm64_st_phys_base,
arm64_st_phys_base + (1UL << 40));
return;
#endif
for (i = 0; i < ihk_mc_get_nr_memory_chunks(); i++) {
unsigned long map_start, map_end;
int numa_id;
@ -1287,6 +1329,58 @@ out:
return ret;
}
#ifdef ENABLE_TOFU
int ihk_mc_linux_pt_virt_to_phys_size(struct page_table *pt,
const void *virt,
unsigned long *phys,
unsigned long *size)
{
unsigned long v = (unsigned long)virt;
pte_t* ptep;
translation_table_t* tt;
unsigned long paddr;
unsigned long lsize;
tt = get_translation_table(pt);
ptep = ptl4_offset(tt, v);
if (!ptl4_present(ptep)) {
return -EFAULT;
}
ptep = ptl3_offset_linux(ptep, v);
if (!ptl3_present(ptep)) {
return -EFAULT;
}
if (ptl3_type_block(ptep)) {
paddr = ptl3_phys(ptep);
lsize = PTL3_SIZE;
goto out;
}
ptep = ptl2_offset(ptep, v);
if (!ptl2_present(ptep)) {
return -EFAULT;
}
if (ptl2_type_block(ptep)) {
paddr = ptl2_phys(ptep);
lsize = PTL2_SIZE;
goto out;
}
ptep = ptl1_offset(ptep, v);
if (!ptl1_present(ptep)) {
return -EFAULT;
}
paddr = ptl1_phys(ptep);
lsize = PTL1_SIZE;
out:
*phys = paddr | (v & (lsize - 1));
if(size) *size = lsize;
return 0;
}
#endif
int ihk_mc_pt_virt_to_phys_size(struct page_table *pt,
const void *virt,
@ -1348,7 +1442,6 @@ int ihk_mc_pt_virt_to_phys(struct page_table *pt,
return ihk_mc_pt_virt_to_phys_size(pt, virt, phys, NULL);
}
int ihk_mc_pt_print_pte(struct page_table *pt, void *virt)
{
const unsigned long v = (unsigned long)virt;
@ -1360,6 +1453,15 @@ int ihk_mc_pt_print_pte(struct page_table *pt, void *virt)
}
tt = get_translation_table(pt);
__kprintf("%s: 0x%lx, CONFIG_ARM64_PGTABLE_LEVELS: %d, ptl4_index: %ld, ptl3_index: %ld, ptl2_index: %ld, ptl1_index: %ld\n",
__func__,
v,
CONFIG_ARM64_PGTABLE_LEVELS,
ptl4_index(v),
ptl3_index(v),
ptl2_index(v),
ptl1_index(v));
ptep = ptl4_offset(tt, v);
__kprintf("l4 table: 0x%lX l4idx: %d\n", virt_to_phys(tt), ptl4_index(v));
if (!(ptl4_present(ptep))) {
@ -2147,6 +2249,198 @@ static void unmap_free_stat(struct page *page, unsigned long phys,
}
}
/*
* Kernel space page table clearing functions.
*/
struct clear_kernel_range_args {
int free_physical;
};
static int clear_kernel_range_middle(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end, int level);
static int clear_kernel_range_l1(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end)
{
const struct table {
unsigned long pgsize;
unsigned long cont_pgsize;
} tbl = {
.pgsize = PTL1_SIZE,
.cont_pgsize = PTL1_CONT_SIZE
};
struct clear_kernel_range_args *args = args0;
uint64_t phys = 0;
pte_t old;
size_t clear_size;
if (ptl1_null(ptep)) {
return -ENOENT;
}
old = xchg(ptep, PTE_NULL);
if (!pte_is_present(&old))
return 0;
arch_flush_tlb_single(0, base);
clear_size = pte_is_contiguous(&old) ?
tbl.cont_pgsize : tbl.pgsize;
dkprintf("%s: 0x%lx:%lu unmapped\n",
__func__, base, clear_size);
if (args->free_physical) {
phys = ptl1_phys(&old);
ihk_mc_free_pages(phys_to_virt(phys), clear_size >> PAGE_SHIFT);
}
return 0;
}
static int clear_kernel_range_l2(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end)
{
return clear_kernel_range_middle(args0, ptep, base, start, end, 2);
}
static int clear_kernel_range_l3(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end)
{
return clear_kernel_range_middle(args0, ptep, base, start, end, 3);
}
static int clear_kernel_range_l4(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end)
{
return clear_kernel_range_middle(args0, ptep, base, start, end, 4);
}
static int clear_kernel_range_middle(void *args0, pte_t *ptep, uint64_t base,
uint64_t start, uint64_t end, int level)
{
const struct table {
walk_pte_t* walk;
walk_pte_fn_t* callback;
unsigned long pgsize;
unsigned long cont_pgsize;
} table[] = {
{walk_pte_l1, clear_kernel_range_l1, PTL2_SIZE, PTL2_CONT_SIZE}, /*PTL2*/
{walk_pte_l2, clear_kernel_range_l2, PTL3_SIZE, PTL3_CONT_SIZE}, /*PTL3*/
{walk_pte_l3, clear_kernel_range_l3, PTL4_SIZE, PTL4_CONT_SIZE}, /*PTL4*/
};
const struct table tbl = table[level-2];
struct clear_kernel_range_args *args = args0;
uint64_t phys = 0;
translation_table_t *tt;
int error;
pte_t old;
size_t clear_size;
if (ptl_null(ptep, level)) {
return -ENOENT;
}
dkprintf("%s(level: %d): 0x%lx in 0x%lx-0x%lx\n",
__func__, level, base, start, end);
if (ptl_type_page(ptep, level)
&& ((base < start) || (end < (base + tbl.pgsize)))) {
error = -EINVAL;
ekprintf("clear_range_middle(%p,%p,%lx,%lx,%lx,%d):"
"split page. %d\n",
args0, ptep, base, start, end, level, error);
return error;
}
if (ptl_type_page(ptep, level)) {
old = xchg(ptep, PTE_NULL);
if (!ptl_present(&old, level)) {
return 0;
}
arch_flush_tlb_single(0, base);
clear_size = pte_is_contiguous(&old) ?
tbl.cont_pgsize : tbl.pgsize;
dkprintf("%s(level: %d): 0x%lx:%lu unmapped\n",
__func__, level, base, clear_size);
if (args->free_physical) {
phys = ptl_phys(&old, level);
ihk_mc_free_pages(phys_to_virt(phys), clear_size >> PAGE_SHIFT);
}
return 0;
}
tt = (translation_table_t*)phys_to_virt(ptl_phys(ptep, level));
error = tbl.walk(tt, base, start, end, tbl.callback, args0);
if (error && (error != -ENOENT)) {
return error;
}
if (args->free_physical) {
if ((start <= base) && ((base + tbl.pgsize) <= end)) {
ptl_clear(ptep, level);
arch_flush_tlb_single(0, base);
ihk_mc_free_pages(tt, 1);
}
}
return 0;
}
static int clear_kernel_range(uintptr_t start, uintptr_t end, int free_physical)
{
const struct table {
walk_pte_t* walk;
walk_pte_fn_t* callback;
} tables[] = {
{walk_pte_l2, clear_kernel_range_l2}, /*second*/
{walk_pte_l3, clear_kernel_range_l3}, /*first*/
{walk_pte_l4, clear_kernel_range_l4}, /*zero*/
};
const struct table initial_lookup = tables[CONFIG_ARM64_PGTABLE_LEVELS - 2];
int error;
struct clear_kernel_range_args args;
translation_table_t* tt;
unsigned long irqflags;
dkprintf("%s: start: 0x%lx, end: 0x%lx, free phys: %d\n",
__func__, start, end, free_physical);
if (start <= USER_END)
return -EINVAL;
args.free_physical = free_physical;
irqflags = ihk_mc_spinlock_lock(&init_pt_lock);
tt = get_translation_table(get_init_page_table());
error = initial_lookup.walk(tt, 0,
(start & ~(0xffff000000000000)),
(end & ~(0xffff000000000000)),
initial_lookup.callback, &args);
dkprintf("%s: start: 0x%lx, end: 0x%lx, free phys: %d, ret: %d\n",
__func__, start, end, free_physical, error);
ihk_mc_spinlock_unlock(&init_pt_lock, irqflags);
return error;
}
int ihk_mc_clear_kernel_range(void *start, void *end)
{
#define KEEP_PHYSICAL 0
return clear_kernel_range((uintptr_t)start, (uintptr_t)end, KEEP_PHYSICAL);
}
/*
* User space page table clearing functions.
*/
struct clear_range_args {
int free_physical;
struct memobj *memobj;
@ -2344,6 +2638,14 @@ static int clear_range(struct page_table *pt, struct process_vm *vm,
if (memobj && ((memobj->flags & MF_PREMAP))) {
args.free_physical = 0;
}
if (vm->proc->straight_va &&
(void *)start == vm->proc->straight_va &&
(void *)end == (vm->proc->straight_va +
vm->proc->straight_len)) {
args.free_physical = 0;
}
args.memobj = memobj;
args.vm = vm;

View File

@ -218,3 +218,41 @@ ENTRY(__inline_memset)
ret
ENDPIPROC(__inline_memset)
ENDPROC(____inline_memset)
/*
* Non-temporal vector memory clear
*
* Parameters:
* x0 - buf (assumed to be aligned to page size)
* x1 - n (assumed to be at least page size)
*/
ENTRY(__memclear)
stp q0, q1, [x2] /* Preserve two 128 bit vector regs */
eor v0.16B, v0.16B, v0.16B
eor v1.16B, v1.16B, v1.16B
1:
stnp q0, q1, [x0, #32 * 0]
stnp q0, q1, [x0, #32 * 1]
stnp q0, q1, [x0, #32 * 2]
stnp q0, q1, [x0, #32 * 3]
stnp q0, q1, [x0, #32 * 4]
stnp q0, q1, [x0, #32 * 5]
stnp q0, q1, [x0, #32 * 6]
stnp q0, q1, [x0, #32 * 7]
stnp q0, q1, [x0, #32 * 8]
stnp q0, q1, [x0, #32 * 9]
stnp q0, q1, [x0, #32 * 10]
stnp q0, q1, [x0, #32 * 11]
stnp q0, q1, [x0, #32 * 12]
stnp q0, q1, [x0, #32 * 13]
stnp q0, q1, [x0, #32 * 14]
stnp q0, q1, [x0, #32 * 15]
add x0, x0, #512
subs x1, x1, #512
cmp x1, #0
b.ne 1b
ldp q0, q1, [x2] /* Restore vector regs */
ret
ENDPROC(__memclear)

View File

@ -16,6 +16,7 @@
#include <uio.h>
#include <syscall.h>
#include <rusage_private.h>
#include <memory.h>
#include <ihk/debug.h>
void terminate_mcexec(int, int);
@ -1071,6 +1072,9 @@ static int setup_rt_frame(int usig, unsigned long rc, int to_restart,
if (k->sa.sa_flags & SA_RESTORER){
regs->regs[30] = (unsigned long)k->sa.sa_restorer;
#ifdef ENABLE_FUGAKU_HACKS
kprintf("%s: SA_RESTORER: 0x%lx\n", __func__, regs->regs[30]);
#endif
} else {
regs->regs[30] = (unsigned long)VDSO_SYMBOL(thread->vm->vdso_addr, sigtramp);
}
@ -1723,9 +1727,18 @@ SYSCALL_DECLARE(mmap)
/* check arguments */
pgsize = PAGE_SIZE;
#ifndef ENABLE_FUGAKU_HACKS
if (flags & MAP_HUGETLB) {
int hugeshift = flags & (0x3F << MAP_HUGE_SHIFT);
/* OpenMPI expects -EINVAL when trying to map
* /dev/shm/ file with MAP_SHARED | MAP_HUGETLB
*/
if (!(flags & MAP_ANONYMOUS)) {
error = -EINVAL;
goto out;
}
if (hugeshift == 0) {
/* default hugepage size */
flags |= ihk_mc_get_linux_default_huge_page_shift() <<
@ -1755,6 +1768,11 @@ SYSCALL_DECLARE(mmap)
goto out;
}
}
#else
if (flags & MAP_HUGETLB) {
flags &= ~(MAP_HUGETLB);
}
#endif
#define VALID_DUMMY_ADDR ((region->user_start + PTL3_SIZE - 1) & ~(PTL3_SIZE - 1))
addr = (flags & MAP_FIXED)? addr0: VALID_DUMMY_ADDR;
@ -2233,8 +2251,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
memset(mpsr->nr_pages, 0, sizeof(int) * count);
@ -2252,8 +2272,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 1:
@ -2275,8 +2297,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * count);
break;
case 1:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 2:
@ -2305,8 +2329,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * (count / 2));
break;
case 2:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 3:
@ -2332,13 +2358,15 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
}
/* NUMA verification in parallel */
for (i = i_s; i < i_e; i++) {
if (mpsr->nodes[i] < 0 ||
mpsr->nodes[i] >= ihk_mc_get_nr_numa_nodes() ||
!test_bit(mpsr->nodes[i],
mpsr->proc->vm->numa_mask)) {
mpsr->phase_ret = -EINVAL;
break;
if (mpsr->user_nodes) {
for (i = i_s; i < i_e; i++) {
if (mpsr->nodes[i] < 0 ||
mpsr->nodes[i] >= ihk_mc_get_nr_numa_nodes() ||
!test_bit(mpsr->nodes[i],
mpsr->proc->vm->numa_mask)) {
mpsr->phase_ret = -EINVAL;
break;
}
}
}
@ -2370,7 +2398,7 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
/* PTE valid? */
if (!mpsr->ptep[i] || !pte_is_present(mpsr->ptep[i])) {
mpsr->status[i] = -ENOENT;
mpsr->status[i] = -EFAULT;
mpsr->ptep[i] = NULL;
continue;
}
@ -2434,6 +2462,26 @@ pte_out:
dkprintf("%s: phase %d done\n", __FUNCTION__, phase);
++phase;
/*
* When nodes array is NULL, move_pages doesn't move any pages,
* instead will return the node where each page
* currently resides by status array.
*/
if (!mpsr->user_nodes) {
/* get nid in parallel */
for (i = i_s; i < i_e; i++) {
if (mpsr->status[i] < 0) {
continue;
}
mpsr->status[i] = phys_to_nid(
pte_get_phys(mpsr->ptep[i]));
}
mpsr->phase_ret = 0;
goto out; // return node information
}
/* Processing of move pages */
if (cpu_index == 0) {
/* Allocate new pages on target NUMA nodes */
for (i = 0; i < count; i++) {
@ -2446,8 +2494,11 @@ pte_out:
/* TODO: store pgalign info in an array as well? */
if (mpsr->nr_pages[i] > 1) {
if (mpsr->nr_pages[i] * PAGE_SIZE == PTL2_SIZE)
pgalign = PTL2_SHIFT - PTL1_SHIFT;
int nr_pages;
for (pgalign = 0, nr_pages = mpsr->nr_pages[i];
nr_pages != 1; pgalign++, nr_pages >>= 1) {
}
}
dst = ihk_mc_alloc_aligned_pages_node(mpsr->nr_pages[i],

View File

@ -174,9 +174,14 @@ void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
arch_show_interrupt_context(regs);
#ifdef ENABLE_TOFU
info.si_signo = SIGSTOP;
info.si_errno = 0;
#else
info.si_signo = SIGILL;
info.si_errno = 0;
info.si_code = ILL_ILLOPC;
#endif
info._sifields._sigfault.si_addr = (void*)regs->pc;
arm64_notify_die("Oops - bad mode", regs, &info, 0);

View File

@ -80,7 +80,11 @@ static void (*lapic_icr_write)(unsigned int h, unsigned int l);
static void (*lapic_wait_icr_idle)(void);
void (*x86_issue_ipi)(unsigned int apicid, unsigned int low);
int running_on_kvm(void);
static void smp_func_call_handler(void);
void smp_func_call_handler(void);
int ihk_mc_get_smp_handler_irq(void)
{
return LOCAL_SMP_FUNC_CALL_VECTOR;
}
void init_processors_local(int max_id);
void assign_processor_id(void);
@ -868,6 +872,49 @@ void show_context_stack(uintptr_t *rbp) {
return;
}
#ifdef ENABLE_FUGAKU_HACKS
void __show_context_stack(struct thread *thread,
unsigned long pc, uintptr_t sp, int kprintf_locked)
{
uintptr_t stack_top;
unsigned long irqflags = 0;
stack_top = ALIGN_UP(sp, (uintptr_t)KERNEL_STACK_SIZE);
if (!kprintf_locked)
irqflags = kprintf_lock();
__kprintf("TID: %d, call stack (most recent first):\n",
thread->tid);
__kprintf("PC: %016lx, SP: %016lx\n", pc, sp);
for (;;) {
extern char _head[], _end[];
uintptr_t *fp, *lr;
fp = (uintptr_t *)sp;
lr = (uintptr_t *)(sp + 8);
if ((*fp <= sp)) {
break;
}
if ((*fp > stack_top)) {
break;
}
if ((*lr < (unsigned long)_head) ||
(*lr > (unsigned long)_end)) {
break;
}
__kprintf("PC: %016lx, SP: %016lx, FP: %016lx\n", *lr - 4, sp, *fp);
sp = *fp;
}
if (!kprintf_locked)
kprintf_unlock(irqflags);
}
#endif
void interrupt_exit(struct x86_user_context *regs)
{
if (interrupt_from_user(regs)) {
@ -876,20 +923,18 @@ void interrupt_exit(struct x86_user_context *regs)
check_need_resched();
check_signal(0, regs, -1);
}
else {
check_sig_pending();
}
}
void handle_interrupt(int vector, struct x86_user_context *regs)
{
struct ihk_mc_interrupt_handler *h;
struct cpu_local_var *v = get_this_cpu_local_var();
int from_user = interrupt_from_user(regs);
lapic_ack();
++v->in_interrupt;
set_cputime(interrupt_from_user(regs) ?
set_cputime(from_user ?
CPUTIME_MODE_U2K : CPUTIME_MODE_K2K_IN);
dkprintf("CPU[%d] got interrupt, vector: %d, RIP: 0x%lX\n",
@ -1007,15 +1052,18 @@ void handle_interrupt(int vector, struct x86_user_context *regs)
}
interrupt_exit(regs);
set_cputime(interrupt_from_user(regs) ?
set_cputime(from_user ?
CPUTIME_MODE_K2U : CPUTIME_MODE_K2K_OUT);
--v->in_interrupt;
/* for migration by IPI */
if (v->flags & CPU_FLAG_NEED_MIGRATE) {
schedule();
check_signal(0, regs, 0);
// Don't migrate on K2K schedule
if (from_user) {
schedule();
check_signal(0, regs, 0);
}
}
}
@ -1137,6 +1185,17 @@ void cpu_halt(void)
asm volatile("hlt");
}
#ifdef ENABLE_FUGAKU_HACKS
/*@
@ assigns \nothing;
@ ensures \interrupt_disabled == 0;
@*/
void cpu_halt_panic(void)
{
cpu_halt();
}
#endif
/*@
@ assigns \nothing;
@ ensures \interrupt_disabled == 0;
@ -1521,6 +1580,16 @@ void arch_print_stack(void)
__print_stack(rbp, 0);
}
#ifdef ENABLE_FUGAKU_HACKS
unsigned long arch_get_instruction_address(const void *reg)
{
const struct x86_user_context *uctx = reg;
const struct x86_basic_regs *regs = &uctx->gpr;
return regs->rip;
}
#endif
/*@
@ requires \valid(reg);
@ assigns \nothing;
@ -1609,6 +1678,11 @@ int ihk_mc_arch_get_special_register(enum ihk_asr_type type,
}
}
int ihk_mc_get_interrupt_id(int cpu)
{
return get_x86_cpu_local_variable(cpu)->apic_id;
}
/*@
@ requires \valid_cpuid(cpu); // valid CPU logical ID
@ ensures \result == 0
@ -2106,144 +2180,6 @@ int arch_cpu_read_write_register(
return 0;
}
/*
* Generic remote CPU function invocation facility.
*/
static void smp_func_call_handler(void)
{
int irq_flags;
struct smp_func_call_request *req;
int reqs_left;
reiterate:
req = NULL;
reqs_left = 0;
irq_flags = ihk_mc_spinlock_lock(
&cpu_local_var(smp_func_req_lock));
/* Take requests one-by-one */
if (!list_empty(&cpu_local_var(smp_func_req_list))) {
req = list_first_entry(&cpu_local_var(smp_func_req_list),
struct smp_func_call_request, list);
list_del(&req->list);
reqs_left = !list_empty(&cpu_local_var(smp_func_req_list));
}
ihk_mc_spinlock_unlock(&cpu_local_var(smp_func_req_lock),
irq_flags);
if (req) {
req->ret = req->sfcd->func(req->cpu_index,
req->sfcd->nr_cpus, req->sfcd->arg);
ihk_atomic_dec(&req->sfcd->cpus_left);
}
if (reqs_left)
goto reiterate;
}
int smp_call_func(cpu_set_t *__cpu_set, smp_func_t __func, void *__arg)
{
int cpu, nr_cpus = 0;
int cpu_index = 0;
int this_cpu_index = 0;
struct smp_func_call_data sfcd;
struct smp_func_call_request *reqs;
int ret = 0;
int call_on_this_cpu = 0;
cpu_set_t cpu_set;
/* Sanity checks */
if (!__cpu_set || !__func) {
return -EINVAL;
}
/* Make sure it won't change in between */
cpu_set = *__cpu_set;
for_each_set_bit(cpu, (unsigned long *)&cpu_set,
sizeof(cpu_set) * BITS_PER_BYTE) {
if (cpu == ihk_mc_get_processor_id()) {
call_on_this_cpu = 1;
}
++nr_cpus;
}
if (!nr_cpus) {
return -EINVAL;
}
reqs = kmalloc(sizeof(*reqs) * nr_cpus, IHK_MC_AP_NOWAIT);
if (!reqs) {
ret = -ENOMEM;
goto free_out;
}
sfcd.nr_cpus = nr_cpus;
sfcd.func = __func;
sfcd.arg = __arg;
ihk_atomic_set(&sfcd.cpus_left,
call_on_this_cpu ? nr_cpus - 1 : nr_cpus);
/* Add requests and send IPIs */
cpu_index = 0;
for_each_set_bit(cpu, (unsigned long *)&cpu_set,
sizeof(cpu_set) * BITS_PER_BYTE) {
unsigned long irq_flags;
reqs[cpu_index].cpu_index = cpu_index;
reqs[cpu_index].ret = 0;
if (cpu == ihk_mc_get_processor_id()) {
this_cpu_index = cpu_index;
++cpu_index;
continue;
}
reqs[cpu_index].sfcd = &sfcd;
irq_flags =
ihk_mc_spinlock_lock(&get_cpu_local_var(cpu)->smp_func_req_lock);
list_add_tail(&reqs[cpu_index].list,
&get_cpu_local_var(cpu)->smp_func_req_list);
ihk_mc_spinlock_unlock(&get_cpu_local_var(cpu)->smp_func_req_lock,
irq_flags);
ihk_mc_interrupt_cpu(cpu, LOCAL_SMP_FUNC_CALL_VECTOR);
++cpu_index;
}
/* Is this CPU involved? */
if (call_on_this_cpu) {
reqs[this_cpu_index].ret =
__func(this_cpu_index, nr_cpus, __arg);
}
/* Wait for the rest of the CPUs */
while (ihk_atomic_read(&sfcd.cpus_left) > 0) {
cpu_pause();
}
/* Check return values, if error, report the first non-zero */
for (cpu_index = 0; cpu_index < nr_cpus; ++cpu_index) {
if (reqs[cpu_index].ret != 0) {
ret = reqs[cpu_index].ret;
goto free_out;
}
}
ret = 0;
free_out:
kfree(reqs);
return ret;
}
extern int nmi_mode;
extern long freeze_thaw(void *nmi_ctx);

View File

@ -129,12 +129,4 @@ static inline int futex_atomic_op_inuser(int encoded_op,
return ret;
}
static inline int get_futex_value_locked(uint32_t *dest, uint32_t *from)
{
*dest = *(volatile uint32_t *)from;
return 0;
}
#endif

View File

@ -451,4 +451,12 @@ extern unsigned long ap_trampoline;
/* Local is cachable */
#define IHK_IKC_QUEUE_PT_ATTR (PTATTR_NO_EXECUTE | PTATTR_WRITABLE)
#ifdef ENABLE_FUGAKU_HACKS
#ifndef __ASSEMBLY__
# define ALIGN_UP(x, align) ALIGN_DOWN((x) + (align) - 1, align)
# define ALIGN_DOWN(x, align) ((x) & ~((align) - 1))
#endif /* !__ASSEMBLY__ */
#endif
#endif

View File

@ -53,5 +53,9 @@ struct x86_cpu_local_variables *get_x86_this_cpu_local(void);
void *get_x86_cpu_local_kstack(int id);
void *get_x86_this_cpu_kstack(void);
#ifdef ENABLE_FUGAKU_HACKS
#define LOCALS_SPAN (4 * PAGE_SIZE)
#define KERNEL_STACK_SIZE LOCALS_SPAN
#endif
#endif

View File

@ -168,7 +168,7 @@ SYSCALL_HANDLED(311, process_vm_writev)
SYSCALL_HANDLED(322, execveat)
SYSCALL_HANDLED(700, get_cpu_id)
#ifdef PROFILE_ENABLE
SYSCALL_HANDLED(__NR_profile, profile)
SYSCALL_HANDLED(PROFILE_EVENT_MAX, profile)
#endif // PROFILE_ENABLE
SYSCALL_HANDLED(730, util_migrate_inter_kernel)
SYSCALL_HANDLED(731, util_indicate_clone)

View File

@ -21,7 +21,9 @@
#include <registers.h>
#include <string.h>
#ifndef ENABLE_FUGAKU_HACKS
#define LOCALS_SPAN (4 * PAGE_SIZE)
#endif
struct x86_cpu_local_variables *locals;
size_t x86_cpu_local_variables_span = LOCALS_SPAN; /* for debugger */

View File

@ -1651,6 +1651,14 @@ static int clear_range(struct page_table *pt, struct process_vm *vm,
if (memobj && ((memobj->flags & MF_PREMAP))) {
args.free_physical = 0;
}
if (vm->proc->straight_va &&
(void *)start == vm->proc->straight_va &&
(void *)end == (vm->proc->straight_va +
vm->proc->straight_len)) {
args.free_physical = 0;
}
args.memobj = memobj;
args.vm = vm;

View File

@ -32,6 +32,7 @@
#include <limits.h>
#include <syscall.h>
#include <rusage_private.h>
#include <memory.h>
#include <ihk/debug.h>
void terminate_mcexec(int, int);
@ -1430,6 +1431,14 @@ SYSCALL_DECLARE(mmap)
/* check arguments */
pgsize = PAGE_SIZE;
if (flags & MAP_HUGETLB) {
/* OpenMPI expects -EINVAL when trying to map
* /dev/shm/ file with MAP_SHARED | MAP_HUGETLB
*/
if (!(flags & MAP_ANONYMOUS)) {
error = -EINVAL;
goto out;
}
switch (flags & (0x3F << MAP_HUGE_SHIFT)) {
case 0:
/* default hugepage size */
@ -2294,8 +2303,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
memset(mpsr->ptep, 0, sizeof(pte_t) * count);
memset(mpsr->status, 0, sizeof(int) * count);
memset(mpsr->nr_pages, 0, sizeof(int) * count);
@ -2313,8 +2324,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
case 0:
memcpy(mpsr->virt_addr, mpsr->user_virt_addr,
sizeof(void *) * count);
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 1:
@ -2336,8 +2349,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * count);
break;
case 1:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 2:
@ -2366,8 +2381,10 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
sizeof(void *) * (count / 2));
break;
case 2:
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
if (mpsr->user_nodes) {
memcpy(mpsr->nodes, mpsr->user_nodes,
sizeof(int) * count);
}
mpsr->nodes_ready = 1;
break;
case 3:
@ -2393,13 +2410,15 @@ int move_pages_smp_handler(int cpu_index, int nr_cpus, void *arg)
}
/* NUMA verification in parallel */
for (i = i_s; i < i_e; i++) {
if (mpsr->nodes[i] < 0 ||
mpsr->nodes[i] >= ihk_mc_get_nr_numa_nodes() ||
!test_bit(mpsr->nodes[i],
mpsr->proc->vm->numa_mask)) {
mpsr->phase_ret = -EINVAL;
break;
if (mpsr->user_nodes) {
for (i = i_s; i < i_e; i++) {
if (mpsr->nodes[i] < 0 ||
mpsr->nodes[i] >= ihk_mc_get_nr_numa_nodes() ||
!test_bit(mpsr->nodes[i],
mpsr->proc->vm->numa_mask)) {
mpsr->phase_ret = -EINVAL;
break;
}
}
}
@ -2495,6 +2514,26 @@ pte_out:
dkprintf("%s: phase %d done\n", __FUNCTION__, phase);
++phase;
/*
* When nodes array is NULL, move_pages doesn't move any pages,
* instead will return the node where each page
* currently resides by status array.
*/
if (!mpsr->user_nodes) {
/* get nid in parallel */
for (i = i_s; i < i_e; i++) {
if (mpsr->status[i] < 0) {
continue;
}
mpsr->status[i] = phys_to_nid(
pte_get_phys(mpsr->ptep[i]));
}
mpsr->phase_ret = 0;
goto out; // return node information
}
/* Processing of move pages */
if (cpu_index == 0) {
/* Allocate new pages on target NUMA nodes */
for (i = 0; i < count; i++) {

View File

@ -30,6 +30,9 @@ endif ()
if (NOT "${LINUX_ARCH}" STREQUAL "${CMAKE_HOST_SYSTEM_PROCESSOR}")
string(REGEX REPLACE "ld$" "" CROSS_COMPILE "${CMAKE_LINKER}")
if (CMAKE_CROSSCOMPILING)
list(APPEND KBUILD_MAKE_FLAGS "QEMU_LD_PREFIX=${CMAKE_FIND_ROOT_PATH}")
endif()
list(APPEND KBUILD_MAKE_FLAGS "ARCH=${ARCH}")
list(APPEND KBUILD_MAKE_FLAGS "CROSS_COMPILE=${CROSS_COMPILE}")
endif()

View File

@ -1,11 +1,12 @@
=============================================
Version 1.7.0-0.93 (Aug 1, 2020)
Version 1.7.0 (Nov 25, 2020)
=============================================
----------------------
IHK major updates
----------------------
#. ihklib: add ihk_create_os_str
#. ihklib: ihk_reserve_mem: add capped best effort to balanced
------------------------
IHK major bug fixes

Binary file not shown.

View File

@ -129,11 +129,29 @@ Create the tarball and the spec file:
make dist
cp mckernel-<version>.tar.gz <rpmbuild>/SOURCES
(optional) Edit the following line in ``scripts/mckernel.spec`` to change
cmake options. For example:
::
%cmake -DCMAKE_BUILD_TYPE=Release \
-DUNAME_R=%{kernel_version} \
-DKERNEL_DIR=%{kernel_dir} \
%{?cmake_libdir:-DCMAKE_INSTALL_LIBDIR=%{cmake_libdir}} \
%{?build_target:-DBUILD_TARGET=%{build_target}} \
%{?toolchain_file:-DCMAKE_TOOLCHAIN_FILE=%{toolchain_file}} \
-DENABLE_TOFU=ON -DENABLE_FUGAKU_HACKS=ON \
-DENABLE_KRM_WORKAROUND=OFF -DWITH_KRM=ON \
-DENABLE_FUGAKU_DEBUG=OFF \
.
Create the rpm package:
When not cross-compiling:
"""""""""""""""""""""""""
Then build the rpm:
::
rpmbuild -ba scripts/mckernel.spec

BIN
docs/spec/ihk.pdf Normal file

Binary file not shown.

View File

@ -2,17 +2,6 @@
:suffix: .
:depth: 3
External Specs
Specifications
==============
Overview
--------
Function Specs
--------------
Command / Daemon Specs
----------------------
Booting LWK
===========
The specifications pdf is :download:`here <ihk.pdf>`

View File

@ -2,8 +2,6 @@
:suffix: .
:depth: 3
Interfaces
==========
Interface details
=================
Specifications
==============
The specifications pdf is :download:`here <mckernel.pdf>`

View File

@ -34,6 +34,19 @@ For example, with Fujitsu Technical Computing Suite (TCS), you need to specify `
#PJM -L jobenv=mck1
(Optional, Fujitsu TCS only) Specify boot parameters
----------------------------------------------------
You can specify the boot parameters by defining environmental variables and pass them to Fujitsu TCS.
The parameters include the resource reservation settings, resource reservation amount, kernel arguments and routing of message channels between McKernel CPUs and Linux CPUs.
See `IHK Specifications - ihk_create_os_str() <spec/ihk.html>`__ for the parameter names and allowed values.
The example of setting the memory amount is shown below.
.. code-block:: none
export IHK_MEM="7G@4,7G@5,7G@6,7G@7"
pjsub -X run.sh
Insert ``mcexec`` into the command line
---------------------------------------
@ -183,3 +196,27 @@ Limitations
26. mmap() allows unlimited overcommit. Note that it corresponds to
setting sysctl ``vm.overcommit_memory`` to 1.
27. mlockall() is not supported and returns -EPERM.
28. munlockall() is not supported and returns zero.
29. scheduling behavior is not Linux compatible. For example, sometimes one of the two processes on the same CPU continues to run after yielding.
30. (Fujitsu TCS-only) A job following the one in which __mcctrl_os_read_write_cpu_register() returns ``-ETIME`` fails because xos_hwb related CPU state isn't finalized. You can tell if the function returned ``-ETIME`` by checking if the following line appeared in the Linux kernel message:
::
__mcctrl_os_read_write_cpu_register: ERROR sending IKC msg: -62
You can re-initialize xos_hwb related CPU state by the following command:
::
sudo systemctl restart xos_hwb
31. System calls can write the mcexec VMAs with PROT_WRITE flag not
set. This is because we never turn off PROT_WRITE of the mcexec
VMAs to circumvent the issue "set_host_vma(): do NOT read protect
Linux VMA".

View File

@ -4,69 +4,66 @@ Advanced: Enable Utility Thread offloading Interface (UTI)
UTI enables a runtime such as MPI runtime to spawn utility threads such
as MPI asynchronous progress threads to Linux cores.
Install capstone
~~~~~~~~~~~~~~~~~~~~
Install ``capstone`` and ``capstone-devel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When compute nodes don't have access to repositories
""""""""""""""""""""""""""""""""""""""""""""""""""""
When compute nodes don't have access to EPEL repository
"""""""""""""""""""""""""""""""""""""""""""""""""""""""
Install EPEL capstone-devel:
Install EPEL ``capstone`` and ``capstone-devel``:
::
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo yum install capstone-devel
sudo yum install capstone capstone-devel
When compute nodes don't have access to repositories
""""""""""""""""""""""""""""""""""""""""""""""""""""
When compute nodes don't have access to EPEL repository
"""""""""""""""""""""""""""""""""""""""""""""""""""""""
Ask the system administrator to install ``capstone-devel``. Note that it is in the EPEL repository.
A. Ask the system administrator to install ``capstone`` and ``capstone-devel``. Note that it is in the EPEL repository.
Install syscall_intercept
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
B. Download the rpm with the machine in which you are the administrator:
::
git clone https://github.com/RIKEN-SysSoft/syscall_intercept.git
mkdir build && cd build
cmake <syscall_intercept>/arch/aarch64 -DCMAKE_INSTALL_PREFIX=<syscall-intercept-install> -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DTREAT_WARNINGS_AS_ERRORS=OFF
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo yum install yum-utils
yumdownloader capstone capstone-devel
Install UTI for McKernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and then install it to your home directory of the login node:
Install:
::
.. code-block:: none
cd $HOME/$(uname -p)
rpm2cpio capstone-4.0.1-9.el8.aarch64.rpm | cpio -idv
rpm2cpio capstone-devel-4.0.1-9.el8.aarch64.rpm | cpio -idv
sed -i 's#/usr/#'"$HOME"'/'"$(uname -p)"'/usr/#' $HOME/$(uname -p)/usr/lib64/pkgconfig/capstone.pc
git clone https://github.com/RIKEN-SysSoft/uti.git
mkdir build && cd build
../uti/configure --prefix=<mckernel-install> --with-rm=mckernel
make && make install
Install McKernel
~~~~~~~~~~~~~~~~~~~~
Add ``-DENABLE_UTI=ON`` option to ``cmake``:
``cmake`` with the additional options:
::
CMAKE_PREFIX_PATH=<syscall-intercept-install> cmake -DCMAKE_INSTALL_PREFIX=${HOME}/ihk+mckernel -DENABLE_UTI=ON $HOME/src/ihk+mckernel/mckernel
cmake -DCMAKE_INSTALL_PREFIX=${HOME}/ihk+mckernel -DENABLE_UTI=ON $HOME/src/ihk+mckernel/mckernel
make -j install
Run programs
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~
Add ``--enable-uti`` option to ``mcexec``:
``mcexec`` with ``--enable-uti`` option:
::
mcexec --enable-uti <command>
Install UTI for Linux
~~~~~~~~~~~~~~~~~~~~~~~~~
(Optional) Install UTI for Linux
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You should skip this step if it's already installed as with, for example, Fujitsu Technical Computing Suite.
You can skip this step if you don't want to develop a run-time using UTI, or if it's already installed with, for example, Fujitsu Technical Computing Suite.
Install by make
"""""""""""""""
@ -89,3 +86,9 @@ Install by rpm
rm -f ~/rpmbuild/SOURCES/<version>.tar.gz
rpmbuild -ba ./scripts/uti.spec
rpm -Uvh uti-<version>-<release>-<arch>.rpm
(Optional) Install UTI for McKernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can skip this step if you don't want to develop a run-time using UTI.
Execute the commands above for installing UTI for Linux, with ``--with-rm=linux`` replaced with ``--with-rm=mckernel``.

View File

@ -4671,7 +4671,7 @@ void cmd_ipcs(void); /* ipcs.c */
/*
* main.c
*/
void main_loop(void);
//void main_loop(void);
void exec_command(void);
struct command_table_entry *get_command_table_entry(char *);
void program_usage(int);

View File

@ -94,6 +94,7 @@ struct get_cpu_set_arg {
char *req_cpu_list; // Requested by user-space
int req_cpu_list_len; // Lenght of request string
int *process_rank;
pid_t ppid;
void *cpu_set;
size_t cpu_set_size; // Size in bytes
int *target_core;
@ -112,6 +113,18 @@ typedef unsigned long __cpu_set_unit;
#define MPOL_NO_BSS 0x04
#define MPOL_SHM_PREMAP 0x08
/* should be the same as process.h */
#define PLD_PROCESS_NUMA_MASK_BITS 256
enum {
PLD_MPOL_DEFAULT,
PLD_MPOL_PREFERRED,
PLD_MPOL_BIND,
PLD_MPOL_INTERLEAVE,
PLD_MPOL_LOCAL,
PLD_MPOL_MAX, /* always last member of enum */
};
#define PLD_MAGIC 0xcafecafe44332211UL
struct program_load_desc {
@ -146,9 +159,19 @@ struct program_load_desc {
unsigned long heap_extension;
long stack_premap;
unsigned long mpol_bind_mask;
int mpol_mode;
unsigned long mpol_nodemask[PLD_PROCESS_NUMA_MASK_BITS /
(sizeof(unsigned long) * 8)];
int thp_disable;
int enable_uti;
int uti_thread_rank; /* N-th clone() spawns a thread on Linux CPU */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int straight_map;
size_t straight_map_threshold;
#ifdef ENABLE_TOFU
int enable_tofu;
#endif
int nr_processes;
int process_rank;
__cpu_set_unit cpu_set[PLD_CPU_SET_SIZE];
@ -195,6 +218,9 @@ struct syscall_response {
unsigned long req_thread_status;
long ret;
unsigned long fault_address;
#ifdef ENABLE_TOFU
void *pde_data;
#endif
};
struct syscall_ret_desc {

View File

@ -5,7 +5,7 @@ struct syscall_struct {
int number;
unsigned long args[6];
unsigned long ret;
unsigned long uti_clv; /* copy of a clv in McKernel */
unsigned long uti_info; /* reference to data in McKernel */
};
#define UTI_SZ_SYSCALL_STACK 16
@ -17,7 +17,7 @@ struct uti_desc {
int mck_tid; /* TODO: Move this out for multiple migrated-to-Linux threads */
unsigned long key; /* struct task_struct* of mcexec thread, used to search struct host_thread */
int pid, tid; /* Used as the id of tracee when issuing MCEXEC_UP_TERMINATE_THREAD */
unsigned long uti_clv; /* copy of McKernel clv */
unsigned long uti_info; /* reference to data in McKernel */
int fd; /* /dev/mcosX */
struct syscall_struct syscall_stack[UTI_SZ_SYSCALL_STACK]; /* stack of system call arguments and return values */
@ -26,6 +26,36 @@ struct uti_desc {
int start_syscall_intercept; /* Used to sync between mcexec.c and syscall_intercept.c */
};
/* Reference to McKernel variables accessed by mcctrl */
struct uti_info {
/* clv info */
unsigned long thread_va;
void *uti_futex_resp;
void *ikc2linux;
unsigned long uti_futex_resp_pa;
unsigned long ikc2linux_pa;
/* thread info */
int tid;
int cpu;
void *status;
void *spin_sleep_lock;
void *spin_sleep;
void *vm;
void *futex_q;
unsigned long status_pa;
unsigned long spin_sleep_lock_pa;
unsigned long spin_sleep_pa;
unsigned long vm_pa;
unsigned long futex_q_pa;
/* global info */
int mc_idle_halt;
void *futex_queue;
void *os; // set by mcctrl
unsigned long futex_queue_pa;
};
#endif

View File

@ -16,13 +16,15 @@ kmod(mcctrl
-I${IHK_FULL_SOURCE_DIR}/include/arch/${ARCH}
-I${PROJECT_SOURCE_DIR}/executer/include
-I${CMAKE_CURRENT_SOURCE_DIR}/arch/${ARCH}/include
-I${CMAKE_CURRENT_SOURCE_DIR}/include
-I${PROJECT_BINARY_DIR}
-I${PROJECT_SOURCE_DIR}/kernel/include
-I${PROJECT_SOURCE_DIR}/arch/${ARCH}/kernel/include
-DMCEXEC_PATH=\\"${MCEXEC_PATH}\\"
${ARCH_C_FLAGS}
SOURCES
driver.c control.c ikc.c syscall.c procfs.c binfmt_mcexec.c
sysfs.c sysfs_files.c arch/${ARCH}/archdeps.c
sysfs.c sysfs_files.c mc_plist.c futex.c arch/${ARCH}/archdeps.c arch/${ARCH}/cpu.c
EXTRA_SYMBOLS
${PROJECT_BINARY_DIR}/ihk/linux/core/Module.symvers
DEPENDS

View File

@ -2,11 +2,16 @@
#include <linux/version.h>
#include <linux/mm_types.h>
#include <linux/kallsyms.h>
#include <linux/delay.h>
#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
#include <linux/sched/task_stack.h>
#endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0) */
#include <linux/ptrace.h>
#include <linux/uaccess.h>
#include <linux/mmu_notifier.h>
#include <linux/kref.h>
#include <linux/file.h>
#include <linux/proc_fs.h>
#include <asm/vdso.h>
#include "config.h"
#include "../../mcctrl.h"
@ -27,6 +32,39 @@ void *vdso_end;
static struct vm_special_mapping (*vdso_spec)[2];
#endif
#ifdef ENABLE_TOFU
/* Tofu CQ and barrier gate release functions */
struct file_operations *mcctrl_tof_utofu_procfs_ops_cq;
int (*mcctrl_tof_utofu_release_cq)(struct inode *inode,
struct file *filp);
struct file_operations *mcctrl_tof_utofu_procfs_ops_bch;
int (*mcctrl_tof_utofu_release_bch)(struct inode *inode,
struct file *filp);
int (*mcctrl_tof_core_cq_cacheflush)(int tni, int cqid);
int (*mcctrl_tof_core_disable_bch)(int tni, int bgid);
int (*mcctrl_tof_core_unset_bg)(int tni, int bgid);
typedef void (*tof_core_signal_handler)(int, int, uint64_t, uint64_t);
void (*mcctrl_tof_core_register_signal_bg)(int tni, int bgid,
tof_core_signal_handler handler);
struct tof_utofu_bg;
struct tof_utofu_bg *mcctrl_tof_utofu_bg;
/* Tofu MMU notifier */
struct mmu_notifier_ops *mcctrl_tof_utofu_mn_ops;
struct mmu_notifier_ops __mcctrl_tof_utofu_mn_ops;
static void (*mcctrl_tof_utofu_mn_invalidate_range_end)(
struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start,
unsigned long end);
void __mcctrl_tof_utofu_mn_invalidate_range_end(
struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start,
unsigned long end);
#endif
int arch_symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 0, 0)
@ -43,6 +81,71 @@ int arch_symbols_init(void)
return -EFAULT;
#endif
#ifdef ENABLE_TOFU
mcctrl_tof_utofu_procfs_ops_cq =
(void *)kallsyms_lookup_name("tof_utofu_procfs_ops_cq");
if (WARN_ON(!mcctrl_tof_utofu_procfs_ops_cq))
return -EFAULT;
mcctrl_tof_utofu_procfs_ops_bch =
(void *)kallsyms_lookup_name("tof_utofu_procfs_ops_bch");
if (WARN_ON(!mcctrl_tof_utofu_procfs_ops_bch))
return -EFAULT;
mcctrl_tof_utofu_release_cq =
(void *)kallsyms_lookup_name("tof_utofu_release_cq");
if (WARN_ON(!mcctrl_tof_utofu_release_cq))
return -EFAULT;
mcctrl_tof_utofu_release_bch =
(void *)kallsyms_lookup_name("tof_utofu_release_bch");
if (WARN_ON(!mcctrl_tof_utofu_release_bch))
return -EFAULT;
mcctrl_tof_core_cq_cacheflush =
(void *)kallsyms_lookup_name("tof_core_cq_cacheflush");
if (WARN_ON(!mcctrl_tof_core_cq_cacheflush))
return -EFAULT;
mcctrl_tof_core_disable_bch =
(void *)kallsyms_lookup_name("tof_core_disable_bch");
if (WARN_ON(!mcctrl_tof_core_disable_bch))
return -EFAULT;
mcctrl_tof_core_unset_bg =
(void *)kallsyms_lookup_name("tof_core_unset_bg");
if (WARN_ON(!mcctrl_tof_core_unset_bg))
return -EFAULT;
mcctrl_tof_core_register_signal_bg =
(void *)kallsyms_lookup_name("tof_core_register_signal_bg");
if (WARN_ON(!mcctrl_tof_core_register_signal_bg))
return -EFAULT;
mcctrl_tof_utofu_bg =
(void *)kallsyms_lookup_name("tof_utofu_bg");
if (WARN_ON(!mcctrl_tof_utofu_bg))
return -EFAULT;
mcctrl_tof_utofu_mn_ops =
(void *)kallsyms_lookup_name("tof_utofu_mn_ops");
if (WARN_ON(!mcctrl_tof_utofu_mn_ops))
return -EFAULT;
/*
* Copy original content and update redirected function,
* CQ will be pointed to this structure after init ioctl()
*/
memcpy(&__mcctrl_tof_utofu_mn_ops, mcctrl_tof_utofu_mn_ops,
sizeof(*mcctrl_tof_utofu_mn_ops));
__mcctrl_tof_utofu_mn_ops.invalidate_range =
__mcctrl_tof_utofu_mn_invalidate_range_end;
mcctrl_tof_utofu_mn_invalidate_range_end =
(void *)kallsyms_lookup_name("tof_utofu_mn_invalidate_range_end");
if (WARN_ON(!mcctrl_tof_utofu_mn_invalidate_range_end))
return -EFAULT;
#endif
return 0;
}
@ -331,6 +434,15 @@ int translate_rva_to_rpa(ihk_os_t os, unsigned long rpt, unsigned long rva,
// page table to translation_table.
phys = ihk_device_map_memory(ihk_os_to_dev(os), rpt, PAGE_SIZE);
#ifdef ENABLE_FUGAKU_HACKS
if (!phys) {
pr_err("%s(): ERROR: VA: 0x%lx, rpt is NULL for PID %d\n",
__func__, rva, task_tgid_vnr(current));
error = -EFAULT;
goto out;
}
#endif
tbl = ihk_device_map_virtual(ihk_os_to_dev(os), phys, PAGE_SIZE, NULL, 0);
rpa = (unsigned long)tbl->tt_pa;
@ -417,3 +529,488 @@ long arch_switch_ctx(struct uti_switch_ctx_desc *desc)
out:
return rc;
}
#ifdef ENABLE_TOFU
/*
* Tofu CQ and BCH release handlers
*/
int __mcctrl_tof_utofu_release_cq(struct inode *inode, struct file *filp);
int __mcctrl_tof_utofu_release_bch(struct inode *inode, struct file *filp);
void mcctrl_tofu_hijack_release_handlers(void)
{
mcctrl_tof_utofu_procfs_ops_cq->release =
__mcctrl_tof_utofu_release_cq;
mcctrl_tof_utofu_procfs_ops_bch->release =
__mcctrl_tof_utofu_release_bch;
wmb();
}
void mcctrl_tofu_restore_release_handlers(void)
{
mcctrl_tof_utofu_procfs_ops_cq->release =
mcctrl_tof_utofu_release_cq;
mcctrl_tof_utofu_procfs_ops_bch->release =
mcctrl_tof_utofu_release_bch;
wmb();
}
/*
* Tofu cleanup functions
*/
#include <tofu/tof_uapi.h>
#include <tofu/tof_icc.h>
#include <tofu/tofu_generated-tof_core_cq.h>
#include <tofu/tofu_generated-tof_utofu_device.h>
#include <tofu/tofu_generated-tof_utofu_cq.h>
#include <tofu/tofu_generated-tof_utofu_mbpt.h>
#include <tofu/tofu_generated-tof_utofu_bg.h>
#define TOF_UTOFU_VERSION TOF_UAPI_VERSION
#define TOF_UTOFU_NUM_STAG_NTYPES 3
#define TOF_UTOFU_NUM_STAG_BITS(size) ((size) + 13)
#define TOF_UTOFU_NUM_STAG(size) ((uint64_t)1 << TOF_UTOFU_NUM_STAG_BITS(size))
#define TOF_UTOFU_STAG_TRANS_BITS 3
#define TOF_UTOFU_STAG_TRANS_SIZE ((uint64_t)1 << TOF_UTOFU_STAG_TRANS_BITS)
#define TOF_UTOFU_STAG_TRANS_TABLE_LEN(size) (TOF_UTOFU_NUM_STAG(size) * TOF_UTOFU_STAG_TRANS_SIZE)
#define TOF_UTOFU_STEERING_TABLE_LEN(size) (TOF_UTOFU_NUM_STAG(size) * TOF_ICC_STEERING_SIZE)
#define TOF_UTOFU_MB_TABLE_LEN(size) (TOF_UTOFU_NUM_STAG(size) * TOF_ICC_MB_SIZE)
#define TOF_UTOFU_STAG_MEM_LEN(size) (TOF_UTOFU_STEERING_TABLE_LEN(size) * 4)
#define TOF_UTOFU_SPECIAL_STAG 4096
#define TOF_UTOFU_ICC_COMMON_REGISTER (tof_icc_reg_pa + 0x0B000000)
#define TOF_UTOFU_REG_START tof_icc_reg_pa
#define TOF_UTOFU_REG_END (TOF_UTOFU_ICC_COMMON_REGISTER + 0x000FFFFF)
#define TOF_UTOFU_SET_SUBNET_TNI 0 /* This number is kernel TNIs number in setting subnet */
#define TOF_UTOFU_KCQ 11
#define TOF_UTOFU_LINKDOWN_PORT_MASK 0x000003FF
#define TOF_UTOFU_ALLOC_STAG_LPG 0x2
#define TOF_UTOFU_BLANK_MBVA (-1)
#define TOF_UTOFU_MRU_EMPTY (-1)
struct tof_utofu_trans_list {
int16_t prev;
int16_t next;
uint8_t pgszbits;
struct tof_utofu_mbpt *mbpt;
};
/*
* Bit 30 marks a kref as McKernel internal.
* This can be used to distinguish krefs from Linux and
* it also ensures that a non deallocated kref will not
* crash the Linux allocator.
*/
#define MCKERNEL_KREF_MARK (1U << 30)
static inline unsigned int mcctrl_kref_is_mckernel(const struct kref *kref)
{
return (refcount_read(&kref->refcount) & (MCKERNEL_KREF_MARK));
}
/**
* kref_put - decrement refcount for object.
* @kref: object.
* @release: pointer to the function that will clean up the object when the
* last reference to the object is released.
* This pointer is required, and it is not acceptable to pass kfree
* in as this function. If the caller does pass kfree to this
* function, you will be publicly mocked mercilessly by the kref
* maintainer, and anyone else who happens to notice it. You have
* been warned.
*
* Decrement the refcount, and if 0, call release().
* Return 1 if the object was removed, otherwise return 0. Beware, if this
* function returns 0, you still can not count on the kref from remaining in
* memory. Only use the return value if you want to see if the kref is now
* gone, not present.
*/
static inline int mcctrl_kref_put(struct kref *kref, void (*release)(struct kref *kref))
{
if (atomic_dec_return(&kref->refcount.refs) == MCKERNEL_KREF_MARK) {
release(kref);
return 1;
}
return 0;
}
static int tof_utofu_cq_cacheflush(struct tof_utofu_cq *ucq){
return mcctrl_tof_core_cq_cacheflush(ucq->tni, ucq->cqid);
}
static void tof_utofu_trans_mru_delete(struct tof_utofu_cq *ucq, int stag){
struct tof_utofu_trans_list *mru = ucq->trans.mru;
int prev = mru[stag].prev;
int next = mru[stag].next;
if(prev == TOF_UTOFU_MRU_EMPTY || next == TOF_UTOFU_MRU_EMPTY){ /* already deleted */
return;
}
if(prev == stag){ /* a single entry */
ucq->trans.mruhead = TOF_UTOFU_MRU_EMPTY;
}else{
if(ucq->trans.mruhead == stag){
ucq->trans.mruhead = next;
}
mru[prev].next = next;
mru[next].prev = prev;
}
mru[stag].prev = TOF_UTOFU_MRU_EMPTY;
mru[stag].next = TOF_UTOFU_MRU_EMPTY;
}
static void tof_utofu_trans_disable(struct tof_utofu_cq *ucq, int stag){
struct tof_trans_table *table = ucq->trans.table;
atomic64_set((atomic64_t *)&table[stag], 0);
tof_utofu_trans_mru_delete(ucq, stag);
}
/* McKernel scatterlist is simply a contiguous buffer. */
struct scatterlist {
void *pages;
unsigned int offset;
unsigned int length;
unsigned long dma_address;
unsigned int dma_length;
};
static uintptr_t tof_utofu_disable_mbpt(struct tof_utofu_mbpt *mbpt, int idx){
int i0, i1;
struct tof_icc_mbpt_entry *ent;
uintptr_t ipa;
i0 = idx / (PAGE_SIZE / TOF_ICC_MBPT_SIZE);
i1 = idx - i0 * (PAGE_SIZE / TOF_ICC_MBPT_SIZE);
//ent = sg_virt(&mbpt->sg[i0]);
ent = mbpt->sg->pages + (i0 * PAGE_SIZE);
if(!ent[i1].enable){
return 0;
}
ent[i1].enable = 0;
ipa = (uint64_t)ent[i1].ipa << 12;
ent[i1].ipa = 0;
return ipa;
}
static void tof_utofu_free_mbpt(struct tof_utofu_cq *ucq, struct tof_utofu_mbpt *mbpt){
int i;
for(i = 0; i < mbpt->nsgents * PAGE_SIZE / sizeof(struct tof_icc_mbpt_entry); i++){
uintptr_t iova;
iova = tof_utofu_disable_mbpt(mbpt, i);
#if 0
/*
* NOTE: Not performed for McKernel managed stags.
*/
if(iova){
tof_smmu_release_ipa_cq(ucq->tni, ucq->cqid, iova, mbpt->pgsz);
}
#endif
}
#if 0
/*
* NOTE: Everyhing below has been allocated in McKernel, do nothing here!!
* This leaks memory in McKernel, but it doesn't crash Linux.
* Memory will be released once McKernel is unbooted.
*/
tof_smmu_iova_unmap_sg(ucq->tni, ucq->cqid, mbpt->sg, mbpt->nsgents);
for(i = 0; i < mbpt->nsgents; i++){
tof_util_free_pages((unsigned long)sg_virt(&mbpt->sg[i]), 0);
}
tof_util_free(mbpt->sg);
tof_util_free(mbpt);
#endif
}
static void tof_utofu_mbpt_release(struct kref *kref)
{
struct tof_utofu_mbpt *mbpt = container_of(kref, struct tof_utofu_mbpt, kref);
//atomic64_inc((atomic64_t *)&kref_free_count);
tof_utofu_free_mbpt(mbpt->ucq, mbpt);
}
static int tof_utofu_free_stag(struct tof_utofu_cq *ucq, int stag){
if(stag < 0 || stag >= TOF_UTOFU_NUM_STAG(ucq->num_stag) ||
ucq->steering == NULL){
return -EINVAL;
}
if(!(ucq->steering[stag].enable)){
return -ENOENT;
}
if (!mcctrl_kref_is_mckernel(&ucq->trans.mru[stag].mbpt->kref)) {
printk("%s: stag: %d is not an McKernel kref\n", __func__, stag);
return -EINVAL;
}
ucq->steering[stag].enable = 0;
ucq->mb[stag].enable = 0;
tof_utofu_trans_disable(ucq, stag);
dma_wmb();
tof_utofu_cq_cacheflush(ucq);
mcctrl_kref_put(&ucq->trans.mru[stag].mbpt->kref, tof_utofu_mbpt_release);
ucq->trans.mru[stag].mbpt = NULL;
dprintk("%s: TNI: %d, CQ: %d: stag %d deallocated\n",
__func__, ucq->tni, ucq->cqid, stag);
return 0;
}
void mcctrl_mckernel_tof_utofu_release_cq(void *pde_data)
{
struct tof_utofu_cq *ucq;
struct tof_utofu_device *dev;
unsigned long irqflags;
int stag;
dev = (struct tof_utofu_device *)pde_data;
ucq = container_of(dev, struct tof_utofu_cq, common);
if (!ucq->common.enabled) {
return;
}
dprintk("%s: UCQ (PDE: 0x%lx) TNI %d CQ %d\n",
__func__, (unsigned long)pde_data, ucq->tni, ucq->cqid);
/*
* Only release stags here, actual cleanup is still performed
* in the Tofu driver
*/
for (stag = 0; stag < TOF_UTOFU_NUM_STAG(ucq->num_stag); stag++) {
spin_lock_irqsave(&ucq->trans.mru_lock, irqflags);
tof_utofu_free_stag(ucq, stag);
spin_unlock_irqrestore(&ucq->trans.mru_lock, irqflags);
}
}
static inline void tof_core_unregister_signal_bg(int tni, int bgid)
{
return mcctrl_tof_core_register_signal_bg(tni, bgid, NULL);
}
static struct tof_utofu_bg *tof_utofu_bg_get(int tni, int bgid){
if((unsigned int)tni >= TOF_ICC_NTNIS ||
(unsigned int)bgid >= TOF_ICC_NBGS){
return NULL;
}
//return &tof_utofu_bg[tni][bgid];
// Convert [][] notion into pointer aritmethic
return mcctrl_tof_utofu_bg + (tni * TOF_ICC_NBGS) + bgid;
}
static int __tof_utofu_unset_bg(struct tof_utofu_bg *ubg){
if(ubg->common.enabled){
mcctrl_tof_core_unset_bg(ubg->tni, ubg->bgid);
ubg->common.enabled = false;
tof_core_unregister_signal_bg(ubg->tni, ubg->bgid);
}
return 0;
}
static int mcctrl_tof_utofu_disable_bch(struct tof_utofu_bg *ubg){
int ret;
int tni, bgid;
if(!ubg->bch.enabled){
return -EPERM;
}
ret = mcctrl_tof_core_disable_bch(ubg->tni, ubg->bgid);
if(ret < 0){
return ret;
}
for(tni = 0; tni < TOF_ICC_NTNIS; tni++){
uint64_t mask = ubg->bch.bgmask[tni];
for(bgid = 0; bgid < TOF_ICC_NBGS; bgid++){
if((mask >> bgid) & 1){
ret = __tof_utofu_unset_bg(tof_utofu_bg_get(tni, bgid));
if(ret < 0){
/* OK? */
//BUG();
return ret;
}
}
}
}
/* Not performed in McKernel handler */
//tof_smmu_release_ipa_bg(ubg->tni, ubg->bgid, ubg->bch.iova, TOF_ICC_BCH_DMA_ALIGN);
//put_page(ubg->bch.page);
ubg->bch.enabled = false;
smp_mb();
dprintk("%s: tni=%d bgid=%d\n", __func__, ubg->tni, ubg->bgid);
return 0;
}
void mcctrl_mckernel_tof_utofu_release_bch(void *pde_data)
{
struct tof_utofu_bg *ubg;
struct tof_utofu_device *dev = (struct tof_utofu_device *)pde_data;
ubg = container_of(dev, struct tof_utofu_bg, common);
//tof_log_if("tni=%d bgid=%d\n", ubg->tni, ubg->bgid);
dprintk("%s: tni=%d bgid=%d\n", __func__, ubg->tni, ubg->bgid);
mcctrl_tof_utofu_disable_bch(ubg);
}
void mcctrl_tofu_cleanup_file(struct mcctrl_file_to_pidfd *f2pfd)
{
/* Figure out whether CQ or BCH */
if (strstr(f2pfd->tofu_dev_path, "cq")) {
dprintk("%s: PID: %d, fd: %d (%s) -> release CQ\n",
__func__, f2pfd->pid, f2pfd->fd, f2pfd->tofu_dev_path);
mcctrl_mckernel_tof_utofu_release_cq(f2pfd->pde_data);
}
else if (strstr(f2pfd->tofu_dev_path, "bch")) {
dprintk("%s: PID: %d, fd: %d (%s) -> release BCH\n",
__func__, f2pfd->pid, f2pfd->fd, f2pfd->tofu_dev_path);
mcctrl_mckernel_tof_utofu_release_bch(f2pfd->pde_data);
}
}
int __mcctrl_tof_utofu_release_handler(struct inode *inode, struct file *filp,
int (*__release_func)(struct inode *inode, struct file *filp))
{
struct mcctrl_usrdata *usrdata;
struct mcctrl_file_to_pidfd *f2pfd;
struct mcctrl_per_proc_data *ppd;
struct ikc_scd_packet isp;
int ret;
dprintk("%s: current PID: %d, comm: %s \n",
__func__, task_tgid_vnr(current), current->comm);
f2pfd = mcctrl_file_to_pidfd_hash_lookup(filp, current->group_leader);
if (!f2pfd) {
goto out;
}
dprintk("%s: current PID: %d, PID: %d, fd: %d ...\n",
__func__, task_tgid_vnr(current), f2pfd->pid, f2pfd->fd);
usrdata = ihk_host_os_get_usrdata(f2pfd->os);
/* Look up per-process structure */
ppd = mcctrl_get_per_proc_data(usrdata, f2pfd->pid);
if (!ppd) {
pr_err("%s: PID: %d, fd: %d no PPD\n",
__func__, f2pfd->pid, f2pfd->fd);
goto out;
}
dprintk("%s: PID: %d, fd: %d PPD OK\n",
__func__, f2pfd->pid, f2pfd->fd);
/*
* We are in release() due to the process being killed,
* or because the application didn't close the file properly.
* Ask McKernel to clean up this fd.
*/
isp.msg = SCD_MSG_CLEANUP_FD;
isp.pid = f2pfd->pid;
isp.arg = f2pfd->fd;
ret = mcctrl_ikc_send_wait(f2pfd->os, ppd->ikc_target_cpu,
&isp, -20, NULL, NULL, 0);
if (ret != 0) {
pr_err("%s: WARNING: IKC req for PID: %d, fd: %d failed\n",
__func__, f2pfd->pid, f2pfd->fd);
}
/* Disable any remaining STAGs/BCH in mcctrl anyway */
mcctrl_tofu_cleanup_file(f2pfd);
mcctrl_file_to_pidfd_hash_remove(filp, f2pfd->os,
current->group_leader, f2pfd->fd);
mcctrl_put_per_proc_data(ppd);
out:
dprintk("%s: current PID: %d, comm: %s -> calling release\n",
__func__, task_tgid_vnr(current), current->comm);
return __release_func(inode, filp);
}
int __mcctrl_tof_utofu_release_cq(struct inode *inode, struct file *filp)
{
return __mcctrl_tof_utofu_release_handler(inode, filp,
mcctrl_tof_utofu_release_cq);
}
int __mcctrl_tof_utofu_release_bch(struct inode *inode, struct file *filp)
{
return __mcctrl_tof_utofu_release_handler(inode, filp,
mcctrl_tof_utofu_release_bch);
}
/*
* Tofu MMU notifier functions
*/
void __mcctrl_tof_utofu_mn_invalidate_range_end(
struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start,
unsigned long end)
{
char tmpname[TASK_COMM_LEN];
/* Not an offloaded syscall? */
if (current->mm != mm) {
goto out_call_real;
}
/* Not mcexec? Just in case.. */
get_task_comm(tmpname, current);
if (strncmp(tmpname, "mcexec", TASK_COMM_LEN)) {
goto out_call_real;
}
/* This is only called for Tofu enabled mcexec processes */
dprintk("%s: skipping tof_utofu_mn_invalidate_range_end() "
"for mcexec PID %d\n",
__func__, task_tgid_vnr(current));
return;
out_call_real:
return mcctrl_tof_utofu_mn_invalidate_range_end(mn, mm, start, end);
}
int __mcctrl_tof_utofu_ioctl_init_cq(struct tof_utofu_device *dev,
unsigned long arg) {
struct tof_utofu_cq *ucq;
ucq = container_of(dev, struct tof_utofu_cq, common);
if (!ucq->common.enabled) {
return -EINVAL;
}
dprintk("%s: Tofu TNI %d CQ %d (PDE: 0x%lx) MMU notifier to be hijacked\n",
__func__, ucq->tni, ucq->cqid, (unsigned long)dev);
/* Override the MMU notifier */
ucq->mn.ops = &__mcctrl_tof_utofu_mn_ops;
return 0;
}
long __mcctrl_tof_utofu_unlocked_ioctl_cq(void *pde_data, unsigned int cmd,
unsigned long arg) {
struct tof_utofu_device *dev = (struct tof_utofu_device *)pde_data;
int ret;
switch (cmd) {
/* We only care about init, where we hijack the MMU notifier */
case TOF_IOCTL_INIT_CQ:
ret = __mcctrl_tof_utofu_ioctl_init_cq(dev, arg);
break;
default:
ret = 0;
}
return ret;
}
#endif

View File

@ -0,0 +1,96 @@
/* cpu.c COPYRIGHT FUJITSU LIMITED 2015-2019 */
#include <cpu.h>
/* we not have "pause" instruction, instead "yield" instruction */
void cpu_pause(void)
{
asm volatile("yield" ::: "memory");
}
#if defined(CONFIG_HAS_NMI)
#include <arm-gic-v3.h>
/* restore interrupt (ICC_PMR_EL1 <= flags) */
void cpu_restore_interrupt(unsigned long flags)
{
asm volatile(
"msr_s " __stringify(ICC_PMR_EL1) ",%0"
:
: "r" (flags)
: "memory");
}
/* save ICC_PMR_EL1 & disable interrupt (ICC_PMR_EL1 <= ICC_PMR_EL1_MASKED) */
unsigned long cpu_disable_interrupt_save(void)
{
unsigned long flags;
unsigned long masked = ICC_PMR_EL1_MASKED;
asm volatile(
"mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
"msr_s " __stringify(ICC_PMR_EL1) ",%1"
: "=&r" (flags)
: "r" (masked)
: "memory");
return flags;
}
/* save ICC_PMR_EL1 & enable interrupt (ICC_PMR_EL1 <= ICC_PMR_EL1_UNMASKED) */
unsigned long cpu_enable_interrupt_save(void)
{
unsigned long flags;
unsigned long masked = ICC_PMR_EL1_UNMASKED;
asm volatile(
"mrs_s %0, " __stringify(ICC_PMR_EL1) "\n"
"msr_s " __stringify(ICC_PMR_EL1) ",%1"
: "=&r" (flags)
: "r" (masked)
: "memory");
return flags;
}
#else /* defined(CONFIG_HAS_NMI) */
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_local_irq_restore */
/* restore interrupt (PSTATE.DAIF = flags restore) */
void cpu_restore_interrupt(unsigned long flags)
{
asm volatile(
"msr daif, %0 // arch_local_irq_restore"
:
: "r" (flags)
: "memory");
}
/* @ref.impl arch/arm64/include/asm/irqflags.h::arch_local_irq_save */
/* save PSTATE.DAIF & disable interrupt (PSTATE.DAIF I bit set) */
unsigned long cpu_disable_interrupt_save(void)
{
unsigned long flags;
asm volatile(
"mrs %0, daif // arch_local_irq_save\n"
"msr daifset, #2"
: "=r" (flags)
:
: "memory");
return flags;
}
/* save PSTATE.DAIF & enable interrupt (PSTATE.DAIF I bit set) */
unsigned long cpu_enable_interrupt_save(void)
{
unsigned long flags;
asm volatile(
"mrs %0, daif // arch_local_irq_save\n"
"msr daifclr, #2"
: "=r" (flags)
:
: "memory");
return flags;
}
#endif /* defined(CONFIG_HAS_NMI) */

View File

@ -0,0 +1,142 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
/* arch-lock.h COPYRIGHT FUJITSU LIMITED 2015-2018 */
#ifndef __HEADER_ARM64_COMMON_ARCH_LOCK_H
#define __HEADER_ARM64_COMMON_ARCH_LOCK_H
#include <linux/preempt.h>
#include <cpu.h>
#define ihk_mc_spinlock_lock __ihk_mc_spinlock_lock
#define ihk_mc_spinlock_unlock __ihk_mc_spinlock_unlock
#define ihk_mc_spinlock_lock_noirq __ihk_mc_spinlock_lock_noirq
#define ihk_mc_spinlock_unlock_noirq __ihk_mc_spinlock_unlock_noirq
/* @ref.impl arch/arm64/include/asm/spinlock_types.h::TICKET_SHIFT */
#define TICKET_SHIFT 16
/* @ref.impl ./arch/arm64/include/asm/lse.h::ARM64_LSE_ATOMIC_INSN */
/* else defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS) */
#define _ARM64_LSE_ATOMIC_INSN(llsc, lse) llsc
/* @ref.impl arch/arm64/include/asm/spinlock_types.h::arch_spinlock_t */
typedef struct {
#ifdef __AARCH64EB__
uint16_t next;
uint16_t owner;
#else /* __AARCH64EB__ */
uint16_t owner;
uint16_t next;
#endif /* __AARCH64EB__ */
} __attribute__((aligned(4))) _ihk_spinlock_t;
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_lock */
/* spinlock lock */
static inline void
__ihk_mc_spinlock_lock_noirq(_ihk_spinlock_t *lock)
{
unsigned int tmp;
_ihk_spinlock_t lockval, newval;
preempt_disable();
asm volatile(
/* Atomically increment the next ticket. */
_ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
" prfm pstl1strm, %3\n"
"1: ldaxr %w0, %3\n"
" add %w1, %w0, %w5\n"
" stxr %w2, %w1, %3\n"
" cbnz %w2, 1b\n",
/* LSE atomics */
" mov %w2, %w5\n"
" ldadda %w2, %w0, %3\n"
__nops(3)
)
/* Did we get the lock? */
" eor %w1, %w0, %w0, ror #16\n"
" cbz %w1, 3f\n"
/*
* No: spin on the owner. Send a local event to avoid missing an
* unlock before the exclusive load.
*/
" sevl\n"
"2: wfe\n"
" ldaxrh %w2, %4\n"
" eor %w1, %w2, %w0, lsr #16\n"
" cbnz %w1, 2b\n"
/* We got the lock. Critical section starts here. */
"3:"
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
: "Q" (lock->owner), "I" (1 << TICKET_SHIFT)
: "memory");
}
/* spinlock lock & interrupt disable & PSTATE.DAIF save */
static inline unsigned long
__ihk_mc_spinlock_lock(_ihk_spinlock_t *lock)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
__ihk_mc_spinlock_lock_noirq(lock);
return flags;
}
/* @ref.impl arch/arm64/include/asm/spinlock.h::arch_spin_unlock */
/* spinlock unlock */
static inline void
__ihk_mc_spinlock_unlock_noirq(_ihk_spinlock_t *lock)
{
unsigned long tmp;
asm volatile(_ARM64_LSE_ATOMIC_INSN(
/* LL/SC */
" ldrh %w1, %0\n"
" add %w1, %w1, #1\n"
" stlrh %w1, %0",
/* LSE atomics */
" mov %w1, #1\n"
" staddlh %w1, %0\n"
__nops(1))
: "=Q" (lock->owner), "=&r" (tmp)
:
: "memory");
preempt_enable();
}
static inline void
__ihk_mc_spinlock_unlock(_ihk_spinlock_t *lock, unsigned long flags)
{
__ihk_mc_spinlock_unlock_noirq(lock);
cpu_restore_interrupt(flags);
}
typedef struct mcs_rwlock_lock {
_ihk_spinlock_t slock;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_lock_t;
#else
} mcs_rwlock_lock_t;
#endif
static inline void
mcs_rwlock_writer_lock_noirq(struct mcs_rwlock_lock *lock)
{
ihk_mc_spinlock_lock_noirq(&lock->slock);
}
static inline void
mcs_rwlock_writer_unlock_noirq(struct mcs_rwlock_lock *lock)
{
ihk_mc_spinlock_unlock_noirq(&lock->slock);
}
#endif /* !__HEADER_ARM64_COMMON_ARCH_LOCK_H */

View File

@ -38,4 +38,26 @@ static const unsigned long arch_rus_vm_flags = VM_RESERVED | VM_MIXEDMAP | VM_EX
#else
static const unsigned long arch_rus_vm_flags = VM_DONTDUMP | VM_MIXEDMAP | VM_EXEC;
#endif
#define _xchg(ptr, x) \
({ \
__typeof__(*(ptr)) __ret; \
__ret = (__typeof__(*(ptr))) \
__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))); \
__ret; \
})
#define xchg4(ptr, x) _xchg(ptr, x)
#define xchg8(ptr, x) _xchg(ptr, x)
enum arm64_pf_error_code {
PF_PROT = 1 << 0,
PF_WRITE = 1 << 1,
PF_USER = 1 << 2,
PF_RSVD = 1 << 3,
PF_INSTR = 1 << 4,
PF_PATCH = 1 << 29,
PF_POPULATE = 1 << 30,
};
#endif /* __HEADER_MCCTRL_ARM64_ARCHDEPS_H */

View File

@ -0,0 +1,41 @@
#!/bin/bash
SCRIPT="`readlink -f ${BASH_SOURCE[0]:-}`"
SCRIPT_DIR=$(dirname ${SCRIPT})
CURRENT_DIR=`pwd`
cd ${SCRIPT_DIR}
DWARF_TOOL=${SCRIPT_DIR}/../../../../../../../tools/dwarf-extract-struct/dwarf-extract-struct
if [ ! -x ${DWARF_TOOL} ]; then
echo "error: couldn't find DWARF extractor executable (${DWARF_TOOL}), have you compiled it?"
cd -
exit 1
fi
echo "Looking for Tofu driver debug symbols..."
if [ "`find /lib/modules/ -name "tof_module.tar.gz" | xargs -r ls -t | head -n 1 | wc -l`" == "0" ]; then
echo "error: couldn't find Tofu modules with debug symbols"
cd -
exit 1
fi
MODULE_TAR_GZ=`find /lib/modules/ -name "tof_module.tar.gz" | xargs ls -t | head -n 1`
echo "Using Tofu driver debug symbols: ${MODULE_TAR_GZ}"
KMODULE=tof_utofu.ko
if ! tar zxvf ${MODULE_TAR_GZ} ${KMODULE} 2>&1 > /dev/null; then
echo "error: uncompressing kernel module with debug symbols"
cd -
exit 1
fi
${DWARF_TOOL} ${KMODULE} tof_utofu_device enabled subnet gpid > tofu_generated-tof_utofu_device.h
${DWARF_TOOL} ${KMODULE} tof_utofu_cq common tni cqid mn trans steering mb num_stag | sed "s/struct FILL_IN_MANUALLY trans;/#include \"tof_utofu_cq_trans.h\"/g" > tofu_generated-tof_utofu_cq.h
${DWARF_TOOL} ${KMODULE} tof_utofu_mbpt ucq iova sg nsgents mbptstart pgsz kref > tofu_generated-tof_utofu_mbpt.h
${DWARF_TOOL} ${KMODULE} tof_utofu_bg common tni bgid bch | sed "s/struct FILL_IN_MANUALLY bch;/#include \"tof_utofu_bg_bch.h\"/g" > tofu_generated-tof_utofu_bg.h
rm ${KMODULE}
#cat tofu_generated*.h
cd - > /dev/null

View File

@ -0,0 +1,831 @@
#ifndef _TOF_ICC_H_
#define _TOF_ICC_H_
#include <linux/types.h>
#ifdef __KERNEL__
#include <linux/bitops.h>
#else
#include <stdint.h>
typedef uint64_t phys_addr_t;
#endif
/* constants related to the Tofu Interconnect D */
#define TOF_ICC_NTNIS 6
#define TOF_ICC_NCQS 12
#define TOF_ICC_NBGS 48
#define TOF_ICC_NBCHS 16
#define TOF_ICC_NPORTS 10
#define TOF_ICC_NVMSIDS 16
#define TOF_ICC_RH_LEN 8
#define TOF_ICC_ECRC_LEN 4
#define TOF_ICC_FRAME_ALIGN 32
#define TOF_ICC_TLP_LEN(len) (((len) + 1) * TOF_ICC_FRAME_ALIGN)
#define TOF_ICC_TLP_PAYLOAD_MAX (TOF_ICC_TLP_LEN(61) - TOF_ICC_ECRC_LEN)
#define TOF_ICC_FRAME_LEN(len) (TOF_ICC_RH_LEN + TOF_ICC_TLP_LEN(len))
#define TOF_ICC_FRAME_LEN_MIN TOF_ICC_FRAME_LEN(2)
#define TOF_ICC_FRAME_LEN_MAX TOF_ICC_FRAME_LEN(61)
#define TOF_ICC_FRAME_BUF_SIZE_BITS 11
#define TOF_ICC_FRAME_BUF_SIZE (1 << TOF_ICC_FRAME_BUF_SIZE_BITS)
#define TOF_ICC_FRAME_BUF_ALIGN_BITS 8
#define TOF_ICC_FRAME_BUF_ALIGN (1 << TOF_ICC_FRAME_BUF_ALIGN_BITS)
#define TOF_ICC_PB_SIZE_BITS 11
#define TOF_ICC_PB_SIZE (1 << TOF_ICC_PB_SIZE_BITS)
#define TOF_ICC_PB_ALIGN_BITS 11
#define TOF_ICC_PB_ALIGN (1 << TOF_ICC_PB_ALIGN_BITS)
#define TOF_ICC_ST_ALIGN_BITS 8
#define TOF_ICC_ST_ALIGN (1 << TOF_ICC_ST_ALIGN_BITS)
#define TOF_ICC_MBT_ALIGN_BITS 8
#define TOF_ICC_MBT_ALIGN (1 << TOF_ICC_MBT_ALIGN_BITS)
#define TOF_ICC_MBPT_ALIGN_BITS 8
#define TOF_ICC_MBPT_ALIGN (1 << TOF_ICC_MBPT_ALIGN_BITS)
#define TOF_ICC_BG_BSEQ_SIZE_BITS 24
#define TOF_ICC_BG_BSEQ_SIZE (1 << TOF_ICC_BG_BSEQ_SIZE_BITS)
#define TOF_ICC_BCH_DMA_ALIGN_BITS 8
#define TOF_ICC_BCH_DMA_ALIGN (1 << TOF_ICC_BCH_DMA_ALIGN_BITS)
/* this is a CPU-specific constant, but referred in the ICC spec. */
#define TOF_ICC_CACHE_LINE_SIZE_BITS 8
#define TOF_ICC_CACHE_LINE_SIZE (1 << TOF_ICC_CACHE_LINE_SIZE_BITS)
#define TOF_ICC_TOQ_DESC_SIZE_BITS 5
#define TOF_ICC_TOQ_DESC_SIZE (1 << TOF_ICC_TOQ_DESC_SIZE_BITS)
#define TOF_ICC_TCQ_DESC_SIZE_BITS 3
#define TOF_ICC_TCQ_DESC_SIZE (1 << TOF_ICC_TCQ_DESC_SIZE_BITS)
#define TOF_ICC_TCQ_NLINE_BITS (TOF_ICC_CACHE_LINE_SIZE_BITS - TOF_ICC_TCQ_DESC_SIZE_BITS)
#define TOF_ICC_MRQ_DESC_SIZE_BITS 5
#define TOF_ICC_MRQ_DESC_SIZE (1 << TOF_ICC_MRQ_DESC_SIZE_BITS)
#define TOF_ICC_PBQ_DESC_SIZE_BITS 3
#define TOF_ICC_PBQ_DESC_SIZE (1 << TOF_ICC_PBQ_DESC_SIZE_BITS)
#define TOF_ICC_PRQ_DESC_SIZE_BITS 3
#define TOF_ICC_PRQ_DESC_SIZE (1 << TOF_ICC_PRQ_DESC_SIZE_BITS)
#define TOF_ICC_PRQ_NLINE_BITS (TOF_ICC_CACHE_LINE_SIZE_BITS - TOF_ICC_PBQ_DESC_SIZE_BITS)
#define TOF_ICC_TOQ_SIZE_NTYPES 6
#define TOF_ICC_TOQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_TOQ_SIZE(size) (1 << TOF_ICC_TOQ_SIZE_BITS(size))
#define TOF_ICC_TOQ_LEN(size) (TOF_ICC_TOQ_SIZE(size) * TOF_ICC_TOQ_DESC_SIZE)
#define TOF_ICC_TCQ_LEN(size) (TOF_ICC_TOQ_SIZE(size) * TOF_ICC_TCQ_DESC_SIZE)
#define TOF_ICC_MRQ_SIZE_NTYPES 6
#define TOF_ICC_MRQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_MRQ_SIZE(size) (1 << TOF_ICC_MRQ_SIZE_BITS(size))
#define TOF_ICC_MRQ_LEN(size) (TOF_ICC_MRQ_SIZE(size) * TOF_ICC_MRQ_DESC_SIZE)
#define TOF_ICC_PBQ_SIZE_NTYPES 6
#define TOF_ICC_PBQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_PBQ_SIZE(size) (1 << TOF_ICC_PBQ_SIZE_BITS(size))
#define TOF_ICC_PBQ_LEN(size) (TOF_ICC_PBQ_SIZE(size) * TOF_ICC_PBQ_DESC_SIZE)
#define TOF_ICC_PRQ_SIZE_NTYPES 6
#define TOF_ICC_PRQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_PRQ_SIZE(size) (1 << TOF_ICC_PRQ_SIZE_BITS(size))
#define TOF_ICC_PRQ_LEN(size) (TOF_ICC_PRQ_SIZE(size) * TOF_ICC_PRQ_DESC_SIZE)
#define TOF_ICC_STEERING_TABLE_ALIGN_BITS 8
#define TOF_ICC_STEERING_TABLE_ALIGN (1 << TOF_ICC_STEERING_TABLE_ALIGN_BITS)
#define TOF_ICC_STEERING_SIZE_BITS 4
#define TOF_ICC_STEERING_SIZE (1 << TOF_ICC_STEERING_SIZE_BITS)
#define TOF_ICC_MB_TABLE_ALIGN_BITS 8
#define TOF_ICC_MB_TABLE_ALIGN (1 << TOF_ICC_MB_TABLE_ALIGN_BITS)
#define TOF_ICC_MB_SIZE_BITS 4
#define TOF_ICC_MB_SIZE (1 << TOF_ICC_MB_SIZE_BITS)
#define TOF_ICC_MB_PS_ENCODE(bits) ((bits) % 9 == 3 ? (bits) / 9 - 1 : (bits) / 13 + 3)
#define TOF_ICC_MBPT_ALIGN_BITS 8
#define TOF_ICC_MBPT_ALIGN (1 << TOF_ICC_MBPT_ALIGN_BITS)
#define TOF_ICC_MBPT_SIZE_BITS 3
#define TOF_ICC_MBPT_SIZE (1 << TOF_ICC_MBPT_SIZE_BITS)
#define TOF_ICC_X_BITS 5
#define TOF_ICC_Y_BITS 5
#define TOF_ICC_Z_BITS 5
#define TOF_ICC_A_BITS 1
#define TOF_ICC_B_BITS 2
#define TOF_ICC_C_BITS 1
#define TOF_ICC_MAX_X_SIZE (1 << TOF_ICC_X_BITS)
#define TOF_ICC_MAX_Y_SIZE (1 << TOF_ICC_Y_BITS)
#define TOF_ICC_MAX_Z_SIZE (1 << TOF_ICC_Z_BITS)
#define TOF_ICC_A_SIZE 2
#define TOF_ICC_B_SIZE 3
#define TOF_ICC_C_SIZE 2
#define TOF_ICC_X_MASK ((1 << TOF_ICC_X_BITS) - 1)
#define TOF_ICC_Y_MASK ((1 << TOF_ICC_Y_BITS) - 1)
#define TOF_ICC_Z_MASK ((1 << TOF_ICC_Z_BITS) - 1)
#define TOF_ICC_A_MASK ((1 << TOF_ICC_A_BITS) - 1)
#define TOF_ICC_B_MASK ((1 << TOF_ICC_B_BITS) - 1)
#define TOF_ICC_C_MASK ((1 << TOF_ICC_C_BITS) - 1)
#define TOF_ICC_ABC_SIZE (TOF_ICC_A_SIZE * TOF_ICC_B_SIZE * TOF_ICC_C_SIZE)
static inline int tof_icc_get_framelen(int len){
len = TOF_ICC_RH_LEN + round_up(len + TOF_ICC_ECRC_LEN, TOF_ICC_FRAME_ALIGN);
if(len < TOF_ICC_FRAME_LEN_MIN){
len = TOF_ICC_FRAME_LEN_MIN;
}
return len;
}
/** Descriptors **/
/** commands and rcodes **/
enum {
TOF_ICC_TOQ_NOP,
TOF_ICC_TOQ_PUT,
TOF_ICC_TOQ_WRITE_PIGGYBACK_BUFFER,
TOF_ICC_TOQ_PUT_PIGGYBACK,
TOF_ICC_TOQ_GET,
TOF_ICC_TOQ_GETL,
TOF_ICC_TOQ_ATOMIC_READ_MODIFY_WRITE = 0xe,
TOF_ICC_TOQ_TRANSMIT_RAW_PACKET1 = 0x10,
TOF_ICC_TOQ_TRANSMIT_RAW_PACKET2,
TOF_ICC_TOQ_TRANSMIT_SYSTEM_PACKET1,
TOF_ICC_TOQ_TRANSMIT_SYSTEM_PACKET2,
TOF_ICC_TOQ_NCOMMANDS,
};
enum {
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_HALFWAY_NOTICE = 0x1,
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_NOTICE,
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_REMOTE_ERROR,
TOF_ICC_MRQ_PUT_HALFWAY_NOTICE,
TOF_ICC_MRQ_PUT_LAST_HALFWAY_NOTICE,
TOF_ICC_MRQ_GET_HALFWAY_NOTICE,
TOF_ICC_MRQ_GET_LAST_HALFWAY_NOTICE,
TOF_ICC_MRQ_PUT_NOTICE,
TOF_ICC_MRQ_PUT_LAST_NOTICE,
TOF_ICC_MRQ_GET_NOTICE,
TOF_ICC_MRQ_GET_LAST_NOTICE,
TOF_ICC_MRQ_PUT_REMOTE_ERROR,
TOF_ICC_MRQ_PUT_LAST_REMOTE_ERROR,
TOF_ICC_MRQ_GET_REMOTE_ERROR,
TOF_ICC_MRQ_GET_LAST_REMOTE_ERROR,
TOF_ICC_MRQ_NCOMMANDS,
};
enum {
TOF_ICC_PRQ_UNKNOWN_TLP,
TOF_ICC_PRQ_SYSTEM_TLP,
TOF_ICC_PRQ_ADDRESS_RANGE_EXCEPTION = 0x6,
TOF_ICC_PRQ_CQ_EXCEPTION = 0x8,
TOF_ICC_PRQ_ILLEGAL_TLP_FLAGS,
TOF_ICC_PRQ_ILLEGAL_TLP_LENGTH,
TOF_ICC_PRQ_CQ_ERROR = 0xc,
};
/** structures **/
struct tof_icc_steering_entry {
uint64_t res1:6;
uint64_t readonly:1;
uint64_t enable:1;
uint64_t mbva:32;
uint64_t res2:8;
uint64_t mbid:16;
uint64_t length; /* for optimization */
};
struct tof_icc_mb_entry {
uint64_t ps:3;
uint64_t res1:4;
uint64_t enable:1;
uint64_t ipa:32;
uint64_t res2:24;
uint64_t npage; /* for optimization */
};
struct tof_icc_mbpt_entry {
uint64_t res1:7;
uint64_t enable:1;
uint64_t res2:4;
uint64_t ipa:28;
uint64_t res3:24;
};
struct tof_icc_cq_stag_offset {
uint64_t offset:40;
uint64_t stag:18;
uint64_t cqid:6;
};
struct tof_icc_toq_common_header1 {
uint8_t interrupt:1;
uint8_t res1:4;
uint8_t source_type:2;
uint8_t flip:1;
uint8_t command;
union {
uint8_t mtu;
struct {
uint8_t res:4;
uint8_t op:4;
} armw;
} mtuop;
uint8_t sps:4;
uint8_t pa:1;
uint8_t pb:2;
uint8_t pc:1;
uint8_t rx;
uint8_t ry;
uint8_t rz;
uint8_t ra:1;
uint8_t rb:2;
uint8_t rc:1;
uint8_t res3:1;
uint8_t ri:3;
};
struct tof_icc_toq_common_header2 {
uint8_t gap;
uint8_t s:1;
uint8_t r:1;
uint8_t q:1;
uint8_t p:1;
uint8_t res1:1;
uint8_t j:1;
uint8_t res2:2;
uint16_t edata;
union{
struct {
uint32_t length:24;
uint32_t res:8;
} normal;
struct {
uint32_t length:6;
uint32_t res:26;
} piggyback;
} len;
};
struct tof_icc_toq_descriptor {
struct tof_icc_toq_common_header1 head1;
uint64_t res[3];
};
struct tof_icc_toq_nop {
struct tof_icc_toq_common_header1 head1;
uint64_t res[3];
};
struct tof_icc_toq_put {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
struct tof_icc_cq_stag_offset local;
};
struct tof_icc_toq_write_piggyback_buffer {
struct tof_icc_toq_common_header1 head1;
uint64_t data[3];
};
struct tof_icc_toq_put_piggyback {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
uint64_t data;
};
struct tof_icc_toq_get {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
struct tof_icc_cq_stag_offset local;
};
struct tof_icc_toq_atomic_read_modify_write {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
uint64_t data;
};
struct tof_icc_toq_transmit_raw_packet1 {
struct tof_icc_toq_common_header1 head1;
uint8_t gap;
uint8_t res4[3];
uint32_t length:12;
uint32_t res5:20;
uint64_t res6;
uint64_t pa:48; /* for optimization */
uint64_t res7:16;
};
struct tof_icc_toq_transmit_raw_packet2 {
uint8_t interrupt:1;
uint8_t res1:4;
uint8_t source_type:2;
uint8_t flip:1;
uint8_t command;
uint8_t res2:7;
uint8_t e:1;
uint8_t res3[4];
uint8_t port:5;
uint8_t res4:1;
uint8_t vc:2;
uint8_t gap;
uint8_t res5[3];
uint32_t length:12;
uint32_t res6:20;
uint64_t res7;
uint64_t pa:48; /* for optimization */
uint64_t res8:16;
};
struct tof_icc_toq_transmit_system_packet {
struct tof_icc_toq_common_header1 head1; /* rx, ry, rz should be rdx, rdy, rdz */
uint8_t gap;
uint8_t res4[3];
uint32_t length:12;
uint32_t res5:20;
uint64_t res6;
uint64_t pa:48; /* for optimization */
uint64_t res7:16;
};
struct tof_icc_tcq_descriptor {
uint8_t res1:5;
uint8_t counter_unmatch:1;
uint8_t res2:1;
uint8_t flip:1;
uint8_t rcode;
uint8_t res3[2];
union{
struct {
uint32_t length:24;
uint32_t res:8;
} normal;
struct {
uint32_t length:6;
uint32_t res:26;
} piggyback;
} len;
};
struct tof_icc_mrq_common_header1 {
uint8_t res1:7;
uint8_t flip:1;
uint8_t id;
uint8_t rcode;
uint8_t res2:4;
uint8_t pa:1;
uint8_t pb:2;
uint8_t pc:1;
uint8_t x;
uint8_t y;
uint8_t z;
uint8_t a:1;
uint8_t b:2;
uint8_t c:1;
uint8_t res3:1;
uint8_t i:3;
};
struct tof_icc_mrq_common_header2 {
uint8_t res1;
uint8_t res2:4;
uint8_t initial:1;
uint8_t res3:3;
uint16_t edata;
union {
struct {
uint32_t length:11;
uint32_t res:21;
} normal;
struct {
uint32_t op:4;
uint32_t res:28;
} armw;
} lenop;
};
struct tof_icc_mrq_atomic_read_modify_write_halfway_notice {
struct tof_icc_mrq_common_header1 head1;
struct tof_icc_mrq_common_header2 head2;
struct tof_icc_cq_stag_offset local;
struct tof_icc_cq_stag_offset remote;
};
struct tof_icc_mrq_descriptor {
struct tof_icc_mrq_common_header1 head1;
struct tof_icc_mrq_common_header2 head2;
struct tof_icc_cq_stag_offset cso1;
struct tof_icc_cq_stag_offset cso2;
};
struct tof_icc_pbq_descriptor {
uint64_t res1:7;
uint64_t f:1;
uint64_t res2:3;
uint64_t pa:29;
uint64_t res3:24;
};
struct tof_icc_prq_descriptor {
uint64_t rcode:7;
uint64_t f:1;
uint64_t res1:3;
uint64_t pa:29;
uint64_t res2:8;
uint64_t w:1;
uint64_t res3:5;
uint64_t l:1;
uint64_t e:1;
uint64_t res4:8;
};
/** Registers **/
/* useful packed structures */
struct tof_icc_reg_subnet {
uint64_t lz:6;
uint64_t sz:6;
uint64_t nz:6;
uint64_t ly:6;
uint64_t sy:6;
uint64_t ny:6;
uint64_t lx:6;
uint64_t sx:6;
uint64_t nx:6;
uint64_t res:10;
};
struct tof_icc_reg_bg_address {
uint32_t bgid:6;
uint32_t tni:3;
uint32_t c:1;
uint32_t b:2;
uint32_t a:1;
uint32_t z:5;
uint32_t y:5;
uint32_t x:5;
uint32_t pc:1;
uint32_t pb:2;
uint32_t pa:1;
};
/* relative offset of interrupt controller registers */
#define TOF_ICC_IRQREG_IRR 0x0
#define TOF_ICC_IRQREG_IMR 0x8
#define TOF_ICC_IRQREG_IRC 0x10
#define TOF_ICC_IRQREG_IMC 0x18
#define TOF_ICC_IRQREG_ICL 0x20
/* TOFU REGISTERS */
#define tof_icc_reg_pa 0x40000000
/* CQ */
#define TOF_ICC_REG_CQ_PA(tni, cqid) (tof_icc_reg_pa + 0 + (tni) * 0x1000000 + (cqid) * 0x10000)
#define TOF_ICC_REG_CQ_TOQ_DIRECT_DESCRIPTOR 0x0
#define TOF_ICC_REG_CQ_TOQ_FETCH_START 0x40
#define TOF_ICC_REG_CQ_MRQ_FULL_POINTER 0x48
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER0 0x50
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER1 0x58
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER2 0x60
#define TOF_ICC_REG_CQ_TCQ_NUM_NOTICE 0x68
#define TOF_ICC_REG_CQ_MRQ_NUM_NOTICE 0x70
#define TOF_ICC_REG_CQ_TX_PAYLOAD_BYTE 0x78
#define TOF_ICC_REG_CQ_RX_PAYLOAD_BYTE 0x80
#define TOF_ICC_REG_CQ_DUMP_START 0x0
#define TOF_ICC_REG_CQ_DUMP_END 0x88
/* BCH */
#define TOF_ICC_REG_BCH_PA(tni, bgid) (tof_icc_reg_pa + 0x0000e00000 + (tni) * 0x1000000 + (bgid) * 0x10000)
#define TOF_ICC_REG_BCH_IDATA 0x800
#define TOF_ICC_REG_BCH_READY 0x840
#define TOF_ICC_REG_BCH_READY_STATE BIT(63)
#define TOF_ICC_REG_BCH_IGNORED_SIGNAL_COUNT 0x848
#define TOF_ICC_REG_BCH_DUMP_START 0x800
#define TOF_ICC_REG_BCH_DUMP_END 0x850
/* CQS */
#define TOF_ICC_REG_CQS_PA(tni, cqid) (tof_icc_reg_pa + 0x0000400000 + (tni) * 0x1000000 + (cqid) * 0x10000)
#define TOF_ICC_REG_CQS_STATUS 0x0
#define TOF_ICC_REG_CQS_STATUS_DESCRIPTOR_PROCESS_STOP BIT(63)
#define TOF_ICC_REG_CQS_STATUS_DESCRIPTOR_FETCH_STOP BIT(62)
#define TOF_ICC_REG_CQS_STATUS_BLANK_ENTRY_FLIP_BIT BIT(61)
#define TOF_ICC_REG_CQS_STATUS_CACHE_FLUSH_BUSY BIT(60)
#define TOF_ICC_REG_CQS_STATUS_CQ_ENABLE BIT(59)
#define TOF_ICC_REG_CQS_STATUS_SESSION_DEAD BIT(58)
#define TOF_ICC_REG_CQS_STATUS_SESSION_OFFSET_OVERFLOW BIT(57)
#define TOF_ICC_REG_CQS_STATUS_SESSION_OFFSET GENMASK(56, 32)
#define TOF_ICC_REG_CQS_STATUS_NEXT_DESCRIPTOR_OFFSET GENMASK(29, 5)
#define TOF_ICC_REG_CQS_ENABLE 0x8
#define TOF_ICC_REG_CQS_CACHE_FLUSH 0x10
#define TOF_ICC_REG_CQS_FETCH_STOP 0x18
#define TOF_ICC_REG_CQS_MODE 0x20
#define TOF_ICC_REG_CQS_MODE_SYSTEM BIT(63)
#define TOF_ICC_REG_CQS_MODE_TRP2_ENABLE BIT(62)
#define TOF_ICC_REG_CQS_MODE_TRP1_ENABLE BIT(61)
#define TOF_ICC_REG_CQS_MODE_SESSION BIT(60)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NX GENMASK(53, 48)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SX GENMASK(47, 42)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LX GENMASK(41, 36)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NY GENMASK(35, 30)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SY GENMASK(29, 24)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LY GENMASK(23, 18)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NZ GENMASK(17, 12)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SZ GENMASK(11, 6)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LZ GENMASK(5, 0)
#define TOF_ICC_REG_CQS_GPID 0x28
#define TOF_ICC_REG_CQS_TOQ_IPA 0x30
#define TOF_ICC_REG_CQS_TOQ_SIZE 0x38
#define TOF_ICC_REG_CQS_TCQ_IPA 0x40
#define TOF_ICC_REG_CQS_TCQ_IPA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_CQS_MRQ_IPA 0x48
#define TOF_ICC_REG_CQS_MRQ_IPA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_CQS_MRQ_SIZE 0x50
#define TOF_ICC_REG_CQS_MRQ_MASK 0x58
#define TOF_ICC_REG_CQS_TCQ_DESCRIPTOR_COALESCING_TIMER 0x60
#define TOF_ICC_REG_CQS_MRQ_DESCRIPTOR_COALESCING_TIMER 0x68
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_COALESCING_TIMER 0x70
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_COALESCING_COUNT 0x78
#define TOF_ICC_REG_CQS_TOQ_DIRECT_SOURCE_COUNT 0x80
#define TOF_ICC_REG_CQS_TOQ_DIRECT_DESCRIPTOR_COUNT 0x88
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_ENABLE 0x90
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_IPA 0x98
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_SIZE 0xa0
#define TOF_ICC_REG_CQS_STEERING_TABLE_ENABLE 0xa8
#define TOF_ICC_REG_CQS_STEERING_TABLE_IPA 0xb0
#define TOF_ICC_REG_CQS_STEERING_TABLE_SIZE 0xb8
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_MASK 0xc0
#define TOF_ICC_REG_CQS_IRR 0xc8
#define TOF_ICC_REG_CQS_IMR 0xd0
#define TOF_ICC_REG_CQS_IRC 0xd8
#define TOF_ICC_REG_CQS_IMC 0xe0
#define TOF_ICC_REG_CQS_ICL 0xe8
#define TOF_ICC_REG_CQS_DUMP_START 0x0
#define TOF_ICC_REG_CQS_DUMP_END 0xf0
/* BGS */
#define TOF_ICC_REG_BGS_PA(tni, bgid) (tof_icc_reg_pa + 0x0000800000 + (tni) * 0x1000000 + (bgid) * 0x10000)
#define TOF_ICC_REG_BGS_ENABLE 0x0
#define TOF_ICC_REG_BGS_IRR 0x8
#define TOF_ICC_REG_BGS_IMR 0x10
#define TOF_ICC_REG_BGS_IRC 0x18
#define TOF_ICC_REG_BGS_IMC 0x20
#define TOF_ICC_REG_BGS_ICL 0x28
#define TOF_ICC_REG_BGS_STATE 0x30
#define TOF_ICC_REG_BGS_STATE_ENABLE BIT(0)
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_GPID_UNMATCH 0x38
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_GPID_UNMATCH_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_ADDRESS_UNMATCH 0x40
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_ADDRESS_UNMATCH_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_BGS_SIGNAL_A 0x48
#define TOF_ICC_REG_BGS_SIGNAL_A_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_A_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_A_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_A_OP_TYPE GENMASK(3, 0)
#define TOF_ICC_REG_BGS_SIGNAL_B 0x50
#define TOF_ICC_REG_BGS_SIGNAL_B_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_B_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_B_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_B_OP_TYPE GENMASK(3, 0)
#define TOF_ICC_REG_BGS_SIGNAL_MASK 0x58
#define TOF_ICC_REG_BGS_SIGNAL_MASK_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_TLP_SEND BIT(60)
#define TOF_ICC_REG_BGS_LOCAL_LINK 0x60
#define TOF_ICC_REG_BGS_LOCAL_LINK_BGID_RECV GENMASK(37, 32)
#define TOF_ICC_REG_BGS_LOCAL_LINK_BGID_SEND GENMASK(5, 0)
#define TOF_ICC_REG_BGS_REMOTE_LINK 0x68
#define TOF_ICC_REG_BGS_REMOTE_LINK_BG_ADDRESS_RECV GENMASK(59, 32)
#define TOF_ICC_REG_BGS_REMOTE_LINK_BG_ADDRESS_SEND GENMASK(31, 0)
#define TOF_ICC_REG_BGS_SUBNET_SIZE 0x70
#define TOF_ICC_REG_BGS_GPID_BSEQ 0x78
#define TOF_ICC_REG_BGS_DATA_A0 0x108
#define TOF_ICC_REG_BGS_DATA_AE 0x178
#define TOF_ICC_REG_BGS_DATA_B0 0x188
#define TOF_ICC_REG_BGS_DATA_BE 0x1f8
#define TOF_ICC_REG_BGS_BCH_MASK 0x800
#define TOF_ICC_REG_BGS_BCH_MASK_MASK BIT(63)
#define TOF_ICC_REG_BGS_BCH_MASK_STATUS 0x808
#define TOF_ICC_REG_BGS_BCH_MASK_STATUS_RUN BIT(63)
#define TOF_ICC_REG_BGS_BCH_NOTICE_IPA 0x810
#define TOF_ICC_REG_BGS_DUMP_START 0x0
#define TOF_ICC_REG_BGS_DUMP_END 0x818
/* TNI */
#define TOF_ICC_REG_TNI_PA(tni) (tof_icc_reg_pa + 0x0000c00000 + (tni) * 0x1000000)
#define TOF_ICC_REG_TNI_IRR 0x8
#define TOF_ICC_REG_TNI_IMR 0x10
#define TOF_ICC_REG_TNI_IRC 0x18
#define TOF_ICC_REG_TNI_IMC 0x20
#define TOF_ICC_REG_TNI_ICL 0x28
#define TOF_ICC_REG_TNI_STATE 0x30
#define TOF_ICC_REG_TNI_STATE_MASK GENMASK(1, 0)
#define TOF_ICC_REG_TNI_STATE_DISABLE 0
#define TOF_ICC_REG_TNI_STATE_NORMAL 2
#define TOF_ICC_REG_TNI_STATE_ERROR 3
#define TOF_ICC_REG_TNI_ENABLE 0x38
#define TOF_ICC_REG_TNI_CQ_PRESENT 0x40
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG 0x48
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG_DEST_BG GENMASK(37, 32)
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG_SOURCE_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_TNI_PRQ_FULL_POINTER 0x100
#define TOF_ICC_REG_TNI_PBQ_PA 0x108
#define TOF_ICC_REG_TNI_PBQ_SIZE 0x110
#define TOF_ICC_REG_TNI_PRQ_PA 0x118
#define TOF_ICC_REG_TNI_PRQ_PA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_TNI_PRQ_SIZE 0x120
#define TOF_ICC_REG_TNI_PRQ_MASK 0x128
#define TOF_ICC_REG_TNI_PRQ_ENTRY_COALESCING_TIMER 0x130
#define TOF_ICC_REG_TNI_PRQ_INTERRUPT_COALESCING_TIMER 0x138
#define TOF_ICC_REG_TNI_PRQ_INTERRUPT_COALESCING_COUNT 0x140
#define TOF_ICC_REG_TNI_SEND_COUNT 0x148
#define TOF_ICC_REG_TNI_NO_SEND_COUNT 0x150
#define TOF_ICC_REG_TNI_BLOCK_SEND_COUNT 0x158
#define TOF_ICC_REG_TNI_RECEIVE_COUNT 0x160
#define TOF_ICC_REG_TNI_NO_RECEIVE_COUNT 0x168
#define TOF_ICC_REG_TNI_NUM_SEND_TLP 0x170
#define TOF_ICC_REG_TNI_BYTE_SEND_TLP 0x178
#define TOF_ICC_REG_TNI_NUM_SEND_SYSTEM_TLP 0x180
#define TOF_ICC_REG_TNI_NUM_RECEIVE_TLP 0x188
#define TOF_ICC_REG_TNI_BYTE_RECEIVE_TLP 0x190
#define TOF_ICC_REG_TNI_NUM_RECEIVE_NULLIFIED_TLP 0x198
#define TOF_ICC_REG_TNI_RX_NUM_UNKNOWN_TLP 0x1a0
#define TOF_ICC_REG_TNI_RX_NUM_SYSTEM_TLP 0x1a8
#define TOF_ICC_REG_TNI_RX_NUM_EXCEPTION_TLP 0x1b0
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_UNKNOWN_TLP 0x1b8
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_SYSTEM_TLP 0x1c0
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_EXCEPTION_TLP 0x1c8
#define TOF_ICC_REG_TNI_DUMP_START 0x8
#define TOF_ICC_REG_TNI_DUMP_END 0x1d0
/* Port */
#define TOF_ICC_REG_PORT_PA(port) (tof_icc_reg_pa + 0x0006000000 + (port) * 0x1000)
#define TOF_ICC_REG_PORT_TX_VC0_ZERO_CREDIT_COUNT 0x0
#define TOF_ICC_REG_PORT_TX_VC1_ZERO_CREDIT_COUNT 0x8
#define TOF_ICC_REG_PORT_TX_VC2_ZERO_CREDIT_COUNT 0x10
#define TOF_ICC_REG_PORT_TX_VC3_ZERO_CREDIT_COUNT 0x18
#define TOF_ICC_REG_PORT_FREE_RUN_COUNT 0x80
#define TOF_ICC_REG_PORT_NUM_SEND_DLLP 0xc0
#define TOF_ICC_REG_PORT_NUM_SEND_TLP 0xc8
#define TOF_ICC_REG_PORT_BYTE_SEND_TLP 0xd0
#define TOF_ICC_REG_PORT_NUM_SEND_SYSTEM_TLP 0xd8
#define TOF_ICC_REG_PORT_NUM_SEND_NULLIFIED_TLP 0xe0
#define TOF_ICC_REG_PORT_NUM_TX_DISCARD_SYSTEM_TLP 0xe8
#define TOF_ICC_REG_PORT_NUM_TX_DISCARD_NORMAL_TLP 0xf0
#define TOF_ICC_REG_PORT_NUM_TX_FILTERED_NORMAL_TLP 0xf8
#define TOF_ICC_REG_PORT_NUM_VIRTUAL_CUT_THROUGH_TLP 0x100
#define TOF_ICC_REG_PORT_NUM_GENERATE_NULLIFIED_TLP 0x108
#define TOF_ICC_REG_PORT_NUM_RECEIVE_DLLP 0x110
#define TOF_ICC_REG_PORT_NUM_RECEIVE_TLP 0x118
#define TOF_ICC_REG_PORT_BYTE_RECEIVE_TLP 0x120
#define TOF_ICC_REG_PORT_NUM_RECEIVE_SYSTEM_TLP 0x128
#define TOF_ICC_REG_PORT_NUM_RECEIVE_NULLIFIED_TLP 0x130
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_SYSTEM_TLP 0x138
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_NORMAL_TLP 0x140
#define TOF_ICC_REG_PORT_NUM_RX_FILTERED_NORMAL_TLP 0x158
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_NULLIFIED_TLP 0x160
#define TOF_ICC_REG_PORT_FRAME_LCRC_ERROR_COUNT 0x170
#define TOF_ICC_REG_PORT_TX_RETRY_BUFFER_CE_COUNT 0x180
#define TOF_ICC_REG_PORT_RX_VC_BUFFER_CE_COUNT 0x188
#define TOF_ICC_REG_PORT_XB_CE_COUNT 0x190
#define TOF_ICC_REG_PORT_ACK_NACK_TIME_OUT_COUNT 0x198
#define TOF_ICC_REG_PORT_SLICE0_FCS_ERROR_COUNT 0x1a0
#define TOF_ICC_REG_PORT_SLICE1_FCS_ERROR_COUNT 0x1a8
#define TOF_ICC_REG_PORT_DUMP_START 0x0
#define TOF_ICC_REG_PORT_DUMP_END 0x1b0
/* XB */
#define TOF_ICC_REG_XB_PA (tof_icc_reg_pa + 0x000600f000)
#define TOF_ICC_REG_XB_STQ_ENABLE 0x0
#define TOF_ICC_REG_XB_STQ_UPDATE_INTERVAL 0x8
#define TOF_ICC_REG_XB_STQ_PA 0x10
#define TOF_ICC_REG_XB_STQ_SIZE 0x18
#define TOF_ICC_REG_XB_STQ_NEXT_OFFSET 0x20
#define TOF_ICC_REG_XB_DUMP_START 0x0
#define TOF_ICC_REG_XB_DUMP_END 0x28
#define TOF_ICC_XB_TC_DATA_CYCLE_COUNT(tni) ((tni) * 0x10 + 0x0)
#define TOF_ICC_XB_TC_WAIT_CYCLE_COUNT(tni) ((tni) * 0x10 + 0x8)
#define TOF_ICC_XB_TD_DATA_CYCLE_COUNT(tnr) ((tnr) * 0x10 + 0x60)
#define TOF_ICC_XB_TD_WAIT_CYCLE_COUNT(tnr) ((tnr) * 0x10 + 0x68)
/* Tofu */
#define TOF_ICC_REG_TOFU_PA (tof_icc_reg_pa + 0x0007000000)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS 0x0
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_X GENMASK(22, 18)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_Y GENMASK(17, 13)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_Z GENMASK(12, 8)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_A BIT(7)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_B GENMASK(6, 5)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_C BIT(4)
#define TOF_ICC_REG_TOFU_PORT_SETTING 0x8
#define TOF_ICC_REG_TOFU_TD_TLP_FILTER(tnr) ((tnr) * 0x10 + 0x10)
#define TOF_ICC_REG_TOFU_TD_SETTINGS(tnr) ((tnr) * 0x10 + 0x18)
#define TOF_ICC_REG_TOFU_TNR_MSI_BASE 0xc0
#define TOF_ICC_REG_TOFU_TNR_IRR 0xc8
#define TOF_ICC_REG_TOFU_TNR_IMR 0xd0
#define TOF_ICC_REG_TOFU_TNR_IRC 0xd8
#define TOF_ICC_REG_TOFU_TNR_IMC 0xe0
#define TOF_ICC_REG_TOFU_TNR_ICL 0xe8
#define TOF_ICC_REG_TOFU_TNI_VMS(tni, vmsid) ((tni) * 0x100 + (vmsid) * 0x8 + 0x100)
#define TOF_ICC_REG_TOFU_TNI_VMS_CQ00(tni) ((tni) * 0x100 + 0x180)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG00(tni) ((tni) * 0x100 + 0x1a0)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG16(tni) ((tni) * 0x100 + 0x1a8)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG32(tni) ((tni) * 0x100 + 0x1b0)
#define TOF_ICC_REG_TOFU_TNI_MSI_BASE(tni) ((tni) * 0x100 + 0x1c0)
#define TOF_ICC_REG_TOFU_DUMP_START 0x0
#define TOF_ICC_REG_TOFU_DUMP_END 0x6c8
/** Interrupts **/
#define TOF_ICC_IRQ_CQS_TOQ_READ_EXCEPTION BIT(0)
#define TOF_ICC_IRQ_CQS_TOQ_DIRECT_DESCRIPTOR_EXCEPTION BIT(1)
#define TOF_ICC_IRQ_CQS_TOQ_MARKED_UE BIT(2)
#define TOF_ICC_IRQ_CQS_TCQ_WRITE_EXCEPTION BIT(3)
#define TOF_ICC_IRQ_CQS_TOQ_SOURCE_TYPE_EXCEPTION BIT(4)
#define TOF_ICC_IRQ_CQS_TCQ_WRITE_ACKNOWLEDGE BIT(5)
#define TOF_ICC_IRQ_CQS_MRQ_WRITE_ACKNOWLEDGE BIT(7)
#define TOF_ICC_IRQ_CQS_MRQ_WRITE_EXCEPTION BIT(8)
#define TOF_ICC_IRQ_CQS_MRQ_OVERFLOW BIT(9)
#define TOF_ICC_IRQ_CQS_STEERING_READ_EXCEPTION BIT(36)
#define TOF_ICC_IRQ_CQS_MB_READ_EXCEPTION BIT(38)
#define TOF_ICC_IRQ_CQS_PAYLOAD_READ_EXCEPTION BIT(39)
#define TOF_ICC_IRQ_CQS_PAYLOAD_WRITE_EXCEPTION BIT(40)
/* Just for convinience of irr value, no exists CQS CACHEFLUSH_TIMEOUT interrupt */
#define TOF_ICC_DUMMY_IRQ_CQS_CACHEFLUSH_TIMEOUT BIT(63)
#define TOF_ICC_IRQ_BGS_NODE_ADDRESS_UNMATCH BIT(0)
#define TOF_ICC_IRQ_BGS_BG_RECV_ADDRESS_EXCEPTION BIT(1)
#define TOF_ICC_IRQ_BGS_BG_SEND_ADDRESS_EXCEPTION BIT(2)
#define TOF_ICC_IRQ_BGS_GPID_UNMATCH BIT(3)
#define TOF_ICC_IRQ_BGS_BSEQ_UNMATCH BIT(4)
#define TOF_ICC_IRQ_BGS_SIGNAL_STATE_ERROR BIT(5)
#define TOF_ICC_IRQ_BGS_SYNCHRONIZATION_ACKNOWLEDGE BIT(24)
#define TOF_ICC_IRQ_BGS_ERROR_SYNCHRONIZATION_ACKNOWLEDGE BIT(25)
#define TOF_ICC_IRQ_BGS_DMA_COMPLETION_EXCEPTION BIT(26)
#define TOF_ICC_IRQ_TNI_PBQ_READ_EXCEPTION BIT(0)
#define TOF_ICC_IRQ_TNI_PBQ_MARKED_UE BIT(1)
#define TOF_ICC_IRQ_TNI_PBQ_UNDERFLOW BIT(2)
#define TOF_ICC_IRQ_TNI_PRQ_PACKET_DISCARD BIT(3)
#define TOF_ICC_IRQ_TNI_PRQ_WRITE_ACKNOWLEDGE BIT(4)
#define TOF_ICC_IRQ_TNI_PRQ_WRITE_EXCEPTION BIT(5)
#define TOF_ICC_IRQ_TNI_PRQ_OVERFLOW BIT(6)
#define TOF_ICC_IRQ_TNI_INACTIVE_BG BIT(16)
#define TOF_ICC_IRQ_TNI_STAGE2_TRANSLATION_FAULT BIT(32)
#define TOF_ICC_IRQ_TNR_TNR0_RX_FILTER_OUT BIT(0)
#define TOF_ICC_IRQ_TNR_TNR0_TX_FILTER_OUT BIT(1)
#define TOF_ICC_IRQ_TNR_TNR0_PORT_ERROR BIT(2)
#define TOF_ICC_IRQ_TNR_TNR0_DATELINE_ERROR BIT(3)
#define TOF_ICC_IRQ_TNR_TNR0_ROUTING_ERROR BIT(4)
#define TOF_ICC_IRQ_TNR_TNR1_RX_FILTER_OUT BIT(6)
#define TOF_ICC_IRQ_TNR_TNR1_TX_FILTER_OUT BIT(7)
#define TOF_ICC_IRQ_TNR_TNR1_PORT_ERROR BIT(8)
#define TOF_ICC_IRQ_TNR_TNR1_DATELINE_ERROR BIT(9)
#define TOF_ICC_IRQ_TNR_TNR1_ROUTING_ERROR BIT(10)
#define TOF_ICC_IRQ_TNR_TNR2_RX_FILTER_OUT BIT(12)
#define TOF_ICC_IRQ_TNR_TNR2_TX_FILTER_OUT BIT(13)
#define TOF_ICC_IRQ_TNR_TNR2_PORT_ERROR BIT(14)
#define TOF_ICC_IRQ_TNR_TNR2_DATELINE_ERROR BIT(15)
#define TOF_ICC_IRQ_TNR_TNR2_ROUTING_ERROR BIT(16)
#define TOF_ICC_IRQ_TNR_TNR3_RX_FILTER_OUT BIT(18)
#define TOF_ICC_IRQ_TNR_TNR3_TX_FILTER_OUT BIT(19)
#define TOF_ICC_IRQ_TNR_TNR3_PORT_ERROR BIT(20)
#define TOF_ICC_IRQ_TNR_TNR3_DATELINE_ERROR BIT(21)
#define TOF_ICC_IRQ_TNR_TNR3_ROUTING_ERROR BIT(22)
#define TOF_ICC_IRQ_TNR_TNR4_RX_FILTER_OUT BIT(24)
#define TOF_ICC_IRQ_TNR_TNR4_TX_FILTER_OUT BIT(25)
#define TOF_ICC_IRQ_TNR_TNR4_PORT_ERROR BIT(26)
#define TOF_ICC_IRQ_TNR_TNR4_DATELINE_ERROR BIT(27)
#define TOF_ICC_IRQ_TNR_TNR4_ROUTING_ERROR BIT(28)
#define TOF_ICC_IRQ_TNR_TNR5_RX_FILTER_OUT BIT(30)
#define TOF_ICC_IRQ_TNR_TNR5_TX_FILTER_OUT BIT(31)
#define TOF_ICC_IRQ_TNR_TNR5_PORT_ERROR BIT(32)
#define TOF_ICC_IRQ_TNR_TNR5_DATELINE_ERROR BIT(33)
#define TOF_ICC_IRQ_TNR_TNR5_ROUTING_ERROR BIT(34)
#define TOF_ICC_IRQ_TNR_TNR6_RX_FILTER_OUT BIT(36)
#define TOF_ICC_IRQ_TNR_TNR6_TX_FILTER_OUT BIT(37)
#define TOF_ICC_IRQ_TNR_TNR6_PORT_ERROR BIT(38)
#define TOF_ICC_IRQ_TNR_TNR6_DATELINE_ERROR BIT(39)
#define TOF_ICC_IRQ_TNR_TNR6_ROUTING_ERROR BIT(40)
#define TOF_ICC_IRQ_TNR_TNR7_RX_FILTER_OUT BIT(42)
#define TOF_ICC_IRQ_TNR_TNR7_TX_FILTER_OUT BIT(43)
#define TOF_ICC_IRQ_TNR_TNR7_PORT_ERROR BIT(44)
#define TOF_ICC_IRQ_TNR_TNR7_DATELINE_ERROR BIT(45)
#define TOF_ICC_IRQ_TNR_TNR7_ROUTING_ERROR BIT(46)
#define TOF_ICC_IRQ_TNR_TNR8_RX_FILTER_OUT BIT(48)
#define TOF_ICC_IRQ_TNR_TNR8_TX_FILTER_OUT BIT(49)
#define TOF_ICC_IRQ_TNR_TNR8_PORT_ERROR BIT(50)
#define TOF_ICC_IRQ_TNR_TNR8_DATELINE_ERROR BIT(51)
#define TOF_ICC_IRQ_TNR_TNR8_ROUTING_ERROR BIT(52)
#define TOF_ICC_IRQ_TNR_TNR9_RX_FILTER_OUT BIT(54)
#define TOF_ICC_IRQ_TNR_TNR9_TX_FILTER_OUT BIT(55)
#define TOF_ICC_IRQ_TNR_TNR9_PORT_ERROR BIT(56)
#define TOF_ICC_IRQ_TNR_TNR9_DATELINE_ERROR BIT(57)
#define TOF_ICC_IRQ_TNR_TNR9_ROUTING_ERROR BIT(58)
#endif
/* vim: set noet ts=8 sw=8 sts=0 tw=0 : */

View File

@ -0,0 +1,345 @@
#ifndef _TOF_UAPI_H_
#define _TOF_UAPI_H_
#ifdef __KERNEL__
#include <linux/types.h>
#else
#include <stdint.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#endif
enum tof_sig_errno_cq {
TOF_TOQ_DIRECT_DESCRIPTOR_EXCEPTION,
TOF_TOQ_SOURCE_TYPE_EXCEPTION,
TOF_MRQ_OVERFLOW,
TOF_CQS_CACHEFLUSH_TIMEOUT,
};
enum tof_sig_errno_bg {
TOF_NODE_ADDRESS_UNMATCH,
TOF_BSEQ_UNMATCH,
TOF_SIGNAL_STATE_ERROR,
TOF_ERROR_SYNCHRONIZATION_ACKNOWLEDGE,
};
#define TOF_UAPI_VERSION 0x2a00
struct tof_init_cq {
uint16_t version;
uint8_t session_mode;
uint8_t toq_size;
uint8_t mrq_size;
uint8_t num_stag;
uint8_t tcq_cinj;
uint8_t mrq_cinj;
void *toq_mem;
void *tcq_mem;
void *mrq_mem;
};
struct tof_alloc_stag {
uint32_t flags;
int stag;
uint64_t offset;
void *va;
uint64_t len;
};
struct tof_free_stags {
uint16_t num;
int *stags;
};
struct tof_addr {
uint8_t pa;
uint8_t pb;
uint8_t pc;
uint8_t x;
uint8_t y;
uint8_t z;
uint8_t a;
uint8_t b;
uint8_t c;
};
struct tof_set_bg {
int tni;
int gate;
int source_lgate;
struct tof_addr source_raddr;
int source_rtni;
int source_rgate;
int dest_lgate;
struct tof_addr dest_raddr;
int dest_rtni;
int dest_rgate;
};
struct tof_enable_bch {
void *addr;
int bseq;
int num;
struct tof_set_bg *bgs;
};
struct tof_set_subnet {
int res0;
int res1;
uint8_t nx;
uint8_t sx;
uint8_t lx;
uint8_t ny;
uint8_t sy;
uint8_t ly;
uint8_t nz;
uint8_t sz;
uint8_t lz;
};
struct tof_reg_user {
uid_t uid;
uint32_t gpid;
struct tof_set_subnet subnet;
uint64_t *cqmask;
uint64_t *bgmask;
};
struct tof_notify_linkdown {
int num;
struct {
uint8_t x;
uint8_t y;
uint8_t z;
uint8_t a;
uint8_t b;
uint8_t c;
uint16_t ports;
} *items;
};
struct tof_get_port_stat {
int port_no;
uint64_t mask;
uint64_t pa[31];
};
struct tof_get_cq_stat {
int tni;
int cqid;
uint64_t txbyte;
uint64_t rxbyte;
};
struct tof_load_register {
uint64_t pa;
uint64_t len;
void *buf;
};
struct tof_load_resource {
uint64_t rsc_id;
uint64_t offset;
uint64_t len;
void *buf;
};
union tof_trans_table_bitfield {
struct {
uint64_t start:36;
uint64_t len:27;
uint64_t ps_code:1;
} bits;
uint64_t atomic;
};
struct tof_trans_table {
union tof_trans_table_bitfield steering;
union tof_trans_table_bitfield mbpt;
};
void tof_utofu_set_linkdown_callback(void (*callback)(int, const void *));
void tof_utofu_unset_linkdown_callback(void);
#define TOF_MMAP_CQ_REGISTER 0
#ifdef __KERNEL__
#define TOF_MMAP_CQ_TRANSTABLE (PAGE_SIZE)
#else
#define TOF_MMAP_CQ_TRANSTABLE (sysconf(_SC_PAGESIZE))
#endif
#define TOF_MMAP_BCH_REGISTER 0
#define TOF_MMAP_XB_STQ 0
#define TOF_ST_RDWR 0x0
#define TOF_ST_RDONLY 0x1
#define TOF_ST_LPG 0x2
#define TOF_STAG_TRANS_PS_CODE_64KB 0
#define TOF_STAG_TRANS_PS_CODE_2MB 1
#define TOF_IOC_MAGIC 'd'
#define TOF_IOCTL_INIT_CQ _IOWR(TOF_IOC_MAGIC, 0, long)
#define TOF_IOCTL_ALLOC_STAG _IOWR(TOF_IOC_MAGIC, 1, long)
#define TOF_IOCTL_FREE_STAGS _IOWR(TOF_IOC_MAGIC, 2, long)
#define TOF_IOCTL_ENABLE_BCH _IOWR(TOF_IOC_MAGIC, 3, long)
#define TOF_IOCTL_DISABLE_BCH _IOWR(TOF_IOC_MAGIC, 4, long)
#define TOF_IOCTL_SET_RT_SIGNAL _IOWR(TOF_IOC_MAGIC, 5, long)
#define TOF_IOCTL_SET_SUBNET _IOWR(TOF_IOC_MAGIC, 6, long)
#define TOF_IOCTL_REG_USER _IOWR(TOF_IOC_MAGIC, 7, long)
#define TOF_IOCTL_NOTIFY_LINKDOWN _IOWR(TOF_IOC_MAGIC, 8, long)
#define TOF_IOCTL_GET_PORT_STAT _IOWR(TOF_IOC_MAGIC, 9, long)
#define TOF_IOCTL_GET_CQ_STAT _IOWR(TOF_IOC_MAGIC, 10, long)
#define TOF_IOCTL_LOAD_REGISTER _IOWR(TOF_IOC_MAGIC, 11, long)
#define TOF_IOCTL_LOAD_RESOURCE _IOWR(TOF_IOC_MAGIC, 12, long)
#ifndef __KERNEL__
#define TOF_INIT_CQ TOF_IOCTL_INIT_CQ
#define TOF_ALLOC_STAG TOF_IOCTL_ALLOC_STAG
#define TOF_FREE_STAGS TOF_IOCTL_FREE_STAGS
#define TOF_ENABLE_BCH TOF_IOCTL_ENABLE_BCH
#define TOF_DISABLE_BCH TOF_IOCTL_DISABLE_BCH
#define TOF_SET_RT_SIGNAL TOF_IOCTL_SET_RT_SIGNAL
#define TOF_SET_SUBNET TOF_IOCTL_SET_SUBNET
#define TOF_REG_USER TOF_IOCTL_REG_USER
#define TOF_NOTIFY_LINKDOWN TOF_IOCTL_NOTIFY_LINKDOWN
#define TOF_GET_PORT_STAT TOF_IOCTL_GET_PORT_STAT
#define TOF_GET_CQ_STAT TOF_IOCTL_GET_CQ_STAT
#define TOF_LOAD_REGISTER TOF_IOCTL_LOAD_REGISTER
#define TOF_LOAD_RESOURCE TOF_IOCTL_LOAD_RESOURCE
#endif
enum {
/* TOQ (0 - 71) */
TOF_RSC_TNI0_TOQ0 = 0, TOF_RSC_TNI0_TOQ1, TOF_RSC_TNI0_TOQ2, TOF_RSC_TNI0_TOQ3,
TOF_RSC_TNI0_TOQ4, TOF_RSC_TNI0_TOQ5, TOF_RSC_TNI0_TOQ6, TOF_RSC_TNI0_TOQ7,
TOF_RSC_TNI0_TOQ8, TOF_RSC_TNI0_TOQ9, TOF_RSC_TNI0_TOQ10, TOF_RSC_TNI0_TOQ11,
TOF_RSC_TNI1_TOQ0, TOF_RSC_TNI1_TOQ1, TOF_RSC_TNI1_TOQ2, TOF_RSC_TNI1_TOQ3,
TOF_RSC_TNI1_TOQ4, TOF_RSC_TNI1_TOQ5, TOF_RSC_TNI1_TOQ6, TOF_RSC_TNI1_TOQ7,
TOF_RSC_TNI1_TOQ8, TOF_RSC_TNI1_TOQ9, TOF_RSC_TNI1_TOQ10, TOF_RSC_TNI1_TOQ11,
TOF_RSC_TNI2_TOQ0, TOF_RSC_TNI2_TOQ1, TOF_RSC_TNI2_TOQ2, TOF_RSC_TNI2_TOQ3,
TOF_RSC_TNI2_TOQ4, TOF_RSC_TNI2_TOQ5, TOF_RSC_TNI2_TOQ6, TOF_RSC_TNI2_TOQ7,
TOF_RSC_TNI2_TOQ8, TOF_RSC_TNI2_TOQ9, TOF_RSC_TNI2_TOQ10, TOF_RSC_TNI2_TOQ11,
TOF_RSC_TNI3_TOQ0, TOF_RSC_TNI3_TOQ1, TOF_RSC_TNI3_TOQ2, TOF_RSC_TNI3_TOQ3,
TOF_RSC_TNI3_TOQ4, TOF_RSC_TNI3_TOQ5, TOF_RSC_TNI3_TOQ6, TOF_RSC_TNI3_TOQ7,
TOF_RSC_TNI3_TOQ8, TOF_RSC_TNI3_TOQ9, TOF_RSC_TNI3_TOQ10, TOF_RSC_TNI3_TOQ11,
TOF_RSC_TNI4_TOQ0, TOF_RSC_TNI4_TOQ1, TOF_RSC_TNI4_TOQ2, TOF_RSC_TNI4_TOQ3,
TOF_RSC_TNI4_TOQ4, TOF_RSC_TNI4_TOQ5, TOF_RSC_TNI4_TOQ6, TOF_RSC_TNI4_TOQ7,
TOF_RSC_TNI4_TOQ8, TOF_RSC_TNI4_TOQ9, TOF_RSC_TNI4_TOQ10, TOF_RSC_TNI4_TOQ11,
TOF_RSC_TNI5_TOQ0, TOF_RSC_TNI5_TOQ1, TOF_RSC_TNI5_TOQ2, TOF_RSC_TNI5_TOQ3,
TOF_RSC_TNI5_TOQ4, TOF_RSC_TNI5_TOQ5, TOF_RSC_TNI5_TOQ6, TOF_RSC_TNI5_TOQ7,
TOF_RSC_TNI5_TOQ8, TOF_RSC_TNI5_TOQ9, TOF_RSC_TNI5_TOQ10, TOF_RSC_TNI5_TOQ11,
/* TOQ (72 - 143) */
TOF_RSC_TNI0_TCQ0, TOF_RSC_TNI0_TCQ1, TOF_RSC_TNI0_TCQ2, TOF_RSC_TNI0_TCQ3,
TOF_RSC_TNI0_TCQ4, TOF_RSC_TNI0_TCQ5, TOF_RSC_TNI0_TCQ6, TOF_RSC_TNI0_TCQ7,
TOF_RSC_TNI0_TCQ8, TOF_RSC_TNI0_TCQ9, TOF_RSC_TNI0_TCQ10, TOF_RSC_TNI0_TCQ11,
TOF_RSC_TNI1_TCQ0, TOF_RSC_TNI1_TCQ1, TOF_RSC_TNI1_TCQ2, TOF_RSC_TNI1_TCQ3,
TOF_RSC_TNI1_TCQ4, TOF_RSC_TNI1_TCQ5, TOF_RSC_TNI1_TCQ6, TOF_RSC_TNI1_TCQ7,
TOF_RSC_TNI1_TCQ8, TOF_RSC_TNI1_TCQ9, TOF_RSC_TNI1_TCQ10, TOF_RSC_TNI1_TCQ11,
TOF_RSC_TNI2_TCQ0, TOF_RSC_TNI2_TCQ1, TOF_RSC_TNI2_TCQ2, TOF_RSC_TNI2_TCQ3,
TOF_RSC_TNI2_TCQ4, TOF_RSC_TNI2_TCQ5, TOF_RSC_TNI2_TCQ6, TOF_RSC_TNI2_TCQ7,
TOF_RSC_TNI2_TCQ8, TOF_RSC_TNI2_TCQ9, TOF_RSC_TNI2_TCQ10, TOF_RSC_TNI2_TCQ11,
TOF_RSC_TNI3_TCQ0, TOF_RSC_TNI3_TCQ1, TOF_RSC_TNI3_TCQ2, TOF_RSC_TNI3_TCQ3,
TOF_RSC_TNI3_TCQ4, TOF_RSC_TNI3_TCQ5, TOF_RSC_TNI3_TCQ6, TOF_RSC_TNI3_TCQ7,
TOF_RSC_TNI3_TCQ8, TOF_RSC_TNI3_TCQ9, TOF_RSC_TNI3_TCQ10, TOF_RSC_TNI3_TCQ11,
TOF_RSC_TNI4_TCQ0, TOF_RSC_TNI4_TCQ1, TOF_RSC_TNI4_TCQ2, TOF_RSC_TNI4_TCQ3,
TOF_RSC_TNI4_TCQ4, TOF_RSC_TNI4_TCQ5, TOF_RSC_TNI4_TCQ6, TOF_RSC_TNI4_TCQ7,
TOF_RSC_TNI4_TCQ8, TOF_RSC_TNI4_TCQ9, TOF_RSC_TNI4_TCQ10, TOF_RSC_TNI4_TCQ11,
TOF_RSC_TNI5_TCQ0, TOF_RSC_TNI5_TCQ1, TOF_RSC_TNI5_TCQ2, TOF_RSC_TNI5_TCQ3,
TOF_RSC_TNI5_TCQ4, TOF_RSC_TNI5_TCQ5, TOF_RSC_TNI5_TCQ6, TOF_RSC_TNI5_TCQ7,
TOF_RSC_TNI5_TCQ8, TOF_RSC_TNI5_TCQ9, TOF_RSC_TNI5_TCQ10, TOF_RSC_TNI5_TCQ11,
/* MRQ (144 - 215) */
TOF_RSC_TNI0_MRQ0, TOF_RSC_TNI0_MRQ1, TOF_RSC_TNI0_MRQ2, TOF_RSC_TNI0_MRQ3,
TOF_RSC_TNI0_MRQ4, TOF_RSC_TNI0_MRQ5, TOF_RSC_TNI0_MRQ6, TOF_RSC_TNI0_MRQ7,
TOF_RSC_TNI0_MRQ8, TOF_RSC_TNI0_MRQ9, TOF_RSC_TNI0_MRQ10, TOF_RSC_TNI0_MRQ11,
TOF_RSC_TNI1_MRQ0, TOF_RSC_TNI1_MRQ1, TOF_RSC_TNI1_MRQ2, TOF_RSC_TNI1_MRQ3,
TOF_RSC_TNI1_MRQ4, TOF_RSC_TNI1_MRQ5, TOF_RSC_TNI1_MRQ6, TOF_RSC_TNI1_MRQ7,
TOF_RSC_TNI1_MRQ8, TOF_RSC_TNI1_MRQ9, TOF_RSC_TNI1_MRQ10, TOF_RSC_TNI1_MRQ11,
TOF_RSC_TNI2_MRQ0, TOF_RSC_TNI2_MRQ1, TOF_RSC_TNI2_MRQ2, TOF_RSC_TNI2_MRQ3,
TOF_RSC_TNI2_MRQ4, TOF_RSC_TNI2_MRQ5, TOF_RSC_TNI2_MRQ6, TOF_RSC_TNI2_MRQ7,
TOF_RSC_TNI2_MRQ8, TOF_RSC_TNI2_MRQ9, TOF_RSC_TNI2_MRQ10, TOF_RSC_TNI2_MRQ11,
TOF_RSC_TNI3_MRQ0, TOF_RSC_TNI3_MRQ1, TOF_RSC_TNI3_MRQ2, TOF_RSC_TNI3_MRQ3,
TOF_RSC_TNI3_MRQ4, TOF_RSC_TNI3_MRQ5, TOF_RSC_TNI3_MRQ6, TOF_RSC_TNI3_MRQ7,
TOF_RSC_TNI3_MRQ8, TOF_RSC_TNI3_MRQ9, TOF_RSC_TNI3_MRQ10, TOF_RSC_TNI3_MRQ11,
TOF_RSC_TNI4_MRQ0, TOF_RSC_TNI4_MRQ1, TOF_RSC_TNI4_MRQ2, TOF_RSC_TNI4_MRQ3,
TOF_RSC_TNI4_MRQ4, TOF_RSC_TNI4_MRQ5, TOF_RSC_TNI4_MRQ6, TOF_RSC_TNI4_MRQ7,
TOF_RSC_TNI4_MRQ8, TOF_RSC_TNI4_MRQ9, TOF_RSC_TNI4_MRQ10, TOF_RSC_TNI4_MRQ11,
TOF_RSC_TNI5_MRQ0, TOF_RSC_TNI5_MRQ1, TOF_RSC_TNI5_MRQ2, TOF_RSC_TNI5_MRQ3,
TOF_RSC_TNI5_MRQ4, TOF_RSC_TNI5_MRQ5, TOF_RSC_TNI5_MRQ6, TOF_RSC_TNI5_MRQ7,
TOF_RSC_TNI5_MRQ8, TOF_RSC_TNI5_MRQ9, TOF_RSC_TNI5_MRQ10, TOF_RSC_TNI5_MRQ11,
/* PBQ (216 - 221) */
TOF_RSC_TNI0_PBQ, TOF_RSC_TNI1_PBQ, TOF_RSC_TNI2_PBQ, TOF_RSC_TNI3_PBQ,
TOF_RSC_TNI4_PBQ, TOF_RSC_TNI5_PBQ,
/* PRQ (222 - 227) */
TOF_RSC_TNI0_PRQ, TOF_RSC_TNI1_PRQ, TOF_RSC_TNI2_PRQ, TOF_RSC_TNI3_PRQ,
TOF_RSC_TNI4_PRQ, TOF_RSC_TNI5_PRQ,
/* STEERINGTABLE (228 - 299) */
TOF_RSC_TNI0_STEERINGTABLE0, TOF_RSC_TNI0_STEERINGTABLE1, TOF_RSC_TNI0_STEERINGTABLE2,
TOF_RSC_TNI0_STEERINGTABLE3, TOF_RSC_TNI0_STEERINGTABLE4, TOF_RSC_TNI0_STEERINGTABLE5,
TOF_RSC_TNI0_STEERINGTABLE6, TOF_RSC_TNI0_STEERINGTABLE7, TOF_RSC_TNI0_STEERINGTABLE8,
TOF_RSC_TNI0_STEERINGTABLE9, TOF_RSC_TNI0_STEERINGTABLE10, TOF_RSC_TNI0_STEERINGTABLE11,
TOF_RSC_TNI1_STEERINGTABLE0, TOF_RSC_TNI1_STEERINGTABLE1, TOF_RSC_TNI1_STEERINGTABLE2,
TOF_RSC_TNI1_STEERINGTABLE3, TOF_RSC_TNI1_STEERINGTABLE4, TOF_RSC_TNI1_STEERINGTABLE5,
TOF_RSC_TNI1_STEERINGTABLE6, TOF_RSC_TNI1_STEERINGTABLE7, TOF_RSC_TNI1_STEERINGTABLE8,
TOF_RSC_TNI1_STEERINGTABLE9, TOF_RSC_TNI1_STEERINGTABLE10, TOF_RSC_TNI1_STEERINGTABLE11,
TOF_RSC_TNI2_STEERINGTABLE0, TOF_RSC_TNI2_STEERINGTABLE1, TOF_RSC_TNI2_STEERINGTABLE2,
TOF_RSC_TNI2_STEERINGTABLE3, TOF_RSC_TNI2_STEERINGTABLE4, TOF_RSC_TNI2_STEERINGTABLE5,
TOF_RSC_TNI2_STEERINGTABLE6, TOF_RSC_TNI2_STEERINGTABLE7, TOF_RSC_TNI2_STEERINGTABLE8,
TOF_RSC_TNI2_STEERINGTABLE9, TOF_RSC_TNI2_STEERINGTABLE10, TOF_RSC_TNI2_STEERINGTABLE11,
TOF_RSC_TNI3_STEERINGTABLE0, TOF_RSC_TNI3_STEERINGTABLE1, TOF_RSC_TNI3_STEERINGTABLE2,
TOF_RSC_TNI3_STEERINGTABLE3, TOF_RSC_TNI3_STEERINGTABLE4, TOF_RSC_TNI3_STEERINGTABLE5,
TOF_RSC_TNI3_STEERINGTABLE6, TOF_RSC_TNI3_STEERINGTABLE7, TOF_RSC_TNI3_STEERINGTABLE8,
TOF_RSC_TNI3_STEERINGTABLE9, TOF_RSC_TNI3_STEERINGTABLE10, TOF_RSC_TNI3_STEERINGTABLE11,
TOF_RSC_TNI4_STEERINGTABLE0, TOF_RSC_TNI4_STEERINGTABLE1, TOF_RSC_TNI4_STEERINGTABLE2,
TOF_RSC_TNI4_STEERINGTABLE3, TOF_RSC_TNI4_STEERINGTABLE4, TOF_RSC_TNI4_STEERINGTABLE5,
TOF_RSC_TNI4_STEERINGTABLE6, TOF_RSC_TNI4_STEERINGTABLE7, TOF_RSC_TNI4_STEERINGTABLE8,
TOF_RSC_TNI4_STEERINGTABLE9, TOF_RSC_TNI4_STEERINGTABLE10, TOF_RSC_TNI4_STEERINGTABLE11,
TOF_RSC_TNI5_STEERINGTABLE3, TOF_RSC_TNI5_STEERINGTABLE4, TOF_RSC_TNI5_STEERINGTABLE5,
TOF_RSC_TNI5_STEERINGTABLE6, TOF_RSC_TNI5_STEERINGTABLE7, TOF_RSC_TNI5_STEERINGTABLE8,
TOF_RSC_TNI5_STEERINGTABLE9, TOF_RSC_TNI5_STEERINGTABLE10, TOF_RSC_TNI5_STEERINGTABLE11,
/* MBTABLE (300 - 371) */
TOF_RSC_TNI0_MBTABLE0, TOF_RSC_TNI0_MBTABLE1, TOF_RSC_TNI0_MBTABLE2,
TOF_RSC_TNI0_MBTABLE3, TOF_RSC_TNI0_MBTABLE4, TOF_RSC_TNI0_MBTABLE5,
TOF_RSC_TNI0_MBTABLE6, TOF_RSC_TNI0_MBTABLE7, TOF_RSC_TNI0_MBTABLE8,
TOF_RSC_TNI0_MBTABLE9, TOF_RSC_TNI0_MBTABLE10, TOF_RSC_TNI0_MBTABLE11,
TOF_RSC_TNI1_MBTABLE0, TOF_RSC_TNI1_MBTABLE1, TOF_RSC_TNI1_MBTABLE2,
TOF_RSC_TNI1_MBTABLE3, TOF_RSC_TNI1_MBTABLE4, TOF_RSC_TNI1_MBTABLE5,
TOF_RSC_TNI1_MBTABLE6, TOF_RSC_TNI1_MBTABLE7, TOF_RSC_TNI1_MBTABLE8,
TOF_RSC_TNI1_MBTABLE9, TOF_RSC_TNI1_MBTABLE10, TOF_RSC_TNI1_MBTABLE11,
TOF_RSC_TNI2_MBTABLE0, TOF_RSC_TNI2_MBTABLE1, TOF_RSC_TNI2_MBTABLE2,
TOF_RSC_TNI2_MBTABLE3, TOF_RSC_TNI2_MBTABLE4, TOF_RSC_TNI2_MBTABLE5,
TOF_RSC_TNI2_MBTABLE6, TOF_RSC_TNI2_MBTABLE7, TOF_RSC_TNI2_MBTABLE8,
TOF_RSC_TNI2_MBTABLE9, TOF_RSC_TNI2_MBTABLE10, TOF_RSC_TNI2_MBTABLE11,
TOF_RSC_TNI3_MBTABLE0, TOF_RSC_TNI3_MBTABLE1, TOF_RSC_TNI3_MBTABLE2,
TOF_RSC_TNI3_MBTABLE3, TOF_RSC_TNI3_MBTABLE4, TOF_RSC_TNI3_MBTABLE5,
TOF_RSC_TNI3_MBTABLE6, TOF_RSC_TNI3_MBTABLE7, TOF_RSC_TNI3_MBTABLE8,
TOF_RSC_TNI3_MBTABLE9, TOF_RSC_TNI3_MBTABLE10, TOF_RSC_TNI3_MBTABLE11,
TOF_RSC_TNI4_MBTABLE0, TOF_RSC_TNI4_MBTABLE1, TOF_RSC_TNI4_MBTABLE2,
TOF_RSC_TNI4_MBTABLE3, TOF_RSC_TNI4_MBTABLE4, TOF_RSC_TNI4_MBTABLE5,
TOF_RSC_TNI4_MBTABLE6, TOF_RSC_TNI4_MBTABLE7, TOF_RSC_TNI4_MBTABLE8,
TOF_RSC_TNI4_MBTABLE9, TOF_RSC_TNI4_MBTABLE10, TOF_RSC_TNI4_MBTABLE11,
TOF_RSC_TNI5_MBTABLE0, TOF_RSC_TNI5_MBTABLE1, TOF_RSC_TNI5_MBTABLE2,
TOF_RSC_TNI5_MBTABLE3, TOF_RSC_TNI5_MBTABLE4, TOF_RSC_TNI5_MBTABLE5,
TOF_RSC_TNI5_MBTABLE6, TOF_RSC_TNI5_MBTABLE7, TOF_RSC_TNI5_MBTABLE8,
TOF_RSC_TNI5_MBTABLE9, TOF_RSC_TNI5_MBTABLE10, TOF_RSC_TNI5_MBTABLE11,
TOF_RSC_NUM /* 372 */
};
#define TOF_RSC_TOQ(TNI, CQID) (TOF_RSC_TNI0_TOQ0 + (TNI * 12) + CQID)
#define TOF_RSC_TCQ(TNI, CQID) (TOF_RSC_TNI0_TCQ0 + (TNI * 12) + CQID)
#define TOF_RSC_MRQ(TNI, CQID) (TOF_RSC_TNI0_MRQ0 + (TNI * 12) + CQID)
#define TOF_RSC_PBQ(TNI) (TOF_RSC_TNI0_PBQ + TNI)
#define TOF_RSC_PRQ(TNI) (TOF_RSC_TNI0_PRQ + TNI)
#define TOF_RSC_STT(TNI, CQID) (TOF_RSC_TNI0_STEERINGTABLE0 + (TNI * 12) + CQID)
#define TOF_RSC_MBT(TNI, CQID) (TOF_RSC_TNI0_MBTABLE0 + (TNI * 12) + CQID)
#endif
/* vim: set noet ts=8 sw=8 sts=0 tw=0 : */

View File

@ -0,0 +1,6 @@
struct {
bool enabled;
uint64_t bgmask[TOF_ICC_NTNIS];
uintptr_t iova;
void *kaddr;
} bch;

View File

@ -0,0 +1,6 @@
struct {
struct tof_utofu_trans_list *mru;
struct tof_trans_table *table;
int mruhead;
ihk_spinlock_t mru_lock;
} trans;

View File

@ -0,0 +1,21 @@
struct tof_utofu_bg {
union {
char whole_struct[160];
struct {
char padding0[0];
struct tof_utofu_device common;
};
struct {
char padding1[80];
uint8_t tni;
};
struct {
char padding2[81];
uint8_t bgid;
};
struct {
char padding3[88];
#include "tof_utofu_bg_bch.h"
};
};
};

View File

@ -0,0 +1,37 @@
struct tof_utofu_cq {
union {
char whole_struct[384];
struct {
char padding0[0];
struct tof_utofu_device common;
};
struct {
char padding1[80];
uint8_t tni;
};
struct {
char padding2[81];
uint8_t cqid;
};
struct {
char padding3[104];
#include "tof_utofu_cq_trans.h"
};
struct {
char padding4[128];
struct tof_icc_steering_entry *steering;
};
struct {
char padding5[136];
struct tof_icc_mb_entry *mb;
};
struct {
char padding6[186];
uint8_t num_stag;
};
struct {
char padding7[336];
struct mmu_notifier mn;
};
};
};

View File

@ -0,0 +1,17 @@
struct tof_utofu_device {
union {
char whole_struct[80];
struct {
char padding0[0];
bool enabled;
};
struct {
char padding1[12];
uint32_t gpid;
};
struct {
char padding2[24];
uint64_t subnet;
};
};
};

View File

@ -0,0 +1,33 @@
struct tof_utofu_mbpt {
union {
char whole_struct[56];
struct {
char padding0[0];
struct kref kref;
};
struct {
char padding1[8];
struct tof_utofu_cq *ucq;
};
struct {
char padding2[16];
uintptr_t iova;
};
struct {
char padding3[24];
struct scatterlist *sg;
};
struct {
char padding4[32];
size_t nsgents;
};
struct {
char padding5[40];
uintptr_t mbptstart;
};
struct {
char padding6[48];
size_t pgsz;
};
};
};

View File

@ -0,0 +1,51 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
#include <cpu.h>
/*@
@ assigns \nothing;
@ behavior to_enabled:
@ assumes flags & RFLAGS_IF;
@ ensures \interrupt_disabled == 0;
@ behavior to_disabled:
@ assumes !(flags & RFLAGS_IF);
@ ensures \interrupt_disabled > 0;
@*/
void cpu_restore_interrupt(unsigned long flags)
{
asm volatile("push %0; popf" : : "g"(flags) : "memory", "cc");
}
void cpu_pause(void)
{
asm volatile("pause" ::: "memory");
}
/*@
@ assigns \nothing;
@ ensures \interrupt_disabled > 0;
@ behavior from_enabled:
@ assumes \interrupt_disabled == 0;
@ ensures \result & RFLAGS_IF;
@ behavior from_disabled:
@ assumes \interrupt_disabled > 0;
@ ensures !(\result & RFLAGS_IF);
@*/
unsigned long cpu_disable_interrupt_save(void)
{
unsigned long flags;
asm volatile("pushf; pop %0; cli" : "=r"(flags) : : "memory", "cc");
return flags;
}
unsigned long cpu_enable_interrupt_save(void)
{
unsigned long flags;
asm volatile("pushf; pop %0; sti" : "=r"(flags) : : "memory", "cc");
return flags;
}

View File

@ -0,0 +1,106 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
#ifndef __HEADER_X86_COMMON_ARCH_LOCK
#define __HEADER_X86_COMMON_ARCH_LOCK
#include <linux/preempt.h>
#include <cpu.h>
#define ihk_mc_spinlock_lock __ihk_mc_spinlock_lock
#define ihk_mc_spinlock_unlock __ihk_mc_spinlock_unlock
#define ihk_mc_spinlock_lock_noirq __ihk_mc_spinlock_lock_noirq
#define ihk_mc_spinlock_unlock_noirq __ihk_mc_spinlock_unlock_noirq
typedef unsigned short __ticket_t;
typedef unsigned int __ticketpair_t;
/* arch/x86/include/asm/spinlock_types.h defines struct __raw_tickets */
typedef struct ihk_spinlock {
union {
__ticketpair_t head_tail;
struct ihk__raw_tickets {
__ticket_t head, tail;
} tickets;
};
} _ihk_spinlock_t;
static inline void ihk_mc_spinlock_init(_ihk_spinlock_t *lock)
{
lock->head_tail = 0;
}
static inline void __ihk_mc_spinlock_lock_noirq(_ihk_spinlock_t *lock)
{
register struct ihk__raw_tickets inc = { .tail = 0x0002 };
preempt_disable();
asm volatile ("lock xaddl %0, %1\n"
: "+r" (inc), "+m" (*(lock)) : : "memory", "cc");
if (inc.head == inc.tail)
goto out;
for (;;) {
if (*((volatile __ticket_t *)&lock->tickets.head) == inc.tail)
goto out;
cpu_pause();
}
out:
barrier(); /* make sure nothing creeps before the lock is taken */
}
static inline void __ihk_mc_spinlock_unlock_noirq(_ihk_spinlock_t *lock)
{
__ticket_t inc = 0x0002;
asm volatile ("lock addw %1, %0\n"
: "+m" (lock->tickets.head)
: "ri" (inc) : "memory", "cc");
preempt_enable();
}
static inline unsigned long __ihk_mc_spinlock_lock(_ihk_spinlock_t *lock)
{
unsigned long flags;
flags = cpu_disable_interrupt_save();
__ihk_mc_spinlock_lock_noirq(lock);
return flags;
}
static inline void __ihk_mc_spinlock_unlock(_ihk_spinlock_t *lock,
unsigned long flags)
{
__ihk_mc_spinlock_unlock_noirq(lock);
cpu_restore_interrupt(flags);
}
typedef struct mcs_rwlock_lock {
_ihk_spinlock_t slock;
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_rwlock_lock_t;
#else
} mcs_rwlock_lock_t;
#endif
static inline void
mcs_rwlock_writer_lock_noirq(struct mcs_rwlock_lock *lock)
{
ihk_mc_spinlock_lock_noirq(&lock->slock);
}
static inline void
mcs_rwlock_writer_unlock_noirq(struct mcs_rwlock_lock *lock)
{
ihk_mc_spinlock_unlock_noirq(&lock->slock);
}
#endif

View File

@ -23,4 +23,26 @@ static const unsigned long arch_rus_vm_flags = VM_RESERVED | VM_MIXEDMAP;
#else
static const unsigned long arch_rus_vm_flags = VM_DONTDUMP | VM_MIXEDMAP;
#endif
#define xchg4(ptr, x) \
({ \
int __x = (x); \
asm volatile("xchgl %k0,%1" \
: "=r" (__x) \
: "m" (*ptr), "0" (__x) \
: "memory"); \
__x; \
})
enum x86_pf_error_code {
PF_PROT = 1 << 0,
PF_WRITE = 1 << 1,
PF_USER = 1 << 2,
PF_RSVD = 1 << 3,
PF_INSTR = 1 << 4,
PF_PATCH = 1 << 29,
PF_POPULATE = 1 << 30,
};
#endif /* __HEADER_MCCTRL_X86_64_ARCHDEPS_H */

View File

@ -36,6 +36,7 @@
#include <linux/semaphore.h>
#include <linux/interrupt.h>
#include <linux/cpumask.h>
#include <linux/delay.h>
#include <asm/uaccess.h>
#include <asm/delay.h>
#include <asm/io.h>
@ -49,6 +50,8 @@
#include <uapi/linux/sched/types.h>
#endif
#include <archdeps.h>
#include <uti.h>
#include <futex.h>
//#define DEBUG
@ -229,6 +232,9 @@ static long mcexec_prepare_image(ihk_os_t os,
dprintk("%s: pid %d, rpgtable: 0x%lx added\n",
__FUNCTION__, ppd->pid, ppd->rpgtable);
#ifdef ENABLE_TOFU
ppd->enable_tofu = pdesc->enable_tofu;
#endif
ret = 0;
@ -266,18 +272,24 @@ int mcexec_transfer_image(ihk_os_t os, struct remote_transfer *__user upt)
return -EFAULT;
}
#ifdef CONFIG_MIC
if (pt.size > PAGE_SIZE) {
printk("mcexec_transfer_image(): ERROR: size exceeds PAGE_SIZE\n");
return -EFAULT;
}
phys = ihk_device_map_memory(ihk_os_to_dev(os), pt.rphys, PAGE_SIZE);
#ifdef CONFIG_MIC
rpm = ioremap_wc(phys, PAGE_SIZE);
#else
rpm = ihk_device_map_virtual(ihk_os_to_dev(os), phys, PAGE_SIZE, NULL, 0);
phys = ihk_device_map_memory(ihk_os_to_dev(os), pt.rphys, pt.size);
rpm = ihk_device_map_virtual(ihk_os_to_dev(os), phys, pt.size, NULL, 0);
#endif
if (!rpm) {
pr_err("%s(): error: invalid remote address\n", __func__);
return -EFAULT;
}
if (pt.direction == MCEXEC_UP_TRANSFER_TO_REMOTE) {
if (copy_from_user(rpm, pt.userp, pt.size)) {
ret = -EFAULT;
@ -295,10 +307,11 @@ int mcexec_transfer_image(ihk_os_t os, struct remote_transfer *__user upt)
#ifdef CONFIG_MIC
iounmap(rpm);
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, PAGE_SIZE);
#else
ihk_device_unmap_virtual(ihk_os_to_dev(os), rpm, PAGE_SIZE);
ihk_device_unmap_virtual(ihk_os_to_dev(os), rpm, pt.size);
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, pt.size);
#endif
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, PAGE_SIZE);
return ret;
@ -378,6 +391,7 @@ static void release_handler(ihk_os_t os, void *param)
int os_ind = ihk_host_os_get_index(os);
unsigned long flags;
struct host_thread *thread;
int ret;
/* Finalize FS switch for uti threads */
write_lock_irqsave(&host_thread_lock, flags);
@ -399,7 +413,13 @@ static void release_handler(ihk_os_t os, void *param)
dprintk("%s: SCD_MSG_CLEANUP_PROCESS, info: %p, cpu: %d\n",
__FUNCTION__, info, info->cpu);
mcctrl_ikc_send(os, info->cpu, &isp);
ret = mcctrl_ikc_send_wait(os, info->cpu,
&isp, -20, NULL, NULL, 0);
if (ret != 0) {
printk("%s: WARNING: failed to send IKC msg: %d\n",
__func__, ret);
}
if (os_ind >= 0) {
delete_pid_entry(os_ind, info->pid);
}
@ -587,13 +607,14 @@ extern int mckernel_cpu_2_linux_cpu(struct mcctrl_usrdata *udp, int cpu_id);
static long mcexec_get_cpuset(ihk_os_t os, unsigned long arg)
{
struct mcctrl_usrdata *udp = ihk_host_os_get_usrdata(os);
struct mcctrl_part_exec *pe;
struct mcctrl_part_exec *pe = NULL, *pe_itr;
struct get_cpu_set_arg req;
struct mcctrl_cpu_topology *cpu_top, *cpu_top_i;
struct cache_topology *cache_top;
int cpu, cpus_assigned, cpus_to_assign, cpu_prev;
int ret = 0;
int mcexec_linux_numa;
int pe_list_len = 0;
cpumask_t *mcexec_cpu_set = NULL;
cpumask_t *cpus_used = NULL;
cpumask_t *cpus_to_use = NULL;
@ -614,7 +635,7 @@ static long mcexec_get_cpuset(ihk_os_t os, unsigned long arg)
}
if (copy_from_user(&req, (void *)arg, sizeof(req))) {
printk("%s: error copying user request\n", __FUNCTION__);
pr_err("%s: error copying user request\n", __func__);
ret = -EINVAL;
goto put_out;
}
@ -691,18 +712,48 @@ static long mcexec_get_cpuset(ihk_os_t os, unsigned long arg)
goto put_out;
}
pe = &udp->part_exec;
mutex_lock(&udp->part_exec_lock);
/* Find part_exec having same node_proxy */
list_for_each_entry_reverse(pe_itr, &udp->part_exec_list, chain) {
pe_list_len++;
if (pe_itr->node_proxy_pid == req.ppid) {
pe = pe_itr;
break;
}
}
mutex_lock(&pe->lock);
if (!pe) {
/* First process to enter CPU partitioning */
pr_debug("%s: pe_list_len:%d\n", __func__, pe_list_len);
if (pe_list_len >= PE_LIST_MAXLEN) {
/* delete head entry of pe_list */
pe_itr = list_first_entry(&udp->part_exec_list,
struct mcctrl_part_exec, chain);
list_del(&pe_itr->chain);
kfree(pe_itr);
}
/* First process to enter CPU partitioning */
if (pe->nr_processes == -1) {
pe = kzalloc(sizeof(struct mcctrl_part_exec), GFP_KERNEL);
if (!pe) {
mutex_unlock(&udp->part_exec_lock);
ret = -ENOMEM;
goto put_out;
}
/* Init part_exec */
mutex_init(&pe->lock);
INIT_LIST_HEAD(&pe->pli_list);
pe->nr_processes = req.nr_processes;
pe->nr_processes_left = req.nr_processes;
pe->nr_processes_joined = 0;
pe->node_proxy_pid = req.ppid;
list_add_tail(&pe->chain, &udp->part_exec_list);
dprintk("%s: nr_processes: %d (partitioned exec starts)\n",
__FUNCTION__,
pe->nr_processes);
__func__, pe->nr_processes);
}
mutex_unlock(&udp->part_exec_lock);
mutex_lock(&pe->lock);
if (pe->nr_processes != req.nr_processes) {
printk("%s: error: requested number of processes"
@ -712,7 +763,15 @@ static long mcexec_get_cpuset(ihk_os_t os, unsigned long arg)
goto put_and_unlock_out;
}
if (pe->nr_processes_joined >= pe->nr_processes) {
printk("%s: too many processes have joined to the group of %d\n",
__func__, req.ppid);
ret = -EINVAL;
goto put_and_unlock_out;
}
--pe->nr_processes_left;
++pe->nr_processes_joined;
dprintk("%s: nr_processes: %d, nr_processes_left: %d\n",
__FUNCTION__,
pe->nr_processes,
@ -798,8 +857,6 @@ static long mcexec_get_cpuset(ihk_os_t os, unsigned long arg)
wake_up_interruptible(&pli_next->pli_wq);
}
/* Reset process counter to start state */
pe->nr_processes = -1;
ret = -ETIMEDOUT;
goto put_and_unlock_out;
}
@ -1047,16 +1104,8 @@ next_cpu:
/* Commit used cores to OS structure */
memcpy(&pe->cpus_used, cpus_used, sizeof(*cpus_used));
/* Reset if last process */
if (pe->nr_processes_left == 0) {
dprintk("%s: nr_processes: %d (partitioned exec ends)\n",
__FUNCTION__,
pe->nr_processes);
pe->nr_processes = -1;
memset(&pe->cpus_used, 0, sizeof(pe->cpus_used));
}
/* Otherwise wake up next process in list */
else {
/* If not last process, wake up next process in list */
if (pe->nr_processes_left != 0) {
++pe->process_rank;
pli_next = list_first_entry(&pe->pli_list,
struct process_list_item, list);
@ -1221,7 +1270,7 @@ void mcctrl_put_per_proc_data(struct mcctrl_per_proc_data *ppd)
process is gone and the application should be terminated. */
packet = (struct ikc_scd_packet *)ptd->data;
dprintk("%s: calling __return_syscall (hash),target pid=%d,tid=%d\n", __FUNCTION__, ppd->pid, packet->req.rtid);
__return_syscall(ppd->ud->os, packet, -ERESTARTSYS,
__return_syscall(ppd->ud->os, ppd, packet, -ERESTARTSYS,
packet->req.rtid);
ihk_ikc_release_packet((struct ihk_ikc_free_packet *)packet);
@ -1245,13 +1294,14 @@ void mcctrl_put_per_proc_data(struct mcctrl_per_proc_data *ppd)
/* We use ERESTARTSYS to tell the LWK that the proxy
* process is gone and the application should be terminated */
__return_syscall(ppd->ud->os, packet, -ERESTARTSYS,
__return_syscall(ppd->ud->os, ppd, packet, -ERESTARTSYS,
packet->req.rtid);
ihk_ikc_release_packet((struct ihk_ikc_free_packet *)packet);
}
ihk_ikc_spinlock_unlock(&ppd->wq_list_lock, flags);
pager_remove_process(ppd);
futex_remove_process(ppd);
kfree(ppd);
}
@ -1286,7 +1336,7 @@ int mcexec_syscall(struct mcctrl_usrdata *ud, struct ikc_scd_packet *packet)
/* We use ERESTARTSYS to tell the LWK that the proxy
* process is gone and the application should be terminated */
__return_syscall(ud->os, packet, -ERESTARTSYS,
__return_syscall(ud->os, NULL, packet, -ERESTARTSYS,
packet->req.rtid);
ihk_ikc_release_packet((struct ihk_ikc_free_packet *)packet);
@ -1729,7 +1779,7 @@ long mcexec_ret_syscall(ihk_os_t os, struct syscall_ret_desc *__user arg)
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, ret.size);
}
__return_syscall(os, packet, ret.ret, task_pid_vnr(current));
__return_syscall(os, ppd, packet, ret.ret, task_pid_vnr(current));
error = 0;
out:
@ -1852,6 +1902,7 @@ int mcexec_create_per_process_data(ihk_os_t os,
spin_lock_init(&ppd->wq_list_lock);
memset(&ppd->cpu_set, 0, sizeof(cpumask_t));
ppd->ikc_target_cpu = 0;
ppd->rva_to_rpa_cache = RB_ROOT;
/* Final ref will be dropped in release_handler() through
* mcexec_destroy_per_process_data() */
atomic_set(&ppd->refcount, 1);
@ -2172,7 +2223,13 @@ static DECLARE_WAIT_QUEUE_HEAD(perfctrlq);
long mcctrl_perf_num(ihk_os_t os, unsigned long arg)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata;
if (!os || ihk_host_validate_os(os)) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (!usrdata) {
pr_err("%s: error: mcctrl_usrdata not found\n", __func__);
@ -2197,22 +2254,34 @@ struct mcctrl_perf_ctrl_desc {
*/
long mcctrl_perf_set(ihk_os_t os, struct ihk_perf_event_attr *__user arg)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata = NULL;
struct ikc_scd_packet isp;
struct perf_ctrl_desc *perf_desc;
struct ihk_perf_event_attr attr;
struct ihk_cpu_info *info = ihk_os_get_cpu_info(os);
struct ihk_cpu_info *info = NULL;
int ret = 0;
int i = 0, j = 0;
int need_free;
int num_registered = 0;
int err = 0;
if (!os || ihk_host_validate_os(os)) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (!usrdata) {
pr_err("%s: error: mcctrl_usrdata not found\n", __func__);
return -EINVAL;
}
info = ihk_os_get_cpu_info(os);
if (!info) {
pr_err("%s: error: cannot get cpu info\n", __func__);
return -EINVAL;
}
for (i = 0; i < usrdata->perf_event_num; i++) {
ret = copy_from_user(&attr, &arg[i],
sizeof(struct ihk_perf_event_attr));
@ -2272,20 +2341,30 @@ long mcctrl_perf_set(ihk_os_t os, struct ihk_perf_event_attr *__user arg)
long mcctrl_perf_get(ihk_os_t os, unsigned long *__user arg)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata = NULL;
struct ikc_scd_packet isp;
struct perf_ctrl_desc *perf_desc;
struct ihk_cpu_info *info = ihk_os_get_cpu_info(os);
struct ihk_cpu_info *info = NULL;
unsigned long value_sum = 0;
int ret = 0;
int i = 0, j = 0;
int need_free;
if (!os || ihk_host_validate_os(os)) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (!usrdata) {
pr_err("%s: error: mcctrl_usrdata not found\n", __func__);
return -EINVAL;
}
info = ihk_os_get_cpu_info(os);
if (!info || info->n_cpus < 1) {
return -EINVAL;
}
for (i = 0; i < usrdata->perf_event_num; i++) {
perf_desc = kmalloc(sizeof(struct mcctrl_perf_ctrl_desc),
GFP_KERNEL);
@ -2333,15 +2412,20 @@ long mcctrl_perf_get(ihk_os_t os, unsigned long *__user arg)
long mcctrl_perf_enable(ihk_os_t os)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata = NULL;
struct ikc_scd_packet isp;
struct perf_ctrl_desc *perf_desc;
struct ihk_cpu_info *info = ihk_os_get_cpu_info(os);
struct ihk_cpu_info *info = NULL;
unsigned long cntr_mask = 0;
int ret = 0;
int i = 0, j = 0;
int need_free;
if (!os || ihk_host_validate_os(os)) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (!usrdata) {
pr_err("%s: error: mcctrl_usrdata not found\n", __func__);
return -EINVAL;
@ -2364,6 +2448,11 @@ long mcctrl_perf_enable(ihk_os_t os)
isp.msg = SCD_MSG_PERF_CTRL;
isp.arg = virt_to_phys(perf_desc);
info = ihk_os_get_cpu_info(os);
if (!info || info->n_cpus < 1) {
kfree(perf_desc);
return -EINVAL;
}
for (j = 0; j < info->n_cpus; j++) {
ret = mcctrl_ikc_send_wait(os, j, &isp, 0,
wakeup_desc_of_perf_desc(perf_desc),
@ -2391,15 +2480,20 @@ long mcctrl_perf_enable(ihk_os_t os)
long mcctrl_perf_disable(ihk_os_t os)
{
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
struct mcctrl_usrdata *usrdata = NULL;
struct ikc_scd_packet isp;
struct perf_ctrl_desc *perf_desc;
struct ihk_cpu_info *info = ihk_os_get_cpu_info(os);
struct ihk_cpu_info *info = NULL;
unsigned long cntr_mask = 0;
int ret = 0;
int i = 0, j = 0;
int need_free;
if (!os || ihk_host_validate_os(os)) {
return -EINVAL;
}
usrdata = ihk_host_os_get_usrdata(os);
if (!usrdata) {
pr_err("%s: error: mcctrl_usrdata not found\n", __func__);
return -EINVAL;
@ -2422,6 +2516,11 @@ long mcctrl_perf_disable(ihk_os_t os)
isp.msg = SCD_MSG_PERF_CTRL;
isp.arg = virt_to_phys(perf_desc);
info = ihk_os_get_cpu_info(os);
if (!info || info->n_cpus < 1) {
kfree(perf_desc);
return -EINVAL;
}
for (j = 0; j < info->n_cpus; j++) {
ret = mcctrl_ikc_send_wait(os, j, &isp, 0,
wakeup_desc_of_perf_desc(perf_desc),
@ -2463,6 +2562,10 @@ long mcctrl_getrusage(ihk_os_t ihk_os, struct mcctrl_ioctl_getrusage_desc *__use
unsigned long ut;
unsigned long st;
if (!ihk_os || ihk_host_validate_os(ihk_os)) {
return -EINVAL;
}
ret = copy_from_user(&desc, _desc, sizeof(struct mcctrl_ioctl_getrusage_desc));
if (ret != 0) {
printk("%s: copy_from_user failed\n", __FUNCTION__);
@ -2705,7 +2808,7 @@ static long mcexec_terminate_thread_unsafe(ihk_os_t os, int pid, int tid, long c
__FUNCTION__, tid);
goto no_ptd;
}
__return_syscall(usrdata->os, packet, code, tid);
__return_syscall(usrdata->os, ppd, packet, code, tid);
ihk_ikc_release_packet((struct ihk_ikc_free_packet *)packet);
/* Drop reference for this function */
@ -2792,57 +2895,28 @@ static long mcexec_release_user_space(struct release_user_space_desc *__user arg
#endif
}
static long (*mckernel_do_futex)(int n, unsigned long arg0, unsigned long arg1,
unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5,
unsigned long _uti_clv,
void *uti_futex_resp,
void *_linux_wait_event,
void *_linux_printk,
void *_linux_clock_gettime);
/* Convert phys_addr to virt_addr on Linux */
static void
uti_info_p2v(struct uti_info *info)
{
info->uti_futex_resp =
(void *)phys_to_virt(info->uti_futex_resp_pa);
info->ikc2linux =
(void *)phys_to_virt(info->ikc2linux_pa);
long uti_wait_event(void *_resp, unsigned long nsec_timeout) {
struct uti_futex_resp *resp = _resp;
if (nsec_timeout) {
return wait_event_interruptible_timeout(resp->wq, resp->done, nsecs_to_jiffies(nsec_timeout));
} else {
return wait_event_interruptible(resp->wq, resp->done);
}
}
info->status =
(void *)phys_to_virt(info->status_pa);
info->spin_sleep_lock =
(void *)phys_to_virt(info->spin_sleep_lock_pa);
info->spin_sleep =
(void *)phys_to_virt(info->spin_sleep_pa);
info->vm =
(void *)phys_to_virt(info->vm_pa);
info->futex_q =
(void *)phys_to_virt(info->futex_q_pa);
int uti_printk(const char *fmt, ...) {
int sum = 0, nwritten;
va_list args;
va_start(args, fmt);
nwritten = vprintk(fmt, args);
sum += nwritten;
va_end(args);
return sum;
}
int uti_clock_gettime(clockid_t clk_id, struct timespec *tp) {
int ret = 0;
struct timespec64 ts64;
dprintk("%s: clk_id=%x,REALTIME=%x,MONOTONIC=%x\n", __FUNCTION__, clk_id, CLOCK_REALTIME, CLOCK_MONOTONIC);
switch(clk_id) {
case CLOCK_REALTIME:
getnstimeofday64(&ts64);
tp->tv_sec = ts64.tv_sec;
tp->tv_nsec = ts64.tv_nsec;
dprintk("%s: CLOCK_REALTIME,%ld.%09ld\n", __FUNCTION__, tp->tv_sec, tp->tv_nsec);
break;
case CLOCK_MONOTONIC: {
/* Do not use getrawmonotonic() because it returns different value than clock_gettime() */
ktime_get_ts64(&ts64);
tp->tv_sec = ts64.tv_sec;
tp->tv_nsec = ts64.tv_nsec;
dprintk("%s: CLOCK_MONOTONIC,%ld.%09ld\n", __FUNCTION__, tp->tv_sec, tp->tv_nsec);
break; }
default:
ret = -EINVAL;
break;
}
return ret;
info->futex_queue =
(void *)phys_to_virt(info->futex_queue_pa);
}
long mcexec_syscall_thread(ihk_os_t os, unsigned long arg, struct file *file)
@ -2851,36 +2925,38 @@ long mcexec_syscall_thread(ihk_os_t os, unsigned long arg, struct file *file)
int number;
unsigned long args[6];
unsigned long ret;
unsigned long uti_clv; /* copy of a clv in McKernel */
unsigned long uti_info; /* reference to data in McKernel */
};
struct syscall_struct param;
struct syscall_struct __user *uparam =
(struct syscall_struct __user *)arg;
long rc;
if (copy_from_user(&param, uparam, sizeof param)) {
return -EFAULT;
}
if (param.number == __NR_futex) {
struct uti_futex_resp resp = {
.done = 0
};
init_waitqueue_head(&resp.wq);
if (!mckernel_do_futex) {
if (ihk_os_get_special_address(os, IHK_SPADDR_MCKERNEL_DO_FUTEX,
(unsigned long *)&mckernel_do_futex,
NULL)) {
kprintf("%s: ihk_os_get_special_address failed\n", __FUNCTION__);
return -EINVAL;
}
dprintk("%s: mckernel_do_futex=%p\n", __FUNCTION__, mckernel_do_futex);
}
struct uti_info *_uti_info = NULL;
init_waitqueue_head(&resp.wq);
_uti_info = (struct uti_info *)param.uti_info;
/* Convert phys_addr to virt_addr on Linux */
uti_info_p2v(_uti_info);
_uti_info->os = (void *)os;
rc = do_futex(param.number, param.args[0],
param.args[1], param.args[2],
param.args[3], param.args[4], param.args[5],
(struct uti_info *)param.uti_info,
(void *)&resp);
rc = (*mckernel_do_futex)(param.number, param.args[0], param.args[1], param.args[2],
param.args[3], param.args[4], param.args[5], param.uti_clv, (void *)&resp, (void *)uti_wait_event, (void *)uti_printk, (void *)uti_clock_gettime);
param.ret = rc;
} else {
struct mcctrl_usrdata *usrdata = ihk_host_os_get_usrdata(os);
@ -2940,6 +3016,8 @@ void mcctrl_futex_wake(struct ikc_scd_packet *pisp)
}
resp->done = 1;
dprintk("%s: cpu: %d\n", __func__, ihk_ikc_get_processor_id());
wake_up_interruptible(&resp->wq);
}
@ -3054,7 +3132,7 @@ static long
mcexec_uti_attr(ihk_os_t os, struct uti_attr_desc __user *_desc)
{
struct uti_attr_desc desc;
char *uti_cpu_set_str;
char *uti_cpu_set_str = NULL;
struct kuti_attr *kattr;
cpumask_t *cpuset = NULL, *env_cpuset = NULL;
struct mcctrl_usrdata *ud = ihk_host_os_get_usrdata(os);
@ -3089,22 +3167,33 @@ mcexec_uti_attr(ihk_os_t os, struct uti_attr_desc __user *_desc)
goto out;
}
if (!(uti_cpu_set_str = kmalloc(desc.uti_cpu_set_len, GFP_KERNEL))) {
pr_err("%s: error: allocating uti_cpu_set_str\n",
__func__);
rc = -ENOMEM;
goto out;
}
if (desc.uti_cpu_set_str) {
if (!(uti_cpu_set_str = kmalloc(desc.uti_cpu_set_len, GFP_KERNEL))) {
pr_err("%s: error: allocating uti_cpu_set_str\n",
__func__);
rc = -ENOMEM;
goto out;
}
if ((rc = copy_from_user(uti_cpu_set_str, desc.uti_cpu_set_str, desc.uti_cpu_set_len))) {
pr_err("%s: error: copy_from_user\n",
__func__);
rc = -EFAULT;
goto out;
if ((rc = copy_from_user(uti_cpu_set_str, desc.uti_cpu_set_str, desc.uti_cpu_set_len))) {
pr_err("%s: error: copy_from_user\n",
__func__);
rc = -EFAULT;
goto out;
}
}
kattr = phys_to_virt(desc.phys_attr);
{
int i;
pr_info("%s: flag: %lx\n", __func__, (unsigned long)kattr->attr.flags);
for (i = 0; i < UTI_MAX_NUMA_DOMAINS; i+= 64) {
kprintf("%s: numa_set[%d]: %lx\n", __func__, i, (unsigned long)kattr->attr.numa_set[i / 64]);
}
}
/* Find caller cpu for later resolution of subgroups */
list_for_each_entry(cpu_topo, &ud->cpu_topology_list, chain) {
if (cpu_topo->mckernel_cpu_id == kattr->parent_cpuid) {
@ -3451,7 +3540,7 @@ int mcctrl_get_request_os_cpu(ihk_os_t os, int *ret_cpu)
struct ihk_ikc_channel_desc *ch;
int ret = 0;
if (!os) {
if (!os || ihk_host_validate_os(os) || !ret_cpu) {
return -EINVAL;
}
@ -3493,7 +3582,11 @@ int mcctrl_get_request_os_cpu(ihk_os_t os, int *ret_cpu)
*ret_cpu = ch->send.queue->read_cpu;
ret = 0;
#ifndef ENABLE_FUGAKU_HACKS
pr_info("%s: OS: %lx, CPU: %d\n",
#else
dprintk("%s: OS: %lx, CPU: %d\n",
#endif
__func__, (unsigned long)os, *ret_cpu);
out_put_ppd:
@ -3543,7 +3636,8 @@ int __mcctrl_os_read_write_cpu_register(ihk_os_t os, int cpu,
isp.op = op;
isp.pdesc = virt_to_phys(ldesc);
ret = mcctrl_ikc_send_wait(os, cpu, &isp, 0, NULL, &do_free, 1, ldesc);
/* 1 sec timeout for the case where McKernel can't respond */
ret = mcctrl_ikc_send_wait(os, cpu, &isp, -1000, NULL, &do_free, 1, ldesc);
if (ret != 0) {
printk("%s: ERROR sending IKC msg: %d\n", __FUNCTION__, ret);
goto out;
@ -3557,7 +3651,11 @@ int __mcctrl_os_read_write_cpu_register(ihk_os_t os, int cpu,
/* Notify caller (for future async implementation) */
atomic_set(&desc->sync, 1);
#ifndef ENABLE_FUGAKU_HACKS
dprintk("%s: MCCTRL_OS_CPU_%s_REGISTER: CPU: %d, addr_ext: 0x%lx, val: 0x%lx\n",
#else
printk("%s: MCCTRL_OS_CPU_%s_REGISTER: CPU: %d, addr_ext: 0x%lx, val: 0x%lx\n",
#endif
__FUNCTION__,
(op == MCCTRL_OS_CPU_READ_REGISTER ? "READ" : "WRITE"), cpu,
desc->addr_ext, desc->val);

View File

@ -50,6 +50,9 @@ extern void procfs_exit(int);
extern void uti_attr_finalize(void);
extern void binfmt_mcexec_init(void);
extern void binfmt_mcexec_exit(void);
#ifdef ENABLE_TOFU
extern void mcctrl_file_to_pidfd_hash_init(void);
#endif
extern int mcctrl_os_read_cpu_register(ihk_os_t os, int cpu,
struct ihk_os_cpu_register *desc);
@ -57,6 +60,11 @@ extern int mcctrl_os_write_cpu_register(ihk_os_t os, int cpu,
struct ihk_os_cpu_register *desc);
extern int mcctrl_get_request_os_cpu(ihk_os_t os, int *cpu);
#ifdef ENABLE_TOFU
extern void mcctrl_tofu_hijack_release_handlers(void);
extern void mcctrl_tofu_restore_release_handlers(void);
#endif
static long mcctrl_ioctl(ihk_os_t os, unsigned int request, void *priv,
unsigned long arg, struct file *file)
{
@ -227,7 +235,6 @@ void (*mcctrl_zap_page_range)(struct vm_area_struct *vma,
struct inode_operations *mcctrl_hugetlbfs_inode_operations;
static int symbols_init(void)
{
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,17,0)
@ -319,10 +326,17 @@ static int __init mcctrl_init(void)
}
binfmt_mcexec_init();
#ifdef ENABLE_TOFU
mcctrl_file_to_pidfd_hash_init();
#endif
if ((ret = symbols_init()))
goto error;
#ifdef ENABLE_TOFU
mcctrl_tofu_hijack_release_handlers();
#endif
if ((ret = ihk_host_register_os_notifier(&mcctrl_os_notifier)) != 0) {
printk("mcctrl: error: registering OS notifier\n");
goto error;
@ -345,6 +359,9 @@ static void __exit mcctrl_exit(void)
binfmt_mcexec_exit();
uti_attr_finalize();
#ifdef ENABLE_TOFU
mcctrl_tofu_restore_release_handlers();
#endif
printk("mcctrl: unregistered.\n");
}

File diff suppressed because it is too large Load Diff

View File

@ -142,13 +142,35 @@ int mcctrl_ikc_send_wait(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp,
ret = mcctrl_ikc_send(os, cpu, pisp);
if (ret < 0) {
pr_warn("%s: mcctrl_ikc_send failed: %d\n", __func__, ret);
kfree(desc);
if (alloc_desc)
kfree(desc);
return ret;
}
if (timeout) {
ret = wait_event_interruptible_timeout(desc->wq,
desc->status, timeout);
/*
* Negative timeout indicates busy waiting, which can be used
* in situations where wait_event_interruptible_XXX() would
* fail, e.g., in a signal handler, at the time the process
* is being killed, etc.
*/
if (timeout < 0) {
unsigned long timeout_jiffies =
jiffies + msecs_to_jiffies(timeout * -1);
ret = -ETIME;
while (time_before(jiffies, timeout_jiffies)) {
schedule();
if (READ_ONCE(desc->status)) {
ret = 0;
break;
}
}
}
else {
ret = wait_event_interruptible_timeout(desc->wq,
desc->status, msecs_to_jiffies(timeout));
}
} else {
ret = wait_event_interruptible(desc->wq, desc->status);
}
@ -210,6 +232,8 @@ static int syscall_packet_handler(struct ihk_ikc_channel_desc *c,
case SCD_MSG_PROCFS_ANSWER:
case SCD_MSG_REMOTE_PAGE_FAULT_ANSWER:
case SCD_MSG_CPU_RW_REG_RESP:
case SCD_MSG_CLEANUP_PROCESS_RESP:
case SCD_MSG_CLEANUP_FD_RESP:
mcctrl_wakeup_cb(__os, pisp);
break;
@ -280,7 +304,11 @@ int mcctrl_ikc_send(ihk_os_t os, int cpu, struct ikc_scd_packet *pisp)
{
struct mcctrl_usrdata *usrdata;
if (!os || cpu < 0) {
if (!os || ihk_host_validate_os(os) || !pisp) {
return -EINVAL;
}
if (cpu < 0) {
return -EINVAL;
}
@ -508,11 +536,9 @@ int prepare_ikc_channels(ihk_os_t os)
usrdata->os = os;
ihk_host_os_set_usrdata(os, usrdata);
ihk_ikc_listen_port(os, &lp_ikc2linux);
ihk_ikc_listen_port(os, &lp_ikc2mckernel);
init_waitqueue_head(&usrdata->wq_procfs);
mutex_init(&usrdata->reserve_lock);
mutex_init(&usrdata->part_exec_lock);
for (i = 0; i < MCCTRL_PER_PROC_DATA_HASH_SIZE; ++i) {
INIT_LIST_HEAD(&usrdata->per_proc_data_hash[i]);
@ -521,13 +547,21 @@ int prepare_ikc_channels(ihk_os_t os)
INIT_LIST_HEAD(&usrdata->cpu_topology_list);
INIT_LIST_HEAD(&usrdata->node_topology_list);
INIT_LIST_HEAD(&usrdata->part_exec_list);
mutex_init(&usrdata->part_exec.lock);
INIT_LIST_HEAD(&usrdata->part_exec.pli_list);
usrdata->part_exec.nr_processes = -1;
INIT_LIST_HEAD(&usrdata->wakeup_descs_list);
spin_lock_init(&usrdata->wakeup_descs_lock);
/* ihk_ikc_listen_port should be performed after
* usrdata->cpu_topology_list is initialized because the
* function enables syscall_packet_handler which accesses
* the list (the call path is sysfsm_packet_handler -->
* sysfsm_work_main --> sysfsm_setup --> setup_sysfs_files
* --> setup_cpus_sysfs_files).
*/
ihk_ikc_listen_port(os, &lp_ikc2linux);
ihk_ikc_listen_port(os, &lp_ikc2mckernel);
return 0;
error:
@ -580,6 +614,18 @@ void destroy_ikc_channels(ihk_os_t os)
kfree(usrdata->channels);
kfree(usrdata->ikc2linux);
mutex_lock(&usrdata->part_exec_lock);
while (!list_empty(&usrdata->part_exec_list)) {
struct mcctrl_part_exec *pe;
pe = list_first_entry(&usrdata->part_exec_list,
struct mcctrl_part_exec, chain);
list_del(&pe->chain);
kfree(pe);
}
mutex_unlock(&usrdata->part_exec_lock);
kfree(usrdata);
}

View File

@ -0,0 +1,10 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
#ifndef MC_CPU_H
#define MC_CPU_H
void cpu_restore_interrupt(unsigned long flags);
void cpu_pause(void);
unsigned long cpu_disable_interrupt_save(void);
unsigned long cpu_enable_interrupt_save(void);
#endif

View File

@ -0,0 +1,174 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
#ifndef _FUTEX_H
#define _FUTEX_H
#include <mc_plist.h>
#include <arch-lock.h>
#include <linux/uaccess.h>
/** \name Futex Commands
* @{
*/
#define FUTEX_WAIT 0
#define FUTEX_WAKE 1
#define FUTEX_FD 2
#define FUTEX_REQUEUE 3
#define FUTEX_CMP_REQUEUE 4
#define FUTEX_WAKE_OP 5
#define FUTEX_LOCK_PI 6
#define FUTEX_UNLOCK_PI 7
#define FUTEX_TRYLOCK_PI 8
#define FUTEX_WAIT_BITSET 9
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
// @}
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
#define FUTEX_CMD_MASK ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
#define FUTEX_WAIT_PRIVATE (FUTEX_WAIT | FUTEX_PRIVATE_FLAG)
#define FUTEX_WAKE_PRIVATE (FUTEX_WAKE | FUTEX_PRIVATE_FLAG)
#define FUTEX_REQUEUE_PRIVATE (FUTEX_REQUEUE | FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PRIVATE (FUTEX_CMP_REQUEUE | FUTEX_PRIVATE_FLAG)
#define FUTEX_WAKE_OP_PRIVATE (FUTEX_WAKE_OP | FUTEX_PRIVATE_FLAG)
#define FUTEX_LOCK_PI_PRIVATE (FUTEX_LOCK_PI | FUTEX_PRIVATE_FLAG)
#define FUTEX_UNLOCK_PI_PRIVATE (FUTEX_UNLOCK_PI | FUTEX_PRIVATE_FLAG)
#define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
#define FUTEX_WAIT_BITSET_PRIVATE (FUTEX_WAIT_BITSET | FUTEX_PRIVATE_FLAG)
#define FUTEX_WAKE_BITSET_PRIVATE (FUTEX_WAKE_BITSET | FUTEX_PRIVATE_FLAG)
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
/** \name Futex Operations, used for FUTEX_WAKE_OP
* @{
*/
#define FUTEX_OP_SET 0 /* *(int *)UADDR2 = OPARG; */
#define FUTEX_OP_ADD 1 /* *(int *)UADDR2 += OPARG; */
#define FUTEX_OP_OR 2 /* *(int *)UADDR2 |= OPARG; */
#define FUTEX_OP_ANDN 3 /* *(int *)UADDR2 &= ~OPARG; */
#define FUTEX_OP_XOR 4 /* *(int *)UADDR2 ^= OPARG; */
#define FUTEX_OP_OPARG_SHIFT 8U /* Use (1 << OPARG) instead of OPARG. */
#define FUTEX_OP_CMP_EQ 0 /* if (oldval == CMPARG) wake */
#define FUTEX_OP_CMP_NE 1 /* if (oldval != CMPARG) wake */
#define FUTEX_OP_CMP_LT 2 /* if (oldval < CMPARG) wake */
#define FUTEX_OP_CMP_LE 3 /* if (oldval <= CMPARG) wake */
#define FUTEX_OP_CMP_GT 4 /* if (oldval > CMPARG) wake */
#define FUTEX_OP_CMP_GE 5 /* if (oldval >= CMPARG) wake */
// @}
#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */
#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */
#define FUTEX_HASHBITS 8 /* 256 entries in each futex hash tbl */
#define PS_RUNNING 0x1
#define PS_INTERRUPTIBLE 0x2
#define PS_UNINTERRUPTIBLE 0x4
#define PS_ZOMBIE 0x8
#define PS_EXITED 0x10
#define PS_STOPPED 0x20
static inline int get_futex_value_locked(uint32_t *dest, uint32_t *from)
{
int ret;
pagefault_disable();
ret = __get_user(*dest, from);
pagefault_enable();
return ret ? -EFAULT : 0;
}
union futex_key {
struct {
unsigned long pgoff;
void *phys;
int offset;
} shared;
struct {
unsigned long address;
void *mm; // Acctually, process_vm
int offset;
} private;
struct {
unsigned long word;
void *ptr;
int offset;
} both;
};
#define FUTEX_KEY_INIT ((union futex_key) { .both = { .ptr = NULL } })
#define FUTEX_BITSET_MATCH_ANY 0xffffffff
/**
* struct futex_q - The hashed futex queue entry, one per waiting task
* @task: the task waiting on the futex
* @lock_ptr: the hash bucket lock
* @key: the key the futex is hashed on
* @requeue_pi_key: the requeue_pi target futex key
* @bitset: bitset for the optional bitmasked wakeup
*
* We use this hashed waitqueue, instead of a normal wait_queue_t, so
* we can wake only the relevant ones (hashed queues may be shared).
*
* A futex_q has a woken state, just like tasks have TASK_RUNNING.
* It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
* The order of wakup is always to make the first condition true, then
* the second.
*
* PI futexes are typically woken before they are removed from the hash list via
* the rt_mutex code. See unqueue_me_pi().
*/
struct futex_q {
struct mc_plist_node list;
void *task; // Actually, struct thread
_ihk_spinlock_t *lock_ptr;
union futex_key key;
union futex_key *requeue_pi_key;
uint32_t bitset;
/* Used to wake-up a thread running on a Linux CPU */
void *uti_futex_resp;
/* Used to send IPI directly to the waiter CPU */
int linux_cpu;
/* Used to wake-up a thread running on a McKernel from Linux */
void *th_spin_sleep;
void *th_status;
void *th_spin_sleep_lock;
void *proc_status;
void *proc_update_lock;
void *runq_lock;
void *clv_flags;
int intr_id;
int intr_vector;
unsigned long th_spin_sleep_pa;
unsigned long th_status_pa;
unsigned long th_spin_sleep_lock_pa;
unsigned long proc_status_pa;
unsigned long proc_update_lock_pa;
unsigned long runq_lock_pa;
unsigned long clv_flags_pa;
};
long do_futex(int n, unsigned long arg0, unsigned long arg1,
unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5,
struct uti_info *uti_info,
void *uti_futex_resp);
void futex_remove_process(struct mcctrl_per_proc_data *ppd);
#endif

View File

@ -0,0 +1,277 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
/*
* Descending-priority-sorted double-linked list
*
* (C) 2002-2003 Intel Corp
* Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>.
*
* 2001-2005 (c) MontaVista Software, Inc.
* Daniel Walker <dwalker@mvista.com>
*
* (C) 2005 Thomas Gleixner <tglx@linutronix.de>
*
* Simplifications of the original code by
* Oleg Nesterov <oleg@tv-sign.ru>
*
* Licensed under the FSF's GNU Public License v2 or later.
*
* Based on simple lists (include/linux/list.h).
*
* This is a priority-sorted list of nodes; each node has a
* priority from INT_MIN (highest) to INT_MAX (lowest).
*
* Addition is O(K), removal is O(1), change of priority of a node is
* O(K) and K is the number of RT priority levels used in the system.
* (1 <= K <= 99)
*
* This list is really a list of lists:
*
* - The tier 1 list is the prio_list, different priority nodes.
*
* - The tier 2 list is the node_list, serialized nodes.
*
* Simple ASCII art explanation:
*
* |HEAD |
* | |
* |prio_list.prev|<------------------------------------|
* |prio_list.next|<->|pl|<->|pl|<--------------->|pl|<-|
* |10 | |10| |21| |21| |21| |40| (prio)
* | | | | | | | | | | | |
* | | | | | | | | | | | |
* |node_list.next|<->|nl|<->|nl|<->|nl|<->|nl|<->|nl|<-|
* |node_list.prev|<------------------------------------|
*
* The nodes on the prio_list list are sorted by priority to simplify
* the insertion of new nodes. There are no nodes with duplicate
* priorites on the list.
*
* The nodes on the node_list are ordered by priority and can contain
* entries which have the same priority. Those entries are ordered
* FIFO
*
* Addition means: look for the prio_list node in the prio_list
* for the priority of the node and insert it before the node_list
* entry of the next prio_list node. If it is the first node of
* that priority, add it to the prio_list in the right position and
* insert it into the serialized node_list list
*
* Removal means remove it from the node_list and remove it from
* the prio_list if the node_list list_head is non empty. In case
* of removal from the prio_list it must be checked whether other
* entries of the same priority are on the list or not. If there
* is another entry of the same priority then this entry has to
* replace the removed entry on the prio_list. If the entry which
* is removed is the only entry of this priority then a simple
* remove from both list is sufficient.
*
* INT_MIN is the highest priority, 0 is the medium highest, INT_MAX
* is lowest priority.
*
* No locking is done, up to the caller.
*
*/
#ifndef _MC_PLIST_H_
#define _MC_PLIST_H_
#include <arch-lock.h>
struct mc_plist_head {
struct list_head prio_list;
struct list_head node_list;
#ifdef CONFIG_DEBUG_PI_LIST
raw_spinlock_t *rawlock;
spinlock_t *spinlock;
#endif
};
struct mc_plist_node {
int prio;
struct mc_plist_head plist;
};
#ifdef CONFIG_DEBUG_PI_LIST
# define PLIST_HEAD_LOCK_INIT(_lock) .spinlock = _lock
# define PLIST_HEAD_LOCK_INIT_RAW(_lock) .rawlock = _lock
#else
# define PLIST_HEAD_LOCK_INIT(_lock)
# define PLIST_HEAD_LOCK_INIT_RAW(_lock)
#endif
#define _MCK_PLIST_HEAD_INIT(head) \
.prio_list = LIST_HEAD_INIT((head).prio_list), \
.node_list = LIST_HEAD_INIT((head).node_list)
/**
* PLIST_HEAD_INIT - static struct plist_head initializer
* @head: struct plist_head variable name
* @_lock: lock to initialize for this list
*/
#define MCK_PLIST_HEAD_INIT(head, _lock) \
{ \
_MCK_PLIST_HEAD_INIT(head), \
MCK_PLIST_HEAD_LOCK_INIT(&(_lock)) \
}
/**
* PLIST_HEAD_INIT_RAW - static struct plist_head initializer
* @head: struct plist_head variable name
* @_lock: lock to initialize for this list
*/
#define MCK_PLIST_HEAD_INIT_RAW(head, _lock) \
{ \
_MCK_PLIST_HEAD_INIT(head), \
MCK_PLIST_HEAD_LOCK_INIT_RAW(&(_lock)) \
}
/**
* PLIST_NODE_INIT - static struct plist_node initializer
* @node: struct plist_node variable name
* @__prio: initial node priority
*/
#define MCK_PLIST_NODE_INIT(node, __prio) \
{ \
.prio = (__prio), \
.plist = { _MCK_PLIST_HEAD_INIT((node).plist) }, \
}
/**
* plist_head_init - dynamic struct plist_head initializer
* @head: &struct plist_head pointer
* @lock: spinlock protecting the list (debugging)
*/
static inline void
mc_plist_head_init(struct mc_plist_head *head, _ihk_spinlock_t *lock)
{
INIT_LIST_HEAD(&head->prio_list);
INIT_LIST_HEAD(&head->node_list);
#ifdef CONFIG_DEBUG_PI_LIST
head->spinlock = lock;
head->rawlock = NULL;
#endif
}
/**
* plist_head_init_raw - dynamic struct plist_head initializer
* @head: &struct plist_head pointer
* @lock: raw_spinlock protecting the list (debugging)
*/
static inline void
mc_plist_head_init_raw(struct mc_plist_head *head, _ihk_spinlock_t *lock)
{
INIT_LIST_HEAD(&head->prio_list);
INIT_LIST_HEAD(&head->node_list);
#ifdef CONFIG_DEBUG_PI_LIST
head->rawlock = lock;
head->spinlock = NULL;
#endif
}
/**
* plist_node_init - Dynamic struct plist_node initializer
* @node: &struct plist_node pointer
* @prio: initial node priority
*/
static inline void mc_plist_node_init(struct mc_plist_node *node, int prio)
{
node->prio = prio;
mc_plist_head_init(&node->plist, NULL);
}
extern void mc_plist_add(struct mc_plist_node *node,
struct mc_plist_head *head);
extern void mc_plist_del(struct mc_plist_node *node,
struct mc_plist_head *head);
/**
* plist_for_each - iterate over the plist
* @pos: the type * to use as a loop counter
* @head: the head for your list
*/
#define mc_plist_for_each(pos, head) \
list_for_each_entry(pos, &(head)->node_list, plist.node_list)
/**
* plist_for_each_safe - iterate safely over a plist of given type
* @pos: the type * to use as a loop counter
* @n: another type * to use as temporary storage
* @head: the head for your list
*
* Iterate over a plist of given type, safe against removal of list entry.
*/
#define mc_plist_for_each_safe(pos, n, head) \
list_for_each_entry_safe(pos, n, &(head)->node_list, plist.node_list)
/**
* plist_for_each_entry - iterate over list of given type
* @pos: the type * to use as a loop counter
* @head: the head for your list
* @mem: the name of the list_struct within the struct
*/
#define mc_plist_for_each_entry(pos, head, mem) \
list_for_each_entry(pos, &(head)->node_list, mem.plist.node_list)
/**
* plist_for_each_entry_safe - iterate safely over list of given type
* @pos: the type * to use as a loop counter
* @n: another type * to use as temporary storage
* @head: the head for your list
* @m: the name of the list_struct within the struct
*
* Iterate over list of given type, safe against removal of list entry.
*/
#define mc_plist_for_each_entry_safe(pos, n, head, m) \
list_for_each_entry_safe(pos, n, &(head)->node_list, m.plist.node_list)
/**
* plist_head_empty - return !0 if a plist_head is empty
* @head: &struct plist_head pointer
*/
static inline int mc_plist_head_empty(const struct mc_plist_head *head)
{
return list_empty(&head->node_list);
}
/**
* plist_node_empty - return !0 if plist_node is not on a list
* @node: &struct plist_node pointer
*/
static inline int mc_plist_node_empty(const struct mc_plist_node *node)
{
return mc_plist_head_empty(&node->plist);
}
/* All functions below assume the plist_head is not empty. */
/**
* plist_first_entry - get the struct for the first entry
* @head: the &struct plist_head pointer
* @type: the type of the struct this is embedded in
* @member: the name of the list_struct within the struct
*/
#ifdef CONFIG_DEBUG_PI_LIST
# define mc_plist_first_entry(head, type, member) \
({ \
WARN_ON(mc_plist_head_empty(head)); \
container_of(mc_plist_first(head), type, member); \
})
#else
# define mc_plist_first_entry(head, type, member) \
container_of(mc_plist_first(head), type, member)
#endif
/**
* plist_first - return the first node (and thus, highest priority)
* @head: the &struct plist_head pointer
*
* Assumes the plist is _not_ empty.
*/
static inline struct mc_plist_node *mc_plist_first(
const struct mc_plist_head *head)
{
return list_entry(head->node_list.next,
struct mc_plist_node, plist.node_list);
}
#endif

View File

@ -0,0 +1,100 @@
/* This is copy of the necessary part from McKernel, for uti-futex */
#include <mc_plist.h>
#include <arch-lock.h>
#ifdef CONFIG_DEBUG_PI_LIST
static void mc_plist_check_prev_next(struct list_head *t, struct list_head *p,
struct list_head *n)
{
WARN(n->prev != p || p->next != n,
"top: %p, n: %p, p: %p\n"
"prev: %p, n: %p, p: %p\n"
"next: %p, n: %p, p: %p\n",
t, t->next, t->prev,
p, p->next, p->prev,
n, n->next, n->prev);
}
static void mc_plist_check_list(struct list_head *top)
{
struct list_head *prev = top, *next = top->next;
mc_plist_check_prev_next(top, prev, next);
while (next != top) {
prev = next;
next = prev->next;
mc_plist_check_prev_next(top, prev, next);
}
}
static void mc_plist_check_head(struct mc_plist_head *head)
{
WARN_ON(!head->rawlock && !head->spinlock);
if (head->rawlock)
WARN_ON_SMP(!raw_spin_is_locked(head->rawlock));
if (head->spinlock)
WARN_ON_SMP(!spin_is_locked(head->spinlock));
mc_plist_check_list(&head->prio_list);
mc_plist_check_list(&head->node_list);
}
#else
# define mc_plist_check_head(h) do { } while (0)
#endif
/**
* plist_add - add @node to @head
*
* @node: &struct plist_node pointer
* @head: &struct plist_head pointer
*/
void mc_plist_add(struct mc_plist_node *node, struct mc_plist_head *head)
{
struct mc_plist_node *iter;
mc_plist_check_head(head);
#if 0
WARN_ON(!plist_node_empty(node));
#endif
list_for_each_entry(iter, &head->prio_list, plist.prio_list) {
if (node->prio < iter->prio)
goto lt_prio;
else if (node->prio == iter->prio) {
iter = list_entry(iter->plist.prio_list.next,
struct mc_plist_node, plist.prio_list);
goto eq_prio;
}
}
lt_prio:
list_add_tail(&node->plist.prio_list, &iter->plist.prio_list);
eq_prio:
list_add_tail(&node->plist.node_list, &iter->plist.node_list);
mc_plist_check_head(head);
}
/**
* plist_del - Remove a @node from plist.
*
* @node: &struct plist_node pointer - entry to be removed
* @head: &struct plist_head pointer - list head
*/
void mc_plist_del(struct mc_plist_node *node, struct mc_plist_head *head)
{
mc_plist_check_head(head);
if (!list_empty(&node->plist.prio_list)) {
struct mc_plist_node *next = mc_plist_first(&node->plist);
list_move_tail(&next->plist.prio_list, &node->plist.prio_list);
list_del_init(&node->plist.prio_list);
}
list_del_init(&node->plist.node_list);
mc_plist_check_head(head);
}

View File

@ -58,7 +58,8 @@
#define SCD_MSG_SEND_SIGNAL 0x7
#define SCD_MSG_SEND_SIGNAL_ACK 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_GET_VDSO_INFO 0xa
#define SCD_MSG_CLEANUP_PROCESS_RESP 0xa
#define SCD_MSG_GET_VDSO_INFO 0xb
//#define SCD_MSG_GET_CPU_MAPPING 0xc
//#define SCD_MSG_REPLY_GET_CPU_MAPPING 0xd
@ -104,6 +105,8 @@
#define SCD_MSG_CPU_RW_REG 0x52
#define SCD_MSG_CPU_RW_REG_RESP 0x53
#define SCD_MSG_CLEANUP_FD 0x54
#define SCD_MSG_CLEANUP_FD_RESP 0x55
#define SCD_MSG_FUTEX_WAKE 0x60
@ -260,6 +263,9 @@ struct mcctrl_per_proc_data {
struct list_head devobj_pager_list;
struct semaphore devobj_pager_lock;
int enable_tofu;
struct rb_root rva_to_rpa_cache;
};
struct sysfsm_req {
@ -324,13 +330,20 @@ struct process_list_item {
wait_queue_head_t pli_wq;
};
#define PE_LIST_MAXLEN 5
struct mcctrl_part_exec {
struct mutex lock;
int nr_processes;
/* number of processes to let in / out the synchronization point */
int nr_processes_left;
/* number of processes which have joined the partition */
int nr_processes_joined;
int process_rank;
pid_t node_proxy_pid;
cpumask_t cpus_used;
struct list_head pli_list;
struct list_head chain;
};
#define CPU_LONGS (((NR_CPUS) + (BITS_PER_LONG) - 1) / (BITS_PER_LONG))
@ -353,6 +366,7 @@ struct mcctrl_usrdata {
int job_pos;
int mcctrl_dma_abort;
struct mutex reserve_lock;
struct mutex part_exec_lock;
unsigned long last_thread_exec;
wait_queue_head_t wq_procfs;
struct list_head per_proc_data_hash[MCCTRL_PER_PROC_DATA_HASH_SIZE];
@ -368,7 +382,7 @@ struct mcctrl_usrdata {
nodemask_t numa_online;
struct list_head cpu_topology_list;
struct list_head node_topology_list;
struct mcctrl_part_exec part_exec;
struct list_head part_exec_list;
int perf_event_num;
};
@ -453,7 +467,8 @@ struct mcctrl_per_thread_data *mcctrl_get_per_thread_data(struct mcctrl_per_proc
struct task_struct *task);
int mcctrl_clear_pte_range(uintptr_t start, uintptr_t len);
void __return_syscall(ihk_os_t os, struct ikc_scd_packet *packet,
void __return_syscall(ihk_os_t os, struct mcctrl_per_proc_data *ppd,
struct ikc_scd_packet *packet,
long ret, int stid);
int clear_pte_range(uintptr_t start, uintptr_t len);
@ -548,4 +563,34 @@ struct uti_futex_resp {
int done;
wait_queue_head_t wq;
};
#ifdef ENABLE_TOFU
/*
* Hash table to keep track of files and related processes
* and file descriptors.
* NOTE: Used for Tofu driver release handlers.
*/
#define MCCTRL_FILE_2_PIDFD_HASH_SHIFT 4
#define MCCTRL_FILE_2_PIDFD_HASH_SIZE (1 << MCCTRL_FILE_2_PIDFD_HASH_SHIFT)
#define MCCTRL_FILE_2_PIDFD_HASH_MASK (MCCTRL_FILE_2_PIDFD_HASH_SIZE - 1)
struct mcctrl_file_to_pidfd {
struct file *filp;
ihk_os_t os;
struct task_struct *group_leader;
int pid;
int fd;
struct list_head hash;
char tofu_dev_path[128];
void *pde_data;
};
int mcctrl_file_to_pidfd_hash_insert(struct file *filp,
ihk_os_t os, int pid, struct task_struct *group_leader, int fd,
char *path, void *pde_data);
struct mcctrl_file_to_pidfd *mcctrl_file_to_pidfd_hash_lookup(
struct file *filp, struct task_struct *group_leader);
int mcctrl_file_to_pidfd_hash_remove(struct file *filp,
ihk_os_t os, struct task_struct *group_leader, int fd);
#endif
#endif

View File

@ -126,7 +126,7 @@ find_procfs_entry(struct procfs_list_entry *parent, const char *name)
static void
delete_procfs_entries(struct procfs_list_entry *top)
{
struct procfs_list_entry *e;
struct procfs_list_entry *e = NULL;
struct procfs_list_entry *n;
list_del(&top->list);
@ -136,8 +136,10 @@ delete_procfs_entries(struct procfs_list_entry *top)
}
#if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0)
e->entry->read_proc = NULL;
e->entry->data = NULL;
if (e) {
e->entry->read_proc = NULL;
e->entry->data = NULL;
}
#endif
remove_proc_entry(top->name, top->parent? top->parent->entry: NULL);
if(top->data)

View File

@ -45,6 +45,9 @@
#include <linux/mount.h>
#include <linux/kdev_t.h>
#include <linux/hugetlb.h>
#include <linux/proc_fs.h>
#include <linux/rbtree.h>
#include <linux/llist.h>
#include <asm/uaccess.h>
#include <asm/delay.h>
#include <asm/io.h>
@ -52,6 +55,7 @@
#include "mcctrl.h"
#include <linux/version.h>
#include <archdeps.h>
#include <asm/pgtable.h>
#define ALIGN_WAIT_BUF(z) (((z + 63) >> 6) << 6)
@ -362,6 +366,7 @@ retry_alloc:
#define STATUS_IN_PROGRESS 0
#define STATUS_SYSCALL 4
#define __NR_syscall_response 8001
req->valid = 0;
if (__notify_syscall_requester(usrdata->os, packet, resp) < 0) {
@ -436,7 +441,7 @@ retry_alloc:
req->valid = 0;
/* check result */
if (req->number != __NR_mmap) {
if (req->number != __NR_syscall_response) {
printk("%s:unexpected response. %lx %lx\n",
__FUNCTION__, req->number, req->args[0]);
syscall_ret = -EIO;
@ -655,6 +660,9 @@ static int rus_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
goto put_and_out;
}
// Force regular page size
pgsize = PAGE_SIZE;
rva = (unsigned long)addr & ~(pgsize - 1);
rpa = rpa & ~(pgsize - 1);
@ -666,7 +674,8 @@ static int rus_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
/* LWK may hold large page based mappings that align rva outside
* Linux' VMA, make sure we don't try to map to those pages */
if (rva + (pix * PAGE_SIZE) < vma->vm_start) {
if (rva + (pix * PAGE_SIZE) < vma->vm_start ||
rva + (pix * PAGE_SIZE) > vma->vm_end) {
continue;
}
@ -677,21 +686,27 @@ static int rus_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
if (error) {
pr_err("%s: error inserting mapping for 0x%#lx "
"(req: TID: %d, syscall: %lu) error: %d,"
" vm_start: 0x%lx, vm_end: 0x%lx\n",
" vm_start: 0x%lx, vm_end: 0x%lx, pgsize: %lu, ind: %lu\n",
__func__,
(unsigned long)addr, packet.fault_tid,
rsysnum, error,
vma->vm_start, vma->vm_end);
vma->vm_start, vma->vm_end, pgsize, pix);
}
}
else
else {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 18, 0)
error = vmf_insert_pfn(vma, rva+(pix*PAGE_SIZE),
pfn+pix);
if (error == VM_FAULT_NOPAGE) {
dprintk("%s: vmf_insert_pfn returned %d\n",
__func__, error);
error = 0;
}
#else
error = vm_insert_pfn(vma, rva+(pix*PAGE_SIZE),
pfn+pix);
#endif
}
if (error) {
pr_err("%s: vm_insert_pfn returned %d\n",
__func__, error);
@ -1835,20 +1850,165 @@ static long pager_call(ihk_os_t os, struct syscall_request *req)
return ret;
}
void __return_syscall(ihk_os_t os, struct ikc_scd_packet *packet,
#ifdef ENABLE_TOFU
struct list_head mcctrl_file_to_pidfd_hash[MCCTRL_FILE_2_PIDFD_HASH_SIZE];
spinlock_t mcctrl_file_to_pidfd_hash_lock;
void mcctrl_file_to_pidfd_hash_init(void)
{
int hash;
spin_lock_init(&mcctrl_file_to_pidfd_hash_lock);
for (hash = 0; hash < MCCTRL_FILE_2_PIDFD_HASH_SIZE; ++hash) {
INIT_LIST_HEAD(&mcctrl_file_to_pidfd_hash[hash]);
}
}
int mcctrl_file_to_pidfd_hash_insert(struct file *filp,
ihk_os_t os, int pid, struct task_struct *group_leader, int fd,
char *path, void *pde_data)
{
unsigned long irqflags;
struct mcctrl_file_to_pidfd *file2pidfd_iter;
struct mcctrl_file_to_pidfd *file2pidfd;
int hash = (int)((unsigned long)filp &
(unsigned long)MCCTRL_FILE_2_PIDFD_HASH_MASK);
int ret = 0;
file2pidfd = kmalloc(sizeof(*file2pidfd), GFP_ATOMIC);
if (!file2pidfd)
return -ENOMEM;
file2pidfd->filp = filp;
file2pidfd->os = os;
file2pidfd->pid = pid;
file2pidfd->group_leader = group_leader;
file2pidfd->fd = fd;
/* Only copy the name under /proc/tofu/dev/ */
strncpy(file2pidfd->tofu_dev_path, path + 15, 128);
file2pidfd->pde_data = pde_data;
spin_lock_irqsave(&mcctrl_file_to_pidfd_hash_lock, irqflags);
list_for_each_entry(file2pidfd_iter,
&mcctrl_file_to_pidfd_hash[hash], hash) {
if (file2pidfd_iter->filp == filp) {
printk("%s: WARNING: filp: %p, pid: %d, fd: %d exists\n",
__func__, filp, pid, fd);
ret = -EBUSY;
goto free_out;
}
}
list_add_tail(&file2pidfd->hash,
&mcctrl_file_to_pidfd_hash[hash]);
dprintk("%s: filp: %p, pid: %d, fd: %d added\n",
__func__, filp, pid, fd);
spin_unlock_irqrestore(&mcctrl_file_to_pidfd_hash_lock, irqflags);
return ret;
free_out:
kfree(file2pidfd);
spin_unlock_irqrestore(&mcctrl_file_to_pidfd_hash_lock, irqflags);
return ret;
}
/*
* XXX: lookup relies on group_leader to identify the process
* because PIDs might be different across name spaces (e.g.,
* when using Docker)
*/
struct mcctrl_file_to_pidfd *mcctrl_file_to_pidfd_hash_lookup(
struct file *filp, struct task_struct *group_leader)
{
unsigned long irqflags;
struct mcctrl_file_to_pidfd *file2pidfd_iter;
struct mcctrl_file_to_pidfd *file2pidfd = NULL;
int hash = (int)((unsigned long)filp &
(unsigned long)MCCTRL_FILE_2_PIDFD_HASH_MASK);
spin_lock_irqsave(&mcctrl_file_to_pidfd_hash_lock, irqflags);
list_for_each_entry(file2pidfd_iter,
&mcctrl_file_to_pidfd_hash[hash], hash) {
if (file2pidfd_iter->filp == filp &&
file2pidfd_iter->group_leader == group_leader) {
file2pidfd = file2pidfd_iter;
dprintk("%s: filp: %p, pid: %d, fd: %d found\n",
__func__, filp, file2pidfd->pid, file2pidfd->fd);
break;
}
}
spin_unlock_irqrestore(&mcctrl_file_to_pidfd_hash_lock, irqflags);
return file2pidfd;
}
int mcctrl_file_to_pidfd_hash_remove(struct file *filp,
ihk_os_t os, struct task_struct *group_leader, int fd)
{
unsigned long irqflags;
struct mcctrl_file_to_pidfd *file2pidfd_iter;
int hash = (int)((unsigned long)filp &
(unsigned long)MCCTRL_FILE_2_PIDFD_HASH_MASK);
int ret = 0;
spin_lock_irqsave(&mcctrl_file_to_pidfd_hash_lock, irqflags);
list_for_each_entry(file2pidfd_iter,
&mcctrl_file_to_pidfd_hash[hash], hash) {
if (file2pidfd_iter->filp != filp)
continue;
if (file2pidfd_iter->os != os)
continue;
if (file2pidfd_iter->group_leader != group_leader)
continue;
if (file2pidfd_iter->fd != fd)
continue;
list_del(&file2pidfd_iter->hash);
dprintk("%s: filp: %p, pid: %d, fd: %d removed\n",
__func__, filp, file2pidfd_iter->pid, fd);
kfree(file2pidfd_iter);
goto unlock_out;
}
dprintk("%s: filp: %p, pid: %d, fd: %d couldn't be found\n",
__func__, filp, pid, fd);
ret = -ENOENT;
unlock_out:
spin_unlock_irqrestore(&mcctrl_file_to_pidfd_hash_lock, irqflags);
return ret;
}
#endif
void __return_syscall(ihk_os_t os, struct mcctrl_per_proc_data *ppd,
struct ikc_scd_packet *packet,
long ret, int stid)
{
unsigned long phys;
struct syscall_response *res;
if (!os || ihk_host_validate_os(os) || !packet) {
return;
}
phys = ihk_device_map_memory(ihk_os_to_dev(os),
packet->resp_pa, sizeof(*res));
if (!phys) {
return;
}
res = ihk_device_map_virtual(ihk_os_to_dev(os),
phys, sizeof(*res), NULL, 0);
if (!res) {
printk("%s: ERROR: invalid response structure address\n",
__FUNCTION__);
ihk_device_unmap_memory(ihk_os_to_dev(os), phys, sizeof(*res));
return;
}
@ -1856,6 +2016,109 @@ void __return_syscall(ihk_os_t os, struct ikc_scd_packet *packet,
res->ret = ret;
res->stid = stid;
#ifdef ENABLE_TOFU
/* Tofu enabled process? */
if (ppd && ppd->enable_tofu) {
char *pathbuf, *fullpath;
/* Record PDE_DATA after open() calls for Tofu driver */
if (packet->req.number == __NR_openat && ret > 1) {
struct fd f;
int fd;
fd = ret;
f = fdget(fd);
if (!f.file) {
goto out_notify;
}
pathbuf = (char *)__get_free_page(GFP_ATOMIC);
if (!pathbuf) {
goto out_fdput_open;
}
fullpath = d_path(&f.file->f_path, pathbuf, PAGE_SIZE);
if (IS_ERR(fullpath)) {
goto out_free_open;
}
if (!strncmp("/proc/tofu/dev/", fullpath, 15)) {
res->pde_data = PDE_DATA(file_inode(f.file));
dprintk("%s: fd: %d, path: %s, PDE_DATA: 0x%lx\n",
__func__,
fd,
fullpath,
(unsigned long)res->pde_data);
dprintk("%s: pgd_index: %ld, pmd_index: %ld, pte_index: %ld\n",
__func__,
pgd_index((unsigned long)res->pde_data),
pmd_index((unsigned long)res->pde_data),
pte_index((unsigned long)res->pde_data));
dprintk("MAX_USER_VA_BITS: %d, PGDIR_SHIFT: %d\n",
MAX_USER_VA_BITS, PGDIR_SHIFT);
mcctrl_file_to_pidfd_hash_insert(f.file, os,
task_tgid_vnr(current),
current->group_leader, fd,
fullpath, res->pde_data);
}
out_free_open:
free_page((unsigned long)pathbuf);
out_fdput_open:
fdput(f);
}
/* Ioctl on Tofu CQ? */
else if (packet->req.number == __NR_ioctl &&
packet->req.args[0] > 0 && ret == 0) {
struct fd f;
int fd;
int tni, cq;
long __ret;
fd = packet->req.args[0];
f = fdget(fd);
if (!f.file) {
goto out_notify;
}
pathbuf = (char *)__get_free_page(GFP_ATOMIC);
if (!pathbuf) {
goto out_fdput_ioctl;
}
fullpath = d_path(&f.file->f_path, pathbuf, PAGE_SIZE);
if (IS_ERR(fullpath)) {
goto out_free_ioctl;
}
/* Looking for /proc/tofu/dev/tniXcqY pattern */
__ret = sscanf(fullpath, "/proc/tofu/dev/tni%dcq%d", &tni, &cq);
if (__ret == 2) {
extern long __mcctrl_tof_utofu_unlocked_ioctl_cq(void *pde_data,
unsigned int cmd, unsigned long arg);
dprintk("%s: ioctl(): fd: %d, path: %s\n",
__func__,
fd,
fullpath);
__ret = __mcctrl_tof_utofu_unlocked_ioctl_cq(
PDE_DATA(file_inode(f.file)),
packet->req.args[1], packet->req.args[2]);
}
out_free_ioctl:
free_page((unsigned long)pathbuf);
out_fdput_ioctl:
fdput(f);
}
}
out_notify:
#endif
if (__notify_syscall_requester(os, packet, res) < 0) {
printk("%s: WARNING: failed to notify PID %d\n",
__FUNCTION__, packet->pid);
@ -2158,11 +2421,98 @@ int __do_in_kernel_irq_syscall(ihk_os_t os, struct ikc_scd_packet *packet)
if (ret == -ENOSYS)
return -ENOSYS;
__return_syscall(os, packet, ret, 0);
__return_syscall(os, NULL, packet, ret, 0);
return 0;
}
/*
* Memory clearing helpers.
*/
struct node_distance;
#define IHK_RBTREE_ALLOCATOR
#ifdef IHK_RBTREE_ALLOCATOR
struct free_chunk {
unsigned long addr, size;
struct rb_node node;
struct llist_node list;
};
#endif
typedef struct mcs_lock_node {
#ifndef SPIN_LOCK_IN_MCS
unsigned long locked;
struct mcs_lock_node *next;
#endif
unsigned long irqsave;
#ifdef SPIN_LOCK_IN_MCS
ihk_spinlock_t spinlock;
#endif
#ifndef ENABLE_UBSAN
} __aligned(64) mcs_lock_node_t;
#else
} mcs_lock_node_t;
#endif
struct ihk_mc_numa_node {
int id;
int linux_numa_id;
int type;
struct list_head allocators;
struct node_distance *nodes_by_distance;
#ifdef IHK_RBTREE_ALLOCATOR
atomic_t zeroing_workers;
atomic_t nr_to_zero_pages;
struct llist_head zeroed_list;
struct llist_head to_zero_list;
struct rb_root free_chunks;
mcs_lock_node_t lock;
unsigned long nr_pages;
/*
* nr_free_pages: all freed pages, zeroed if zero_at_free
*/
unsigned long nr_free_pages;
unsigned long min_addr;
unsigned long max_addr;
#endif
};
void mcctrl_zero_mckernel_pages(unsigned long arg)
{
struct llist_node *llnode;
struct ihk_mc_numa_node *node =
(struct ihk_mc_numa_node *)arg;
/* Iterate free chunks */
while ((llnode = llist_del_first(&node->to_zero_list))) {
unsigned long addr;
unsigned long size;
struct free_chunk *chunk =
container_of(llnode, struct free_chunk, list);
addr = chunk->addr;
size = chunk->size;
memset(phys_to_virt(addr) + sizeof(*chunk), 0,
chunk->size - sizeof(*chunk));
llist_add(&chunk->list, &node->zeroed_list);
dprintk("%s: zeroed %lu pages @ McKernel NUMA %d (chunk: 0x%lx:%lu)\n",
__func__,
size >> PAGE_SHIFT,
node->id,
addr, size);
barrier();
atomic_sub((int)(size >> PAGE_SHIFT), &node->nr_to_zero_pages);
}
atomic_dec(&node->zeroing_workers);
}
int __do_in_kernel_syscall(ihk_os_t os, struct ikc_scd_packet *packet)
{
struct syscall_request *sc = &packet->req;
@ -2171,6 +2521,28 @@ int __do_in_kernel_syscall(ihk_os_t os, struct ikc_scd_packet *packet)
dprintk("%s: system call: %lx\n", __FUNCTION__, sc->args[0]);
switch (sc->number) {
#ifdef ENABLE_TOFU
case __NR_close: {
struct fd f;
int fd;
fd = (int)sc->args[0];
if (fd > 2) {
f = fdget(fd);
if (f.file) {
mcctrl_file_to_pidfd_hash_remove(f.file, os,
current->group_leader, fd);
fdput(f);
}
}
error = -ENOSYS;
goto out;
break;
}
#endif
case __NR_mmap:
ret = pager_call(os, sc);
break;
@ -2183,6 +2555,14 @@ int __do_in_kernel_syscall(ihk_os_t os, struct ikc_scd_packet *packet)
ret = remap_user_space(sc->args[0], sc->args[1], sc->args[2]);
break;
case __NR_move_pages:
/*
* move pages is used for zeroing McKernel side memory,
* this call is NOT offloaded by applications.
*/
mcctrl_zero_mckernel_pages(sc->args[0]);
goto out_no_syscall_return;
case __NR_exit_group: {
/* Make sure the user space handler will be called as well */
@ -2266,7 +2646,9 @@ sched_setparam_out:
break;
}
__return_syscall(os, packet, ret, 0);
__return_syscall(os, NULL, packet, ret, 0);
out_no_syscall_return:
ihk_ikc_release_packet((struct ihk_ikc_free_packet *)packet);
error = 0;

View File

@ -69,13 +69,18 @@ if (ENABLE_QLMPI)
endif()
if (ENABLE_UTI)
link_directories("${CMAKE_CURRENT_BINARY_DIR}/lib/syscall_intercept")
add_library(mck_syscall_intercept SHARED syscall_intercept.c arch/${ARCH}/archdep_c.c)
# target name is defined by add_library(), not project() or add_subdirectory()
add_dependencies(mck_syscall_intercept syscall_intercept_shared)
if (${ARCH} STREQUAL "arm64")
set_source_files_properties(syscall_intercept.c PROPERTIES COMPILE_FLAGS -mgeneral-regs-only)
endif()
target_link_libraries(mck_syscall_intercept ${LIBSYSCALL_INTERCEPT_LIBRARIES})
target_include_directories(mck_syscall_intercept PRIVATE ${LIBSYSCALL_INTERCEPT_INCLUDE_DIRS})
set_target_properties(mck_syscall_intercept PROPERTIES INSTALL_RPATH_USE_LINK_PATH TRUE)
target_link_libraries(mck_syscall_intercept syscall_intercept)
target_include_directories(mck_syscall_intercept PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/lib/syscall_intercept/include)
set_target_properties(mck_syscall_intercept PROPERTIES INSTALL_RPATH ${CMAKE_INSTALL_PREFIX}/lib64)
install(TARGETS mck_syscall_intercept
DESTINATION "${CMAKE_INSTALL_LIBDIR}")

View File

@ -1179,7 +1179,7 @@ static int start_gdb(void) {
sprintf(buf, "target remote :%d", ntohs(sin.sin_port));
execlp("gdb", "eclair", "-q", "-ex", "set prompt (eclair) ",
"-ex", buf, opt.kernel_path, NULL);
"-ex", buf, opt.kernel_path, "-ex", "set pagination off", NULL);
perror("execlp");
return 3;
}

View File

@ -1,3 +1,49 @@
if (NOT LIBDWARF)
add_subdirectory(libdwarf)
endif()
if (ENABLE_UTI)
if (${ARCH} STREQUAL "arm64")
set(SYSCALL_INTERCEPT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/syscall_intercept/arch/aarch64" CACHE STRINGS "relative path to syscalL_intercept source directory")
elseif (${ARCH} STREQUAL "x86_64")
set(SYSCALL_INTERCEPT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/syscall_intercept" CACHE STRINGS "relative path to syscalL_intercept source directory")
endif()
# syscall_intercept
# change cmake options only in this directory
SET(CMAKE_BUILD_TYPE Release CACHE STRING "release build" FORCE)
SET(TREAT_WARNINGS_AS_ERRORS OFF CACHE BOOL "ignore warnings" FORCE)
add_subdirectory(${SYSCALL_INTERCEPT_SOURCE_DIR} syscall_intercept)
# libuti
find_path(LIBCAP_INCLUDE_DIRS
capability.h
PATHS /usr/include/sys
NO_DEFAULT_PATH)
find_library(LIBCAP_LIBRARIES
NAME cap
PATHS /usr/lib64
NO_DEFAULT_PATH)
if (NOT LIBCAP_INCLUDE_DIRS OR NOT LIBCAP_LIBRARIES)
message(FATAL_ERROR "error: couldn't find libcap")
endif()
include(ExternalProject)
# Install libuti.so.* into <prefix>/mck/ so that mcexec can
# redirect ld*.so's access to it. In this way, a.out created
# by Fujitsu MPI and linked to libuti.so in the standard path
# can use the McKernel version when invoked through mcexec.
ExternalProject_Add(libuti
SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/uti
BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR}/uti
INSTALL_DIR ${prefix}
CONFIGURE_COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/uti/configure --prefix=<INSTALL_DIR> --libdir=<INSTALL_DIR>/lib64 --disable-static --with-rm=mckernel
BUILD_COMMAND ${MAKE}
BUILD_IN_SOURCE FALSE
INSTALL_COMMAND ${MAKE} install && bash -c "rm ${prefix}/include/uti.h ${prefix}/lib64/libuti.la && [[ -d ${prefix}/lib64/mck ]] || mkdir ${prefix}/lib64/mck && mv ${prefix}/lib64/libuti.* ${prefix}/lib64/mck"
)
endif()

1
executer/user/lib/uti Submodule

Submodule executer/user/lib/uti added at 8c5a556814

View File

@ -68,13 +68,13 @@
#include <sys/user.h>
#endif /* !__aarch64__ */
#include <sys/prctl.h>
#include "../../config.h"
#include "../include/uprotocol.h"
#include <ihk/ihk_host_user.h>
#include "../include/uti.h"
#include <getopt.h>
#include "archdep.h"
#include "arch_args.h"
#include "../../config.h"
#include <numa.h>
#include <numaif.h>
#include <spawn.h>
@ -84,7 +84,11 @@
#include "../include/pmi.h"
#include "../include/qlmpi.h"
#include <sys/xattr.h>
#include "../include/defs.h"
#include "../../lib/include/list.h"
#include "../../lib/include/bitops-set_bit.h"
#include "../../lib/include/bitops-clear_bit.h"
#include "../../lib/include/bitops-test_bit.h"
//#define DEBUG
#define ADD_ENVS_OPTION
@ -187,6 +191,8 @@ static int mpol_no_stack = 0;
static int mpol_no_bss = 0;
static int mpol_shm_premap = 0;
static int no_bind_ikc_map = 0;
static int straight_map = 0;
static unsigned long straight_map_threshold = (1024*1024);
static unsigned long mpol_threshold = 0;
static unsigned long heap_extension = -1;
static int profile = 0;
@ -198,6 +204,9 @@ static char *mpol_bind_nodes = NULL;
static int uti_thread_rank = 0;
static int uti_use_last_cpu = 0;
static int enable_uti = 0;
#ifdef ENABLE_TOFU
static int enable_tofu = 0;
#endif
/* Partitioned execution (e.g., for MPI) */
static int nr_processes = 0;
@ -1053,6 +1062,64 @@ static inline cpu_set_t *numa_node_set(int n)
return (cpu_set_t *)(numa_nodes + n * cpu_set_size);
}
static inline void _numa_local(__cpu_set_unit *localset,
unsigned long *nodemask, int nonlocal)
{
int i;
memset(nodemask, 0, PLD_PROCESS_NUMA_MASK_BITS / 8);
for (i = 0; i < nnodes; i++) {
cpu_set_t *nodeset = numa_node_set(i);
int j;
if (nonlocal) {
set_bit(i, nodemask);
}
for (j = 0; j < ncpu; j++) {
if (test_bit(j, localset)) {
__dprintf("%d belongs to local set\n", j);
}
if (CPU_ISSET_S(j, cpu_set_size, nodeset)) {
__dprintf("%d belongs to node %d\n", j, i);
}
if (test_bit(j, localset) &&
CPU_ISSET_S(j, cpu_set_size, nodeset)) {
if (nonlocal) {
clear_bit(i, nodemask);
} else {
set_bit(i, nodemask);
}
}
}
}
}
static inline void numa_local(__cpu_set_unit *localset, unsigned long *nodemask)
{
_numa_local(localset, nodemask, 0);
}
static inline void numa_nonlocal(__cpu_set_unit *localset,
unsigned long *nodemask)
{
_numa_local(localset, nodemask, 1);
}
static inline void numa_all(unsigned long *nodemask)
{
int i;
memset(nodemask, 0, PLD_PROCESS_NUMA_MASK_BITS / 8);
for (i = 0; i < nnodes; i++) {
set_bit(i, nodemask);
}
}
pid_t master_tid;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
@ -1674,6 +1741,18 @@ static struct option mcexec_options[] = {
.flag = NULL,
.val = 'M',
},
{
.name = "enable-straight-map",
.has_arg = no_argument,
.flag = &straight_map,
.val = 1,
},
{
.name = "straight-map-threshold",
.has_arg = required_argument,
.flag = NULL,
.val = 'S',
},
{
.name = "disable-sched-yield",
.has_arg = no_argument,
@ -1710,6 +1789,14 @@ static struct option mcexec_options[] = {
.flag = &enable_uti,
.val = 1,
},
#ifdef ENABLE_TOFU
{
.name = "enable-tofu",
.has_arg = no_argument,
.flag = &enable_tofu,
.val = 1,
},
#endif
{
.name = "debug-mcexec",
.has_arg = no_argument,
@ -1870,14 +1957,14 @@ opendev()
fprintf(stderr, "%s: warning: LD_PRELOAD line is too long\n", __FUNCTION__); \
return; \
} \
strncat(envbuf, elembuf, remainder); \
strncat(envbuf, elembuf, remainder - 1); \
remainder = PATH_MAX - (strlen(envbuf) + 1); \
nelem++; \
} while (0)
static ssize_t find_libdir(char *libdir, size_t len)
{
FILE *filep;
FILE *filep = NULL;
ssize_t rc;
size_t linelen = 0;
char *line = NULL;
@ -1933,7 +2020,9 @@ static ssize_t find_libdir(char *libdir, size_t len)
}
out:
pclose(filep);
if (filep) {
pclose(filep);
}
free(line);
return rc;
}
@ -2095,10 +2184,10 @@ int main(int argc, char **argv)
/* Parse options ("+" denotes stop at the first non-option) */
#ifdef ADD_ENVS_OPTION
while ((opt = getopt_long(argc, argv, "+c:n:t:M:h:e:s:m:u:",
while ((opt = getopt_long(argc, argv, "+c:n:t:M:h:e:s:m:u:S:",
mcexec_options, NULL)) != -1) {
#else /* ADD_ENVS_OPTION */
while ((opt = getopt_long(argc, argv, "+c:n:t:M:h:s:m:u:",
while ((opt = getopt_long(argc, argv, "+c:n:t:M:h:s:m:u:S:",
mcexec_options, NULL)) != -1) {
#endif /* ADD_ENVS_OPTION */
switch (opt) {
@ -2140,6 +2229,10 @@ int main(int argc, char **argv)
heap_extension = atobytes(optarg);
break;
case 'S':
straight_map_threshold = atobytes(optarg);
break;
#ifdef ADD_ENVS_OPTION
case 'e':
add_env_list(&extra_env, optarg);
@ -2554,6 +2647,7 @@ int main(int argc, char **argv)
cpu_set_arg.cpu_set = (void *)&desc->cpu_set;
cpu_set_arg.cpu_set_size = sizeof(desc->cpu_set);
cpu_set_arg.nr_processes = nr_processes;
cpu_set_arg.ppid = getppid();
cpu_set_arg.target_core = &target_core;
cpu_set_arg.process_rank = &process_rank;
cpu_set_arg.mcexec_linux_numa = &mcexec_linux_numa;
@ -2659,6 +2753,7 @@ int main(int argc, char **argv)
desc->heap_extension = heap_extension;
desc->mpol_bind_mask = 0;
desc->mpol_mode = PLD_MPOL_MAX; /* not specified */
if (mpol_bind_nodes) {
struct bitmask *bind_mask;
bind_mask = numa_parse_nodestring_all(mpol_bind_nodes);
@ -2672,11 +2767,66 @@ int main(int argc, char **argv)
}
}
}
/* Fujitsu TCS specific: mempolicy */
else if (getenv("OMPI_MCA_plm_ple_memory_allocation_policy")) {
char *mpol =
getenv("OMPI_MCA_plm_ple_memory_allocation_policy");
__dprintf("OMPI_MCA_plm_ple_memory_allocation_policy: %s\n",
mpol);
if (!strncmp(mpol, "localalloc", 10)) {
/* MPOL_DEFAULT has the same effect as MPOL_LOCAL */
desc->mpol_mode = MPOL_DEFAULT;
}
else if (!strncmp(mpol, "interleave_local", 16)) {
desc->mpol_mode = MPOL_INTERLEAVE;
numa_local(desc->cpu_set, desc->mpol_nodemask);
}
else if (!strncmp(mpol, "interleave_nonlocal", 19)) {
desc->mpol_mode = MPOL_INTERLEAVE;
numa_nonlocal(desc->cpu_set, desc->mpol_nodemask);
}
else if (!strncmp(mpol, "interleave_all", 14)) {
desc->mpol_mode = MPOL_INTERLEAVE;
numa_all(desc->mpol_nodemask);
}
else if (!strncmp(mpol, "bind_local", 10)) {
desc->mpol_mode = MPOL_BIND;
numa_local(desc->cpu_set, desc->mpol_nodemask);
}
else if (!strncmp(mpol, "bind_nonlocal", 13)) {
desc->mpol_mode = MPOL_BIND;
numa_nonlocal(desc->cpu_set, desc->mpol_nodemask);
}
else if (!strncmp(mpol, "bind_all", 8)) {
desc->mpol_mode = MPOL_BIND;
numa_all(desc->mpol_nodemask);
}
else if (!strncmp(mpol, "prefer_local", 12)) {
desc->mpol_mode = MPOL_PREFERRED;
numa_local(desc->cpu_set, desc->mpol_nodemask);
}
else if (!strncmp(mpol, "prefer_nonlocal", 15)) {
desc->mpol_mode = MPOL_PREFERRED;
numa_nonlocal(desc->cpu_set, desc->mpol_nodemask);
}
__dprintf("mpol_mode: %d, mpol_nodemask: %ld\n",
desc->mpol_mode, desc->mpol_nodemask[0]);
}
desc->enable_uti = enable_uti;
desc->uti_thread_rank = uti_thread_rank;
desc->uti_use_last_cpu = uti_use_last_cpu;
desc->thp_disable = get_thp_disable();
desc->straight_map = straight_map;
desc->straight_map_threshold = straight_map_threshold;
#ifdef ENABLE_TOFU
desc->enable_tofu = enable_tofu;
#endif
/* user_start and user_end are set by this call */
if (ioctl(fd, MCEXEC_UP_PREPARE_IMAGE, (unsigned long)desc) != 0) {
perror("prepare");
@ -2813,7 +2963,9 @@ static void kill_thread(unsigned long tid, int sig,
}
}
static long util_thread(struct thread_data_s *my_thread, unsigned long rp_rctx, int remote_tid, unsigned long pattr, unsigned long uti_clv, unsigned long _uti_desc)
static long util_thread(struct thread_data_s *my_thread,
unsigned long rp_rctx, int remote_tid, unsigned long pattr,
unsigned long uti_info, unsigned long _uti_desc)
{
struct uti_get_ctx_desc get_ctx_desc;
struct uti_switch_ctx_desc switch_ctx_desc;
@ -2865,7 +3017,7 @@ static long util_thread(struct thread_data_s *my_thread, unsigned long rp_rctx,
uti_desc->key = get_ctx_desc.key;
uti_desc->pid = getpid();
uti_desc->tid = gettid();
uti_desc->uti_clv = uti_clv;
uti_desc->uti_info = uti_info;
/* Initialize list of syscall arguments for syscall_intercept */
if (sizeof(struct syscall_struct) * 11 > page_size) {
@ -2879,7 +3031,9 @@ static long util_thread(struct thread_data_s *my_thread, unsigned long rp_rctx,
desc.phys_attr = pattr;
desc.uti_cpu_set_str = getenv("UTI_CPU_SET");
desc.uti_cpu_set_len = strlen(desc.uti_cpu_set_str) + 1;
if (desc.uti_cpu_set_str) {
desc.uti_cpu_set_len = strlen(desc.uti_cpu_set_str) + 1;
}
if ((rc = ioctl(fd, MCEXEC_UP_UTI_ATTR, &desc))) {
fprintf(stderr, "%s: error: MCEXEC_UP_UTI_ATTR: %s\n",
@ -3241,6 +3395,29 @@ overlay_path(int dirfd, const char *in, char *buf, int *resolvelinks)
if (!strcmp(path, "/dev/xpmem"))
return "/dev/null";
if (enable_uti && strstr(path, "libuti.so")) {
char libdir[PATH_MAX];
char *basename;
basename = strrchr(path, '/');
if (basename == NULL) {
basename = (char *)path;
} else {
basename++;
}
if (find_libdir(libdir, sizeof(libdir)) < 0) {
fprintf(stderr, "error: failed to find library directory\n");
return in;
}
n = snprintf(buf, PATH_MAX, "%s/mck/%s",
libdir, basename);
__dprintf("%s: %s replaced with %s\n",
__func__, path, buf);
goto checkexist;
}
if (!strncmp(path, "/proc/self", 10) &&
(path[10] == '/' || path[10] == '\0')) {
n = snprintf(buf, PATH_MAX, "/proc/mcos%d/%d%s",
@ -3974,6 +4151,7 @@ int main_loop(struct thread_data_s *my_thread)
#endif
case __NR_gettid:{
int rc = 0;
/*
* Number of TIDs and the remote physical address where TIDs are
* expected are passed in arg 4 and 5, respectively.
@ -3985,6 +4163,7 @@ int main_loop(struct thread_data_s *my_thread)
int *tids = malloc(sizeof(int) * w.sr.args[4]);
if (!tids) {
fprintf(stderr, "__NR_gettid(): error allocating TIDs\n");
rc = -ENOMEM;
goto gettid_out;
}
@ -4005,13 +4184,14 @@ int main_loop(struct thread_data_s *my_thread)
trans.direction = MCEXEC_UP_TRANSFER_TO_REMOTE;
if (ioctl(fd, MCEXEC_UP_TRANSFER, &trans) != 0) {
rc = -EFAULT;
fprintf(stderr, "__NR_gettid(): error transfering TIDs\n");
}
free(tids);
}
gettid_out:
do_syscall_return(fd, cpu, 0, 0, 0, 0, 0);
do_syscall_return(fd, cpu, rc, 0, 0, 0, 0);
break;
}
@ -4671,7 +4851,8 @@ return_execve2:
case __NR_sched_setaffinity:
if (w.sr.args[0] == 0) {
ret = util_thread(my_thread, w.sr.args[1], w.sr.rtid,
w.sr.args[2], w.sr.args[3], w.sr.args[4]);
w.sr.args[2], w.sr.args[3],
w.sr.args[4]);
}
else {
__eprintf("__NR_sched_setaffinity: invalid argument (%lx)\n", w.sr.args[0]);

View File

@ -1,3 +1,4 @@
#define _GNU_SOURCE
#include <libsyscall_intercept_hook_point.h>
#include <errno.h>
#include <stdio.h>
@ -5,13 +6,16 @@
#include <syscall.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h> /* for pid_t in uprotocol.h */
#include "../include/uprotocol.h"
#include "../include/uti.h"
#include "./archdep_uti.h"
#define DEBUG_UTI
static struct uti_desc uti_desc;
#define DEBUG_UTI
static __thread int on_linux = -1;
static int
hook(long syscall_number,
@ -20,22 +24,29 @@ hook(long syscall_number,
long arg4, long arg5,
long *result)
{
//return 1; /* debug */
int tid = uti_syscall0(__NR_gettid);
struct terminate_thread_desc term_desc;
unsigned long code;
int stack_top;
long ret;
if (!uti_desc.start_syscall_intercept) {
return 1; /* System call isn't taken over */
}
if (tid != uti_desc.mck_tid) {
/* new thread */
if (on_linux == -1) {
int tid = uti_syscall0(__NR_gettid);
on_linux = (tid == uti_desc.mck_tid) ? 1 : 0;
}
if (on_linux == 0) {
if (uti_desc.syscalls2 && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls2[syscall_number]++;
}
return 1;
}
#ifdef DEBUG_UTI
if (uti_desc.syscalls && syscall_number >= 0 && syscall_number < 512) {
uti_desc.syscalls[syscall_number]++;
@ -76,7 +87,7 @@ hook(long syscall_number,
uti_desc.syscall_stack[stack_top].args[3] = arg3;
uti_desc.syscall_stack[stack_top].args[4] = arg4;
uti_desc.syscall_stack[stack_top].args[5] = arg5;
uti_desc.syscall_stack[stack_top].uti_clv = uti_desc.uti_clv;
uti_desc.syscall_stack[stack_top].uti_info = uti_desc.uti_info;
uti_desc.syscall_stack[stack_top].ret = -EINVAL;
ret = uti_syscall3(__NR_ioctl, uti_desc.fd,

2
ihk

Submodule ihk updated: 049fe18e36...a98a13ef5f

View File

@ -51,6 +51,12 @@ set(MCKERNEL_SRCS
${IHK_FULL_SOURCE_DIR}/cokernel/smp/${ARCH}/setup.c
)
if (ENABLE_TOFU)
list(APPEND MCKERNEL_SRCS
tofu/tof_utofu_main.c
)
endif()
if (ENABLE_UBSAN)
add_compile_options(-fsanitize=undefined)
list(APPEND MCKERNEL_SRCS ubsan.c)

View File

@ -267,3 +267,154 @@ cpu_sysfs_setup(void)
return;
} /* cpu_sysfs_setup() */
/*
* Generic remote CPU function invocation facility.
*/
void smp_func_call_handler(void)
{
unsigned long irq_flags;
struct smp_func_call_request *req;
int reqs_left;
reiterate:
req = NULL;
reqs_left = 0;
irq_flags = ihk_mc_spinlock_lock(
&cpu_local_var(smp_func_req_lock));
/* Take requests one-by-one */
if (!list_empty(&cpu_local_var(smp_func_req_list))) {
req = list_first_entry(&cpu_local_var(smp_func_req_list),
struct smp_func_call_request, list);
list_del(&req->list);
reqs_left = !list_empty(&cpu_local_var(smp_func_req_list));
}
ihk_mc_spinlock_unlock(&cpu_local_var(smp_func_req_lock),
irq_flags);
if (req) {
req->ret = req->sfcd->func(req->cpu_index,
req->sfcd->nr_cpus, req->sfcd->arg);
ihk_atomic_dec(&req->sfcd->cpus_left);
}
if (reqs_left)
goto reiterate;
}
int smp_call_func(cpu_set_t *__cpu_set, smp_func_t __func, void *__arg)
{
int cpu, nr_cpus = 0;
int cpu_index = 0;
int this_cpu_index = 0;
struct smp_func_call_data sfcd;
struct smp_func_call_request *reqs;
int ret = 0;
int call_on_this_cpu = 0;
cpu_set_t cpu_set;
int max_nr_cpus = 4;
/* Sanity checks */
if (!__cpu_set || !__func) {
return -EINVAL;
}
/* Make sure it won't change in between */
cpu_set = *__cpu_set;
for_each_set_bit(cpu, (unsigned long *)&cpu_set,
sizeof(cpu_set) * BITS_PER_BYTE) {
if (cpu == ihk_mc_get_processor_id()) {
call_on_this_cpu = 1;
}
++nr_cpus;
if (nr_cpus == max_nr_cpus)
break;
}
if (!nr_cpus) {
return -EINVAL;
}
reqs = kmalloc(sizeof(*reqs) * nr_cpus, IHK_MC_AP_NOWAIT);
if (!reqs) {
ret = -ENOMEM;
goto free_out;
}
kprintf("%s: interrupting %d CPUs for SMP call..\n", __func__, nr_cpus);
sfcd.nr_cpus = nr_cpus;
sfcd.func = __func;
sfcd.arg = __arg;
ihk_atomic_set(&sfcd.cpus_left,
call_on_this_cpu ? nr_cpus - 1 : nr_cpus);
smp_wmb();
/* Add requests and send IPIs */
cpu_index = 0;
for_each_set_bit(cpu, (unsigned long *)&cpu_set,
sizeof(cpu_set) * BITS_PER_BYTE) {
unsigned long irq_flags;
reqs[cpu_index].cpu_index = cpu_index;
reqs[cpu_index].ret = 0;
if (cpu == ihk_mc_get_processor_id()) {
this_cpu_index = cpu_index;
++cpu_index;
continue;
}
reqs[cpu_index].sfcd = &sfcd;
irq_flags =
ihk_mc_spinlock_lock(&get_cpu_local_var(cpu)->smp_func_req_lock);
list_add_tail(&reqs[cpu_index].list,
&get_cpu_local_var(cpu)->smp_func_req_list);
ihk_mc_spinlock_unlock(&get_cpu_local_var(cpu)->smp_func_req_lock,
irq_flags);
dkprintf("%s: interrupting IRQ: %d -> CPU: %d\n", __func__,
ihk_mc_get_smp_handler_irq(), cpu);
ihk_mc_interrupt_cpu(cpu, ihk_mc_get_smp_handler_irq());
++cpu_index;
if (cpu_index == max_nr_cpus)
break;
}
/* Is this CPU involved? */
if (call_on_this_cpu) {
reqs[this_cpu_index].ret =
__func(this_cpu_index, nr_cpus, __arg);
}
dkprintf("%s: waiting for remote CPUs..\n", __func__);
/* Wait for the rest of the CPUs */
while (smp_load_acquire(&sfcd.cpus_left.counter) > 0) {
cpu_pause();
}
/* Check return values, if error, report the first non-zero */
for (cpu_index = 0; cpu_index < nr_cpus; ++cpu_index) {
if (reqs[cpu_index].ret != 0) {
ret = reqs[cpu_index].ret;
goto free_out;
}
}
kprintf("%s: all CPUs finished SMP call successfully\n", __func__);
ret = 0;
free_out:
kfree(reqs);
return ret;
}

View File

@ -58,16 +58,43 @@ struct cpu_local_var *get_cpu_local_var(int id)
return clv + id;
}
#ifdef ENABLE_FUGAKU_HACKS
void __show_context_stack(struct thread *thread,
unsigned long pc, uintptr_t sp, int kprintf_locked);
#endif
void preempt_enable(void)
{
#ifndef ENABLE_FUGAKU_HACKS
if (cpu_local_var_initialized)
--cpu_local_var(no_preempt);
#else
if (cpu_local_var_initialized) {
--cpu_local_var(no_preempt);
if (cpu_local_var(no_preempt) < 0) {
//cpu_disable_interrupt();
__kprintf("%s: %d\n", __func__, cpu_local_var(no_preempt));
__kprintf("TID: %d, call stack from builtin frame (most recent first):\n",
cpu_local_var(current)->tid);
__show_context_stack(cpu_local_var(current), (uintptr_t)&preempt_enable,
(unsigned long)__builtin_frame_address(0), 1);
//arch_cpu_stop();
//cpu_halt();
#ifdef ENABLE_FUGAKU_HACKS
panic("panic: negative preemption??");
#endif
}
}
#endif
}
void preempt_disable(void)
{
if (cpu_local_var_initialized)
if (cpu_local_var_initialized) {
++cpu_local_var(no_preempt);
}
}
int add_backlog(int (*func)(void *arg), void *arg)
@ -120,3 +147,10 @@ void do_backlog(void)
}
}
}
#ifdef ENABLE_FUGAKU_HACKS
ihk_spinlock_t *get_this_cpu_runq_lock(void)
{
return &get_this_cpu_local_var()->runq_lock;
}
#endif

View File

@ -113,7 +113,7 @@ int devobj_create(int fd, size_t len, off_t off, struct memobj **objp, int *maxp
__FUNCTION__, fd, len, off, result.handle, result.maxprot);
obj->memobj.ops = &devobj_ops;
obj->memobj.flags = MF_HAS_PAGER | MF_DEV_FILE;
obj->memobj.flags = MF_HAS_PAGER | MF_REMAP_FILE_PAGES | MF_DEV_FILE;
obj->memobj.size = len;
ihk_atomic_set(&obj->memobj.refcnt, 1);
obj->handle = result.handle;

View File

@ -236,6 +236,7 @@ int fileobj_create(int fd, struct memobj **objp, int *maxprotp, int flags,
memset(newobj, 0, sizeof(*newobj));
newobj->memobj.ops = &fileobj_ops;
newobj->memobj.flags = MF_HAS_PAGER | MF_REG_FILE |
MF_REMAP_FILE_PAGES |
((flags & MAP_PRIVATE) ? MF_PRIVATE : 0);
newobj->handle = result.handle;

View File

@ -62,7 +62,7 @@
#include <process.h>
#include <futex.h>
#include <jhash.h>
#include <mc_jhash.h>
#include <ihk/lock.h>
#include <ihk/atomic.h>
#include <list.h>
@ -72,39 +72,27 @@
#include <timer.h>
#include <ihk/debug.h>
#include <syscall.h>
//#define DEBUG_PRINT_FUTEX
#ifdef DEBUG_PRINT_FUTEX
#undef DDEBUG_DEFAULT
#define DDEBUG_DEFAULT DDEBUG_PRINT
#define uti_dkprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
#else
#define uti_dkprintf(...) do { } while (0)
#endif
#define uti_kprintf(...) do { ((clv_override && linux_printk) ? (*linux_printk) : kprintf)(__VA_ARGS__); } while (0)
#include <kmalloc.h>
#include <ikc/queue.h>
unsigned long ihk_mc_get_ns_per_tsc(void);
/*
* Hash buckets are shared by all the futex_keys that hash to the same
* location. Each key may have multiple futex_q structures, one for each task
* waiting on a futex.
*/
struct futex_hash_bucket {
ihk_spinlock_t lock;
struct plist_head chain;
};
struct futex_hash_bucket *futex_queues;
static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
extern struct ihk_ikc_channel_desc **ikc2linuxs;
struct futex_hash_bucket *get_futex_queues(void)
{
return futex_queues;
}
/*
* We hash on the keys returned from get_futex_key (see below).
*/
static struct futex_hash_bucket *hash_futex(union futex_key *key)
{
uint32_t hash = jhash2((uint32_t*)&key->both.word,
uint32_t hash = mc_jhash2((uint32_t *)&key->both.word,
(sizeof(key->both.word)+sizeof(key->both.ptr))/4,
key->both.offset);
return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
@ -157,11 +145,11 @@ static void drop_futex_key_refs(union futex_key *key)
* lock_page() might sleep, the caller should not hold a spinlock.
*/
static int
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key, struct cpu_local_var *clv_override)
get_futex_key(uint32_t *uaddr, int fshared, union futex_key *key)
{
unsigned long address = (unsigned long)uaddr;
unsigned long phys;
struct thread *thread = cpu_local_var_with_override(current, clv_override);
struct thread *thread = cpu_local_var(current);
struct process_vm *mm = thread->vm;
/*
@ -228,7 +216,7 @@ static int cmpxchg_futex_value_locked(uint32_t __user *uaddr, uint32_t uval, uin
* The hash bucket lock must be held when this is called.
* Afterwards, the futex_q must not be accessed.
*/
static void wake_futex(struct futex_q *q, struct cpu_local_var *clv_override)
static void wake_futex(struct futex_q *q)
{
struct thread *p = q->task;
@ -253,26 +241,30 @@ static void wake_futex(struct futex_q *q, struct cpu_local_var *clv_override)
if (q->uti_futex_resp) {
int rc;
uti_dkprintf("wake_futex(): waking up migrated-to-Linux thread (tid %d),uti_futex_resp=%p\n", p->tid, q->uti_futex_resp);
/* TODO: Add the case when a Linux thread waking up another Linux thread */
if (clv_override) {
uti_dkprintf("%s: ERROR: A Linux thread is waking up migrated-to-Linux thread\n", __FUNCTION__);
}
if (p->spin_sleep == 0) {
uti_dkprintf("%s: INFO: woken up by someone else\n", __FUNCTION__);
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel;
dkprintf("%s: waking up migrated-to-Linux thread (tid %d),uti_futex_resp=%p,linux_cpu: %d\n",
__func__, p->tid, q->uti_futex_resp, q->linux_cpu);
/* does this Linux CPU have a connected channel? */
if (ikc2linuxs[q->linux_cpu]) {
resp_channel = ikc2linuxs[q->linux_cpu];
} else {
resp_channel = cpu_local_var(ikc2linux);
}
struct ikc_scd_packet pckt;
struct ihk_ikc_channel_desc *resp_channel = cpu_local_var_with_override(ikc2linux, clv_override);
pckt.msg = SCD_MSG_FUTEX_WAKE;
pckt.futex.resp = q->uti_futex_resp;
pckt.futex.spin_sleep = &p->spin_sleep;
rc = ihk_ikc_send(resp_channel, &pckt, 0);
if (rc) {
uti_dkprintf("%s: ERROR: ihk_ikc_send returned %d, resp_channel=%p\n", __FUNCTION__, rc, resp_channel);
dkprintf("%s: ERROR: ihk_ikc_send returned %d, resp_channel=%p\n",
__func__, rc, resp_channel);
}
} else {
uti_dkprintf("wake_futex(): waking up McKernel thread (tid %d)\n", p->tid);
dkprintf("%s: waking up McKernel thread (tid %d)\n",
__func__, p->tid);
sched_wakeup_thread(p, PS_NORMAL);
}
}
@ -304,7 +296,8 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
/*
* Wake up waiters matching bitset queued on this futex (uaddr).
*/
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset, struct cpu_local_var *clv_override)
static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake,
uint32_t bitset)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
@ -316,7 +309,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!bitset)
return -EINVAL;
ret = get_futex_key(uaddr, fshared, &key, clv_override);
ret = get_futex_key(uaddr, fshared, &key);
if ((ret != 0))
goto out;
@ -332,7 +325,7 @@ static int futex_wake(uint32_t *uaddr, int fshared, int nr_wake, uint32_t bitset
if (!(this->bitset & bitset))
continue;
wake_futex(this, clv_override);
wake_futex(this);
if (++ret >= nr_wake)
break;
}
@ -350,8 +343,7 @@ out:
*/
static int
futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_wake2, int op,
struct cpu_local_var *clv_override)
int nr_wake, int nr_wake2, int op)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
struct futex_hash_bucket *hb1, *hb2;
@ -360,10 +352,10 @@ futex_wake_op(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int ret, op_ret;
retry:
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
ret = get_futex_key(uaddr1, fshared, &key1);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
ret = get_futex_key(uaddr2, fshared, &key2);
if ((ret != 0))
goto out_put_key1;
@ -397,7 +389,7 @@ retry_private:
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key1)) {
wake_futex(this, clv_override);
wake_futex(this);
if (++ret >= nr_wake)
break;
}
@ -409,7 +401,7 @@ retry_private:
op_ret = 0;
plist_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key2)) {
wake_futex(this, clv_override);
wake_futex(this);
if (++op_ret >= nr_wake2)
break;
}
@ -471,8 +463,8 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
* <0 - on error
*/
static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
int nr_wake, int nr_requeue, uint32_t *cmpval,
int requeue_pi, struct cpu_local_var *clv_override)
int nr_wake, int nr_requeue, uint32_t *cmpval,
int requeue_pi)
{
union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
int drop_count = 0, task_count = 0, ret;
@ -480,10 +472,10 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
struct plist_head *head1;
struct futex_q *this, *next;
ret = get_futex_key(uaddr1, fshared, &key1, clv_override);
ret = get_futex_key(uaddr1, fshared, &key1);
if ((ret != 0))
goto out;
ret = get_futex_key(uaddr2, fshared, &key2, clv_override);
ret = get_futex_key(uaddr2, fshared, &key2);
if ((ret != 0))
goto out_put_key1;
@ -518,7 +510,7 @@ static int futex_requeue(uint32_t *uaddr1, int fshared, uint32_t *uaddr2,
*/
/* RIKEN: no requeue_pi at this moment */
if (++task_count <= nr_wake) {
wake_futex(this, clv_override);
wake_futex(this);
continue;
}
@ -577,9 +569,12 @@ queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
* state is implicit in the state of woken task (see futex_wait_requeue_pi() for
* an example).
*/
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb, struct cpu_local_var *clv_override)
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
{
int prio;
struct thread *thread = cpu_local_var(current);
ihk_spinlock_t *_runq_lock = &cpu_local_var(runq_lock);
unsigned int *_flags = &cpu_local_var(flags);
/*
* The priority used to register this element is
@ -598,7 +593,19 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb, str
q->list.plist.spinlock = &hb->lock;
#endif
plist_add(&q->list, &hb->chain);
q->task = cpu_local_var_with_override(current, clv_override);
/* Store information about wait thread for uti-futex*/
q->task = thread;
q->th_spin_sleep_pa = virt_to_phys((void *)&thread->spin_sleep);
q->th_status_pa = virt_to_phys((void *)&thread->status);
q->th_spin_sleep_lock_pa = virt_to_phys((void *)&thread->spin_sleep_lock);
q->proc_status_pa = virt_to_phys((void *)&thread->proc->status);
q->proc_update_lock_pa = virt_to_phys((void *)&thread->proc->update_lock);
q->runq_lock_pa = virt_to_phys((void *)_runq_lock);
q->clv_flags_pa = virt_to_phys((void *)_flags);
q->intr_id = ihk_mc_get_interrupt_id(thread->cpu_id);
q->intr_vector = ihk_mc_get_vector(IHK_GV_IKC);
ihk_mc_spinlock_unlock_noirq(&hb->lock);
}
@ -661,12 +668,12 @@ retry:
/* RIKEN: this function has been rewritten so that it returns the remaining
* time in case we are waken.
*/
static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
uint64_t timeout, struct cpu_local_var *clv_override)
static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb,
struct futex_q *q, uint64_t timeout)
{
int64_t time_remain = 0;
unsigned long irqstate;
struct thread *thread = cpu_local_var_with_override(current, clv_override);
struct thread *thread = cpu_local_var(current);
/*
* The task state is guaranteed to be set before another task can
* wake it.
@ -685,25 +692,9 @@ static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q
ihk_mc_spinlock_unlock(&thread->spin_sleep_lock, irqstate);
}
queue_me(q, hb, clv_override);
queue_me(q, hb);
if (!plist_node_empty(&q->list)) {
if (clv_override) {
uti_dkprintf("%s: tid: %d is trying to sleep\n", __FUNCTION__, thread->tid);
/* Note that the unit of timeout is nsec */
time_remain = (*linux_wait_event)(q->uti_futex_resp, timeout);
/* Note that time_remain == 0 indicates contidion evaluated to false after the timeout elapsed */
if (time_remain < 0) {
if (time_remain == -ERESTARTSYS) { /* Interrupted by signal */
uti_dkprintf("%s: DEBUG: wait_event returned -ERESTARTSYS\n", __FUNCTION__);
} else {
uti_kprintf("%s: ERROR: wait_event returned %d\n", __FUNCTION__, time_remain);
}
}
uti_dkprintf("%s: tid: %d woken up\n", __FUNCTION__, thread->tid);
} else {
if (timeout) {
dkprintf("futex_wait_queue_me(): tid: %d schedule_timeout()\n", thread->tid);
time_remain = schedule_timeout(timeout);
@ -714,7 +705,6 @@ static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q
time_remain = 0;
}
dkprintf("futex_wait_queue_me(): tid: %d woken up\n", thread->tid);
}
}
/* This does not need to be serialized */
@ -742,8 +732,7 @@ static int64_t futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
struct futex_q *q, struct futex_hash_bucket **hb,
struct cpu_local_var *clv_override)
struct futex_q *q, struct futex_hash_bucket **hb)
{
uint32_t uval;
int ret;
@ -766,7 +755,7 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
* rare, but normal.
*/
q->key = FUTEX_KEY_INIT;
ret = get_futex_key(uaddr, fshared, &q->key, clv_override);
ret = get_futex_key(uaddr, fshared, &q->key);
if (ret != 0)
return ret;
@ -790,8 +779,7 @@ static int futex_wait_setup(uint32_t __user *uaddr, uint32_t val, int fshared,
}
static int futex_wait(uint32_t __user *uaddr, int fshared,
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt,
struct cpu_local_var *clv_override)
uint32_t val, uint64_t timeout, uint32_t bitset, int clockrt)
{
struct futex_hash_bucket *hb;
int64_t time_remain;
@ -802,57 +790,55 @@ static int futex_wait(uint32_t __user *uaddr, int fshared,
if (!bitset)
return -EINVAL;
if (!clv_override) {
q = &lq;
}
else {
q = &cpu_local_var_with_override(current,
clv_override)->futex_q;
}
q = &lq;
#ifdef PROFILE_ENABLE
if (cpu_local_var_with_override(current, clv_override)->profile &&
cpu_local_var_with_override(current, clv_override)->profile_start_ts) {
cpu_local_var_with_override(current, clv_override)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var_with_override(current, clv_override)->profile_start_ts);
cpu_local_var_with_override(current, clv_override)->profile_start_ts = 0;
if (cpu_local_var(current)->profile &&
cpu_local_var(current)->profile_start_ts) {
cpu_local_var(current)->profile_elapsed_ts +=
(rdtsc() - cpu_local_var(current)->profile_start_ts);
cpu_local_var(current)->profile_start_ts = 0;
}
#endif
q->bitset = bitset;
q->requeue_pi_key = NULL;
q->uti_futex_resp = cpu_local_var_with_override(uti_futex_resp,
clv_override);
q->uti_futex_resp = cpu_local_var(uti_futex_resp);
retry:
/* Prepare to wait on uaddr. */
ret = futex_wait_setup(uaddr, val, fshared, q, &hb, clv_override);
ret = futex_wait_setup(uaddr, val, fshared, q, &hb);
if (ret) {
uti_dkprintf("%s: tid=%d futex_wait_setup returns zero, no need to sleep\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
dkprintf("%s: tid=%d futex_wait_setup returns zero, no need to sleep\n",
__func__, cpu_local_var(current)->tid);
goto out;
}
/* queue_me and wait for wakeup, timeout, or a signal. */
time_remain = futex_wait_queue_me(hb, q, timeout, clv_override);
time_remain = futex_wait_queue_me(hb, q, timeout);
/* If we were woken (and unqueued), we succeeded, whatever. */
ret = 0;
if (!unqueue_me(q)) {
uti_dkprintf("%s: tid=%d unqueued\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
dkprintf("%s: tid=%d unqueued\n",
__func__, cpu_local_var(current)->tid);
goto out_put_key;
}
ret = -ETIMEDOUT;
/* RIKEN: timer expired case (indicated by !time_remain) */
if (timeout && !time_remain) {
uti_dkprintf("%s: tid=%d timer expired\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
dkprintf("%s: tid=%d timer expired\n",
__func__, cpu_local_var(current)->tid);
goto out_put_key;
}
/* RIKEN: futex_wait_queue_me() returns -ERESTARTSYS when waiting on Linux CPU and woken up by signal */
if (hassigpending(cpu_local_var_with_override(current, clv_override)) || time_remain == -ERESTARTSYS) {
if (hassigpending(cpu_local_var(current)) ||
time_remain == -ERESTARTSYS) {
ret = -EINTR;
uti_dkprintf("%s: tid=%d woken up by signal\n", __FUNCTION__, cpu_local_var_with_override(current, clv_override)->tid);
dkprintf("%s: tid=%d woken up by signal\n",
__func__, cpu_local_var(current)->tid);
goto out_put_key;
}
@ -864,21 +850,22 @@ out_put_key:
put_futex_key(fshared, &q->key);
out:
#ifdef PROFILE_ENABLE
if (cpu_local_var_with_override(current, clv_override)->profile) {
cpu_local_var_with_override(current, clv_override)->profile_start_ts = rdtsc();
if (cpu_local_var(current)->profile) {
cpu_local_var(current)->profile_start_ts = rdtsc();
}
#endif
return ret;
}
int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared,
struct cpu_local_var *clv_override)
uint32_t *uaddr2, uint32_t val2, uint32_t val3, int fshared)
{
int clockrt, ret = -ENOSYS;
int cmd = op & FUTEX_CMD_MASK;
uti_dkprintf("%s: uaddr=%p, op=%x, val=%x, timeout=%ld, uaddr2=%p, val2=%x, val3=%x, fshared=%d, clv=%p\n", __FUNCTION__, uaddr, op, val, timeout, uaddr2, val2, val3, fshared, clv_override);
dkprintf("%s: uaddr=%p, op=%x, val=%x, timeout=%ld, uaddr2=%p, val2=%x, val3=%x, fshared=%d\n",
__func__, uaddr, op, val, timeout, uaddr2,
val2, val3, fshared);
clockrt = op & FUTEX_CLOCK_REALTIME;
if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
@ -888,21 +875,23 @@ int futex(uint32_t *uaddr, int op, uint32_t val, uint64_t timeout,
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt, clv_override);
ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAKE_BITSET:
ret = futex_wake(uaddr, fshared, val, val3, clv_override);
ret = futex_wake(uaddr, fshared, val, val3);
break;
case FUTEX_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0, clv_override);
ret = futex_requeue(uaddr, fshared, uaddr2,
val, val2, NULL, 0);
break;
case FUTEX_CMP_REQUEUE:
ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3, 0, clv_override);
ret = futex_requeue(uaddr, fshared, uaddr2,
val, val2, &val3, 0);
break;
case FUTEX_WAKE_OP:
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3, clv_override);
ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
break;
/* RIKEN: these calls are not supported for now.
case FUTEX_LOCK_PI:
@ -942,7 +931,9 @@ int futex_init(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
futex_queues = kmalloc(sizeof(struct futex_hash_bucket) *
(1 << FUTEX_HASHBITS), IHK_MC_AP_NOWAIT);
for (i = 0; i < (1 << FUTEX_HASHBITS); i++) {
plist_head_init(&futex_queues[i].chain, &futex_queues[i].lock);
ihk_mc_spinlock_init(&futex_queues[i].lock);
}

View File

@ -43,7 +43,7 @@
#endif
/* Linux channel table, indexec by Linux CPU id */
static struct ihk_ikc_channel_desc **ikc2linuxs = NULL;
struct ihk_ikc_channel_desc **ikc2linuxs;
void check_mapping_for_proc(struct thread *thread, unsigned long addr)
{
@ -542,10 +542,43 @@ static int process_msg_prepare_process(unsigned long rphys)
}
vm->numa_mem_policy = MPOL_BIND;
}
else if (pn->mpol_mode != MPOL_MAX) {
int bit;
vm->numa_mem_policy = pn->mpol_mode;
memset(&vm->numa_mask, 0, sizeof(vm->numa_mask));
for_each_set_bit(bit, pn->mpol_nodemask,
PLD_PROCESS_NUMA_MASK_BITS) {
if (bit >= ihk_mc_get_nr_numa_nodes()) {
kprintf("%s: error: NUMA id %d is larger than mask size!\n",
__func__, bit);
return -EINVAL;
}
set_bit(bit, &vm->numa_mask[0]);
}
dkprintf("%s: numa_mem_policy: %d, numa_mask: %ld\n",
__func__, vm->numa_mem_policy, vm->numa_mask[0]);
}
proc->enable_uti = pn->enable_uti;
proc->uti_thread_rank = pn->uti_thread_rank;
proc->uti_use_last_cpu = pn->uti_use_last_cpu;
proc->straight_map = pn->straight_map;
proc->straight_map_threshold = pn->straight_map_threshold;
#ifdef ENABLE_TOFU
proc->enable_tofu = pn->enable_tofu;
if (proc->enable_tofu) {
extern void tof_utofu_finalize(void);
tof_utofu_finalize();
}
#endif
#ifdef PROFILE_ENABLE
proc->profile = pn->profile;
thread->profile = pn->profile;
@ -756,7 +789,11 @@ out_remote_pf:
syscall_channel_send(resp_channel, &pckt);
rc = do_kill(NULL, info.pid, info.tid, info.sig, &info.info, 0);
#ifndef ENABLE_FUGAKU_HACKS
dkprintf("SCD_MSG_SEND_SIGNAL: do_kill(pid=%d, tid=%d, sig=%d)=%d\n", info.pid, info.tid, info.sig, rc);
#else
kprintf("SCD_MSG_SEND_SIGNAL: do_kill(pid=%d, tid=%d, sig=%d)=%d\n", info.pid, info.tid, info.sig, rc);
#endif
ret = 0;
break;
@ -766,12 +803,36 @@ out_remote_pf:
ret = 0;
break;
case SCD_MSG_CLEANUP_PROCESS:
case SCD_MSG_CLEANUP_PROCESS: {
extern int process_cleanup_before_terminate(int pid);
dkprintf("SCD_MSG_CLEANUP_PROCESS pid=%d, thread=0x%llx\n",
packet->pid, packet->arg);
pckt.msg = SCD_MSG_CLEANUP_PROCESS_RESP;
pckt.err = process_cleanup_before_terminate(packet->pid);
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
syscall_channel_send(resp_channel, &pckt);
terminate_host(packet->pid, (struct thread *)packet->arg);
ret = 0;
break;
}
case SCD_MSG_CLEANUP_FD: {
extern int process_cleanup_fd(int pid, int fd);
pckt.msg = SCD_MSG_CLEANUP_FD_RESP;
pckt.err = process_cleanup_fd(packet->pid, packet->arg);
dkprintf("SCD_MSG_CLEANUP_FD pid=%d, fd=%d -> err: %d\n",
packet->pid, packet->arg, pckt.err);
pckt.ref = packet->ref;
pckt.arg = packet->arg;
pckt.reply = packet->reply;
syscall_channel_send(resp_channel, &pckt);
ret = 0;
break;
}
case SCD_MSG_DEBUG_LOG:
dkprintf("SCD_MSG_DEBUG_LOG code=%lx\n", packet->arg);

View File

@ -85,7 +85,11 @@ static int hugefileobj_get_page(struct memobj *memobj, off_t off,
}
memset(obj->pages[pgind], 0, obj->pgsize);
#ifndef ENABLE_FUGAKU_HACKS
dkprintf("%s: obj: 0x%lx, allocated page for off: %lu"
#else
kprintf("%s: obj: 0x%lx, allocated page for off: %lu"
#endif
" (ind: %d), page size: %lu\n",
__func__, obj, off, pgind, obj->pgsize);
}
@ -274,13 +278,51 @@ int hugefileobj_create(struct memobj *memobj, size_t len, off_t off,
obj->nr_pages = nr_pages;
obj->pages = pages;
#ifndef ENABLE_FUGAKU_HACKS
dkprintf("%s: obj: 0x%lx, VA: 0x%lx, page array allocated"
#else
kprintf("%s: obj: 0x%lx, VA: 0x%lx, page array allocated"
#endif
" for %d pages, pagesize: %lu\n",
__func__,
obj,
virt_addr,
nr_pages,
obj->pgsize);
#ifdef ENABLE_FUGAKU_HACKS
if (!hugetlbfs_on_demand) {
int pgind;
int npages;
#ifndef ENABLE_FUGAKU_HACKS
for (pgind = 0; pgind < obj->nr_pages; ++pgind) {
#else
/* Map in only the last 8 pages */
for (pgind = ((obj->nr_pages > 8) ? (obj->nr_pages - 8) : 0);
pgind < obj->nr_pages; ++pgind) {
#endif
if (obj->pages[pgind]) {
continue;
}
npages = obj->pgsize >> PAGE_SHIFT;
obj->pages[pgind] = ihk_mc_alloc_aligned_pages_user(npages,
obj->pgshift - PTL1_SHIFT,
IHK_MC_AP_NOWAIT | IHK_MC_AP_USER, 0);
if (!obj->pages[pgind]) {
kprintf("%s: error: could not allocate page for off: %lu"
", page size: %lu\n", __func__, off, obj->pgsize);
continue;
}
memset(obj->pages[pgind], 0, obj->pgsize);
dkprintf("%s: obj: 0x%lx, pre-allocated page for off: %lu"
" (ind: %d), page size: %lu\n",
__func__, obj, off, pgind, obj->pgsize);
}
}
#endif
}
obj->memobj.size = len;

View File

@ -20,10 +20,17 @@
* CPU Local Storage (cls)
*/
struct kmalloc_cache_header {
struct kmalloc_cache_header *next;
};
struct kmalloc_header {
unsigned int front_magic;
int cpu_id;
struct list_head list;
union {
struct list_head list;
struct kmalloc_cache_header *cache;
};
int size; /* The size of this chunk without the header */
unsigned int end_magic;
/* 32 bytes */
@ -79,6 +86,7 @@ struct cpu_local_var {
ihk_spinlock_t runq_lock;
unsigned long runq_irqstate;
struct thread *current;
void *kernel_mode_pf_regs;
int prevpid;
struct list_head runq;
size_t runq_len;
@ -98,6 +106,7 @@ struct cpu_local_var {
ihk_spinlock_t migq_lock;
struct list_head migq;
int in_interrupt;
int in_page_fault;
int no_preempt;
int timer_enabled;
unsigned long nr_ctx_switches;

View File

@ -128,6 +128,26 @@
struct process_vm;
static inline int get_futex_value_locked(uint32_t *dest, uint32_t *from)
{
*dest = *(volatile uint32_t *)from;
return 0;
}
/*
* Hash buckets are shared by all the futex_keys that hash to the same
* location. Each key may have multiple futex_q structures, one for each task
* waiting on a futex.
*/
struct futex_hash_bucket {
ihk_spinlock_t lock;
struct plist_head chain;
};
struct futex_hash_bucket *get_futex_queues(void);
union futex_key {
struct {
unsigned long pgoff;
@ -161,8 +181,7 @@ futex(
uint32_t __user * uaddr2,
uint32_t val2,
uint32_t val3,
int fshared,
struct cpu_local_var *clv_override
int fshared
);
@ -196,6 +215,28 @@ struct futex_q {
/* Used to wake-up a thread running on a Linux CPU */
void *uti_futex_resp;
/* Used to send IPI directly to the waiter CPU */
int linux_cpu;
/* Used to wake-up a thread running on a McKernel from Linux */
void *th_spin_sleep;
void *th_status;
void *th_spin_sleep_lock;
void *proc_status;
void *proc_update_lock;
void *runq_lock;
void *clv_flags;
int intr_id;
int intr_vector;
unsigned long th_spin_sleep_pa;
unsigned long th_status_pa;
unsigned long th_spin_sleep_lock_pa;
unsigned long proc_status_pa;
unsigned long proc_update_lock_pa;
unsigned long runq_lock_pa;
unsigned long clv_flags_pa;
};
#endif

View File

@ -1,158 +0,0 @@
#ifndef _LINUX_JHASH_H
#define _LINUX_JHASH_H
/**
* \file futex.c
* Licence details are found in the file LICENSE.
*
* \brief
* Adaptation to McKernel
*
* \author Balazs Gerofi <bgerofi@riken.jp> \par
* Copyright (C) 2012 RIKEN AICS
*
*
* HISTORY:
*/
/*
* jhash.h: Jenkins hash support.
*
* Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
*
* http://burtleburtle.net/bob/hash/
*
* These are the credits from Bob's sources:
*
* lookup2.c, by Bob Jenkins, December 1996, Public Domain.
* hash(), hash2(), hash3, and mix() are externally useful functions.
* Routines to test the hash are included if SELF_TEST is defined.
* You can use this free for any purpose. It has no warranty.
*
* Copyright (C) 2003 David S. Miller (davem@redhat.com)
*
* I've modified Bob's hash to be useful in the Linux kernel, and
* any bugs present are surely my fault. -DaveM
*
*/
/* NOTE: Arguments are modified. */
#define __jhash_mix(a, b, c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
/* The golden ration: an arbitrary value */
#define JHASH_GOLDEN_RATIO 0x9e3779b9
/* The most generic version, hashes an arbitrary sequence
* of bytes. No alignment or length assumptions are made about
* the input key.
*/
static inline uint32_t jhash(const void *key, uint32_t length, uint32_t initval)
{
uint32_t a, b, c, len;
const uint8_t *k = key;
len = length;
a = b = JHASH_GOLDEN_RATIO;
c = initval;
while (len >= 12) {
a += (k[0] +((uint32_t)k[1]<<8) +((uint32_t)k[2]<<16) +((uint32_t)k[3]<<24));
b += (k[4] +((uint32_t)k[5]<<8) +((uint32_t)k[6]<<16) +((uint32_t)k[7]<<24));
c += (k[8] +((uint32_t)k[9]<<8) +((uint32_t)k[10]<<16)+((uint32_t)k[11]<<24));
__jhash_mix(a,b,c);
k += 12;
len -= 12;
}
c += length;
switch (len) {
case 11: c += ((uint32_t)k[10]<<24);
case 10: c += ((uint32_t)k[9]<<16);
case 9 : c += ((uint32_t)k[8]<<8);
case 8 : b += ((uint32_t)k[7]<<24);
case 7 : b += ((uint32_t)k[6]<<16);
case 6 : b += ((uint32_t)k[5]<<8);
case 5 : b += k[4];
case 4 : a += ((uint32_t)k[3]<<24);
case 3 : a += ((uint32_t)k[2]<<16);
case 2 : a += ((uint32_t)k[1]<<8);
case 1 : a += k[0];
};
__jhash_mix(a,b,c);
return c;
}
/* A special optimized version that handles 1 or more of uint32_ts.
* The length parameter here is the number of uint32_ts in the key.
*/
static inline uint32_t jhash2(const uint32_t *k, uint32_t length, uint32_t initval)
{
uint32_t a, b, c, len;
a = b = JHASH_GOLDEN_RATIO;
c = initval;
len = length;
while (len >= 3) {
a += k[0];
b += k[1];
c += k[2];
__jhash_mix(a, b, c);
k += 3; len -= 3;
}
c += length * 4;
switch (len) {
case 2 : b += k[1];
case 1 : a += k[0];
};
__jhash_mix(a,b,c);
return c;
}
/* A special ultra-optimized versions that knows they are hashing exactly
* 3, 2 or 1 word(s).
*
* NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
* done at the end is not done here.
*/
static inline uint32_t jhash_3words(uint32_t a, uint32_t b, uint32_t c, uint32_t initval)
{
a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
c += initval;
__jhash_mix(a, b, c);
return c;
}
static inline uint32_t jhash_2words(uint32_t a, uint32_t b, uint32_t initval)
{
return jhash_3words(a, b, 0, initval);
}
static inline uint32_t jhash_1word(uint32_t a, uint32_t initval)
{
return jhash_3words(a, 0, 0, initval);
}
#endif /* _LINUX_JHASH_H */

View File

@ -36,4 +36,98 @@ int memcheckall(void);
int freecheck(int runcount);
void kmalloc_consolidate_free_list(void);
#ifndef unlikely
#define unlikely(x) __builtin_expect(!!(x), 0)
#endif
/*
* Generic lockless kmalloc cache.
*/
static inline void kmalloc_cache_free(void *elem)
{
struct kmalloc_cache_header *current = NULL;
struct kmalloc_cache_header *new =
(struct kmalloc_cache_header *)elem;
struct kmalloc_header *header;
register struct kmalloc_cache_header *cache;
if (unlikely(!elem))
return;
/* Get cache pointer from kmalloc header */
header = (struct kmalloc_header *)((void *)elem -
sizeof(struct kmalloc_header));
if (unlikely(!header->cache)) {
kprintf("%s: WARNING: no cache for 0x%lx\n",
__func__, elem);
return;
}
cache = header->cache;
retry:
current = cache->next;
new->next = current;
if (!__sync_bool_compare_and_swap(&cache->next, current, new)) {
goto retry;
}
}
static inline void kmalloc_cache_prealloc(struct kmalloc_cache_header *cache,
size_t size, int nr_elem)
{
struct kmalloc_cache_header *elem;
int i;
if (unlikely(cache->next))
return;
for (i = 0; i < nr_elem; ++i) {
struct kmalloc_header *header;
elem = (struct kmalloc_cache_header *)
kmalloc(size, IHK_MC_AP_NOWAIT);
if (!elem) {
kprintf("%s: ERROR: allocating cache element\n", __func__);
continue;
}
/* Store cache pointer in kmalloc_header */
header = (struct kmalloc_header *)((void *)elem -
sizeof(struct kmalloc_header));
header->cache = cache;
kmalloc_cache_free(elem);
}
}
static inline void *kmalloc_cache_alloc(struct kmalloc_cache_header *cache,
size_t size)
{
register struct kmalloc_cache_header *first, *next;
retry:
next = NULL;
first = cache->next;
if (first) {
next = first->next;
if (!__sync_bool_compare_and_swap(&cache->next,
first, next)) {
goto retry;
}
}
else {
kprintf("%s: calling pre-alloc for 0x%lx...\n", __func__, cache);
kmalloc_cache_prealloc(cache, size, 384);
goto retry;
}
return (void *)first;
}
#endif

84
kernel/include/kref.h Normal file
View File

@ -0,0 +1,84 @@
/*
* kref.h - library routines for handling generic reference counted objects
* (based on Linux implementation)
*
* This file is released under the GPLv2.
*
*/
#ifndef _KREF_H_
#define _KREF_H_
#include <ihk/atomic.h>
#include <ihk/lock.h>
/*
* Bit 30 marks a kref as McKernel internal.
* This can be used to distinguish krefs from Linux and
* it also ensures that a non deallocated kref will not
* crash the Linux allocator.
*/
#define MCKERNEL_KREF_MARK (1U << 30)
struct kref {
ihk_atomic_t refcount;
};
#define KREF_INIT(n) { .refcount = IHK_ATOMIC_INIT(MCKERNEL_KREF_MARK + n), }
/**
* kref_init - initialize object.
* @kref: object in question.
*/
static inline void kref_init(struct kref *kref)
{
ihk_atomic_set(&kref->refcount, MCKERNEL_KREF_MARK + 1);
}
static inline unsigned int kref_read(const struct kref *kref)
{
return (ihk_atomic_read(&kref->refcount) & ~(MCKERNEL_KREF_MARK));
}
static inline unsigned int kref_is_mckernel(const struct kref *kref)
{
return (ihk_atomic_read(&kref->refcount) & (MCKERNEL_KREF_MARK));
}
/**
* kref_get - increment refcount for object.
* @kref: object.
*/
static inline void kref_get(struct kref *kref)
{
ihk_atomic_inc(&kref->refcount);
}
/**
* kref_put - decrement refcount for object.
* @kref: object.
* @release: pointer to the function that will clean up the object when the
* last reference to the object is released.
* This pointer is required, and it is not acceptable to pass kfree
* in as this function. If the caller does pass kfree to this
* function, you will be publicly mocked mercilessly by the kref
* maintainer, and anyone else who happens to notice it. You have
* been warned.
*
* Decrement the refcount, and if 0, call release().
* Return 1 if the object was removed, otherwise return 0. Beware, if this
* function returns 0, you still can not count on the kref from remaining in
* memory. Only use the return value if you want to see if the kref is now
* gone, not present.
*/
static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref))
{
//if (ihk_atomic_dec_and_test(&kref->refcount)) {
if (ihk_atomic_sub_return(1, &kref->refcount) == MCKERNEL_KREF_MARK) {
release(kref);
return 1;
}
return 0;
}
#endif /* _KREF_H_ */

88
kernel/include/mc_jhash.h Normal file
View File

@ -0,0 +1,88 @@
#ifndef _MC_JHASH_H
#define _MC_JHASH_H
/**
* \file mc_jhash.h
* Licence details are found in the file LICENSE.
*
* \brief
* Adaptation to McKernel
*
* \author Balazs Gerofi <bgerofi@riken.jp> \par
* Copyright (C) 2012 RIKEN AICS
*
*
* HISTORY:
*/
/*
* jhash.h: Jenkins hash support.
*
* Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
*
* http://burtleburtle.net/bob/hash/
*
* These are the credits from Bob's sources:
*
* lookup2.c, by Bob Jenkins, December 1996, Public Domain.
* hash(), hash2(), hash3, and mix() are externally useful functions.
* Routines to test the hash are included if SELF_TEST is defined.
* You can use this free for any purpose. It has no warranty.
*
* Copyright (C) 2003 David S. Miller (davem@redhat.com)
*
* I've modified Bob's hash to be useful in the Linux kernel, and
* any bugs present are surely my fault. -DaveM
*
*/
/* NOTE: Arguments are modified. */
#define __mc_jhash_mix(a, b, c) \
{ \
a -= b; a -= c; a ^= (c>>13); \
b -= c; b -= a; b ^= (a<<8); \
c -= a; c -= b; c ^= (b>>13); \
a -= b; a -= c; a ^= (c>>12); \
b -= c; b -= a; b ^= (a<<16); \
c -= a; c -= b; c ^= (b>>5); \
a -= b; a -= c; a ^= (c>>3); \
b -= c; b -= a; b ^= (a<<10); \
c -= a; c -= b; c ^= (b>>15); \
}
/* The golden ration: an arbitrary value */
#define JHASH_GOLDEN_RATIO 0x9e3779b9
/* A special optimized version that handles 1 or more of uint32_ts.
* The length parameter here is the number of uint32_ts in the key.
*/
static inline uint32_t mc_jhash2(const uint32_t *k, uint32_t length, uint32_t initval)
{
uint32_t a, b, c, len;
a = b = JHASH_GOLDEN_RATIO;
c = initval;
len = length;
while (len >= 3) {
a += k[0];
b += k[1];
c += k[2];
__mc_jhash_mix(a, b, c);
k += 3; len -= 3;
}
c += length * 4;
switch (len) {
case 2:
b += k[1];
case 1:
a += k[0];
};
__mc_jhash_mix(a, b, c);
return c;
}
#endif /* _MC_JHASH_H */

View File

@ -37,6 +37,7 @@ enum {
MF_SHM = 0x40000,
MF_HUGETLBFS = 0x100000,
MF_PRIVATE = 0x200000, /* To prevent flush in clear_range_* */
MF_REMAP_FILE_PAGES = 0x400000, /* remap_file_pages possible */
};
#define MEMOBJ_READY 0
@ -181,4 +182,11 @@ static inline int is_freeable(struct memobj *memobj)
return 1;
}
static inline int is_callable_remap_file_pages(struct memobj *memobj)
{
if (!memobj || !(memobj->flags & MF_REMAP_FILE_PAGES))
return 0;
return 1;
}
#endif /* HEADER_MEMOBJ_H */

View File

@ -79,4 +79,14 @@
extern int sysctl_overcommit_memory;
/*
* This looks more complex than it should be. But we need to
* get the type for the ~ right in round_down (it needs to be
* as wide as the result!), and we want to evaluate the macro
* arguments just once each.
*/
#define __round_mask(x, y) ((__typeof__(x))((y)-1))
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
#define round_down(x, y) ((x) & ~__round_mask(x, y))
#endif /* HEADER_MMAN_H */

View File

@ -69,4 +69,7 @@ static inline int page_is_multi_mapped(struct page *page)
/* Should we take page faults on ANONYMOUS mappings? */
extern int anon_on_demand;
#ifdef ENABLE_FUGAKU_HACKS
extern int hugetlbfs_on_demand;
#endif
#endif

View File

@ -390,10 +390,14 @@ struct vm_range {
struct rb_node vm_rb_node;
unsigned long start, end;
unsigned long flag;
unsigned long straight_start;
struct memobj *memobj;
off_t objoff;
int pgshift; /* page size. 0 means THP */
int padding;
#ifdef ENABLE_TOFU
struct list_head tofu_stag_list;
#endif
void *private_data;
};
@ -402,6 +406,7 @@ struct vm_range_numa_policy {
unsigned long start, end;
DECLARE_BITMAP(numa_mask, PROCESS_NUMA_MASK_BITS);
int numa_mem_policy;
int il_prev;
};
struct vm_regions {
@ -558,11 +563,20 @@ struct process {
size_t mpol_threshold;
unsigned long heap_extension;
unsigned long mpol_bind_mask;
int mpol_mode;
int enable_uti;
int uti_thread_rank; /* Spawn on Linux CPU when clone_count reaches this */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int clone_count;
int thp_disable;
int straight_map;
#ifdef ENABLE_TOFU
int enable_tofu;
#endif
size_t straight_map_threshold;
// perf_event
int perf_status;
#define PP_NONE 0
@ -578,8 +592,18 @@ struct process {
#endif // PROFILE_ENABLE
int nr_processes; /* For partitioned execution */
int process_rank; /* Rank in partition */
void *straight_va;
size_t straight_len;
unsigned long straight_pa;
int coredump_barrier_count, coredump_barrier_count2;
mcs_rwlock_lock_t coredump_lock; // lock for coredump
#ifdef ENABLE_TOFU
#define MAX_FD_PDE 1024
void *fd_pde_data[MAX_FD_PDE];
char *fd_path[MAX_FD_PDE];
#endif
};
/*
@ -737,9 +761,17 @@ struct thread {
void *coredump_regs;
struct waitq coredump_wq;
int coredump_status;
#ifdef ENABLE_TOFU
/* Path of file being opened */
char *fd_path_in_open;
#endif
};
#define VM_RANGE_CACHE_SIZE 4
#ifdef ENABLE_TOFU
#define TOFU_STAG_HASH_SIZE 4
#endif
struct process_vm {
struct address_space *address_space;
@ -767,11 +799,18 @@ struct process_vm {
long currss;
DECLARE_BITMAP(numa_mask, PROCESS_NUMA_MASK_BITS);
int numa_mem_policy;
int il_prev;
/* Protected by memory_range_lock */
struct rb_root vm_range_numa_policy_tree;
struct vm_range *range_cache[VM_RANGE_CACHE_SIZE];
int range_cache_ind;
struct swapinfo *swapinfo;
#ifdef ENABLE_TOFU
/* Tofu STAG hash */
ihk_spinlock_t tofu_stag_lock;
struct list_head tofu_stag_hash[TOFU_STAG_HASH_SIZE];
#endif
};
static inline int has_cap_ipc_lock(struct thread *th)

View File

@ -1,9 +1,6 @@
#ifndef __PROCESS_PROFILE_H_
#define __PROCESS_PROFILE_H_
/* Uncomment this to enable profiling */
#define PROFILE_ENABLE
#ifdef PROFILE_ENABLE
#define PROFILE_SYSCALL_MAX 2000
#define PROFILE_OFFLOAD_MAX (PROFILE_SYSCALL_MAX << 1)
@ -40,14 +37,25 @@ enum profile_event_type {
PROFILE_remote_page_fault,
PROFILE_mpol_alloc_missed,
PROFILE_mmap_anon_contig_phys,
PROFILE_mmap_anon_straight,
PROFILE_mmap_anon_not_straight,
PROFILE_mmap_anon_no_contig_phys,
PROFILE_mmap_regular_file,
PROFILE_mmap_device_file,
PROFILE_tofu_stag_alloc,
PROFILE_tofu_stag_alloc_new_steering,
PROFILE_tofu_stag_alloc_new_steering_alloc_mbpt,
PROFILE_tofu_stag_alloc_new_steering_update_mbpt,
PROFILE_tofu_stag_free_stags,
PROFILE_tofu_stag_free_stag,
PROFILE_tofu_stag_free_stag_pre,
PROFILE_tofu_stag_free_stag_cqflush,
PROFILE_tofu_stag_free_stag_dealloc,
PROFILE_tofu_stag_free_stag_dealloc_free_pages,
PROFILE_EVENT_MAX /* Should be the last event type */
};
#define __NR_profile PROFILE_EVENT_MAX
#ifdef __KERNEL__
struct thread;
struct process;
@ -61,6 +69,77 @@ int profile_accumulate_and_print_job_events(struct process *proc);
int profile_alloc_events(struct thread *thread);
void profile_dealloc_thread_events(struct thread *thread);
void profile_dealloc_proc_events(struct process *proc);
#else // User space interface
#include <unistd.h>
#include <sys/syscall.h>
#define __NR_profile PROFILE_EVENT_MAX
/* Per-thread */
static inline void mckernel_profile_thread_on(void)
{
syscall(__NR_profile, PROF_ON);
}
static inline void mckernel_profile_thread_off(void)
{
syscall(__NR_profile, PROF_OFF);
}
static inline void mckernel_profile_thread_print(void)
{
syscall(__NR_profile, PROF_PRINT);
}
static inline void mckernel_profile_thread_print_off(void)
{
syscall(__NR_profile, PROF_OFF | PROF_PRINT);
}
/* Per-process */
static inline void mckernel_profile_process_on(void)
{
syscall(__NR_profile, PROF_PROC | PROF_ON);
}
static inline void mckernel_profile_process_off(void)
{
syscall(__NR_profile, PROF_PROC | PROF_OFF);
}
static inline void mckernel_profile_process_print(void)
{
syscall(__NR_profile, PROF_PROC | PROF_PRINT);
}
static inline void mckernel_profile_process_print_off(void)
{
syscall(__NR_profile, PROF_PROC | PROF_OFF | PROF_PRINT);
}
/* Per-job */
static inline void mckernel_profile_job_on(void)
{
syscall(__NR_profile, PROF_JOB | PROF_ON);
}
static inline void mckernel_profile_job_off(void)
{
syscall(__NR_profile, PROF_JOB | PROF_OFF);
}
static inline void mckernel_profile_job_print(void)
{
syscall(__NR_profile, PROF_JOB | PROF_PRINT);
}
static inline void mckernel_profile_job_print_off(void)
{
syscall(__NR_profile, PROF_JOB | PROF_OFF | PROF_PRINT);
}
#endif // __KERNEL__
#endif // PROFILE_ENABLE

View File

@ -108,4 +108,6 @@ static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
typeof(*pos), field); 1; }); \
pos = n)
struct rb_node *rb_preorder_dfs_search(const struct rb_root *root,
bool (*__cond)(struct rb_node *, void *arg), void *__cond_arg);
#endif /* _LINUX_RBTREE_H */

View File

@ -20,6 +20,7 @@
#include <ihk/ikc.h>
#include <rlimit.h>
#include <time.h>
#include <profile.h>
#define NUM_SYSCALLS 255
@ -39,7 +40,8 @@
#define SCD_MSG_SEND_SIGNAL 0x7
#define SCD_MSG_SEND_SIGNAL_ACK 0x8
#define SCD_MSG_CLEANUP_PROCESS 0x9
#define SCD_MSG_GET_VDSO_INFO 0xa
#define SCD_MSG_CLEANUP_PROCESS_RESP 0xa
#define SCD_MSG_GET_VDSO_INFO 0xb
#define SCD_MSG_GET_CPU_MAPPING 0xc
#define SCD_MSG_REPLY_GET_CPU_MAPPING 0xd
@ -84,6 +86,8 @@
#define SCD_MSG_CPU_RW_REG 0x52
#define SCD_MSG_CPU_RW_REG_RESP 0x53
#define SCD_MSG_CLEANUP_FD 0x54
#define SCD_MSG_CLEANUP_FD_RESP 0x55
#define SCD_MSG_FUTEX_WAKE 0x60
@ -180,6 +184,18 @@ typedef unsigned long __cpu_set_unit;
#define MPOL_NO_BSS 0x04
#define MPOL_SHM_PREMAP 0x08
/* should be the same as process.h */
#define PLD_PROCESS_NUMA_MASK_BITS 256
enum {
PLD_MPOL_DEFAULT,
PLD_MPOL_PREFERRED,
PLD_MPOL_BIND,
PLD_MPOL_INTERLEAVE,
PLD_MPOL_LOCAL,
PLD_MPOL_MAX, /* always last member of enum */
};
#define PLD_MAGIC 0xcafecafe44332211UL
struct program_load_desc {
@ -214,9 +230,19 @@ struct program_load_desc {
unsigned long heap_extension;
long stack_premap;
unsigned long mpol_bind_mask;
int mpol_mode;
unsigned long mpol_nodemask[PLD_PROCESS_NUMA_MASK_BITS /
(sizeof(unsigned long) * 8)];
int thp_disable;
int enable_uti;
int uti_thread_rank; /* N-th clone() spawns a thread on Linux CPU */
int uti_use_last_cpu; /* Work-around not to share CPU with OpenMP thread */
int straight_map;
size_t straight_map_threshold;
#ifdef ENABLE_TOFU
int enable_tofu;
#endif
int nr_processes;
int process_rank;
__cpu_set_unit cpu_set[PLD_CPU_SET_SIZE];
@ -327,6 +353,7 @@ struct syscall_response {
unsigned long req_thread_status;
long ret;
unsigned long fault_address;
void *pde_data;
};
struct syscall_post {
@ -653,4 +680,5 @@ extern int (*linux_clock_gettime)(clockid_t clk_id, struct timespec *tp);
extern void terminate_host(int pid, struct thread *thread);
struct sig_pending *getsigpending(struct thread *thread, int delflag);
int interrupt_from_user(void *regs0);
extern unsigned long shmid_index[];
#endif

View File

@ -0,0 +1,51 @@
#!/bin/bash
SCRIPT="`readlink -f ${BASH_SOURCE[0]:-}`"
SCRIPT_DIR=$(dirname ${SCRIPT})
CURRENT_DIR=`pwd`
cd ${SCRIPT_DIR}
DWARF_TOOL=${SCRIPT_DIR}/../../../tools/dwarf-extract-struct/dwarf-extract-struct
if [ ! -x ${DWARF_TOOL} ]; then
echo "error: couldn't find DWARF extractor executable (${DWARF_TOOL}), have you compiled it?"
cd -
exit 1
fi
echo "Looking for Tofu driver debug symbols..."
if [ "`find /lib/modules/ -name "tof_module.tar.gz" | xargs -r ls -t | head -n 1 | wc -l`" == "0" ]; then
echo "error: couldn't find Tofu modules with debug symbols"
cd -
exit 1
fi
MODULE_TAR_GZ=`find /lib/modules/ -name "tof_module.tar.gz" | xargs ls -t | head -n 1`
echo "Using Tofu driver debug symbols: ${MODULE_TAR_GZ}"
KMODULE=tof_utofu.ko
if ! tar zxvf ${MODULE_TAR_GZ} ${KMODULE} 2>&1 > /dev/null; then
echo "error: uncompressing kernel module with debug symbols"
cd -
exit 1
fi
${DWARF_TOOL} ${KMODULE} tof_utofu_device enabled subnet gpid > tofu_generated-tof_utofu_device.h
${DWARF_TOOL} ${KMODULE} tof_utofu_cq common tni cqid trans steering mb num_stag | sed "s/struct FILL_IN_MANUALLY trans;/#include \"tof_utofu_cq_trans.h\"/g" > tofu_generated-tof_utofu_cq.h
${DWARF_TOOL} ${KMODULE} tof_utofu_mbpt ucq iova sg nsgents mbptstart pgsz kref > tofu_generated-tof_utofu_mbpt.h
${DWARF_TOOL} ${KMODULE} tof_utofu_bg common tni bgid bch | sed "s/struct FILL_IN_MANUALLY bch;/#include \"tof_utofu_bg_bch.h\"/g" > tofu_generated-tof_utofu_bg.h
rm ${KMODULE}
KMODULE=tof_core.ko
if ! tar zxvf ${MODULE_TAR_GZ} ${KMODULE} 2>&1 > /dev/null; then
echo "error: uncompressing kernel module with debug symbols"
cd -
exit 1
fi
${DWARF_TOOL} ${KMODULE} tof_core_cq reg | sed "s/struct FILL_IN_MANUALLY reg;/#include \"tof_core_cq_reg.h\"/g" > tofu_generated-tof_core_cq.h
${DWARF_TOOL} ${KMODULE} tof_core_bg lock reg irq subnet gpid sighandler | sed "s/struct FILL_IN_MANUALLY reg;/#include \"tof_core_bg_reg.h\"/g" > tofu_generated-tof_core_bg.h
rm ${KMODULE}
#cat tofu_generated*.h
cd - > /dev/null

View File

@ -0,0 +1,4 @@
struct {
void *bgs;
void *bch;
} reg;

View File

@ -0,0 +1,4 @@
struct {
void *cq;
void *cqs;
} reg;

View File

@ -0,0 +1,836 @@
#ifndef _TOF_ICC_H_
#define _TOF_ICC_H_
#include <types.h>
#include <bitops.h>
typedef uint64_t phys_addr_t;
/* @ref.impl include/linux/bitops.h */
/*
* Create a contiguous bitmask starting at bit position @l and ending at
* position @h. For example
* GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
*/
#define GENMASK(h, l) \
(((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
/* constants related to the Tofu Interconnect D */
#define TOF_ICC_NTNIS 6
#define TOF_ICC_NCQS 12
#define TOF_ICC_NBGS 48
#define TOF_ICC_NBCHS 16
#define TOF_ICC_NPORTS 10
#define TOF_ICC_NVMSIDS 16
#define TOF_ICC_RH_LEN 8
#define TOF_ICC_ECRC_LEN 4
#define TOF_ICC_FRAME_ALIGN 32
#define TOF_ICC_TLP_LEN(len) (((len) + 1) * TOF_ICC_FRAME_ALIGN)
#define TOF_ICC_TLP_PAYLOAD_MAX (TOF_ICC_TLP_LEN(61) - TOF_ICC_ECRC_LEN)
#define TOF_ICC_FRAME_LEN(len) (TOF_ICC_RH_LEN + TOF_ICC_TLP_LEN(len))
#define TOF_ICC_FRAME_LEN_MIN TOF_ICC_FRAME_LEN(2)
#define TOF_ICC_FRAME_LEN_MAX TOF_ICC_FRAME_LEN(61)
#define TOF_ICC_FRAME_BUF_SIZE_BITS 11
#define TOF_ICC_FRAME_BUF_SIZE (1 << TOF_ICC_FRAME_BUF_SIZE_BITS)
#define TOF_ICC_FRAME_BUF_ALIGN_BITS 8
#define TOF_ICC_FRAME_BUF_ALIGN (1 << TOF_ICC_FRAME_BUF_ALIGN_BITS)
#define TOF_ICC_PB_SIZE_BITS 11
#define TOF_ICC_PB_SIZE (1 << TOF_ICC_PB_SIZE_BITS)
#define TOF_ICC_PB_ALIGN_BITS 11
#define TOF_ICC_PB_ALIGN (1 << TOF_ICC_PB_ALIGN_BITS)
#define TOF_ICC_ST_ALIGN_BITS 8
#define TOF_ICC_ST_ALIGN (1 << TOF_ICC_ST_ALIGN_BITS)
#define TOF_ICC_MBT_ALIGN_BITS 8
#define TOF_ICC_MBT_ALIGN (1 << TOF_ICC_MBT_ALIGN_BITS)
#define TOF_ICC_MBPT_ALIGN_BITS 8
#define TOF_ICC_MBPT_ALIGN (1 << TOF_ICC_MBPT_ALIGN_BITS)
#define TOF_ICC_BG_BSEQ_SIZE_BITS 24
#define TOF_ICC_BG_BSEQ_SIZE (1 << TOF_ICC_BG_BSEQ_SIZE_BITS)
#define TOF_ICC_BCH_DMA_ALIGN_BITS 8
#define TOF_ICC_BCH_DMA_ALIGN (1 << TOF_ICC_BCH_DMA_ALIGN_BITS)
/* this is a CPU-specific constant, but referred in the ICC spec. */
#define TOF_ICC_CACHE_LINE_SIZE_BITS 8
#define TOF_ICC_CACHE_LINE_SIZE (1 << TOF_ICC_CACHE_LINE_SIZE_BITS)
#define TOF_ICC_TOQ_DESC_SIZE_BITS 5
#define TOF_ICC_TOQ_DESC_SIZE (1 << TOF_ICC_TOQ_DESC_SIZE_BITS)
#define TOF_ICC_TCQ_DESC_SIZE_BITS 3
#define TOF_ICC_TCQ_DESC_SIZE (1 << TOF_ICC_TCQ_DESC_SIZE_BITS)
#define TOF_ICC_TCQ_NLINE_BITS (TOF_ICC_CACHE_LINE_SIZE_BITS - TOF_ICC_TCQ_DESC_SIZE_BITS)
#define TOF_ICC_MRQ_DESC_SIZE_BITS 5
#define TOF_ICC_MRQ_DESC_SIZE (1 << TOF_ICC_MRQ_DESC_SIZE_BITS)
#define TOF_ICC_PBQ_DESC_SIZE_BITS 3
#define TOF_ICC_PBQ_DESC_SIZE (1 << TOF_ICC_PBQ_DESC_SIZE_BITS)
#define TOF_ICC_PRQ_DESC_SIZE_BITS 3
#define TOF_ICC_PRQ_DESC_SIZE (1 << TOF_ICC_PRQ_DESC_SIZE_BITS)
#define TOF_ICC_PRQ_NLINE_BITS (TOF_ICC_CACHE_LINE_SIZE_BITS - TOF_ICC_PBQ_DESC_SIZE_BITS)
#define TOF_ICC_TOQ_SIZE_NTYPES 6
#define TOF_ICC_TOQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_TOQ_SIZE(size) (1 << TOF_ICC_TOQ_SIZE_BITS(size))
#define TOF_ICC_TOQ_LEN(size) (TOF_ICC_TOQ_SIZE(size) * TOF_ICC_TOQ_DESC_SIZE)
#define TOF_ICC_TCQ_LEN(size) (TOF_ICC_TOQ_SIZE(size) * TOF_ICC_TCQ_DESC_SIZE)
#define TOF_ICC_MRQ_SIZE_NTYPES 6
#define TOF_ICC_MRQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_MRQ_SIZE(size) (1 << TOF_ICC_MRQ_SIZE_BITS(size))
#define TOF_ICC_MRQ_LEN(size) (TOF_ICC_MRQ_SIZE(size) * TOF_ICC_MRQ_DESC_SIZE)
#define TOF_ICC_PBQ_SIZE_NTYPES 6
#define TOF_ICC_PBQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_PBQ_SIZE(size) (1 << TOF_ICC_PBQ_SIZE_BITS(size))
#define TOF_ICC_PBQ_LEN(size) (TOF_ICC_PBQ_SIZE(size) * TOF_ICC_PBQ_DESC_SIZE)
#define TOF_ICC_PRQ_SIZE_NTYPES 6
#define TOF_ICC_PRQ_SIZE_BITS(size) ((size) * 2 + 11)
#define TOF_ICC_PRQ_SIZE(size) (1 << TOF_ICC_PRQ_SIZE_BITS(size))
#define TOF_ICC_PRQ_LEN(size) (TOF_ICC_PRQ_SIZE(size) * TOF_ICC_PRQ_DESC_SIZE)
#define TOF_ICC_STEERING_TABLE_ALIGN_BITS 8
#define TOF_ICC_STEERING_TABLE_ALIGN (1 << TOF_ICC_STEERING_TABLE_ALIGN_BITS)
#define TOF_ICC_STEERING_SIZE_BITS 4
#define TOF_ICC_STEERING_SIZE (1 << TOF_ICC_STEERING_SIZE_BITS)
#define TOF_ICC_MB_TABLE_ALIGN_BITS 8
#define TOF_ICC_MB_TABLE_ALIGN (1 << TOF_ICC_MB_TABLE_ALIGN_BITS)
#define TOF_ICC_MB_SIZE_BITS 4
#define TOF_ICC_MB_SIZE (1 << TOF_ICC_MB_SIZE_BITS)
#define TOF_ICC_MB_PS_ENCODE(bits) ((bits) % 9 == 3 ? (bits) / 9 - 1 : (bits) / 13 + 3)
#define TOF_ICC_MBPT_ALIGN_BITS 8
#define TOF_ICC_MBPT_ALIGN (1 << TOF_ICC_MBPT_ALIGN_BITS)
#define TOF_ICC_MBPT_SIZE_BITS 3
#define TOF_ICC_MBPT_SIZE (1 << TOF_ICC_MBPT_SIZE_BITS)
#define TOF_ICC_X_BITS 5
#define TOF_ICC_Y_BITS 5
#define TOF_ICC_Z_BITS 5
#define TOF_ICC_A_BITS 1
#define TOF_ICC_B_BITS 2
#define TOF_ICC_C_BITS 1
#define TOF_ICC_MAX_X_SIZE (1 << TOF_ICC_X_BITS)
#define TOF_ICC_MAX_Y_SIZE (1 << TOF_ICC_Y_BITS)
#define TOF_ICC_MAX_Z_SIZE (1 << TOF_ICC_Z_BITS)
#define TOF_ICC_A_SIZE 2
#define TOF_ICC_B_SIZE 3
#define TOF_ICC_C_SIZE 2
#define TOF_ICC_X_MASK ((1 << TOF_ICC_X_BITS) - 1)
#define TOF_ICC_Y_MASK ((1 << TOF_ICC_Y_BITS) - 1)
#define TOF_ICC_Z_MASK ((1 << TOF_ICC_Z_BITS) - 1)
#define TOF_ICC_A_MASK ((1 << TOF_ICC_A_BITS) - 1)
#define TOF_ICC_B_MASK ((1 << TOF_ICC_B_BITS) - 1)
#define TOF_ICC_C_MASK ((1 << TOF_ICC_C_BITS) - 1)
#define TOF_ICC_ABC_SIZE (TOF_ICC_A_SIZE * TOF_ICC_B_SIZE * TOF_ICC_C_SIZE)
static inline int tof_icc_get_framelen(int len){
len = TOF_ICC_RH_LEN + round_up(len + TOF_ICC_ECRC_LEN, TOF_ICC_FRAME_ALIGN);
if(len < TOF_ICC_FRAME_LEN_MIN){
len = TOF_ICC_FRAME_LEN_MIN;
}
return len;
}
/** Descriptors **/
/** commands and rcodes **/
enum {
TOF_ICC_TOQ_NOP,
TOF_ICC_TOQ_PUT,
TOF_ICC_TOQ_WRITE_PIGGYBACK_BUFFER,
TOF_ICC_TOQ_PUT_PIGGYBACK,
TOF_ICC_TOQ_GET,
TOF_ICC_TOQ_GETL,
TOF_ICC_TOQ_ATOMIC_READ_MODIFY_WRITE = 0xe,
TOF_ICC_TOQ_TRANSMIT_RAW_PACKET1 = 0x10,
TOF_ICC_TOQ_TRANSMIT_RAW_PACKET2,
TOF_ICC_TOQ_TRANSMIT_SYSTEM_PACKET1,
TOF_ICC_TOQ_TRANSMIT_SYSTEM_PACKET2,
TOF_ICC_TOQ_NCOMMANDS,
};
enum {
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_HALFWAY_NOTICE = 0x1,
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_NOTICE,
TOF_ICC_MRQ_ATOMIC_READ_MODIFY_WRITE_REMOTE_ERROR,
TOF_ICC_MRQ_PUT_HALFWAY_NOTICE,
TOF_ICC_MRQ_PUT_LAST_HALFWAY_NOTICE,
TOF_ICC_MRQ_GET_HALFWAY_NOTICE,
TOF_ICC_MRQ_GET_LAST_HALFWAY_NOTICE,
TOF_ICC_MRQ_PUT_NOTICE,
TOF_ICC_MRQ_PUT_LAST_NOTICE,
TOF_ICC_MRQ_GET_NOTICE,
TOF_ICC_MRQ_GET_LAST_NOTICE,
TOF_ICC_MRQ_PUT_REMOTE_ERROR,
TOF_ICC_MRQ_PUT_LAST_REMOTE_ERROR,
TOF_ICC_MRQ_GET_REMOTE_ERROR,
TOF_ICC_MRQ_GET_LAST_REMOTE_ERROR,
TOF_ICC_MRQ_NCOMMANDS,
};
enum {
TOF_ICC_PRQ_UNKNOWN_TLP,
TOF_ICC_PRQ_SYSTEM_TLP,
TOF_ICC_PRQ_ADDRESS_RANGE_EXCEPTION = 0x6,
TOF_ICC_PRQ_CQ_EXCEPTION = 0x8,
TOF_ICC_PRQ_ILLEGAL_TLP_FLAGS,
TOF_ICC_PRQ_ILLEGAL_TLP_LENGTH,
TOF_ICC_PRQ_CQ_ERROR = 0xc,
};
/** structures **/
struct tof_icc_steering_entry {
uint64_t res1:6;
uint64_t readonly:1;
uint64_t enable:1;
uint64_t mbva:32;
uint64_t res2:8;
uint64_t mbid:16;
uint64_t length; /* for optimization */
};
struct tof_icc_mb_entry {
uint64_t ps:3;
uint64_t res1:4;
uint64_t enable:1;
uint64_t ipa:32;
uint64_t res2:24;
uint64_t npage; /* for optimization */
};
struct tof_icc_mbpt_entry {
uint64_t res1:7;
uint64_t enable:1;
uint64_t res2:4;
uint64_t ipa:28;
uint64_t res3:24;
};
struct tof_icc_cq_stag_offset {
uint64_t offset:40;
uint64_t stag:18;
uint64_t cqid:6;
};
struct tof_icc_toq_common_header1 {
uint8_t interrupt:1;
uint8_t res1:4;
uint8_t source_type:2;
uint8_t flip:1;
uint8_t command;
union {
uint8_t mtu;
struct {
uint8_t res:4;
uint8_t op:4;
} armw;
} mtuop;
uint8_t sps:4;
uint8_t pa:1;
uint8_t pb:2;
uint8_t pc:1;
uint8_t rx;
uint8_t ry;
uint8_t rz;
uint8_t ra:1;
uint8_t rb:2;
uint8_t rc:1;
uint8_t res3:1;
uint8_t ri:3;
};
struct tof_icc_toq_common_header2 {
uint8_t gap;
uint8_t s:1;
uint8_t r:1;
uint8_t q:1;
uint8_t p:1;
uint8_t res1:1;
uint8_t j:1;
uint8_t res2:2;
uint16_t edata;
union{
struct {
uint32_t length:24;
uint32_t res:8;
} normal;
struct {
uint32_t length:6;
uint32_t res:26;
} piggyback;
} len;
};
struct tof_icc_toq_descriptor {
struct tof_icc_toq_common_header1 head1;
uint64_t res[3];
};
struct tof_icc_toq_nop {
struct tof_icc_toq_common_header1 head1;
uint64_t res[3];
};
struct tof_icc_toq_put {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
struct tof_icc_cq_stag_offset local;
};
struct tof_icc_toq_write_piggyback_buffer {
struct tof_icc_toq_common_header1 head1;
uint64_t data[3];
};
struct tof_icc_toq_put_piggyback {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
uint64_t data;
};
struct tof_icc_toq_get {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
struct tof_icc_cq_stag_offset local;
};
struct tof_icc_toq_atomic_read_modify_write {
struct tof_icc_toq_common_header1 head1;
struct tof_icc_toq_common_header2 head2;
struct tof_icc_cq_stag_offset remote;
uint64_t data;
};
struct tof_icc_toq_transmit_raw_packet1 {
struct tof_icc_toq_common_header1 head1;
uint8_t gap;
uint8_t res4[3];
uint32_t length:12;
uint32_t res5:20;
uint64_t res6;
uint64_t pa:48; /* for optimization */
uint64_t res7:16;
};
struct tof_icc_toq_transmit_raw_packet2 {
uint8_t interrupt:1;
uint8_t res1:4;
uint8_t source_type:2;
uint8_t flip:1;
uint8_t command;
uint8_t res2:7;
uint8_t e:1;
uint8_t res3[4];
uint8_t port:5;
uint8_t res4:1;
uint8_t vc:2;
uint8_t gap;
uint8_t res5[3];
uint32_t length:12;
uint32_t res6:20;
uint64_t res7;
uint64_t pa:48; /* for optimization */
uint64_t res8:16;
};
struct tof_icc_toq_transmit_system_packet {
struct tof_icc_toq_common_header1 head1; /* rx, ry, rz should be rdx, rdy, rdz */
uint8_t gap;
uint8_t res4[3];
uint32_t length:12;
uint32_t res5:20;
uint64_t res6;
uint64_t pa:48; /* for optimization */
uint64_t res7:16;
};
struct tof_icc_tcq_descriptor {
uint8_t res1:5;
uint8_t counter_unmatch:1;
uint8_t res2:1;
uint8_t flip:1;
uint8_t rcode;
uint8_t res3[2];
union{
struct {
uint32_t length:24;
uint32_t res:8;
} normal;
struct {
uint32_t length:6;
uint32_t res:26;
} piggyback;
} len;
};
struct tof_icc_mrq_common_header1 {
uint8_t res1:7;
uint8_t flip:1;
uint8_t id;
uint8_t rcode;
uint8_t res2:4;
uint8_t pa:1;
uint8_t pb:2;
uint8_t pc:1;
uint8_t x;
uint8_t y;
uint8_t z;
uint8_t a:1;
uint8_t b:2;
uint8_t c:1;
uint8_t res3:1;
uint8_t i:3;
};
struct tof_icc_mrq_common_header2 {
uint8_t res1;
uint8_t res2:4;
uint8_t initial:1;
uint8_t res3:3;
uint16_t edata;
union {
struct {
uint32_t length:11;
uint32_t res:21;
} normal;
struct {
uint32_t op:4;
uint32_t res:28;
} armw;
} lenop;
};
struct tof_icc_mrq_atomic_read_modify_write_halfway_notice {
struct tof_icc_mrq_common_header1 head1;
struct tof_icc_mrq_common_header2 head2;
struct tof_icc_cq_stag_offset local;
struct tof_icc_cq_stag_offset remote;
};
struct tof_icc_mrq_descriptor {
struct tof_icc_mrq_common_header1 head1;
struct tof_icc_mrq_common_header2 head2;
struct tof_icc_cq_stag_offset cso1;
struct tof_icc_cq_stag_offset cso2;
};
struct tof_icc_pbq_descriptor {
uint64_t res1:7;
uint64_t f:1;
uint64_t res2:3;
uint64_t pa:29;
uint64_t res3:24;
};
struct tof_icc_prq_descriptor {
uint64_t rcode:7;
uint64_t f:1;
uint64_t res1:3;
uint64_t pa:29;
uint64_t res2:8;
uint64_t w:1;
uint64_t res3:5;
uint64_t l:1;
uint64_t e:1;
uint64_t res4:8;
};
/** Registers **/
/* useful packed structures */
struct tof_icc_reg_subnet {
uint64_t lz:6;
uint64_t sz:6;
uint64_t nz:6;
uint64_t ly:6;
uint64_t sy:6;
uint64_t ny:6;
uint64_t lx:6;
uint64_t sx:6;
uint64_t nx:6;
uint64_t res:10;
};
struct tof_icc_reg_bg_address {
uint32_t bgid:6;
uint32_t tni:3;
uint32_t c:1;
uint32_t b:2;
uint32_t a:1;
uint32_t z:5;
uint32_t y:5;
uint32_t x:5;
uint32_t pc:1;
uint32_t pb:2;
uint32_t pa:1;
};
/* relative offset of interrupt controller registers */
#define TOF_ICC_IRQREG_IRR 0x0
#define TOF_ICC_IRQREG_IMR 0x8
#define TOF_ICC_IRQREG_IRC 0x10
#define TOF_ICC_IRQREG_IMC 0x18
#define TOF_ICC_IRQREG_ICL 0x20
/* TOFU REGISTERS */
#define tof_icc_reg_pa 0x40000000
/* CQ */
#define TOF_ICC_REG_CQ_PA(tni, cqid) (tof_icc_reg_pa + 0 + (tni) * 0x1000000 + (cqid) * 0x10000)
#define TOF_ICC_REG_CQ_TOQ_DIRECT_DESCRIPTOR 0x0
#define TOF_ICC_REG_CQ_TOQ_FETCH_START 0x40
#define TOF_ICC_REG_CQ_MRQ_FULL_POINTER 0x48
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER0 0x50
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER1 0x58
#define TOF_ICC_REG_CQ_TOQ_PIGGYBACK_BUFFER2 0x60
#define TOF_ICC_REG_CQ_TCQ_NUM_NOTICE 0x68
#define TOF_ICC_REG_CQ_MRQ_NUM_NOTICE 0x70
#define TOF_ICC_REG_CQ_TX_PAYLOAD_BYTE 0x78
#define TOF_ICC_REG_CQ_RX_PAYLOAD_BYTE 0x80
#define TOF_ICC_REG_CQ_DUMP_START 0x0
#define TOF_ICC_REG_CQ_DUMP_END 0x88
/* BCH */
#define TOF_ICC_REG_BCH_PA(tni, bgid) (tof_icc_reg_pa + 0x0000e00000 + (tni) * 0x1000000 + (bgid) * 0x10000)
#define TOF_ICC_REG_BCH_IDATA 0x800
#define TOF_ICC_REG_BCH_READY 0x840
#define TOF_ICC_REG_BCH_READY_STATE BIT(63)
#define TOF_ICC_REG_BCH_IGNORED_SIGNAL_COUNT 0x848
#define TOF_ICC_REG_BCH_DUMP_START 0x800
#define TOF_ICC_REG_BCH_DUMP_END 0x850
/* CQS */
#define TOF_ICC_REG_CQS_PA(tni, cqid) (tof_icc_reg_pa + 0x0000400000 + (tni) * 0x1000000 + (cqid) * 0x10000)
#define TOF_ICC_REG_CQS_STATUS 0x0
#define TOF_ICC_REG_CQS_STATUS_DESCRIPTOR_PROCESS_STOP BIT(63)
#define TOF_ICC_REG_CQS_STATUS_DESCRIPTOR_FETCH_STOP BIT(62)
#define TOF_ICC_REG_CQS_STATUS_BLANK_ENTRY_FLIP_BIT BIT(61)
#define TOF_ICC_REG_CQS_STATUS_CACHE_FLUSH_BUSY BIT(60)
#define TOF_ICC_REG_CQS_STATUS_CQ_ENABLE BIT(59)
#define TOF_ICC_REG_CQS_STATUS_SESSION_DEAD BIT(58)
#define TOF_ICC_REG_CQS_STATUS_SESSION_OFFSET_OVERFLOW BIT(57)
#define TOF_ICC_REG_CQS_STATUS_SESSION_OFFSET GENMASK(56, 32)
#define TOF_ICC_REG_CQS_STATUS_NEXT_DESCRIPTOR_OFFSET GENMASK(29, 5)
#define TOF_ICC_REG_CQS_ENABLE 0x8
#define TOF_ICC_REG_CQS_CACHE_FLUSH 0x10
#define TOF_ICC_REG_CQS_FETCH_STOP 0x18
#define TOF_ICC_REG_CQS_MODE 0x20
#define TOF_ICC_REG_CQS_MODE_SYSTEM BIT(63)
#define TOF_ICC_REG_CQS_MODE_TRP2_ENABLE BIT(62)
#define TOF_ICC_REG_CQS_MODE_TRP1_ENABLE BIT(61)
#define TOF_ICC_REG_CQS_MODE_SESSION BIT(60)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NX GENMASK(53, 48)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SX GENMASK(47, 42)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LX GENMASK(41, 36)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NY GENMASK(35, 30)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SY GENMASK(29, 24)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LY GENMASK(23, 18)
#define TOF_ICC_REG_CQS_MODE_SUBNET_NZ GENMASK(17, 12)
#define TOF_ICC_REG_CQS_MODE_SUBNET_SZ GENMASK(11, 6)
#define TOF_ICC_REG_CQS_MODE_SUBNET_LZ GENMASK(5, 0)
#define TOF_ICC_REG_CQS_GPID 0x28
#define TOF_ICC_REG_CQS_TOQ_IPA 0x30
#define TOF_ICC_REG_CQS_TOQ_SIZE 0x38
#define TOF_ICC_REG_CQS_TCQ_IPA 0x40
#define TOF_ICC_REG_CQS_TCQ_IPA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_CQS_MRQ_IPA 0x48
#define TOF_ICC_REG_CQS_MRQ_IPA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_CQS_MRQ_SIZE 0x50
#define TOF_ICC_REG_CQS_MRQ_MASK 0x58
#define TOF_ICC_REG_CQS_TCQ_DESCRIPTOR_COALESCING_TIMER 0x60
#define TOF_ICC_REG_CQS_MRQ_DESCRIPTOR_COALESCING_TIMER 0x68
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_COALESCING_TIMER 0x70
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_COALESCING_COUNT 0x78
#define TOF_ICC_REG_CQS_TOQ_DIRECT_SOURCE_COUNT 0x80
#define TOF_ICC_REG_CQS_TOQ_DIRECT_DESCRIPTOR_COUNT 0x88
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_ENABLE 0x90
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_IPA 0x98
#define TOF_ICC_REG_CQS_MEMORY_BLOCK_TABLE_SIZE 0xa0
#define TOF_ICC_REG_CQS_STEERING_TABLE_ENABLE 0xa8
#define TOF_ICC_REG_CQS_STEERING_TABLE_IPA 0xb0
#define TOF_ICC_REG_CQS_STEERING_TABLE_SIZE 0xb8
#define TOF_ICC_REG_CQS_MRQ_INTERRUPT_MASK 0xc0
#define TOF_ICC_REG_CQS_IRR 0xc8
#define TOF_ICC_REG_CQS_IMR 0xd0
#define TOF_ICC_REG_CQS_IRC 0xd8
#define TOF_ICC_REG_CQS_IMC 0xe0
#define TOF_ICC_REG_CQS_ICL 0xe8
#define TOF_ICC_REG_CQS_DUMP_START 0x0
#define TOF_ICC_REG_CQS_DUMP_END 0xf0
/* BGS */
#define TOF_ICC_REG_BGS_PA(tni, bgid) (tof_icc_reg_pa + 0x0000800000 + (tni) * 0x1000000 + (bgid) * 0x10000)
#define TOF_ICC_REG_BGS_ENABLE 0x0
#define TOF_ICC_REG_BGS_IRR 0x8
#define TOF_ICC_REG_BGS_IMR 0x10
#define TOF_ICC_REG_BGS_IRC 0x18
#define TOF_ICC_REG_BGS_IMC 0x20
#define TOF_ICC_REG_BGS_ICL 0x28
#define TOF_ICC_REG_BGS_STATE 0x30
#define TOF_ICC_REG_BGS_STATE_ENABLE BIT(0)
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_GPID_UNMATCH 0x38
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_GPID_UNMATCH_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_ADDRESS_UNMATCH 0x40
#define TOF_ICC_REG_BGS_EXCEPTION_INFO_ADDRESS_UNMATCH_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_BGS_SIGNAL_A 0x48
#define TOF_ICC_REG_BGS_SIGNAL_A_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_A_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_A_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_A_OP_TYPE GENMASK(3, 0)
#define TOF_ICC_REG_BGS_SIGNAL_B 0x50
#define TOF_ICC_REG_BGS_SIGNAL_B_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_B_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_B_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_B_OP_TYPE GENMASK(3, 0)
#define TOF_ICC_REG_BGS_SIGNAL_MASK 0x58
#define TOF_ICC_REG_BGS_SIGNAL_MASK_SIG_RECV BIT(63)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_TLP_RECV BIT(62)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_SIG_SEND BIT(61)
#define TOF_ICC_REG_BGS_SIGNAL_MASK_TLP_SEND BIT(60)
#define TOF_ICC_REG_BGS_LOCAL_LINK 0x60
#define TOF_ICC_REG_BGS_LOCAL_LINK_BGID_RECV GENMASK(37, 32)
#define TOF_ICC_REG_BGS_LOCAL_LINK_BGID_SEND GENMASK(5, 0)
#define TOF_ICC_REG_BGS_REMOTE_LINK 0x68
#define TOF_ICC_REG_BGS_REMOTE_LINK_BG_ADDRESS_RECV GENMASK(59, 32)
#define TOF_ICC_REG_BGS_REMOTE_LINK_BG_ADDRESS_SEND GENMASK(31, 0)
#define TOF_ICC_REG_BGS_SUBNET_SIZE 0x70
#define TOF_ICC_REG_BGS_GPID_BSEQ 0x78
#define TOF_ICC_REG_BGS_DATA_A0 0x108
#define TOF_ICC_REG_BGS_DATA_AE 0x178
#define TOF_ICC_REG_BGS_DATA_B0 0x188
#define TOF_ICC_REG_BGS_DATA_BE 0x1f8
#define TOF_ICC_REG_BGS_BCH_MASK 0x800
#define TOF_ICC_REG_BGS_BCH_MASK_MASK BIT(63)
#define TOF_ICC_REG_BGS_BCH_MASK_STATUS 0x808
#define TOF_ICC_REG_BGS_BCH_MASK_STATUS_RUN BIT(63)
#define TOF_ICC_REG_BGS_BCH_NOTICE_IPA 0x810
#define TOF_ICC_REG_BGS_DUMP_START 0x0
#define TOF_ICC_REG_BGS_DUMP_END 0x818
/* TNI */
#define TOF_ICC_REG_TNI_PA(tni) (tof_icc_reg_pa + 0x0000c00000 + (tni) * 0x1000000)
#define TOF_ICC_REG_TNI_IRR 0x8
#define TOF_ICC_REG_TNI_IMR 0x10
#define TOF_ICC_REG_TNI_IRC 0x18
#define TOF_ICC_REG_TNI_IMC 0x20
#define TOF_ICC_REG_TNI_ICL 0x28
#define TOF_ICC_REG_TNI_STATE 0x30
#define TOF_ICC_REG_TNI_STATE_MASK GENMASK(1, 0)
#define TOF_ICC_REG_TNI_STATE_DISABLE 0
#define TOF_ICC_REG_TNI_STATE_NORMAL 2
#define TOF_ICC_REG_TNI_STATE_ERROR 3
#define TOF_ICC_REG_TNI_ENABLE 0x38
#define TOF_ICC_REG_TNI_CQ_PRESENT 0x40
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG 0x48
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG_DEST_BG GENMASK(37, 32)
#define TOF_ICC_REG_TNI_EXCEPTION_INFO_INACTIVE_BG_SOURCE_BG_ADDRESS GENMASK(27, 0)
#define TOF_ICC_REG_TNI_PRQ_FULL_POINTER 0x100
#define TOF_ICC_REG_TNI_PBQ_PA 0x108
#define TOF_ICC_REG_TNI_PBQ_SIZE 0x110
#define TOF_ICC_REG_TNI_PRQ_PA 0x118
#define TOF_ICC_REG_TNI_PRQ_PA_CACHE_INJECTION BIT(63)
#define TOF_ICC_REG_TNI_PRQ_SIZE 0x120
#define TOF_ICC_REG_TNI_PRQ_MASK 0x128
#define TOF_ICC_REG_TNI_PRQ_ENTRY_COALESCING_TIMER 0x130
#define TOF_ICC_REG_TNI_PRQ_INTERRUPT_COALESCING_TIMER 0x138
#define TOF_ICC_REG_TNI_PRQ_INTERRUPT_COALESCING_COUNT 0x140
#define TOF_ICC_REG_TNI_SEND_COUNT 0x148
#define TOF_ICC_REG_TNI_NO_SEND_COUNT 0x150
#define TOF_ICC_REG_TNI_BLOCK_SEND_COUNT 0x158
#define TOF_ICC_REG_TNI_RECEIVE_COUNT 0x160
#define TOF_ICC_REG_TNI_NO_RECEIVE_COUNT 0x168
#define TOF_ICC_REG_TNI_NUM_SEND_TLP 0x170
#define TOF_ICC_REG_TNI_BYTE_SEND_TLP 0x178
#define TOF_ICC_REG_TNI_NUM_SEND_SYSTEM_TLP 0x180
#define TOF_ICC_REG_TNI_NUM_RECEIVE_TLP 0x188
#define TOF_ICC_REG_TNI_BYTE_RECEIVE_TLP 0x190
#define TOF_ICC_REG_TNI_NUM_RECEIVE_NULLIFIED_TLP 0x198
#define TOF_ICC_REG_TNI_RX_NUM_UNKNOWN_TLP 0x1a0
#define TOF_ICC_REG_TNI_RX_NUM_SYSTEM_TLP 0x1a8
#define TOF_ICC_REG_TNI_RX_NUM_EXCEPTION_TLP 0x1b0
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_UNKNOWN_TLP 0x1b8
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_SYSTEM_TLP 0x1c0
#define TOF_ICC_REG_TNI_RX_NUM_DISCARD_EXCEPTION_TLP 0x1c8
#define TOF_ICC_REG_TNI_DUMP_START 0x8
#define TOF_ICC_REG_TNI_DUMP_END 0x1d0
/* Port */
#define TOF_ICC_REG_PORT_PA(port) (tof_icc_reg_pa + 0x0006000000 + (port) * 0x1000)
#define TOF_ICC_REG_PORT_TX_VC0_ZERO_CREDIT_COUNT 0x0
#define TOF_ICC_REG_PORT_TX_VC1_ZERO_CREDIT_COUNT 0x8
#define TOF_ICC_REG_PORT_TX_VC2_ZERO_CREDIT_COUNT 0x10
#define TOF_ICC_REG_PORT_TX_VC3_ZERO_CREDIT_COUNT 0x18
#define TOF_ICC_REG_PORT_FREE_RUN_COUNT 0x80
#define TOF_ICC_REG_PORT_NUM_SEND_DLLP 0xc0
#define TOF_ICC_REG_PORT_NUM_SEND_TLP 0xc8
#define TOF_ICC_REG_PORT_BYTE_SEND_TLP 0xd0
#define TOF_ICC_REG_PORT_NUM_SEND_SYSTEM_TLP 0xd8
#define TOF_ICC_REG_PORT_NUM_SEND_NULLIFIED_TLP 0xe0
#define TOF_ICC_REG_PORT_NUM_TX_DISCARD_SYSTEM_TLP 0xe8
#define TOF_ICC_REG_PORT_NUM_TX_DISCARD_NORMAL_TLP 0xf0
#define TOF_ICC_REG_PORT_NUM_TX_FILTERED_NORMAL_TLP 0xf8
#define TOF_ICC_REG_PORT_NUM_VIRTUAL_CUT_THROUGH_TLP 0x100
#define TOF_ICC_REG_PORT_NUM_GENERATE_NULLIFIED_TLP 0x108
#define TOF_ICC_REG_PORT_NUM_RECEIVE_DLLP 0x110
#define TOF_ICC_REG_PORT_NUM_RECEIVE_TLP 0x118
#define TOF_ICC_REG_PORT_BYTE_RECEIVE_TLP 0x120
#define TOF_ICC_REG_PORT_NUM_RECEIVE_SYSTEM_TLP 0x128
#define TOF_ICC_REG_PORT_NUM_RECEIVE_NULLIFIED_TLP 0x130
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_SYSTEM_TLP 0x138
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_NORMAL_TLP 0x140
#define TOF_ICC_REG_PORT_NUM_RX_FILTERED_NORMAL_TLP 0x158
#define TOF_ICC_REG_PORT_NUM_RX_DISCARD_NULLIFIED_TLP 0x160
#define TOF_ICC_REG_PORT_FRAME_LCRC_ERROR_COUNT 0x170
#define TOF_ICC_REG_PORT_TX_RETRY_BUFFER_CE_COUNT 0x180
#define TOF_ICC_REG_PORT_RX_VC_BUFFER_CE_COUNT 0x188
#define TOF_ICC_REG_PORT_XB_CE_COUNT 0x190
#define TOF_ICC_REG_PORT_ACK_NACK_TIME_OUT_COUNT 0x198
#define TOF_ICC_REG_PORT_SLICE0_FCS_ERROR_COUNT 0x1a0
#define TOF_ICC_REG_PORT_SLICE1_FCS_ERROR_COUNT 0x1a8
#define TOF_ICC_REG_PORT_DUMP_START 0x0
#define TOF_ICC_REG_PORT_DUMP_END 0x1b0
/* XB */
#define TOF_ICC_REG_XB_PA (tof_icc_reg_pa + 0x000600f000)
#define TOF_ICC_REG_XB_STQ_ENABLE 0x0
#define TOF_ICC_REG_XB_STQ_UPDATE_INTERVAL 0x8
#define TOF_ICC_REG_XB_STQ_PA 0x10
#define TOF_ICC_REG_XB_STQ_SIZE 0x18
#define TOF_ICC_REG_XB_STQ_NEXT_OFFSET 0x20
#define TOF_ICC_REG_XB_DUMP_START 0x0
#define TOF_ICC_REG_XB_DUMP_END 0x28
#define TOF_ICC_XB_TC_DATA_CYCLE_COUNT(tni) ((tni) * 0x10 + 0x0)
#define TOF_ICC_XB_TC_WAIT_CYCLE_COUNT(tni) ((tni) * 0x10 + 0x8)
#define TOF_ICC_XB_TD_DATA_CYCLE_COUNT(tnr) ((tnr) * 0x10 + 0x60)
#define TOF_ICC_XB_TD_WAIT_CYCLE_COUNT(tnr) ((tnr) * 0x10 + 0x68)
/* Tofu */
#define TOF_ICC_REG_TOFU_PA (tof_icc_reg_pa + 0x0007000000)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS 0x0
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_X GENMASK(22, 18)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_Y GENMASK(17, 13)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_Z GENMASK(12, 8)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_A BIT(7)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_B GENMASK(6, 5)
#define TOF_ICC_REG_TOFU_NODE_ADDRESS_C BIT(4)
#define TOF_ICC_REG_TOFU_PORT_SETTING 0x8
#define TOF_ICC_REG_TOFU_TD_TLP_FILTER(tnr) ((tnr) * 0x10 + 0x10)
#define TOF_ICC_REG_TOFU_TD_SETTINGS(tnr) ((tnr) * 0x10 + 0x18)
#define TOF_ICC_REG_TOFU_TNR_MSI_BASE 0xc0
#define TOF_ICC_REG_TOFU_TNR_IRR 0xc8
#define TOF_ICC_REG_TOFU_TNR_IMR 0xd0
#define TOF_ICC_REG_TOFU_TNR_IRC 0xd8
#define TOF_ICC_REG_TOFU_TNR_IMC 0xe0
#define TOF_ICC_REG_TOFU_TNR_ICL 0xe8
#define TOF_ICC_REG_TOFU_TNI_VMS(tni, vmsid) ((tni) * 0x100 + (vmsid) * 0x8 + 0x100)
#define TOF_ICC_REG_TOFU_TNI_VMS_CQ00(tni) ((tni) * 0x100 + 0x180)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG00(tni) ((tni) * 0x100 + 0x1a0)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG16(tni) ((tni) * 0x100 + 0x1a8)
#define TOF_ICC_REG_TOFU_TNI_VMS_BG32(tni) ((tni) * 0x100 + 0x1b0)
#define TOF_ICC_REG_TOFU_TNI_MSI_BASE(tni) ((tni) * 0x100 + 0x1c0)
#define TOF_ICC_REG_TOFU_DUMP_START 0x0
#define TOF_ICC_REG_TOFU_DUMP_END 0x6c8
/** Interrupts **/
#define TOF_ICC_IRQ_CQS_TOQ_READ_EXCEPTION BIT(0)
#define TOF_ICC_IRQ_CQS_TOQ_DIRECT_DESCRIPTOR_EXCEPTION BIT(1)
#define TOF_ICC_IRQ_CQS_TOQ_MARKED_UE BIT(2)
#define TOF_ICC_IRQ_CQS_TCQ_WRITE_EXCEPTION BIT(3)
#define TOF_ICC_IRQ_CQS_TOQ_SOURCE_TYPE_EXCEPTION BIT(4)
#define TOF_ICC_IRQ_CQS_TCQ_WRITE_ACKNOWLEDGE BIT(5)
#define TOF_ICC_IRQ_CQS_MRQ_WRITE_ACKNOWLEDGE BIT(7)
#define TOF_ICC_IRQ_CQS_MRQ_WRITE_EXCEPTION BIT(8)
#define TOF_ICC_IRQ_CQS_MRQ_OVERFLOW BIT(9)
#define TOF_ICC_IRQ_CQS_STEERING_READ_EXCEPTION BIT(36)
#define TOF_ICC_IRQ_CQS_MB_READ_EXCEPTION BIT(38)
#define TOF_ICC_IRQ_CQS_PAYLOAD_READ_EXCEPTION BIT(39)
#define TOF_ICC_IRQ_CQS_PAYLOAD_WRITE_EXCEPTION BIT(40)
/* Just for convinience of irr value, no exists CQS CACHEFLUSH_TIMEOUT interrupt */
#define TOF_ICC_DUMMY_IRQ_CQS_CACHEFLUSH_TIMEOUT BIT(63)
#define TOF_ICC_IRQ_BGS_NODE_ADDRESS_UNMATCH BIT(0)
#define TOF_ICC_IRQ_BGS_BG_RECV_ADDRESS_EXCEPTION BIT(1)
#define TOF_ICC_IRQ_BGS_BG_SEND_ADDRESS_EXCEPTION BIT(2)
#define TOF_ICC_IRQ_BGS_GPID_UNMATCH BIT(3)
#define TOF_ICC_IRQ_BGS_BSEQ_UNMATCH BIT(4)
#define TOF_ICC_IRQ_BGS_SIGNAL_STATE_ERROR BIT(5)
#define TOF_ICC_IRQ_BGS_SYNCHRONIZATION_ACKNOWLEDGE BIT(24)
#define TOF_ICC_IRQ_BGS_ERROR_SYNCHRONIZATION_ACKNOWLEDGE BIT(25)
#define TOF_ICC_IRQ_BGS_DMA_COMPLETION_EXCEPTION BIT(26)
#define TOF_ICC_IRQ_TNI_PBQ_READ_EXCEPTION BIT(0)
#define TOF_ICC_IRQ_TNI_PBQ_MARKED_UE BIT(1)
#define TOF_ICC_IRQ_TNI_PBQ_UNDERFLOW BIT(2)
#define TOF_ICC_IRQ_TNI_PRQ_PACKET_DISCARD BIT(3)
#define TOF_ICC_IRQ_TNI_PRQ_WRITE_ACKNOWLEDGE BIT(4)
#define TOF_ICC_IRQ_TNI_PRQ_WRITE_EXCEPTION BIT(5)
#define TOF_ICC_IRQ_TNI_PRQ_OVERFLOW BIT(6)
#define TOF_ICC_IRQ_TNI_INACTIVE_BG BIT(16)
#define TOF_ICC_IRQ_TNI_STAGE2_TRANSLATION_FAULT BIT(32)
#define TOF_ICC_IRQ_TNR_TNR0_RX_FILTER_OUT BIT(0)
#define TOF_ICC_IRQ_TNR_TNR0_TX_FILTER_OUT BIT(1)
#define TOF_ICC_IRQ_TNR_TNR0_PORT_ERROR BIT(2)
#define TOF_ICC_IRQ_TNR_TNR0_DATELINE_ERROR BIT(3)
#define TOF_ICC_IRQ_TNR_TNR0_ROUTING_ERROR BIT(4)
#define TOF_ICC_IRQ_TNR_TNR1_RX_FILTER_OUT BIT(6)
#define TOF_ICC_IRQ_TNR_TNR1_TX_FILTER_OUT BIT(7)
#define TOF_ICC_IRQ_TNR_TNR1_PORT_ERROR BIT(8)
#define TOF_ICC_IRQ_TNR_TNR1_DATELINE_ERROR BIT(9)
#define TOF_ICC_IRQ_TNR_TNR1_ROUTING_ERROR BIT(10)
#define TOF_ICC_IRQ_TNR_TNR2_RX_FILTER_OUT BIT(12)
#define TOF_ICC_IRQ_TNR_TNR2_TX_FILTER_OUT BIT(13)
#define TOF_ICC_IRQ_TNR_TNR2_PORT_ERROR BIT(14)
#define TOF_ICC_IRQ_TNR_TNR2_DATELINE_ERROR BIT(15)
#define TOF_ICC_IRQ_TNR_TNR2_ROUTING_ERROR BIT(16)
#define TOF_ICC_IRQ_TNR_TNR3_RX_FILTER_OUT BIT(18)
#define TOF_ICC_IRQ_TNR_TNR3_TX_FILTER_OUT BIT(19)
#define TOF_ICC_IRQ_TNR_TNR3_PORT_ERROR BIT(20)
#define TOF_ICC_IRQ_TNR_TNR3_DATELINE_ERROR BIT(21)
#define TOF_ICC_IRQ_TNR_TNR3_ROUTING_ERROR BIT(22)
#define TOF_ICC_IRQ_TNR_TNR4_RX_FILTER_OUT BIT(24)
#define TOF_ICC_IRQ_TNR_TNR4_TX_FILTER_OUT BIT(25)
#define TOF_ICC_IRQ_TNR_TNR4_PORT_ERROR BIT(26)
#define TOF_ICC_IRQ_TNR_TNR4_DATELINE_ERROR BIT(27)
#define TOF_ICC_IRQ_TNR_TNR4_ROUTING_ERROR BIT(28)
#define TOF_ICC_IRQ_TNR_TNR5_RX_FILTER_OUT BIT(30)
#define TOF_ICC_IRQ_TNR_TNR5_TX_FILTER_OUT BIT(31)
#define TOF_ICC_IRQ_TNR_TNR5_PORT_ERROR BIT(32)
#define TOF_ICC_IRQ_TNR_TNR5_DATELINE_ERROR BIT(33)
#define TOF_ICC_IRQ_TNR_TNR5_ROUTING_ERROR BIT(34)
#define TOF_ICC_IRQ_TNR_TNR6_RX_FILTER_OUT BIT(36)
#define TOF_ICC_IRQ_TNR_TNR6_TX_FILTER_OUT BIT(37)
#define TOF_ICC_IRQ_TNR_TNR6_PORT_ERROR BIT(38)
#define TOF_ICC_IRQ_TNR_TNR6_DATELINE_ERROR BIT(39)
#define TOF_ICC_IRQ_TNR_TNR6_ROUTING_ERROR BIT(40)
#define TOF_ICC_IRQ_TNR_TNR7_RX_FILTER_OUT BIT(42)
#define TOF_ICC_IRQ_TNR_TNR7_TX_FILTER_OUT BIT(43)
#define TOF_ICC_IRQ_TNR_TNR7_PORT_ERROR BIT(44)
#define TOF_ICC_IRQ_TNR_TNR7_DATELINE_ERROR BIT(45)
#define TOF_ICC_IRQ_TNR_TNR7_ROUTING_ERROR BIT(46)
#define TOF_ICC_IRQ_TNR_TNR8_RX_FILTER_OUT BIT(48)
#define TOF_ICC_IRQ_TNR_TNR8_TX_FILTER_OUT BIT(49)
#define TOF_ICC_IRQ_TNR_TNR8_PORT_ERROR BIT(50)
#define TOF_ICC_IRQ_TNR_TNR8_DATELINE_ERROR BIT(51)
#define TOF_ICC_IRQ_TNR_TNR8_ROUTING_ERROR BIT(52)
#define TOF_ICC_IRQ_TNR_TNR9_RX_FILTER_OUT BIT(54)
#define TOF_ICC_IRQ_TNR_TNR9_TX_FILTER_OUT BIT(55)
#define TOF_ICC_IRQ_TNR_TNR9_PORT_ERROR BIT(56)
#define TOF_ICC_IRQ_TNR_TNR9_DATELINE_ERROR BIT(57)
#define TOF_ICC_IRQ_TNR_TNR9_ROUTING_ERROR BIT(58)
#endif
/* vim: set noet ts=8 sw=8 sts=0 tw=0 : */

Some files were not shown because too many files have changed in this diff Show More