docs: lift limitations and fix ppn example
Change-Id: Id78e7db09767d5dd8a3dc5b9f911b9026608b021
This commit is contained in:
committed by
Masamichi Takagi
parent
44261678f7
commit
e3493bd0be
@ -87,14 +87,14 @@ executable:
|
|||||||
``<processes-per-node>`` is the number of the processes per node and
|
``<processes-per-node>`` is the number of the processes per node and
|
||||||
calculated by (number of MPI processes) / (number of nodes).
|
calculated by (number of MPI processes) / (number of nodes).
|
||||||
|
|
||||||
For example, ``<processes-per-node>`` equals to 4 (=32/8) when
|
For example, ``<processes-per-node>`` equals to 4 (=8/2) when
|
||||||
specifying the number of processes and nodes as follows with
|
specifying the number of processes and nodes as follows with
|
||||||
Fujitsu Technical Computing Suite.
|
MPICH.
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
#PJM --mpi "proc=32"
|
mpirun -n 8 -hosts host1,host2 ./cpi
|
||||||
#PJM -L "node=8"
|
|
||||||
|
|
||||||
(Advanced) When using Utility Thread offloading Interface (UTI)
|
(Advanced) When using Utility Thread offloading Interface (UTI)
|
||||||
---------------------------------------------------------------
|
---------------------------------------------------------------
|
||||||
@ -112,11 +112,11 @@ Add ``--enable-uti`` option to ``mcexec``:
|
|||||||
Limitations
|
Limitations
|
||||||
===========
|
===========
|
||||||
|
|
||||||
1. Pseudo devices such as /dev/mem and /dev/zero are not mmap()ed
|
#. Pseudo devices such as /dev/mem and /dev/zero are not mmap()ed
|
||||||
correctly even if the mmap() returns a success. An access of their
|
correctly even if the mmap() returns a success. An access of their
|
||||||
mapping receives the SIGSEGV signal.
|
mapping receives the SIGSEGV signal.
|
||||||
|
|
||||||
2. clone() supports only the following flags. All the other flags cause
|
#. clone() supports only the following flags. All the other flags cause
|
||||||
clone() to return error or are simply ignored.
|
clone() to return error or are simply ignored.
|
||||||
|
|
||||||
- CLONE_CHILD_CLEARTID
|
- CLONE_CHILD_CLEARTID
|
||||||
@ -126,32 +126,32 @@ Limitations
|
|||||||
- CLONE_SIGHAND
|
- CLONE_SIGHAND
|
||||||
- CLONE_VM
|
- CLONE_VM
|
||||||
|
|
||||||
3. PAPI has the following restriction.
|
#. PAPI has the following restriction.
|
||||||
|
|
||||||
- Number of counters a user can use at the same time is up to the
|
- Number of counters a user can use at the same time is up to the
|
||||||
number of the physical counters in the processor.
|
number of the physical counters in the processor.
|
||||||
|
|
||||||
4. msync writes back only the modified pages mapped by the calling
|
#. msync writes back only the modified pages mapped by the calling
|
||||||
process.
|
process.
|
||||||
|
|
||||||
5. The following syscalls always return the ENOSYS error.
|
#. The following syscalls always return the ENOSYS error.
|
||||||
|
|
||||||
- migrate_pages()
|
- migrate_pages()
|
||||||
- move_pages()
|
- move_pages()
|
||||||
- set_robust_list()
|
- set_robust_list()
|
||||||
|
|
||||||
6. The following syscalls always return the EOPNOTSUPP error.
|
#. The following syscalls always return the EOPNOTSUPP error.
|
||||||
|
|
||||||
- arch_prctl(ARCH_SET_GS)
|
- arch_prctl(ARCH_SET_GS)
|
||||||
- signalfd()
|
- signalfd()
|
||||||
|
|
||||||
7. signalfd4() returns a fd, but signal is not notified through the fd.
|
#. signalfd4() returns a fd, but signal is not notified through the fd.
|
||||||
|
|
||||||
8. set_rlimit sets the limit values but they are not enforced.
|
#. set_rlimit sets the limit values but they are not enforced.
|
||||||
|
|
||||||
9. Address randomization is not supported.
|
#. Address randomization is not supported.
|
||||||
|
|
||||||
10. brk() extends the heap more than requestd when -h (–extend-heap-by=)
|
#. brk() extends the heap more than requestd when -h (–extend-heap-by=)
|
||||||
option of mcexec is used with the value larger than 4 KiB.
|
option of mcexec is used with the value larger than 4 KiB.
|
||||||
syscall_pwrite02 of LTP would fail for this reason. This is because
|
syscall_pwrite02 of LTP would fail for this reason. This is because
|
||||||
the test expects that the end of the heap is set to the same address
|
the test expects that the end of the heap is set to the same address
|
||||||
@ -161,75 +161,68 @@ Limitations
|
|||||||
than the requested. Therefore, the expected segmentation violation
|
than the requested. Therefore, the expected segmentation violation
|
||||||
doesn’t occur.
|
doesn’t occur.
|
||||||
|
|
||||||
11. setpriority()/getpriority() won’t work. They might set/get the
|
#. setpriority()/getpriority() won’t work. They might set/get the
|
||||||
priority of a random mcexec thread. This is because there’s no fixed
|
priority of a random mcexec thread. This is because there’s no fixed
|
||||||
correspondence between a McKernel thread which issues the system
|
correspondence between a McKernel thread which issues the system
|
||||||
call and a mcexec thread which handles the offload request.
|
call and a mcexec thread which handles the offload request.
|
||||||
|
|
||||||
12. mbind() can set the policy but it is not used when allocating
|
#. mbind() can set the policy but it is not used when allocating
|
||||||
physical pages.
|
physical pages.
|
||||||
|
|
||||||
13. MPOL_F_RELATIVE_NODES and MPOL_INTERLEAVE flags for
|
#. MPOL_F_RELATIVE_NODES and MPOL_INTERLEAVE flags for
|
||||||
set_mempolicy()/mbind() are not supported.
|
set_mempolicy()/mbind() are not supported.
|
||||||
|
|
||||||
14. The MPOL_BIND policy for set_mempolicy()/mbind() works as the same
|
#. The MPOL_BIND policy for set_mempolicy()/mbind() works as the same
|
||||||
as the MPOL_PREFERRED policy. That is, the physical page allocator
|
as the MPOL_PREFERRED policy. That is, the physical page allocator
|
||||||
doesn’t give up the allocation when the specified nodes are running
|
doesn’t give up the allocation when the specified nodes are running
|
||||||
out of pages but continues to search pages in the other nodes.
|
out of pages but continues to search pages in the other nodes.
|
||||||
|
|
||||||
15. Kernel dump on Linux panic requires Linux kernel CentOS-7.4 and
|
#. Kernel dump on Linux panic requires Linux kernel CentOS-7.4 and
|
||||||
later. In addition, crash_kexec_post_notifiers kernel argument must
|
later. In addition, crash_kexec_post_notifiers kernel argument must
|
||||||
be given to Linux kernel.
|
be given to Linux kernel.
|
||||||
|
|
||||||
16. setfsuid()/setfsgid() cannot change the id of the calling thread.
|
#. setfsuid()/setfsgid() cannot change the id of the calling thread.
|
||||||
Instead, it changes that of the mcexec worker thread which takes the
|
Instead, it changes that of the mcexec worker thread which takes the
|
||||||
system-call offload request.
|
system-call offload request.
|
||||||
|
|
||||||
17. mmap (hugeTLBfs): The physical pages corresponding to a map are
|
#. mmap (hugeTLBfs): The physical pages corresponding to a map are
|
||||||
released when no McKernel process exist. The next map gets fresh
|
released when no McKernel process exist. The next map gets fresh
|
||||||
physical pages.
|
physical pages.
|
||||||
|
|
||||||
18. Sticky bit on executable file has no effect.
|
#. Sticky bit on executable file has no effect.
|
||||||
|
|
||||||
19. Linux (RHEL-7 for x86_64) could hang when offlining CPUs in the
|
#. Linux (RHEL-7 for x86_64) could hang when offlining CPUs in the
|
||||||
process of booting McKernel due to the Linux bug, found in
|
process of booting McKernel due to the Linux bug, found in
|
||||||
Linux-3.10 and fixed in the later version. One way to circumvent
|
Linux-3.10 and fixed in the later version. One way to circumvent
|
||||||
this is to always assign the same CPU set to McKernel.
|
this is to always assign the same CPU set to McKernel.
|
||||||
|
|
||||||
20. madvise:
|
#. madvise:
|
||||||
|
|
||||||
- MADV_HWPOISON and MADV_SOFT_OFFLINE always returns -EPERM.
|
- MADV_HWPOISON and MADV_SOFT_OFFLINE always returns -EPERM.
|
||||||
- MADV_MERGEABLE and MADV_UNMERGEABLE always returns -EINVAL.
|
- MADV_MERGEABLE and MADV_UNMERGEABLE always returns -EINVAL.
|
||||||
- MADV_HUGEPAGE and MADV_NOHUGEPAGE on file map returns -EINVAL
|
- MADV_HUGEPAGE and MADV_NOHUGEPAGE on file map returns -EINVAL
|
||||||
except on RHEL-8 for aarch64.
|
except on RHEL-8 for aarch64.
|
||||||
|
|
||||||
21. brk() and mmap() doesn’t report out-of-memory through its return
|
#. brk() and mmap() doesn’t report out-of-memory through its return
|
||||||
value. Instead, page-fault reports the error.
|
value. Instead, page-fault reports the error.
|
||||||
|
|
||||||
22. Anonymous mmap pre-maps requested number of pages when contiguous
|
#. Anonymous mmap pre-maps requested number of pages when contiguous
|
||||||
pages are available. Demand paging is used when not available.
|
pages are available. Demand paging is used when not available.
|
||||||
|
|
||||||
23. Mixing page sizes in anonymous shared mapping is not allowed. mmap
|
#. ihk_os_getperfevent() could time-out when invoked from Fujitsu TCS
|
||||||
creates vm_range with one page size. And munmap or mremap that needs
|
|
||||||
the reduced page size changes the sizes of all the pages of the
|
|
||||||
vm_range.
|
|
||||||
|
|
||||||
24. ihk_os_getperfevent() could time-out when invoked from Fujitsu TCS
|
|
||||||
(job-scheduler).
|
(job-scheduler).
|
||||||
|
|
||||||
25. The behaviors of madvise and mbind are changed to do nothing and
|
#. The behaviors of madvise and mbind are changed to do nothing and
|
||||||
report success as a workaround for Fugaku.
|
report success as a workaround for Fugaku.
|
||||||
|
|
||||||
26. mmap() allows unlimited overcommit. Note that it corresponds to
|
#. mmap() allows unlimited overcommit. Note that it corresponds to
|
||||||
setting sysctl ``vm.overcommit_memory`` to 1.
|
setting sysctl ``vm.overcommit_memory`` to 1.
|
||||||
|
|
||||||
27. mlockall() is not supported and returns -EPERM.
|
#. mlockall() is not supported and returns -EPERM.
|
||||||
|
|
||||||
28. munlockall() is not supported and returns zero.
|
#. munlockall() is not supported and returns zero.
|
||||||
|
|
||||||
29. scheduling behavior is not Linux compatible. For example, sometimes one of the two processes on the same CPU continues to run after yielding.
|
#. (Fujitsu TCS-only) A job following the one in which __mcctrl_os_read_write_cpu_register() returns ``-ETIME`` fails because xos_hwb related CPU state isn't finalized. You can tell if the function returned ``-ETIME`` by checking if the following line appeared in the Linux kernel message:
|
||||||
|
|
||||||
30. (Fujitsu TCS-only) A job following the one in which __mcctrl_os_read_write_cpu_register() returns ``-ETIME`` fails because xos_hwb related CPU state isn't finalized. You can tell if the function returned ``-ETIME`` by checking if the following line appeared in the Linux kernel message:
|
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
@ -241,11 +234,11 @@ Limitations
|
|||||||
|
|
||||||
sudo systemctl restart xos_hwb
|
sudo systemctl restart xos_hwb
|
||||||
|
|
||||||
31. System calls can write the mcexec VMAs with PROT_WRITE flag not
|
#. System calls can write the mcexec VMAs with PROT_WRITE flag not
|
||||||
set. This is because we never turn off PROT_WRITE of the mcexec
|
set. This is because we never turn off PROT_WRITE of the mcexec
|
||||||
VMAs to circumvent the issue "set_host_vma(): do NOT read protect
|
VMAs to circumvent the issue "set_host_vma(): do NOT read protect
|
||||||
Linux VMA".
|
Linux VMA".
|
||||||
|
|
||||||
32. procfs entry creation done by Linux work queue could starve when
|
#. procfs entry creation done by Linux work queue could starve when
|
||||||
Linux CPUs are flooded with system call offloads. LTP-2019
|
Linux CPUs are flooded with system call offloads. LTP-2019
|
||||||
sendmsg02 causes this issue.
|
sendmsg02 causes this issue.
|
||||||
|
|||||||
Reference in New Issue
Block a user