profile 结果

This commit is contained in:
2026-02-10 07:02:20 +00:00
parent b0ebb7006e
commit db848bca01
6 changed files with 72 additions and 15 deletions

1
.gitignore vendored
View File

@@ -130,3 +130,4 @@ Experiment/log
*.ckpt
*.0
unitree_z1_dual_arm_cleanup_pencils/case1/profile_output/traces/wx-ms-w7900d-0032_742306.1770698186047591119.pt.trace.json

View File

@@ -1,14 +1,14 @@
/mnt/ASC1637/miniconda3/envs/unifolm-wma-o/lib/python3.10/site-packages/lightning_fabric/__init__.py:29: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
__import__("pkg_resources").declare_namespace(__name__)
2026-02-09 18:39:50.119842: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-09 18:39:50.123128: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-09 18:39:50.156652: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-02-09 18:39:50.156708: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-02-09 18:39:50.158926: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-02-09 18:39:50.167779: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-09 18:39:50.168073: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
2026-02-10 06:42:14.444321: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-10 06:42:14.447338: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-10 06:42:14.478442: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-02-10 06:42:14.478474: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-02-10 06:42:14.480279: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-02-10 06:42:14.488343: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-10 06:42:14.488598: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-02-09 18:39:50.915144: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2026-02-10 06:42:15.109100: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[rank: 0] Global seed set to 123
/mnt/ASC1637/miniconda3/envs/unifolm-wma-o/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
@@ -25,7 +25,7 @@ INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
INFO:root:Loaded ViT-H-14 model config.
DEBUG:urllib3.connectionpool:https://hf-mirror.com:443 "HEAD /laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin HTTP/1.1" 302 0
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/world_model_interaction.py:198: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/world_model_interaction.py:199: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(ckpt, map_location="cpu")
>>> model checkpoint loaded.
>>> Load pre-trained model ...
@@ -116,7 +116,7 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
DEBUG:PIL.Image:Importing WmfImagePlugin
DEBUG:PIL.Image:Importing XbmImagePlugin
DEBUG:PIL.Image:Importing XpmImagePlugin
DEBUG:PIL.Image:Importing XVThumbImagePlugin
DEBUG:PIL.Image:Importing XVThumbImagePlugin
12%|█▎ | 1/8 [01:08<07:56, 68.03s/it]
25%|██▌ | 2/8 [02:12<06:35, 65.91s/it]
@@ -140,6 +140,6 @@ DEBUG:PIL.Image:Importing XVThumbImagePlugin
>>> Step 4: generating actions ...
>>> Step 4: interacting with world model ...
>>>>>>>>>>>>>>>>>>>>>>>>
>>> Step 5: generating actions ...
>>> Step 5: interacting with world model ...
>>>>>>>>>>>>>>>>>>>>>>>>
>>> Step 5: generating actions ...
>>> Step 5: interacting with world model ...
>>>>>>>>>>>>>>>>>>>>>>>>

View File

@@ -0,0 +1,5 @@
itr,stack_to_device_1,policy/ddim_sampler_init,policy/image_embedding,policy/vae_encode,policy/text_conditioning,policy/projectors,policy/cond_assembly,policy/ddim_sampling,policy/vae_decode,synth_policy,update_action_queue,stack_to_device_2,wm/ddim_sampler_init,wm/image_embedding,wm/vae_encode,wm/text_conditioning,wm/projectors,wm/cond_assembly,wm/ddim_sampling,wm/vae_decode,synth_world_model,update_obs_queue,tensorboard_log,save_results,cpu_transfer,itr_total
0,0.16,0.08,20.98,49.56,14.51,0.29,0.07,31005.48,0.00,31094.51,0.39,0.13,0.09,20.62,48.76,14.17,0.28,0.07,31011.17,775.40,31875.87,0.61,0.31,97.28,7.19,63077.50
1,0.16,0.09,20.97,49.63,14.52,0.30,0.07,31035.49,0.00,31125.16,0.54,0.17,0.14,21.46,49.26,14.88,0.49,0.12,31047.54,777.56,31918.60,0.75,0.60,109.89,6.21,63163.18
2,0.18,0.10,21.44,49.71,15.05,0.34,0.07,31047.64,0.00,31138.56,0.58,0.16,0.13,21.03,48.74,14.69,0.32,0.08,31036.47,776.96,31905.96,0.67,0.39,116.96,7.43,63171.90
3,0.18,0.10,21.38,49.47,15.02,0.35,0.08,31041.05,0.00,31132.03,0.48,0.16,0.12,20.81,49.34,14.41,0.47,0.11,31051.98,777.11,31920.42,0.64,0.38,121.67,7.29,63184.26
1 itr stack_to_device_1 policy/ddim_sampler_init policy/image_embedding policy/vae_encode policy/text_conditioning policy/projectors policy/cond_assembly policy/ddim_sampling policy/vae_decode synth_policy update_action_queue stack_to_device_2 wm/ddim_sampler_init wm/image_embedding wm/vae_encode wm/text_conditioning wm/projectors wm/cond_assembly wm/ddim_sampling wm/vae_decode synth_world_model update_obs_queue tensorboard_log save_results cpu_transfer itr_total
2 0 0.16 0.08 20.98 49.56 14.51 0.29 0.07 31005.48 0.00 31094.51 0.39 0.13 0.09 20.62 48.76 14.17 0.28 0.07 31011.17 775.40 31875.87 0.61 0.31 97.28 7.19 63077.50
3 1 0.16 0.09 20.97 49.63 14.52 0.30 0.07 31035.49 0.00 31125.16 0.54 0.17 0.14 21.46 49.26 14.88 0.49 0.12 31047.54 777.56 31918.60 0.75 0.60 109.89 6.21 63163.18
4 2 0.18 0.10 21.44 49.71 15.05 0.34 0.07 31047.64 0.00 31138.56 0.58 0.16 0.13 21.03 48.74 14.69 0.32 0.08 31036.47 776.96 31905.96 0.67 0.39 116.96 7.43 63171.90
5 3 0.18 0.10 21.38 49.47 15.02 0.35 0.08 31041.05 0.00 31132.03 0.48 0.16 0.12 20.81 49.34 14.41 0.47 0.11 31051.98 777.11 31920.42 0.64 0.38 121.67 7.29 63184.26

View File

@@ -0,0 +1,5 @@
stat,stack_to_device_1,policy/ddim_sampler_init,policy/image_embedding,policy/vae_encode,policy/text_conditioning,policy/projectors,policy/cond_assembly,policy/ddim_sampling,policy/vae_decode,synth_policy,update_action_queue,stack_to_device_2,wm/ddim_sampler_init,wm/image_embedding,wm/vae_encode,wm/text_conditioning,wm/projectors,wm/cond_assembly,wm/ddim_sampling,wm/vae_decode,synth_world_model,update_obs_queue,tensorboard_log,save_results,cpu_transfer,itr_total
mean,0.17,0.09,21.19,49.59,14.78,0.32,0.07,31032.42,0.00,31122.56,0.49,0.15,0.12,20.98,49.03,14.53,0.39,0.10,31036.79,776.76,31905.21,0.67,0.42,111.45,7.03,63149.21
std,0.01,0.01,0.22,0.09,0.26,0.03,0.00,16.13,0.00,16.88,0.07,0.01,0.02,0.31,0.28,0.27,0.09,0.02,15.83,0.82,17.84,0.05,0.11,9.19,0.48,42.08
min,0.16,0.08,20.97,49.47,14.51,0.29,0.07,31005.48,0.00,31094.51,0.39,0.13,0.09,20.62,48.74,14.17,0.28,0.07,31011.17,775.40,31875.87,0.61,0.31,97.28,6.21,63077.50
max,0.18,0.10,21.44,49.71,15.05,0.35,0.08,31047.64,0.00,31138.56,0.58,0.17,0.14,21.46,49.34,14.88,0.49,0.12,31051.98,777.56,31920.42,0.75,0.60,121.67,7.43,63184.26
1 stat stack_to_device_1 policy/ddim_sampler_init policy/image_embedding policy/vae_encode policy/text_conditioning policy/projectors policy/cond_assembly policy/ddim_sampling policy/vae_decode synth_policy update_action_queue stack_to_device_2 wm/ddim_sampler_init wm/image_embedding wm/vae_encode wm/text_conditioning wm/projectors wm/cond_assembly wm/ddim_sampling wm/vae_decode synth_world_model update_obs_queue tensorboard_log save_results cpu_transfer itr_total
2 mean 0.17 0.09 21.19 49.59 14.78 0.32 0.07 31032.42 0.00 31122.56 0.49 0.15 0.12 20.98 49.03 14.53 0.39 0.10 31036.79 776.76 31905.21 0.67 0.42 111.45 7.03 63149.21
3 std 0.01 0.01 0.22 0.09 0.26 0.03 0.00 16.13 0.00 16.88 0.07 0.01 0.02 0.31 0.28 0.27 0.09 0.02 15.83 0.82 17.84 0.05 0.11 9.19 0.48 42.08
4 min 0.16 0.08 20.97 49.47 14.51 0.29 0.07 31005.48 0.00 31094.51 0.39 0.13 0.09 20.62 48.74 14.17 0.28 0.07 31011.17 775.40 31875.87 0.61 0.31 97.28 6.21 63077.50
5 max 0.18 0.10 21.44 49.71 15.05 0.35 0.08 31047.64 0.00 31138.56 0.58 0.17 0.14 21.46 49.34 14.88 0.49 0.12 31051.98 777.56 31920.42 0.75 0.60 121.67 7.43 63184.26

View File

@@ -0,0 +1,45 @@
/mnt/ASC1637/miniconda3/envs/unifolm-wma-o/lib/python3.10/site-packages/lightning_fabric/__init__.py:29: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
__import__("pkg_resources").declare_namespace(__name__)
[rank: 0] Global seed set to 123
/mnt/ASC1637/miniconda3/envs/unifolm-wma-o/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
/mnt/ASC1637/miniconda3/envs/unifolm-wma-o/lib/python3.10/site-packages/open_clip/factory.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=map_location)
/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/profile_iteration.py:168: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(args.ckpt_path, map_location="cpu")
============================================================
PROFILE ITERATION — Loading model...
============================================================
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
torch.compile: 3 ResBlocks in output_blocks[5, 8, 9]
>>> Model loaded and ready.
>>> Noise shape: [1, 4, 16, 40, 64]
>>> DDIM steps: 50
>>> fast_policy_no_decode: True
============================================================
LAYER 1: ITERATION-LEVEL PROFILING
============================================================
>>> unitree_z1_stackbox: 1 data samples loaded.
>>> unitree_z1_stackbox: data stats loaded.
>>> unitree_z1_stackbox: normalizer initiated.
>>> unitree_z1_dual_arm_stackbox: 1 data samples loaded.
>>> unitree_z1_dual_arm_stackbox: data stats loaded.
>>> unitree_z1_dual_arm_stackbox: normalizer initiated.
>>> unitree_z1_dual_arm_stackbox_v2: 1 data samples loaded.
>>> unitree_z1_dual_arm_stackbox_v2: data stats loaded.
>>> unitree_z1_dual_arm_stackbox_v2: normalizer initiated.
>>> unitree_z1_dual_arm_cleanup_pencils: 1 data samples loaded.
>>> unitree_z1_dual_arm_cleanup_pencils: data stats loaded.
>>> unitree_z1_dual_arm_cleanup_pencils: normalizer initiated.
>>> unitree_g1_pack_camera: 1 data samples loaded.
>>> unitree_g1_pack_camera: data stats loaded.
>>> unitree_g1_pack_camera: normalizer initiated.
>>> Running 5 profiled iterations ...
Traceback (most recent call last):
File "/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/profile_iteration.py", line 981, in <module>
main()
File "/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/profile_iteration.py", line 967, in main
all_records = run_profiled_iterations(
File "/mnt/ASC1637/unifolm-world-model-action/scripts/evaluation/profile_iteration.py", line 502, in run_profiled_iterations
sampler_type=args.sampler_type)
AttributeError: 'Namespace' object has no attribute 'sampler_type'

View File

@@ -22,5 +22,6 @@ dataset="unitree_z1_dual_arm_cleanup_pencils"
--guidance_rescale 0.7 \
--perframe_ae \
--vae_dtype bf16 \
--fast_policy_no_decode
--fast_policy_no_decode \
--sampler_type unipc
} 2>&1 | tee "${res_dir}/output.log"