Commit Graph

68 Commits

Author SHA1 Message Date
Richard Yan
ea4819702e oopsie doopsie 2024-08-06 02:43:27 -07:00
Richard Yan
f73029889b oopsie 2024-06-12 13:34:19 -07:00
Hansung Kim
ca7fd84a83 sgemm_tcore: Split util functions to a header file 2024-06-11 19:06:22 -07:00
Hansung Kim
fc8f0c99f0 Merge branch 'tensor_core' into kernels 2024-06-07 18:27:02 -07:00
Hansung Kim
2cac995db9 tensor: generate 8x8 in correctness script 2024-06-07 18:13:57 -07:00
Richard Yan
7cf59c9480 dma and demo kernels 2024-06-07 18:11:19 -07:00
Hansung Kim
483f975439 Merge branch 'kernels' into tensor_core 2024-06-07 16:27:01 -07:00
Hansung Kim
d5adacda30 Add args.bin to ELF
Change KERNEL_ARG_DEV_MEM_ADDR for sgemm_{wg,gemmini,tcore}
2024-06-06 15:19:39 -07:00
Richard Yan
33066af56e cisc gemmini 2024-05-08 15:46:20 -07:00
Hansung Kim
6ba6a1e2e5 Merge branch 'kernels' into tensor_core 2024-05-08 13:25:31 -07:00
joshua
5bd25985c6 i kinda forgot most of changes 2024-05-04 23:01:47 -07:00
Richard Yan
01f4a69ae9 dma mvout, double buffering & other opts 2024-04-28 01:18:51 -07:00
Richard Yan
a44edf2b65 Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels 2024-04-24 22:10:40 -07:00
Richard Yan
6eafa2de54 write operands to elf 2024-04-24 22:09:30 -07:00
Hansung Kim
6cbfbfb856 sgemm_wg: Output CPU data to binary 2024-04-24 21:10:21 -07:00
Richard Yan
449d99f0bb dram gemm kernel 2024-04-16 17:15:22 -07:00
Hansung Kim
b0c1f77388 vx_start.S: Swizzle stack space
Striding stack space for threads by power-of-two risks possibilities of bank
conflicts or cache aliasing problems.  Add an extra offset of 4 bytes to avoid
this.
2024-03-29 12:26:14 -07:00
Hansung Kim
e4eec8ab4d vx_spawn.c: Handle num_clusters > 1
WIP: still assumes num_tasks is divisible by num_cluster
2024-03-28 20:16:44 -07:00
Hansung Kim
870846f20f vx_spawn.c: Create separate vx_spawn_tasks_contiguous 2024-03-27 15:38:52 -07:00
Hansung Kim
4e834f2103 vx_spawn.c: Rewrite cluster-based vx_spawn_tasks variant
Implements round-robin allocation of warps to cores & maintains contiguous
thread ID allocation to neighboring threads.  Also handles partially-enabled
remainder warp logic.

TODO: Hardcodes only 1 cluster in the system.
2024-03-27 15:14:45 -07:00
Hansung Kim
df1f7f242a vx_spawn.c: Implement spawn_tasks_cluster_rem_stub 2024-03-27 00:00:44 -07:00
Hansung Kim
3729a05adc vx_spawn.c: Separate cluster-based scheduling code from original 2024-03-26 16:36:57 -07:00
Hansung Kim
f050a08d77 Write vx_spawn_tasks_cluster
This scheduling logic tries to evenly distribute warps across *all* cores,
instead of trying to fill up the first cores as much as possible.  This scheme
is necessary for the intra-cluster cores which are assumed to have equal
workloads distributed.
2024-03-26 10:45:14 -07:00
Hansung Kim
7d177492b2 Move CORES_PER_CLUSTER to vx_spawn.h 2024-03-24 01:45:30 -07:00
Hansung Kim
f590c4b417 Add vx_spawn.h as dependency to kernel/Makefile 2024-03-24 01:44:49 -07:00
Hansung Kim
12ee2a3a0f Write cluster-aware thread scheduling
NOTE: cores per cluster is hardcoded as a constant
2024-03-18 16:40:02 -07:00
Hansung Kim
a2ea27b2b5 vx_spawn: Add spawn_tasks_contiguous_all_stub
Spawns tasks in a way that the threads in a warp see contiguous
thread_id, unlike the original variant where each thread were allocated
a range of thread_id that spans the number of batches.

E.g. in a 4-thread config, instead of mapping IDs (0,2,4,6)->(1,3,5,7),
map (0,1,2,3)->(4,5,6,7).

TODO remaining logic not implemented.
2024-02-27 15:46:02 -08:00
Blaise Tine
4e7a536918 adding tensor regression test. 2023-11-14 05:37:46 -08:00
Blaise Tine
62cdd8e993 minor update 2023-11-11 15:49:39 -08:00
Blaise Tine
c1e168fdbe Vortex 2.0 changes:
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes

minor update

minor update

minor update

minor update

minor update

minor update

cleanup

cleanup

cache bindings and memory perf refactory

minor update

minor update

hw unit tests fixes

minor update

minor update

minor update

minor update

minor update

minor udpate

minor update

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor updates

minor updates

minor update

minor update
2023-11-10 02:47:05 -08:00
felsabbagh3
9a0c5e0dbc Removed kernel 2019-11-07 00:15:07 -05:00
felsabbagh3
87ae5c8cdf Fixed emulator 2019-11-06 23:30:07 -05:00
felsabbagh3
46b09028d0 Added runtime (kernel 2.0) 2019-10-30 23:40:01 -04:00
felsabbagh3
7863175233 Set associative bank working 2019-10-30 14:57:20 -04:00
felsabbagh3
3b49b82c46 GPR ASIC Working 2019-10-29 23:20:16 -04:00
felsabbagh3
4aa04e76e6 Simulate debug 2019-10-29 14:28:20 -04:00
felsabbagh3
0ee74bc566 migrated 100% to modelsim 2019-10-27 20:08:44 -04:00
felsabbagh3
715982cca7 Modelsim Working + Simulating + dumping - Some bugs 2019-10-27 03:36:02 -04:00
felsabbagh3
89d0390965 CACHE FINALLY WORKING 2019-10-25 04:01:23 -04:00
felsabbagh3
01efe02e8b CACHE WORKING just needs lb/sb 2019-10-25 03:03:09 -04:00
felsabbagh3
1e648c5819 FIxed first circular issue 2019-10-24 10:38:04 -04:00
felsabbagh3
1645a04b1d Fixed SM + added def SYN 2019-10-22 15:56:30 -04:00
felsabbagh3
b7af8c3f34 Integrated Shared Memory 2019-10-22 05:03:47 -04:00
felsabbagh3
b3f464dd89 Barriers impl + tested 2019-10-22 01:47:39 -04:00
felsabbagh3
31d3d51392 WSPAWN imp + tested 2019-10-21 23:35:53 -04:00
felsabbagh3
b6375e76de Readded IPDOM stack + SPLIT/Join tested 2019-10-21 21:24:49 -04:00
felsabbagh3
84f5ccb484 Added CSR TID/WID reads 2019-10-21 02:10:05 -04:00
felsabbagh3
62db9ae691 minor 2019-10-17 12:04:06 -04:00
felsabbagh3
8bc3b8b0a5 Need to link SystemC for sc_time_stamp() 2019-10-14 23:25:14 -04:00
felsabbagh3
ee83e6d8c8 Moved GPR to back-end 2019-10-14 19:08:32 -04:00