kernels

Author	SHA1	Message	Date
Hansung Kim	87a1c2bbfc	Cores per cluster 4 to 8	2024-09-05 16:23:32 -07:00
Hansung Kim	dcd69ea304	Increase SMEM size to 256KB	2024-09-05 16:23:32 -07:00
Hansung Kim	68eb271916	Add operand.c to the link script	2024-08-19 18:09:16 -07:00
Richard Yan	ea4819702e	oopsie doopsie	2024-08-06 02:43:27 -07:00
Richard Yan	f73029889b	oopsie	2024-06-12 13:34:19 -07:00
Hansung Kim	ca7fd84a83	sgemm_tcore: Split util functions to a header file	2024-06-11 19:06:22 -07:00
Hansung Kim	fc8f0c99f0	Merge branch 'tensor_core' into kernels	2024-06-07 18:27:02 -07:00
Hansung Kim	2cac995db9	tensor: generate 8x8 in correctness script	2024-06-07 18:13:57 -07:00
Richard Yan	7cf59c9480	dma and demo kernels	2024-06-07 18:11:19 -07:00
Hansung Kim	483f975439	Merge branch 'kernels' into tensor_core	2024-06-07 16:27:01 -07:00
Hansung Kim	d5adacda30	Add args.bin to ELF Change KERNEL_ARG_DEV_MEM_ADDR for sgemm_{wg,gemmini,tcore}	2024-06-06 15:19:39 -07:00
Richard Yan	33066af56e	cisc gemmini	2024-05-08 15:46:20 -07:00
Hansung Kim	6ba6a1e2e5	Merge branch 'kernels' into tensor_core	2024-05-08 13:25:31 -07:00
joshua	5bd25985c6	i kinda forgot most of changes	2024-05-04 23:01:47 -07:00
Richard Yan	01f4a69ae9	dma mvout, double buffering & other opts	2024-04-28 01:18:51 -07:00
Richard Yan	a44edf2b65	Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels	2024-04-24 22:10:40 -07:00
Richard Yan	6eafa2de54	write operands to elf	2024-04-24 22:09:30 -07:00
Hansung Kim	6cbfbfb856	sgemm_wg: Output CPU data to binary	2024-04-24 21:10:21 -07:00
Richard Yan	449d99f0bb	dram gemm kernel	2024-04-16 17:15:22 -07:00
Hansung Kim	b0c1f77388	vx_start.S: Swizzle stack space Striding stack space for threads by power-of-two risks possibilities of bank conflicts or cache aliasing problems. Add an extra offset of 4 bytes to avoid this.	2024-03-29 12:26:14 -07:00
Hansung Kim	e4eec8ab4d	vx_spawn.c: Handle num_clusters > 1 WIP: still assumes num_tasks is divisible by num_cluster	2024-03-28 20:16:44 -07:00
Hansung Kim	870846f20f	vx_spawn.c: Create separate vx_spawn_tasks_contiguous	2024-03-27 15:38:52 -07:00
Hansung Kim	4e834f2103	vx_spawn.c: Rewrite cluster-based vx_spawn_tasks variant Implements round-robin allocation of warps to cores & maintains contiguous thread ID allocation to neighboring threads. Also handles partially-enabled remainder warp logic. TODO: Hardcodes only 1 cluster in the system.	2024-03-27 15:14:45 -07:00
Hansung Kim	df1f7f242a	vx_spawn.c: Implement spawn_tasks_cluster_rem_stub	2024-03-27 00:00:44 -07:00
Hansung Kim	3729a05adc	vx_spawn.c: Separate cluster-based scheduling code from original	2024-03-26 16:36:57 -07:00
Hansung Kim	f050a08d77	Write vx_spawn_tasks_cluster This scheduling logic tries to evenly distribute warps across all cores, instead of trying to fill up the first cores as much as possible. This scheme is necessary for the intra-cluster cores which are assumed to have equal workloads distributed.	2024-03-26 10:45:14 -07:00
Hansung Kim	7d177492b2	Move CORES_PER_CLUSTER to vx_spawn.h	2024-03-24 01:45:30 -07:00
Hansung Kim	f590c4b417	Add vx_spawn.h as dependency to kernel/Makefile	2024-03-24 01:44:49 -07:00
Hansung Kim	12ee2a3a0f	Write cluster-aware thread scheduling NOTE: cores per cluster is hardcoded as a constant	2024-03-18 16:40:02 -07:00
Hansung Kim	a2ea27b2b5	vx_spawn: Add spawn_tasks_contiguous_all_stub Spawns tasks in a way that the threads in a warp see contiguous thread_id, unlike the original variant where each thread were allocated a range of thread_id that spans the number of batches. E.g. in a 4-thread config, instead of mapping IDs (0,2,4,6)->(1,3,5,7), map (0,1,2,3)->(4,5,6,7). TODO remaining logic not implemented.	2024-02-27 15:46:02 -08:00
Blaise Tine	4e7a536918	adding tensor regression test.	2023-11-14 05:37:46 -08:00
Blaise Tine	62cdd8e993	minor update	2023-11-11 15:49:39 -08:00
Blaise Tine	c1e168fdbe	Vortex 2.0 changes: + Microarchitecture optimizations + 64-bit support + Xilinx FPGA support + LLVM-16 support + Refactoring and quality control fixes minor update minor update minor update minor update minor update minor update cleanup cleanup cache bindings and memory perf refactory minor update minor update hw unit tests fixes minor update minor update minor update minor update minor update minor udpate minor update minor update minor update minor update minor update minor update minor update minor updates minor updates minor update minor update minor update minor update minor update minor update minor updates minor updates minor updates minor updates minor update minor update	2023-11-10 02:47:05 -08:00
felsabbagh3	9a0c5e0dbc	Removed kernel	2019-11-07 00:15:07 -05:00
felsabbagh3	87ae5c8cdf	Fixed emulator	2019-11-06 23:30:07 -05:00
felsabbagh3	46b09028d0	Added runtime (kernel 2.0)	2019-10-30 23:40:01 -04:00
felsabbagh3	7863175233	Set associative bank working	2019-10-30 14:57:20 -04:00
felsabbagh3	3b49b82c46	GPR ASIC Working	2019-10-29 23:20:16 -04:00
felsabbagh3	4aa04e76e6	Simulate debug	2019-10-29 14:28:20 -04:00
felsabbagh3	0ee74bc566	migrated 100% to modelsim	2019-10-27 20:08:44 -04:00
felsabbagh3	715982cca7	Modelsim Working + Simulating + dumping - Some bugs	2019-10-27 03:36:02 -04:00
felsabbagh3	89d0390965	CACHE FINALLY WORKING	2019-10-25 04:01:23 -04:00
felsabbagh3	01efe02e8b	CACHE WORKING just needs lb/sb	2019-10-25 03:03:09 -04:00
felsabbagh3	1e648c5819	FIxed first circular issue	2019-10-24 10:38:04 -04:00
felsabbagh3	1645a04b1d	Fixed SM + added def SYN	2019-10-22 15:56:30 -04:00
felsabbagh3	b7af8c3f34	Integrated Shared Memory	2019-10-22 05:03:47 -04:00
felsabbagh3	b3f464dd89	Barriers impl + tested	2019-10-22 01:47:39 -04:00
felsabbagh3	31d3d51392	WSPAWN imp + tested	2019-10-21 23:35:53 -04:00
felsabbagh3	b6375e76de	Readded IPDOM stack + SPLIT/Join tested	2019-10-21 21:24:49 -04:00
felsabbagh3	84f5ccb484	Added CSR TID/WID reads	2019-10-21 02:10:05 -04:00

1 2

71 Commits