* Implement Matrix Multiplication * 1 warp independance * automatic warp detection * finish python compiler * print buffers * Support for compressed