it's been around couple of weeks since i picked up deep learning again. last time i did it was back in college.
i started with andrej's nn-zero-to-hero series and i couldn't be more thankful that this exists.
for this entire week i have been trying to make a song generation model that generates hindi songs based on a text prompt
this was quite a task to pick up given my non existent background in audio engineering + noivceness in DL.
experimented a lot with AudioLM repo by lucidrains but couldn't get it to run reliably on multi GPU training.
also, i hate any and all kind of online dev IDEs - they all are super slow and suck so hard
experimented a lot with bark from suno ai and also tried finetuning it with hf transformers. encountered a lot of bugs. will take a shot at it after doing some other dummy-finetunes as a way to familirise with hf datasets, tokenizers and transformer sdks.
today i am going to try and finetune harmonai/dance-diffuer and play around with it. let's see how it goes.