Hi guys!
I want to train a CLIP model for Ukrainian and would like to replace a text encoder with pretrained Ukrainian text encoder. I found a couple of discussions about CLIP implemented in Flax for the Spanish and Korean languages and a research papers about replacing CLIP text encoder for other languages (AltCLIP), but they all kinda reimplement CLIP for that.
Is there an easy way to initialize a CLIPModel but with custom text encoder (available on hf hub)? Same question for replacing the tokenizer in the CLIPProcessor.
I would like something like this:
model = CLIPModel(text_encoder=my_text_encoder, image_encoder=my_image_encoder)
processor = CLIPProcessor(tokenizer=my_tokenizer, image_processor=my_image_processor)