How can I use Multimodal API?

Googles Multimodal API relies on a websocket session.

The docs explicitly state this is server to server only, requiring an API key.

I therefore need a proxy server as middleware to stream the session to the client.

I’m going crazy trying to get this working inside NextJS with Vercel hosting. The only solution i can see is running my own node server and deploying that as a proxy.

Is there really no solution to being able to proxy websocket streams that is compatible with vercel hosting?

Hey, @ed52! Welcome to the community :smile:

I’m not familiar with Google’s Multimodal API (and haven’t tried it) but does this guide help?

Dont think so.

My implementation is all in the client, I just need to be able to connect a websocket to the Multimodal API, hiding my API key… adding an extra provider won’t help me here.

Ive read quite a lot about the issue

It seems a common problem but as far as i can tell my only solution is to self host and run a proxy server…

Means an awful lot of work (and cant use vercel hosting…)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.