you are viewing a single comment's thread.

view the rest of the comments →

[–]xoenix[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (1 child)

Roughly, the “safety” architecture designed around image generation (slightly different than text) looks like this: a user makes a request for an image in the chat interface, which Gemini — once it realizes it’s being asked for a picture — sends on to a smaller LLM that exists specifically for rewriting prompts in keeping with the company’s thorough “diversity” mandates. This smaller LLM is trained with LoRA on synthetic data generated by another (third) LLM that uses Google’s full, pages-long diversity “preamble.” The second LLM then rephrases the question (say, “show me an auto mechanic” becomes “show me an Asian auto mechanic in overalls laughing, an African American female auto mechanic holding a wrench, a Native American auto mechanic with a hard hat” etc.), and sends it on to the diffusion model. The diffusion model checks to make sure the prompts don’t violate standard safety policy (things like self-harm, anything with children, images of real people), generates the images, checks the images again for violations of safety policy, and returns them to the user.

“Three entire models all kind of designed for adding diversity,” I asked one person close to the safety architecture. “It seems like that — diversity — is a huge, maybe even central part of the product. Like, in a way it is the product?”

“Yes,” he said, “we spend probably half of our engineering hours on this.”

If you could remove the layer of shit on top of the core product, maybe they'd have something useful. But now I'm beginning to wonder if it's just an unremarkable AI that will be matched or surpassed by unrestricted open source AIs.

[–]OuroborosTheory 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

this is just the Village People