Meta’s new model can understand images

AH Meta logo image 12

OpenAI has GPT, Google has Gemini, and xAI has Grok. All of the top AI companies in the industry have their flagship models, and Meta’s is Llama. On Wednesday, Meta announced its newest AI model named Llama 3.2, and this update gives the model a set of eyes.

Meta announced some pretty exciting stuff during its event yesterday like its new Orion glasses. Fans of the company are sure to be excited to see how the company wants to blend AI and AR (augmented reality) in inventive ways. Also, we got a look at the new Meta Quest 3s, a more affordable VR headset from the company.

Meta announced the new Llama 3.2 model, and it can understand images

One of the biggest steps that an AI company needs to take is making its models multi-modal. This means that it can understand and create different types of media. So, a model that can process both text and videos is considered multi-modal.

The ability to understand images gives a model some major advantages. For starters, the model will be able to see a live video feed and understand what it sees. This is something that can greatly enhance the AR experience. As pointed out by The Verge, developers will be able to use the model when developing AR apps that require a real-time understanding of its surroundings.

There are different models associated with Llama 3.2, and they’ll have different applications. Two of them are vision models with one having 11 billion parameters and the other one having 90 billion. Along with those, there are two text-only models, one with 1 billion parameters and one with 3 billion. Much like Gemini, the smaller Llama models are designed to be implemented into phones.

This means that Gemini might have some competition down the road if these models start to trade blows. Only time will tell if Meta’s model will be any match for what Google has already established.

The post Meta’s new model can understand images appeared first on Android Headlines.