September 20, 2024

MediaBizNet

Complete Australian News World

Zuckerberg teases Meta’s latest AI video vision with Nvidia CEO Jensen Huang

Zuckerberg teases Meta’s latest AI video vision with Nvidia CEO Jensen Huang

Meta has had notable success over the past year with Segment Anything, a machine-learning model that can quickly and reliably recognize and identify almost anything in an image. The second installment, which CEO Mark Zuckerberg first introduced on stage Monday at SIGGRAPH, brings the model to video, showing how quickly the field is moving.

Segmentation is the technical term for when a vision model looks at an image and picks out parts: “This is a dog, and this is a tree behind the dog,” hopefully, not “This is a tree growing from a dog.” This has been happening for decades, but recently it’s gotten much better and faster, with Segment Anything taking a big step forward.

Part Two of Anything (SA2) This is a natural follow-up in that it applies natively to video and not just still images; although you can, of course, run the first model on each frame of video individually, it’s not the most efficient workflow.

“Scientists use these things to study coral reefs and habitats and things like that,” Zuckerberg said in a conversation with Nvidia CEO Jensen Huang. “But being able to do that in video and really capture it and tell it what you want is really cool.”

Video processing, of course, requires more computation, which is a testament to the progress made across the industry in how efficiently SA2 can operate without bringing down a data center. Of course, this model is still large and requires powerful hardware to operate, but fast, flexible sharding would have been virtually impossible even a year ago.

Image rights: Meta

The model, like the first one, will be open and free to use, and there’s no word on a hosted version, something AI companies sometimes offer. But there is a free trial.

READ  12-core processor is 20% faster in multi-threading, and 40% faster Radeon 890M GPU performance compared to 8945HS

Naturally, such a model would require a massive amount of data to train, and Meta has also released a large, annotated database of 50,000 videos that it created specifically for this purpose. In the paper describing SA2, another database of over 100,000 “internally available” videos was also used for training, and this one is not publicly available — I’ve asked Meta for more information about what it is and why it hasn’t been made public. (We believe it’s drawn from public Instagram and Facebook profiles.)

Examples of labeled training data.
Image rights: Meta

Meta has been a leader in “open” AI for several years, though in fact (as Zuckerberg noted in the conversation) it has been doing it for a long time, using tools like PyTorch. But more recently, LLaMa, Segment Anything, and a few other models it has released for free have become a relatively accessible barrier to AI performance in those areas, though their “openness” is a matter of debate.

Zuckerberg noted that the openness isn’t entirely out of the goodness of their hearts at Meta, but that doesn’t mean their intentions aren’t pure:

“This isn’t just a piece of software that you can build — you need an ecosystem around it. This software wouldn’t work nearly as well if we didn’t open source it, right? We’re not doing this because we’re altruistic, although I think that would be good for the ecosystem — we’re doing it because we think it will make the thing we’re building better.”

It will certainly be put to good use, however. Visit GitHub here.