Meta has had notable success over the past year with Segment Anything, a machine-learning model that can quickly and reliably recognize and identify almost anything in an image. The second installment, which CEO Mark Zuckerberg first introduced on stage Monday at SIGGRAPH, brings the model to video, showing how quickly the field is moving.
Segmentation is the technical term for when a vision model looks at an image and picks out parts: “This is a dog, and this is a tree behind the dog,” hopefully, not “This is a tree growing from a dog.” This has been happening for decades, but recently it’s gotten much better and faster, with Segment Anything taking a big step forward.
Part Two of Anything (SA2) This is a natural follow-up in that it applies natively to video and not just still images; although you can, of course, run the first model on each frame of video individually, it’s not the most efficient workflow.
“Scientists use these things to study coral reefs and habitats and things like that,” Zuckerberg said in a conversation with Nvidia CEO Jensen Huang. “But being able to do that in video and really capture it and tell it what you want is really cool.”
Video processing, of course, requires more computation, which is a testament to the progress made across the industry in how efficiently SA2 can operate without bringing down a data center. Of course, this model is still large and requires powerful hardware to operate, but fast, flexible sharding would have been virtually impossible even a year ago.
The model, like the first one, will be open and free to use, and there’s no word on a hosted version, something AI companies sometimes offer. But there is a free trial.
Naturally, such a model would require a massive amount of data to train, and Meta has also released a large, annotated database of 50,000 videos that it created specifically for this purpose. In the paper describing SA2, another database of over 100,000 “internally available” videos was also used for training, and this one is not publicly available — I’ve asked Meta for more information about what it is and why it hasn’t been made public. (We believe it’s drawn from public Instagram and Facebook profiles.)
Meta has been a leader in “open” AI for several years, though in fact (as Zuckerberg noted in the conversation) it has been doing it for a long time, using tools like PyTorch. But more recently, LLaMa, Segment Anything, and a few other models it has released for free have become a relatively accessible barrier to AI performance in those areas, though their “openness” is a matter of debate.
Zuckerberg noted that the openness isn’t entirely out of the goodness of their hearts at Meta, but that doesn’t mean their intentions aren’t pure:
“This isn’t just a piece of software that you can build — you need an ecosystem around it. This software wouldn’t work nearly as well if we didn’t open source it, right? We’re not doing this because we’re altruistic, although I think that would be good for the ecosystem — we’re doing it because we think it will make the thing we’re building better.”
It will certainly be put to good use, however. Visit GitHub here.
More Stories
It certainly looks like the PS5 Pro will be announced in the next few weeks.
Leaks reveal the alleged PS5 Pro name and design
Apple introduces AI-powered object removal in photos with latest iOS update