Segment Anything

What is Segment Anything?

Segment Anything is a Meta AI research project that introduces a new AI model called Segment Anything Model (SAM), which can "cut out" any object in an image with a single click. SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training.

Feature of Segment Anything

Segment Anything Model (SAM) has several features, including:

Promptable design that enables flexible integration with other systems
Extensible outputs that can be used as inputs to other AI systems
Zero-shot generalization to unfamiliar objects and images without requiring additional training
Ability to take input prompts from other systems, such as object detectors or user gaze from an AR/VR headset

How to use Segment Anything

You can try the demo of Segment Anything to see how it works. SAM uses a variety of input prompts, including points, boxes, and text, to segment objects in an image. You can also use SAM to automatically segment everything in an image or generate multiple valid masks for ambiguous prompts.

Price of Segment Anything

The price of Segment Anything is not explicitly stated, but it is a research project by Meta AI, and the model is open-sourced on GitHub.

Helpful Tips for Segment Anything

SAM can be used for a wide range of segmentation tasks without the need for additional training
The model can be integrated with other AI systems to enable more complex tasks, such as object tracking in videos or image editing applications
The dataset used to train SAM includes over 1.1 billion segmentation masks collected on ~11 million licensed and privacy-preserving images

Frequently Asked Questions about Segment Anything

What type of prompts are supported by SAM?
- Foreground/background points, bounding box, mask, and text prompts
What is the structure of the SAM model?
- A ViT-H image encoder, a prompt encoder, and a lightweight transformer-based mask decoder
What platforms does the model use?
- PyTorch and ONNX runtime
How big is the model?
- The image encoder has 632M parameters, and the prompt encoder and mask decoder have 4M parameters
How long does inference take?
- The image encoder takes ~0.15 seconds on an NVIDIA A100 GPU, and the prompt encoder and mask decoder take ~50ms on CPU in the browser using multithreaded SIMD execution

Segment Anything | Meta AI

Meta AI Computer Vision Research