What is AWS AsyncInf:ml?

Last updated: April 10, 2025

Solution for handling inference requests by queuing and processing them asynchronously. This option is ideal for use cases involving large data payloads or models with lengthy processing times that do not require immediate response speeds. Pricing is based on the selected instance type.

Available Regions The prefix represents the region.
Line Item Region
APN2-AsyncInf:ml.m4.xlarge Asia Pacific region in Seoul, South Korea
USE1-AsyncInf:ml.c4.xlarge US East region in Northern Virginia
USE1-AsyncInf:ml.g4dn.xlarge US East region in Northern Virginia
USW2-AsyncInf:ml.c5.2xlarge US West region in Oregon