What is AWS AsyncInf:ml?

Last updated: May 07, 2025

Solution for handling inference requests by queuing and processing them asynchronously. This option is ideal for use cases involving large data payloads or models with lengthy processing times that do not require immediate response speeds. Pricing is based on the selected instance type.

Available Regions
Line Item	Region
APN2-AsyncInf:ml.m4.xlarge	Asia Pacific region in Seoul, South Korea
USE1-AsyncInf:ml.c4.xlarge	US East region in Northern Virginia
USE1-AsyncInf:ml.g4dn.xlarge	US East region in Northern Virginia
USW2-AsyncInf:ml.c5.2xlarge	US West region in Oregon