If you're using Cloudflare Workers, combine Hyperdrive and Neon for 10x query speed – Learn more

Scale your AI application with Neon

Scale your AI application with Neon's Autoscaling and Read Replica features

You can scale your AI application built on Postgres with pgvector in the same way you would any Postgres app: Vertically with added CPU, RAM, and storage, or horizontally with read replicas.

In Neon, scaling vertically is a matter of selecting the desired compute size. Neon supports compute sizes ranging from .025 vCPU with 1 GB RAM up to 56 vCPU with 224 GB RAM. Autoscaling is supported up to 16 vCPU. Larger computes are fixed size computes (no autoscaling). The maintenance_work_mem values shown below are approximate.

Compute Units (CU)vCPURAMmaintenance_work_mem
0.250.251 GB64 MB
0.500.502 GB64 MB
114 GB67 MB
228 GB134 MB
3312 GB201 MB
4416 GB268 MB
5520 GB335 MB
6624 GB402 MB
7728 GB470 MB
8832 GB537 MB
9936 GB604 MB
101040 GB671 MB
111144 GB738 MB
121248 GB805 MB
131352 GB872 MB
141456 GB939 MB
151560 GB1007 MB
161664 GB1074 MB
181872 GB1208 MB
202080 GB1342 MB
222288 GB1476 MB
242496 GB1610 MB
2626104 GB1744 MB
2828112 GB1878 MB
3030120 GB2012 MB
3232128 GB2146 MB
3434136 GB2280 MB
3636144 GB2414 MB
3838152 GB2548 MB
4040160 GB2682 MB
4242168 GB2816 MB
4444176 GB2950 MB
4646184 GB3084 MB
4848192 GB3218 MB
5050200 GB3352 MB
5252208 GB3486 MB
5454216 GB3620 MB
5656224 GB3754 MB

See Edit a compute to learn how to configure your compute size. Available compute sizes differ according to your Neon plan. The Neon Free Plan supports computes starting at 0.25 CU, up to 2 CU with autoscaling enabled. The Launch plan offers compute sizes up to 4 CU. Larger computes are available on the Scale and Business plans. See Neon plans.

To optimize pgvector index build time, you can increase the maintenance_work_mem setting for the current session beyond the preconfigured default shown in the table above with a command similar to this:

SET maintenance_work_mem='10 GB';

The recommended maintenance_work_mem setting is your working set size (the size of your tuples for vector index creation). However, your maintenance_work_mem setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, the maintenance_work_mem='10 GB' setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GB is less than 50% of the RAM available for that compute size.

Autoscaling

You can also enable Neon's autoscaling feature for automatic scaling of compute resources (vCPU and RAM). Neon's Autoscaling feature automatically scales up compute on demand in response to application workload and down to zero on inactivity.

For example, if your AI application experiences heavy load during certain hours of the day or at different times throughout the week, month, or calendar year, Neon automatically scales compute resources without manual intervention according to the compute size boundaries that you configure. This enables you to handle peak demand while avoiding consuming compute resources during periods of low activity.

Enabling autoscaling is also recommended for initial data loads and memory-intensive index builds to ensure sufficient compute resources for this phase of your AI application setup.

To learn more about Neon's autoscaling feature and how to enable it, refer to our Autoscaling guide.

Storage

Neon's data storage allowances differ by plan. The Free plan offers 512 MB of storage. The Launch, Scale, and Business plans support larger data sizes and purchasing additional units of storage. See Neon plans.

Read replicas

Neon supports read replicas, which are independent read-only computes designed to perform read operations on the same data as your primary read-write compute. Read replicas do not replicate data across database instances. Instead, read requests are directed to the same data source. This architecture enables read replicas to be created instantly, enabling you to scale out CPU and RAM, but because data is read from a single source, there are no additional storage costs.

Since vector similarity search is a read-only workload, you can leverage read replicas to offload reads from your primary read-write compute to a dedicated compute when deploying AI applications. After you create a read replica, you can simply swap out your current Neon connecting string for the read replica connection string, which makes deploying a read replica for your AI application very simple.

Neon's read replicas support the same compute sizes outlined above. Read replicas also support autoscaling.

To learn more about the Neon read replicas, see read replicas and refer to our Working with Neon read replicas guide.

Last updated on

Was this page helpful?