What's New on Cloudflare's Developer Platform (April)

AutoRAG gets smarter, KV goes bulk, Python Workers cron support, and Queues throughput upgrade

May 03, 2025

Welcome to a recap of everything announced in April on the Cloudflare developer platform, which will exclude Developer Week, as I covered that extensively here.

It can be hard to keep up to date on everything shipped, so I’m hoping this bite-sized post can keep you in the know, and of course I’ll add my own commentary about the changes too.

I’m still working out exactly what topics and cadence I’ll share on FlaredUp, but I’d like to post a new piece fortnightly at the very least. There will be a variety of pieces, including recaps of the latest developments, case studies, deep dives on Cloudflare technologies, as well as tutorials. Some pieces, like this one, will be relatively small, but others - such as the Developer Week post shared above - will be a lot larger.

Ultimately, I’m doing this in my spare time, so it will always be best-effort, but I’m hoping it can become the source for high quality content on the Cloudflare developer platform.

That’s enough about the Substack, let’s dive into the latest developments.

🔎 Metadata Filter Support Comes to AutoRAG

For those who didn’t catch Developer Week, AutoRAG was one of the new products announced. I dived into a lot of detail about it here, but in short it’s a service that manages every aspect required to build a RAG pipeline.

Every might be a stretch considering it’s in beta, and there are some features that it’s lacking, but if you’re looking for a starting point to build a RAG-based solution, AutoRAG is going to do the heavy lifting for you.

Plus, Cloudflare has a track record of releasing early in beta, and rapidly iterating and adding features, as we are about to see. I have no doubts that more features will come, and it will truly cover every aspect required for RAG pipelines.

To summarise AutoRAG, you upload all the information you want to be able to retrieve during RAG to an R2 bucket, create a new AutoRAG that links to that R2 bucket, and Cloudflare handles the rest. It will retrieve the content from R2, parse it, chunk it, generate embeddings, store and index them in Vectorize, and provide a simple interface (SDK/API) to then retrieve content at runtime.

When AutoRAG was released, one key feature it was missing was the ability to filter for certain information before calculating similarity scores. Not being able to filter meant you needed to create multiple AutoRAGs to segregate data, but that would become quickly untenable. This would be particularly true in a multi-tenant application - as you’d need to continually add AutoRAG bindings to your Worker, and that’s just not going to scale.

With the addition of filtering, you’re able to filter by folder and timestamp. This allows you to group related information into folders within R2 - or designate a folder per tenant - and then specify the folder you wish to use at runtime, or only include records after a given timestamp for example.

This is a good first step for filtering, but I think it really needs multi-folder support (at present, you can only filter on a single folder, and it doesn’t include subfolders). It does however unlock multi-tenancy, and that has been a hotly-requested feature, so kudos to Cloudflare for adding it so quickly after release.

Additionally, being able to tag a single document with multiple pieces of metadata seems pretty vital too - otherwise you might end up duplicating information all over the place just to adequately filter. I’ve worked with AWS Bedrock Knowledge Bases extensively, and they allow you to provide a metadata file alongside each document, which works pretty well. I’d like to see something similar here to make the filtering a little more advanced and more easily configurable.

Having said that, it’s exceptionally easy to implement AutoRAG. I managed to rip out so much code from one of my applications, which involves RAG, and simply replace it with AutoRAG in a matter of minutes.

📚 Workers KV now has Support for Bulk Reads

For the longest time, you could only read one key at a time from Workers KV. This meant that, should you need to read multiple keys, you’d need to parallelise them yourself - or do them sequentially and take the performance hit.

If you’re not familiar with Workers KV, it’s one of the oldest Cloudflare developer products and provides a globally-distributed key-value store. In terms of its interface, you can think of it in the same way you would Redis. That is where the similarities end though, because unlike Redis, which stores data in-memory, Workers KV runs off of disk.

That naturally makes it slower, but it comes with some benefits. Workers KV is replicated globally automatically, versus Redis you’re going to need to configure read replicas and likely manage that yourself.

Getting into the weeds of how Workers KV operates is a little beyond this post, but I’d recommend reading the product page if you’re interested. It’s designed for high-read use cases, and with KV being replicated globally, that makes the reads incredibly quick still, as your data will be cached close to end users.

Writes will always take a little longer, as those need to always go to the central data stores. It’s also worth noting KV operates an eventually consistent model, so it is possible to write-then-read (from another part of the world from the write) and read an older value.

Back to the latest changes, and it’s now possible to read up to 100 keys from KV in a single request. Considering you could parallelise the requests in the past, why is this a big deal?

Because Workers have a built-in limit of 6 simultaneous outbound connections, meaning if you need to read more than 6 keys, you’ll effectively only be able to read 6 in parallel and any requests beyond that will be queued. That’s going to make pulling a high number of keys incredibly slow, even if the keys are cached close to where your request originates.

This problem goes away with bulk reads though, as you can read 100 keys with a single outbound request, removing the need to do any form of parallelisation yourself (unless you’re reading a lot of keys) and getting around the outbound connection limit.

The SDK update is really clean to enable this, as it’s as simple as passing an array of keys to a get request rather than a single string:

const keys = ["key-a", "key-b", "key-c", ...]
const values = await env.KV_NAMESPACE.get(keys);

This is one of those really nice quality of life updates that often get overlooked, but help to move the platform forward and remove blockers to adopting the platform.

Enjoying the article and want to get a head start developing with Cloudflare? I’ve published a book, Serverless Apps on Cloudflare, that introduces you to all of the platform’s key offerings by guiding you through building a series of applications on Cloudflare.

Buy now: eBook | Paperback

⏱️ Python Workers can now Utilise Cron Triggers

If you need to run tasks on a schedule with Cloudflare, the easiest way is to use a cron trigger. You could use Workflows or Durable Objects too, but for simple, recurring tasks, a cron trigger works pretty nicely.

For as long as I can remember, cron triggers have been supported on JavaScript-based Cloudflare Workers. Until recently, JavaScript (and TypeScript) were the only native languages supported for Cloudflare Workers.

However, last year, Cloudflare added Python as a native option for writing Workers. Python Workers are slowly catching up to the capabilities available for their JS counterparts, and the latest addition is the ability to execute cron triggers with Python.

It’s worth noting, you can also write Workers in any language that supports WebAssembly too!

As with most things on the Cloudflare developer platform, the developer experience (DX) for implementing a cron trigger couldn’t be simpler. There’s no faffing about with crontabs, you simply configure the cron’s schedule in your Worker’s wrangler.toml:

[triggers]
crons = [ "*/3 * * * *", "*/15 * * * *" ]

And then add a function to your Python Worker that will be called whenever a cron schedule occurs:

from workers import handler

@handler
async def on_scheduled(event, env, ctx):
    if event.cron == "*/3 * * * *":
        await do_something_3()
    elif event.cron == "*/15 * * * *":
        await do_something_15()

    print("cron processed")

It’s as simple as that. The on_scheduled function will be called on the schedule you set, and you can execute whatever code you need, with access to the bindings your Worker has too.

If the Python code above is invalid, blame ChatGPT. I took the JS code from one of my projects and had it convert it. Besides that code snippet, and as always, every word in this post is written by hand with no AI involvement.

📈 Increased limits for Queues pull consumers

This one is pretty straightforward, so I won’t spend too much on it, but if you’re using, or look at using, Cloudflare Queues from your existing infrastructure, your available throughput just got a significant increase.

Cloudflare Queues are exactly what they sound like: message queues. You publish a message to a queue, and then a consumer can pull messages off the queue and process them as needed. As with everything on the developer platform, they are serverless, so you just create a queue and Cloudflare handles the rest.

The most common approach to consume from a queue is using a Worker. Much like the cron triggers above, you simply add a queue function to your Worker, bind a queue to that Worker, and Cloudflare will invoke that method whenever messages are available.

The throughput for Worker-based consumers has been 5,000 messages/second per queue for a while now, but there is a second way you can consume messages from a queue - HTTP-based pull consumers. Rather than deploying a Worker, you could use your existing infrastructure to poll the queue, using Cloudflare’s REST API for messages.

Previously, pull consumers were rate limited to 1,200 requests / 5 minutes across all queues - significantly lower than the throughput available for a Worker consumer. With this change though, pull-based consumers now have the same throughput available as Workers, at 5,000 messages/second per queue.

Wrapping Up

That wraps up everything in April outside of Developer Week. As you can imagine, there was a huge focus on Developer Week drops, so it’s unsurprising the rest of April was relatively quiet.

Even so, the additions above are still meaningful and push the platform forward, and this is what happens every single month.

I’ll try to do one of these every two weeks, but they may push out to a monthly cadence depending on how much content is available, and what other content I’m working on.