- Title: Auto-Batching
# Auto-Batching
# 1. Summary
Meilisearch can automatically group consecutive asynchronous documentAddition
or documentPartial
tasks for the same index via an automatic batching mechanism.
The user can disable this auto-batching behavior. See 3.2. Auto-batching mechanisms options section.
# 2. Motivation
We have regularly collected user pain points pointing out the slow indexing over the last year. We explained several times to users to make batches containing a maximum of documents to be updated/added to compress the indexing time of specific data structures.
To make Meilisearch easier to use, we explored the idea of automatically creating these batches within Meilisearch before indexing users’ documents.
# 3. Functional Specification
# 3.1. Explanations
A batch preserves the logical order of the tasks for a given index.
Only consecutive documentAdditionOrUpdate
tasks for the same index can be in the same batch. All tasks
concerning other operations will also be part of a batch having only one task.
# 3.1.1. Grouping tasks to a single batch
The scheduling program that groups tasks within a single batch is triggered when an asynchronous task
currently processed reaches a terminal state as succeeded
or failed
.
In other words, when a scheduled documentAdditionOrUpdate
task for a given index is picked from the task queue, the scheduler fetches and groups all documentAdditionOrUpdate
tasks for that same index in a batch.
The more similar consecutive tasks the user sends in a row, the more likely the batching mechanism can group these tasks.
# 3.1.1.1. Schema
# 3.1.1.2. batchUid
generation
All tasks are part of a batch identified by an internal batchUid
field. A task batch preserves the logical order of the tasks for a given index. The batch identifiers are unique and strictly increasing. The batchUid
field is internal; thus not visible on a task
resource.
# 3.1.2. Impacts on task
API resource
- The different tasks grouped in a batch are processed within the same transaction. But if a task fails within a batch, the whole batch does not fail, only the related task.
- Tasks within the same batch share the same values for the
startedAt
,finishedAt
,duration
fields, and the sameerror
object if an error occurs for atask
during the batch processing. - If a batch contains many
tasks
, thetask
details
indexedDocuments
is identical in alltasks
belonging to the same processedbatch
.
# 4. Technical Aspects
N/A
# 5. Future Possibilities
- Extends it for all consecutive payload types.
- Expose the
batchUid
field and add a filter capability on it on the/tasks
endpoints. - Report the documents that could not be indexed to the user in a more precise manner.
- Optimize some tasks sequence, for example if there is a document addition followed by an index deletion, we could skip the document addition.