Method for batch deleting or updating in Entity Framework Core 5

This article describes a library that enable Entity Framework Core 5 users to batch update and delete data within a single roundtrip without pre-loading data, shows its principle, and compares it with other approaches.

Ⅰ. Background

With Microsoft fully embracing open source, the.NET open source community has been booming, and many excellent open source projects have emerged, such as Dapper, SqlSugar, PetaPoco, FreeSQL and so on. Entity Framework Core (EF Core), the ORM Framework officially provided by Microsoft, is getting the most attention. EF Core is fully-fledged and feature-rich, but one of the things that EF Core has been criticized for is that it does not provide batch updating and batch deleting. Batch updating and deleting data in EF Core involves loading the data into memory, then manipulating the data, and finally SaveChanges(). For example, the following code is used to increase prices of books, which are with an Id greater than 2 or with “zack” in the AuthorName, by 3.

Let’s check the SQL statements executed behind the scenes in the above program:

As can be seen, EF Core first queries the data with Select, then modifies them one by one in memory, and then updates each of the modified objects with Update statements one by one.

For another example, the following code is used to delete records with price greater than 5 dollars:

Let’s check the SQL statements executed behind the scenes in the above program:

As can be seen, EF Core first queries the data with Select and then executes Delete statements for each record to Delete.

Obviously, if the data to updated or deleted is large, the performance will be very low.

Therefore, we need a way to delete or update data with high-performance using a single SQL statement in EF Core.

Ⅱ Why doesn’t Microsoft provide such an approach

Despite strong user demand, Microsoft has not provided an efficient approach to batch delete and update. In the issues page of EF Core Github [1], Microsoft said: doing that can lead to EF Core object state tracking chaos, such as int the same DbContext, if a developer uses the method to delete records, then use the deleted records after deletion, the system will go into chaos. Therefore, Microsoft thought it is needed to refactor EF Core code, It would be a lot of work.

As a mature framework, it is necessary and understandable to consider these logical issues to avoid potential risks. But as real developers, we have ways around these problems. In a typical Web application, for example, the deletion occurs on a single Http request, so it doesn’t involve any of Microsoft’s concerns. Even in scenarios where the data is queried before it is deleted through the same DbContext, it is entirely possible to work around this problem by fetching them again after the deletion.

According to the response from Microsoft to that issue on Github, Microsoft is considering providing the functionality, but it’s just a “consideration”, not a sure thing. We can’t wait, so we have to figure it out ourselves.

There are three existing solutions:

  1. Executing native SQL statements. EF Core provides ctx.Database.ExecuteSqlRaw() to execute the native SQL statements, so we can directly write the Delete and Update SQL statement to delete or update data. This method is straightforward, but the disadvantage is that the method of directly operating the data table in the code does not conform to the idea of model-driven and isolation. The programmer faces the database table directly and cannot take advantage of the strongly-typed EF Core. If the model changes, SQL statements must be changed manually. Moreover, some DBMS-specific syntax and functions are used, once the program is moved to another DBMS, some of the SQL statement should be rewritten. And it cannot take advantage of EF Core’s powerful SQL translation mechanism to isolate the differences between different underlying databases.
  2. Using other ORMs. FreeSQL and other ORM provides batch delete, update method, the use is also very simple. The downside of this approach is that third-party ORMs must be introduced into the project and EF Core code cannot be reused.
  3. Using the existing EF Core extension. Open source libraries such as EF Plus and EFcore.BulkExtensions provide methods of bulk operation on EF Core. The core mechanism of this is to get the generated SQL statement from EF Core and SelectExpression. Since there is no public API for getting the SQL statement corresponding to a LINQ operation in previous versions of EF Core 5.0, these open source libraries obtain the SQL statement corresponding to LINQ and SelectExpression by accessing private members of classes in the EF Core framework [2]. Because accessing private members is not object-oriented, the code can fail if the EF Core framework code changes, it happens. Moreover, at the time of this writing, these open source libraries are not yet compatible with.NET 5.

Ⅳ Zack.EFCore.Batch

I have developed an extension library of Entity Framework Core that allows developers to delete or update data in Entity Framework Core with a single SQL. Since the Entity Framework Core 5 API was used in development, this library requires Entity Framework Core 5 and above(.NET 5 and above).

Here’s how to use it:

Step one: Install the package through Nuget:

Step two: Add the following code into the OnConfiguring() of your DbContext:

Step Three: Use the extension method DeleteRangeAsync() of DbContext to batch delete, the parameter of DeleteRangeAsync() is the filter condition.

Sample code of bulk deletion is as follows:

The above code will execute the following SQL statement in the database:

The DeleteRange() method is a synchronous version of DeleteRangeAsync() method.

Create a BatchUpdateBuilder object using the extension method BatchUpdate() of DbContext.

BatchUpdateBuilder has the four following methods:

  1. The Set() method is used to assign a value to an property. The first parameter is a lambda expression for the property name, and the second parameter is a lambda expression for the value.
  2. Where() is the filter condition
  3. ExecuteAsync() is used to execute BatchUpdateBuilder asynchronously.
  4. Execute() is the synchronous method version of ExecuteAsync().

Sample code:

The above code will execute the following SQL statement in the SQLServer database:

This library uses EF Core to translate lambda expressions into SQL statements, so almost all of the lambda expressions supported by EF Core are supported.

GitHub of the project: https://github.com/yangzhongke/Zack.EFCore.Batch

Ⅴ Principle analysis

To be honest, it’s not difficult to convert lambda expressions to SQL statement, because the expression tree can be translated into SQL statements. However, the hardest part is to translate. Net function to SQL fragment, because the same. Net function equivalent SQL fragment in different DBMS is different. It is tough for me to do this, so I think it is a good idea to use EF Core to implement the translation.

Unfortunately, before .Net Core 3.x, it was impossible to directly fetch a translated SQL statement from a Linq query. Translated SQL statements are available on .NET Core through logging , but these are only available after Linq is executed, and the SQL is not available immediatel. According to .Net Core development team, before.net Core 3.x, there was also no publicly available API for translating expression trees into SQL fragments.

From .NET 5, Entity Framework Core provides a method that is available to retrieve the SQL statement from a Linq query without executing it, which can be done by the ToQueryString() method of IQueryable [3].

So I wanted to do this with the SQL statement I got from the ToQueryString() method. I can use segments of Lambda expressions, the filter expression spliced into a query expressions, and then call ToQueryString () method to get the translated SQL statements, and then write a lexical analyzer and syntax analyzer to analyze SQL statements, extract the Where clause as well as the expression fragment in the Select columns, then put these pieces back together into the Update and Delete SQL statements.

However, since the syntax of different DBMSS makes writing such a lexical and parser cumbersome, I wondered if I could explore the implementation of ToQueryString() and then take the SQL fragments directly from the parsing process, thus avoiding the need to generate SQL and then parse them again.

Although EF Core is open source, there is no comprehensive documentation of EF Core, and the code of EF Core is very complex, so studying the source code of EF Core is time-consuming. During the research process, I wanted to give up several times and finally realized the function. By developing this library, I also had a very thorough understanding of the internal principles of EF Core, especially the whole process of translation from Lambda expressions to SQL translation. I’m not going to go through the process here, but I’m going to go through the principles of EF Core, and then I’m going to go through the implementation of my library.

1. SQL translation principle of EF Core

EF Core has a lot of services, such as QueryTranslationPreprocessor, RelationalParameterBasedSqlProcessor, QuerySqlGenerator etc. These services are generally created by IXXX Factory, such as corresponding IQueryTranslationPreprocessorFactory QueryTranslationPreprocessor, corresponding IQuerySqlGeneratorFactory QuerySqlGenerator. The instances of these factory classes can be obtained from dbContext through dbContext.GetService<XXX>(). Of course, there are also services that do not require creation from factories, such as the IQueryCompiler, which can be directly accessed via ctx. GetService<IQueryCompiler>().

So, if you want to use any of the other services in EF Core, try querying the corresponding service interface type or factory type in GetService().

EF Core also allows us to call the DbContextOptionsBuilder’s ReplaceService() method to replace the default service in EF Core with a customized class.

Code at EF Core that translates an IQueryable object into an SQL statement is distributed over various classes. Through my effort, I issued a code that is ready to run as follows:

Let me briefly explain the code above:

queryable presents an instance of IQueryable to be translated, and ctx is the DbContext object. SelectExpression is the root of strongly-typed abstract syntax tree(AST) translated from Linq expression. The GetCommand() of QuerySqlGenerator is used to visit the SelectExpression to build the SQL statement.

GetCommand() of QuerySqlGenerator will call VisitSelect(SelectExpression selectExpression) to build the SQL statement. VisitSqlBinary(SqlBinaryExpression sqlBinaryExpression), VisitFromSql(FromSqlExpression fromSqlExpression), and VisitLike(LikeExpression likeExpression) will be used to translate binary expression, From expression, Like expression into SQL statements respectively.

Because some functions and implementations in different DBMSS are different, and SelectExpression, LikeExpression, etc. are all abstract nodes and abstract models independent of specific DBMS. Therefore, the EF Provider of each DBMS is only responsible for writing code to translate these XXExpression into their own SQL fragments. Most of the code in the EF Core of different DBMS is a variety of XXTranslatorProviders.

2. Principle of Zack.EFCore.Batch

The core of this library is ZackQuerySqlGenerator, which is a class inherited from QuerySqlGenerator.It overrides the VisitSelect method of the parent class, and then copies all the code for the VisitSelect method of the parent class. The goal is to intercept individual SQL fragments during VisitSelect’s buidling of SQL statements. Take the following code for example:

First, save the SQL statement before appending Where condition into the oldSQL variable, and then calculate a difference between the SQL statement after appending Where condition and oldSQL to get the SQL fragment of Where statement.

Second, use optBuilder.ReplaceService<IQuerySqlGeneratorFactory, ZackQuerySqlGeneratorFactory>(); to replace the default IQuerySqlGeneratorFactory with ZackQuerySqlGeneratorFactory. By doing so, EF Core will then translate from SelectExpression to SQL statement using the ZackQuerySqlGenerator class so that we can intercept the translated SQL fragments.

Let me explain the main code of the BatchUpdateBuilder class for batch updating databases. The main code is to convert the expressions like Age = Age + 1, Name = AuthorName.Trim () into Select (new {b.A ge, ge + 1, b.A b.N ame, b.A uthorName. Trime ()}), thus the N assignment expression rearranged to 2 * N query expressions, then joining them together to form a IQueryable object. Then, ZackQuerySqlGenerator is called to translate the SQL fragment, and the Where SQL fragment and columns SQL fragments can be joined into an UPDATED SQL statement.

Ⅵ Limitations

Zack.EFCore.Batch has following limitations:

  1. Because Zack.EFCore.Batch uses the new API of EFCore 5.0, EFCore 3.x and below are not supported.
  2. Since Zack.EFCore.Batch does do direct operation on the database, after the update and deletion, there will be the inconsistency between the tracking state of the object already queried in the same DbContext and data in the database, which is Microsoft concerned about. In the same DbContext instance, it is recommended to perform the query operation again if you need to operate on the data previously queried in the same DbContex after batch deletion or update.
  3. An internal API QueryCompiler is used in the code, which is not recommended.

Originally published at https://www.reddit.com/.

A happy developer