Tuesday, 5 February 2013

Parallelism in .NET – PLINQ in C#



Most .NET developers today are familiar with LINQ, the technology that brought functional programming ideas into the object-oriented environment. Parallel LINQ, or ‘PLINQ’, takes LINQ to the next level by adding intuitive parallel capabilities onto an already powerful framework.
PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available.
Using PLINQ is almost exactly like using LINQ-to-Objects and LINQ-to-XML. You can use any of the operators available through C# 3.0 syntax or the System.Linq.Enumerable class, including OrderBy, Join, Select, Where, and so on.
LINQ-to-SQL and LINQ-to-Entities queries will still be executed by the respective databases and query providers, so PLINQ does not offer a way to parallelize those queries. If you wish to process the results of those queries in memory, including joining the output of many heterogeneous queries, then PLINQ can be quite useful.

 

Using the AsParallel() Method
The AsParallel method is the doorway to PLINQ. It converts data sequence into a ParallelQuery. The LINQ engine detects the use of a ParallelQuery as the source in a query and switches to PLINQ execution automatically. You are likely to use the AsParallel method every time you use PLINQ.
Sample code 1: Sequential LINQ execution
var customers = new[] {
      new Customer { ID = 1,  FirstName = "Sandeep"  , LastName = "Ramani" },
      new Customer { ID = 2,  FirstName = "Dharmik"  , LastName = "Chotaliya" },
      new Customer { ID = 3,  FirstName = "Nisar"    ,  LastName = "Kalia" } ,
      new Customer { ID = 4,  FirstName = "Ravi"     , LastName = "Mapara" } ,
      new Customer { ID = 5,  FirstName = "Hardik"   , LastName = "Mistry" }
      new Customer { ID = 6,  FirstName = "Sandy"    , LastName = "Ramani" },
      new Customer { ID = 7,  FirstName = "Jigar"    , LastName = "Shah" },
      new Customer { ID = 8,  FirstName = "Kaushal"  , LastName = "Parik" } ,
      new Customer { ID = 9,  FirstName = "Abhishek" , LastName = "Swarnker" } ,
      new Customer { ID = 10, FirstName = "Sanket"   , LastName = "Patel" }
      new Customer { ID = 11, FirstName = "Dinesh"   , LastName = "Prajapati" },
      new Customer { ID = 12, FirstName = "Jayesh"   , LastName = "Patel" },
      new Customer { ID = 13, FirstName = "Nimesh"   , LastName = "Mishra" } ,
      new Customer { ID = 14, FirstName = "Shiva"    , LastName = "Reddy" } ,
      new Customer { ID = 15, FirstName = "Jasmin"   , LastName = "Malviya" }
      new Customer { ID = 16, FirstName = "Haresh"   , LastName = "Bhanderi" },
      new Customer { ID = 17, FirstName = "Ankit"    , LastName = "Ramani" },
      new Customer { ID = 18, FirstName = "Sanket"   , LastName = "Shah" } ,
      new Customer { ID = 19, FirstName = "Amit"     , LastName = "Shah" } ,
      new Customer { ID = 20, FirstName = "Nilesh"   , LastName = "Soni" }       };

var results = from c in customers
            where c.FirstName.StartsWith("San")
            select c;

Sample code 2: Parallel LINQ execution
var customers = new[] {
      new Customer { ID = 1,  FirstName = "Sandeep"  , LastName = "Ramani" },
      new Customer { ID = 2,  FirstName = "Dharmik"  , LastName = "Chotaliya" },
      new Customer { ID = 3,  FirstName = "Nisar"    ,  LastName = "Kalia" } ,
      new Customer { ID = 4,  FirstName = "Ravi"     , LastName = "Mapara" } ,
      new Customer { ID = 5,  FirstName = "Hardik"   , LastName = "Mistry" }
      new Customer { ID = 6,  FirstName = "Sandy"    , LastName = "Ramani" },
      new Customer { ID = 7,  FirstName = "Jigar"    , LastName = "Shah" },
      new Customer { ID = 8,  FirstName = "Kaushal"  , LastName = "Parik" } ,
      new Customer { ID = 9,  FirstName = "Abhishek" , LastName = "Swarnker" } ,
      new Customer { ID = 10, FirstName = "Sanket"   , LastName = "Patel" }
      new Customer { ID = 11, FirstName = "Dinesh"   , LastName = "Prajapati" },
      new Customer { ID = 12, FirstName = "Jayesh"   , LastName = "Patel" },
      new Customer { ID = 13, FirstName = "Nimesh"   , LastName = "Mishra" } ,
      new Customer { ID = 14, FirstName = "Shiva"    , LastName = "Reddy" } ,
      new Customer { ID = 15, FirstName = "Jasmin"   , LastName = "Malviya" }
      new Customer { ID = 16, FirstName = "Haresh"   , LastName = "Bhanderi" },
      new Customer { ID = 17, FirstName = "Ankit"    , LastName = "Ramani" },
      new Customer { ID = 18, FirstName = "Sanket"   , LastName = "Shah" } ,
      new Customer { ID = 19, FirstName = "Amit"     , LastName = "Shah" } ,
      new Customer { ID = 20, FirstName = "Nilesh"   , LastName = "Soni" }       };

var results = from c in customers.AsParallel()
            where c.FirstName.StartsWith("San")
            select c;

With the simple addition of the AsParallel() extension method, the .NET runtime will automatically parallelize the operation across multiple cores. In fact, PLINQ will take full responsibility for partitioning your data into multiple chunks that can be processed in parallel.
When you will run the above sample queries, you might get the same output but possibly in different order.

Limitations

  1. PLINQ only works against local collections. This means that if you’re using LINQ providers over remote data, such as LINQ to SQL or ADO.NET Entity Framework, then you’re out of luck for this version.
  2. Since PLINQ chunks the collection into multiple partitions and executes them in parallel, the results that you would get from a PLINQ query may not be in the same order as the results that you would get from a serially executed LINQ query.
However, you can work around this by introducing the AsOrdered() method into your query, which will force a specific ordering into your results. Keep in mind, however, that the AsOrdered()  method does incur a performance hit for large collections, which can erase many of the performance gains of parallelizing your query in the first place.
Sample code 3: Preserving the Order of PLINQ Query Results Using the AsOrdered Method
var results = from c in customers.AsParallel().AsOrdered()
            where c.FirstName.StartsWith("San")
            select c;

Controlling Parallelism

1. Forcing Parallel Execution

In some cases, PLINQ may decide that your query is better dealt with sequentially. You can control this by using the WithExecutionMode extension method, which is applied to the ParallelQuery type. The WithExecutionMode method takes a value from the ParallelExecutionMode enumeration. There are two such values: the default (let PLINQ decide what to do) and ForceParallelism (use PLINQ even if the overhead of parallel execution is likely to outweigh the benefits).
var results = from c in customers.AsParallel().WithExecutionMode
                  (ParallelExecutionMode.ForceParallelism)
            where c.FirstName.StartsWith("San")
            select c;

2. Limiting the Degree of Parallelism

You can request that PLINQ limit the number of partitions that are processed simultaneously using the WithDegreeofParallelism extension method, which operates on the ParallelQuery type. This method takes an int argument that states the maximum number of partitions that should be processed at once; this is known as the degree of parallelism. Setting the degree of parallelism doesn’t force PLINQ to use that many. It just sets an upper limit. PLINQ may decide to use fewer than you have specified or, if you have not used the WithExecutionMode method, may decide to execute the query sequentially.
var results = from c in customers.AsParallel().WithDegreeOfParallelism(2)
            where c.FirstName.StartsWith("San")
            select c;

3. Generating and Using a Parallel Sequence

IEnumerable<int> evens
      = ((ParallelQuery<int>) ParallelEnumerable.Range(0, 50000))
            .Where(i => i % 2 == 0)
            .Select(i => i);
The above code uses the Range method to create a sequence of 50,000 integers starting with the zero. The first argument to the method is the start index; the second is the number of values you require. Notice that we have cast the result from the Range method to a ParallelQuery. If we don’t do this, LINQ doesn’t recognize the sequence as supporting parallel execution and will execute the query sequentially.

4. Generating and Using a Repeating Sequence

int sum = ParallelEnumerable.Repeat(1, 50000)
            .Select(i => i)
            .Sum();
The static Repeat method takes an object and a count and creates a sequence where the object is repeated the specified number of times.

No comments:

Post a Comment