Monday, May 9, 2011

Goodbye Inverse Control Flow, Hello Async!!!

In this post I will talk about new Async features in .NET Framework (still in CTP with SP1 Refresh mode as of May 2011). These features add new syntax and implicit control structures by compiler to seamlessly convert sequential code into asynchronous one. Let us see in detail what are the problems we currently have with asynchronous method invocations, handling control flow and exceptions, and also how new features can help us (e.g. programmers, developers, coders and etc.) manage our asynchronous code in much more elegant way.

In essence using new Async features will help reduce complexity of asynchronous code by introducing new syntax. This syntax is simple enough and make asynchronous code look almost like sequential code. Under the hood compiler converts these new structures into implicit closures and returns a continuation delegate using class Task or Task<T>. The caller in turn receives partial results right away in the form of instance of Task, Task<T>, which holds continuation reference, or void (in case of “fire and forget”). Caller is also able to monitor task progress as well as get results once task is completed. Once asynchronous process is completed a continuation delegate is called which populates final result of type T (in case of Task<T>) and an event on Task is raised, which allows callers to react appropriately. Errors are raised within the context of async method and handled just as any other errors are handled in sequential code.

Most of this is compiler generated and there is no more a need to wire events, closures, continuations and error handling manually. What in the world is this and will I ever have a need for this?

When will it make sense?

It depends how much parallelism your application needs/requires. With a broad adoption of multi-core and recently many-core processors people expect applications to be faster and more responsive; until applications know how to run code asynchronously there isn’t going to be much advantage in using multi-core processors by one application, besides many blocking threads are computationally expensive.

For example, if there is a need or a way to request N (where N>1) downloads from remote resource and network delay in accessing each resource as delta_t1..delta_tn then sequentially written code will have to wait at least Sum(delta_ti)(for all i=1..n) and on top of that actual download time. In case of asynchronous code the delay will be Max(delta_ti)(where i=1..n) + overhead_time. The overhead_time is the time it takes to manage different threads and effectively execute them in parallel, it also depends on the availability of resources (for multiple intensive IO threads it will just add too much overhead for context switching) and grows somewhat proportionately to the number of threads. Therefore it makes sense to use asynchrony when Sum(delta_ti) > (Max(delta_ti) + overhead_time).

Therefore processes such as web requests, data requests from a server, intensive computations are all good candidates for running in parallel or at least off of the main or UI threads. For IOs it will be when they are running separate from UIs or main threads. As I mentioned running multiple intensive IOs over a single shared resource can increase overhead rather rapidly. But in case of independent resources running IOs on different threads will actually improve performance.

So the rule of thumb. Can the infrastructure effectively support parallelism?
- If yes, then processing request over such infrastructure asynchronously will be beneficial.
- If the answer is no, then there is a chance of overhead for context switching between different threads will be greater.

Current State of Asynchrony

Currently .NET 4.0 Framework implements Task and Task<T>. MSDN has great examples and design patterns using this class for asynchronous method calls see here, here and here. I mostly use .NET 3.5 and thus have to have more plumbing.

Let’s look at how we can execute asynchronous task without Task and Async features and the problems we run into while doing it. Here is the main idea:

some method

// do some prelim work

BackgroundWorker worker = new BackgroundWorker(); // some kind of way to spin-off tasks in the form of delegates(e.g. function pointers)

// need to run long process, but want to free up UI

Action myDelegate = delegate
{

     do.this(); 
     do.that(); 
     do_for_very_long.this(); 

};

EventHandler workerCompleted = (s, ea) =>
{

     // unhook event handlers
    worker.Completed -= workerCompleted; 
    worker.Cancelled -= workerCancelled; 
     …
     // handle results

};

EventHandler workerCancelled = (s, ea) =>
{

     // unhook event handlers
    worker.Completed -= workerCompleted; 
    worker.Cancelled -= workerCancelled; 
     …
     // handle errors

};

    worker.Completed += workerCompleted; 
    worker.Cancelled += workerCancelled; 
    worker.RunAsync(myDelegate);

end of some method

What you see above is inverse flow of control, closures, lambda functions, continuations. But this is typically what happens when there is a need to execute asynchronous process.

1. We create local variables on the main thread, which are part of continuation state for spin-off threads.

Here is a definition from Wikipedia:

“In computer science and programming, a continuation is an abstract representation of the control state. A continuation reifies an instance of a computational process at a given point in the process's execution. It contains information such as the process's current stack (including all data whose lifetime is within the process e.g. "local variables"), as well the process's point in the computation. Such an instance can then be later resumed upon invocation. The "current continuation" or "continuation of the computation step" is the continuation that, from the perspective of running code, would be derived from the current point in a program's execution.

The term continuations can also be used to refer to first-class continuations, which are constructs that give a programming language the ability to save the execution state at any point and return to that point at a later point in the program.”

In other words Async does continuations complitely behind the scenes while here we have to wire up events to listen for a callback when async method completes.

2. We define the execution steps for the asynchronous work, but do not run it yet. This may or may not use closure. Closure is a way of passing continuation state. So if we were to reference in this function variables which were previously defined on our main thread then compiler will automatically preserve the references for the time when function execution will occur, subject to GC rules. Most commonly closures are used with lambda functions or anonymous function delegates, while these are different concepts.

The other way is to explicitly create a named function with arguments and pass all data via these arguments. Sometimes this could be a daunting task.

Here is a definition for closures:

“In computer science, a closure is a function together with a referencing environment for the nonlocal names (free variables) of that function. Such a function is said to be "closed over" its free variables. The referencing environment binds the nonlocal names to the corresponding variables in scope at the time the closure is created, additionally extending their lifetime to at least as long as the lifetime of the closure itself.

Closures are used to implement continuation passing style, and in this manner, hide state. Constructs such as objects and control structures can thus be implemented with closures. In some languages, a closure may occur when a function is defined within another function, and the inner function refers to local variables of the outer function. At runtime, when the outer function executes, a closure is formed, consisting of the inner function’s code and references to any variables of the outer function required by the closure; such variables are called the upvalues of the closure.

The term closure is often mistakenly used to mean anonymous function. This is probably because most languages implementing anonymous functions allow them to form closures and programmers are usually introduced to both concepts at the same time. These are, however, distinct concepts. A closure retains a reference to the environment at the time it was created (for example, to the current value of a local variable in the enclosing scope) while a generic anonymous function need not do this.”

and lambda functions:

“In computing, an anonymous function (also function constant or function literal) is a function (or a subroutine) defined, and possibly called, without being bound to an identifier. Anonymous functions are convenient to pass as an argument to a higher-order function and are ubiquitous in languages with first-class functions.

Anonymous functions originate in the work of Alonzo Church in his invention of the lambda calculus in 1936 (prior to electronic computers), in which all functions are anonymous. The Y combinator can be utilised in these circumstances to provide anonymous recursion, which Church used to show that some mathematical questions are unsolvable by computation. (Note: this result was disputed at the time, and later Alan Turing - who became Church's student - provided a proof that was more generally accepted.)

Anonymous functions have been a feature of programming languages since Lisp in 1958. An increasing number of modern programming languages support anonymous functions, and some notable mainstream languages have recently added support for them, the most widespread being JavaScript also C# and PHP support anonymous functions. Anonymous functions were added to the C++ language as of C++0x

3. We define two event handlers to handle successful completion and exceptional completion.

These are defined using lambda functions and they do use closures to unhook event handlers after event is triggered.

Still no execution occurred yet and therefore up to this point everything is running blazing fast.

4. We are registering for events by hooking up our error handlers.

5. We spin-off asynchronous thread and immediately return. This is where execution actually starts but function call has already completed. Now control flow goes through the function(s) we defined earlier in our method and, once completed, will either call Completed handler or Cancelled handler if there are errors. This way of executing is called Inversion of Control Flow.

“In computer programming, Inversion of control (IoC) is an abstract principle describing an aspect of some software architecture designs in which the flow of control of a system is inverted in comparison to procedural programming.

In traditional programming the flow of the business logic is controlled by a central piece of code, which calls reusable subroutines that perform specific functions. Using Inversion of Control this "central control" design principle is abandoned. The caller's code deals with the program's execution order, but the business knowledge is encapsulated by the called subroutines.”

The problem with “Inversion of Control” is that it is not linear and adds a lot of noise (plumbing) to business logic. It is very easy to introduce bugs using this style of programming.

What a mess, huh? So how is it different with Async?

If you read “Async Whitepaper” you will notice a somewhat similar example. But let us walk use our example and see how we can benefit from Async.

First we need to change our method declaration to use keyword async and a return type Task<T>, that is if we are planning to return some results of type T.

private async Task<string> MyFunctionAsync()
{

string result = String.Empty;
Action<string> myDelegate = delegate
{

     do.this(); 
     do.that(); 
     do_for_very_long.this();  
     return “success”;

};

try
{   

   result = await Task.Run(myDelegate); // notice AWAIT keyword. This is new syntax!!!

   // the rest of the code will be wrapped into continuation, 
   // and called only when the above finished executing. 
   // while our MyFunctionAsync will return immediately with Task<string>, 
   // but string will be empty until this part is finished.

   // handle when success

   …

}
catch
{

   // handle when failure
    result = “failure”;

}
return result;

}

This code looks a lot simpler and less error prone, while executes almost exactly the same as the one we described earlier!

In conclusion. We looked at the mechanisms used to implement new Async functionality in C# Async CTP libraries. Talked about when it is appropriate to use asynchronous programming model. How it is currently being done and the number of problems associated with current asynchronous programming style. At the end we took a look at the new features of Async library (it will be part of C# 5.0 compiler later on) and a much more simplified programming model. We pointed to original whitepaper for Async library in hopes that readers will refer to it as well. In the process we have discussed some computer science concepts such as closures, lambda functions, continuations and inverse of control.

I hope some of the readers will find this information useful.

Happy coding!

No comments:

Post a Comment