Category Archives: Cloud Architecture

Creating a Scheduled Azure Function

I’ve previously written about creating Azure functions. I’ve also written about how to react to service bus queues. In this post, I wanted to cover creating a scheduled function. Basically, these allow you to create a scheduled task that executes at a given interval, or at a given time.

Timer Trigger

The first thing to do is create a function with a type of Timer Trigger:

Schedule / CRON format

The next thing is to understand the schedule, or CRON, format. The format is:

{second} {minute} {hour} {day} {month} {day-of-week}

Scheduled Intervals

The example you’ll see when you create this looks like this:

0 */5 * * * *

The notation */[number] means once every number; so */5 would mean once every 5… and then look at the placeholder to work out 5 what; in this case it means once every 5 minutes. So, for example:

*/10 * * * * *

Would be once every 10 seconds.

Scheduled Times

Specifying numbers means the schedule will execute at that time; so:

0 0 0 * * *

Would execute every time the hour, minute and second all hit zero – so once per day at midnight; and:

0 * * * * *

Would execute every time the second hits zero – so once per minute; and:

0 0 * * * 1

Would execute once per hour on a Monday (as the last placeholder is the day of the week).

Time constraints

These can be specified in any column in the format [lower bound]-[upper bound], and they restrict the timer to a range of values; for example:

0 */20 5-10 * * *

Means every 20 minutes between 5 and 10am (as you can see, the different types can be used in conjunction).

Asterisks (*)

You’ll notice above that there are asterisks in every placeholder that a value has not been specified. The askerisk signifies that the schedule will execute at every interval within that placeholder; for example:

* * * * * *

Means every second; and:

0 * * * * *

Means every minute.

Back to the function

Upon starting, the function will detail when the next several executions will take place:

But what if you don’t know what the schedule will be at compile time. As with many of the variables in an Azure Function, you can simply substitute the value for a placeholder:

[FunctionName("MyFunc")]
public static void Run([TimerTrigger("%schedule%")]TimerInfo myTimer, TraceWriter log)
{
    log.Info($"C# Timer trigger function executed at: {DateTime.Now}");
}

This value can then be provided inside the local.settings.json:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "DefaultEndpointsProto . . .",
    "AzureWebJobsDashboard": "DefaultEndpointsProto . . .",
    "schedule": "0 * * * * *"
  }
}

References

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer

http://cronexpressiondescriptor.azurewebsites.net/?expression=1+*+*+*+*+*&locale=en

Serverless Computing – A Paradigm Shift

In the beginning

When I first started programming (on the Spectrum ZX81), you would typically write a program such as this (apologies if it’s syntactically incorrect, but I don’t have a Speccy interpreter to hand):

10 PRINT "Hello World"
20 GOTO 10

You could type that in to a computer at Tandys and chuckle as the shop assistants tried to work out how to stop the program (sometimes, you might not even use “Hello World”, but something more profane).

However, no matter how long it took them to work out how to stop the program, they only paid for the electricity the computer used while it was on. Further, there will only ever be a finite and, presumably (I never tried), predictable number of “Hello World” messages produced in an hour.

The N-Tier Revolution

Fast forward a few years, and everyone’s talking about N-Tier computing. We’re writing programs that run on servers. Some of those servers are big and expensive, but pretty much the same statement is true. No matter how big and complex your program that you run on the server, it’s your server (well, actually, it probably belongs to a company that you work for in some capacity). For example, if you have a poorly written SQL Server Procedure that scans an entire table, the same two statements are still true: no matter how long it takes to run, the price for execution is consistent, and the amount of time it takes to run is predicable (although, you may decide that if it’s slow, speeding it up might be a better use of your time than calculating exactly how slow).

Using Other People’s Computers

And now we come to cloud computing… or do we? The chronology is a little off on this, and that’s mainly because everyone keeps forgetting what cloud computing actually is. You’re renting time on somebody else’s computer. If I was twenty years older, I might have started this post by saying that “this was what was done back in the 70’s and 80’s in a manner of speaking. But I’m not, so we’ll jump back to the mid to late 90’s: and SETI. Anyone who had a computer back in those days of dial-up connections and 14.4K modems will remember that SETI (search for extra-terrestrial intelligence) were distributing a program for everyone to run on their computer instead of toaster screensavers*.

Wait – isn’t that what cloud computing is?

Yes – but SETI did it in reverse. You would dial-up, download a chunk of data and your machine would process it as a screensaver.

Everyone was happy to do that for free: because we all want to find aliens. But what if there had been a cost associated to each byte of data processed; clearly something similar was in the mind of Amazon when they started with this idea.

Back to the Paradigm Shift

So, the title, and gist of this post is that the two statements that have been consistent right through from programs written on the ZX Spectrum to programs written in Turbo C and Pascal, to programs written in C# running on a dedicated server, has now changed. So, here are the four big changes brought about by the cloud **:

1. Development and Debugging

If you write a program, you can no-longer be sure of the cost, or the execution time. This isn’t a scare post: both are broadly predictable; but the paradigm has now changed. If you’re in the middle of writing an Azure function, or a GCP BigQuery query, and it doesn’t work, you can’t just close your laptop and go for dinner while you have a think, because while you do, nodes are lighting up all over the world trying to complete your task. The lights are dimming in Seattle while your Azure function crashes again and again.

2. Scale and Accessibility

The second big change is the way that your code is scaled. For example, you might be used to parallelising slow code so that you can make use of all available threads; however, in our new world, if you do that, you may actually be making it harder for your cloud platform of choice to scale your code.

Because you pay per minute of computing time (typically this is more expensive than storage), code that is unnecessarily slow or inefficient may not cause your system to slow down – what it will probably do is to cost you more money to run it at the same speed.

In some cases, you might find it’s more efficient to do the opposite of what you have thus-far believed to be accepted industry best practice. For example, it may prove more cost efficient to de-normalise data that you need to access.

3. Logging

This isn’t exactly new: you should always have been logging in your programs – it’s just common sense. However, there’s a new emphasis here: you’re not running this on your own server (you’re not even running it on a customer’s server) – it’s somebody else’s. That means that if it crashes, there’s a limited amount of investigation that you can do. As a result, you need to log profusely.

4. Microservices and Message Busses

IMHO, there are two reason why the Microservice architecture has become so popular in the cloud world: it’s easy – that is, spinning up a new Azure function endpoint takes minutes.

Secondly, it’s more scaleable. Microservices make your code more scaleable because it’s easy for the cloud provider to instantiate two instances of your program, and then three, and then a million. If your program does one small thing, then only that small thing needs to be instantiated. If your program does twenty different things, it can still scale, but it’ll cost more, because it will require more processing power.

Finally, instead of simply calling the service that you need, there is now the option to place a message on a queue; apart from separating your program into definable responsibility sectors, this means that, when your cloud provider of choice scales your service out, all the instances can pick up a message and deal with it.

Summary

So, what’s the conclusion: is cloud computing good, or bad? In five years time, the idea that someone in your organisation will know the spec of the machine your business critical application is running on seems a little far-fetched. The days of having a trusted server that has a load of hugely important stuff on, but nobody really knows what, and has been running since 1973 are numbered.

Obviously, there’s a price to pay for everything. In the case of the cloud, it’s complexity – not of the code, and not really of the combined system, but typically you are introducing dozens of moving parts to a system. If you decide to segregate your database, you might find that you have several databases, tens, or even hundreds of independent processes and endpoints, you could even spread that across multiple cloud providers. It’s not that hard to lose track of where these pieces are living and what they are doing.

Footnotes

* If you don’t get this reference then you’re probably under 30.

** Clearly this is an opinion piece.

Working with Multiple Cloud Providers – Part 3 – Linking Azure and GCP

This is the third and final post in a short series on linking up Azure with GCP (for Christmas). In the first post, I set-up a basic Azure function that updated some data in table storage, and then in the second post, I configured the GCP link from PubSub into BigQuery.

In the post, we’ll square this off by adapting the Azure function to post a message directly to PubSub; then, we’ll call the Azure function with Santa’a data, and watch that appear in BigQuery. At least, that was my plan – but Microsoft had other ideas.

It turns out that Azure functions have a dependency on Newtonsoft Json 9.0.1, and the GCP client libraries require 10+. So instead of being a 10 minute job on Boxing day to link the two, it turned into a mammoth task. Obviously, I spent the first few hours searching for a way around this – surely other people have faced this, and there’s a redirect, setting, or way of banging the keyboard that makes it work? Turns out not.

The next idea was to experiment with contacting the Google server directly, as is described here. Unfortunately, you still need the Auth libraries.

Finally, I swapped out the function for a WebJob. WebJobs give you a little move flexibility, and have no hard dependencies. So, on with the show (albeit a little more involved than expected).

WebJob

In this post I described how to create a basic WebJob. Here, we’re going to do something similar. In our case, we’re going to listen for an Azure Service Bus Message, and then update the Azure Storage table (as described in the previous post), and call out to GCP to publish a message to PubSub.

Handling a Service Bus Message

We weren’t originally going to take this approach, but I found that WebJobs play much nicer with a Service Bus message, than with trying to get them to fire on a specific endpoint. In terms of scaleability, adding a queue in the middle can only be a good thing. We’ll square off the contactable endpoint at the end with a function that will simply convert the endpoint to a message on the queue. Here’s what the WebJob Program looks like:

public static void ProcessQueueMessage(
    [ServiceBusTrigger("localsantaqueue")] string message,
    TextWriter log,
    [Table("Delivery")] ICollector<TableItem> outputTable)
{
    Console.WriteLine("test");
 
    log.WriteLine(message);
 
    // parse query parameter
    TableItem item = Newtonsoft.Json.JsonConvert.DeserializeObject<TableItem>(message);
    if (string.IsNullOrWhiteSpace(item.PartitionKey)) item.PartitionKey = item.childName.First().ToString();
    if (string.IsNullOrWhiteSpace(item.RowKey)) item.RowKey = item.childName;
 
    outputTable.Add(item);
 
    GCPHelper.AddMessageToPubSub(item).GetAwaiter().GetResult();
    
    log.WriteLine("DeliveryComplete Finished");
 
}

Effectively, this is the same logic as the function (obviously, we now have the GCPHelper, and we’ll come to that in a minute. First, here’s the code for the TableItem model:


[JsonObject(MemberSerialization.OptIn)]
public class TableItem : TableEntity
{
    [JsonProperty]
    public string childName { get; set; }
 
    [JsonProperty]
    public string present { get; set; }
}

As you can see, we need to decorate the members with specific serialisation instructions. The reason being that this model is being used by both GCP (which only needs what you see on the screen) and Azure (which needs the inherited properties).

GCPHelper

As described here, you’ll need to install the client package for GCP into the Azure Function App that we created in post one of this series (referenced above):

Install-Package Google.Cloud.PubSub.V1 -Pre

Here’s the helper code that I mentioned:

public static class GCPHelper
{
    public static async Task AddMessageToPubSub(TableItem toSend)
    {
        string jsonMsg = Newtonsoft.Json.JsonConvert.SerializeObject(toSend);
        
        Environment.SetEnvironmentVariable(
            "GOOGLE_APPLICATION_CREDENTIALS",
            Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test-Project-8d8d83hs4hd.json"));
        GrpcEnvironment.SetLogger(new ConsoleLogger());

        PublisherClient publisher = PublisherClient.Create();
        string projectId = "test-project-123456";
        TopicName topicName = new TopicName(projectId, "test");
        SimplePublisher simplePublisher = 
            await SimplePublisher.CreateAsync(topicName);
        string messageId = 
            await simplePublisher.PublishAsync(jsonMsg);
        await simplePublisher.ShutdownAsync(TimeSpan.FromSeconds(15));
    }
 
}

I detailed in this post how to create a credentials file; you’ll need to do that to allow the WebJob to be authorised. The Json file referenced above was created using that process.

Azure Config

You’ll need to create an Azure message queue (I’ve called mine localsantaqueue):

I would also download the Service Bus Explorer (I’ll be using it later for testing).

GCP Config

We already have a DataFlow, a PubSub Topic and a BigQuery Database, so GCP should require no further configuration; except to ensure the permissions are correct.

The Service Account user (which I give more details of here needs to have PubSub permissions. For now, we’ll make them an editor, although in this instance, they probably only need publish:

Test

We can do a quick test using the Service Bus Explorer and publish a message to the queue:

The ultimate test is that we can then see this in the BigQuery Table:

Lastly, the Function

This won’t be a completely function free post. The last step is to create a function that adds a message to the queue:

[FunctionName("Function1")]
public static HttpResponseMessage Run(
    [HttpTrigger(AuthorizationLevel.Function, "post")]HttpRequestMessage req,             
    TraceWriter log,
    [ServiceBus("localsantaqueue")] ICollector<string> queue)
{
    log.Info("C# HTTP trigger function processed a request.");
    var parameters = req.GetQueryNameValuePairs();
    string childName = parameters.First(a => a.Key == "childName").Value;
    string present = parameters.First(a => a.Key == "present").Value;
    string json = "{{ 'childName': '{childName}', 'present': '{present}' }} ";            
    queue.Add(json);
    

    return req.CreateResponse(HttpStatusCode.OK);
}

So now we have an endpoint for our imaginary Xamarin app to call into.

Summary

Both GCP and Azure are relatively immature platforms for this kind of interaction. The GCP client libraries seem to be missing functionality (and GCP is still heavily weighted away from .Net). The Azure libraries (especially functions) seem to be in a pickle, too – with strange dependencies that makes it very difficult to communicate outside of Azure. As a result, this task (which should have taken an hour or so) took a great deal of time, and it was completely unnecessary.

Having said that, it is clearly possible to link the two systems, if a little long-winded.

References

https://blog.falafel.com/rest-google-cloud-pubsub-with-oauth/

https://github.com/Azure/azure-functions-vs-build-sdk/issues/107

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus

https://stackoverflow.com/questions/48092003/adding-to-a-queue-using-an-azure-function-in-c-sharp/48092276#48092276

Working with Multiple Cloud Providers – Part 2 – Getting Data Into BigQuery

In this post, I described how we might attempt to help Santa and his delivery drivers to deliver presents to every child in the world, using the combined power of Google and Microsoft.

In this, the second part of the series (there will be one more), I’m going to describe how we might set-up a GCP pipeline that feeds that data into BigQuery (Google’s BigData NoSQL warehouse offering). We’ll first set up BigQuery, then the PubSub topic, and finally, we’ll set-up the dataflow, ready for Part 3, which will be joining the two systems together.

BigQuery

Once you navigate to the BigQuery section of the GCP console, you’ll be able to create a Dataset:

You can now set-up a new table. As this is an illustration, we’ll keep it as simple as possible, but you can see that this might be much more complex:

One thing to bear in mind about BigQuery, and cloud data storage in general is that, often, it makes sense to de-normalise your data – storage is often much cheaper than CPU time.

PubSub

Now we have somewhere to put the data; we could simply have the Azure function write the data into BigQuery. However, we might then run into problems if the data flow suddenly spiked. For this reason, Google recommends the use of PubSub as a shock absorber.

Let’s create a PubSub topic. I’ve written in more detail on this here:

DataFlow

The last piece of the jigsaw is Dataflow. Dataflow can be used for much more complex tasks than to simply take data from one place and put it in another, but in this case, that’s all we need. Before we can set-up a new dataflow job, we’ll need to create a storage bucket:

We’ll create the bucket as Regional for now:

Remember that the bucket name must be unique (so no-one can ever pick pcm-data-flow-bucket again!)

Now, we’ll move onto the DataFlow itself. We get a number of dataflow templates out of the box; and we’ll use one of those. Let’s launch dataflow from the console:

Here we create a new Dataflow job:

We’ll pick “PubSub to BigQuery”:

You’ll then get asked for the name of the topic (which was created earlier) and the storage bucket (again, created earlier); you’re form should look broadly like this when you’re done:

I strongly recommend specifying a maximum number of workers, at least while you’re testing.

Testing

Finally, we’ll test it. PubSub allows you to publish a message:

Next, visit the Dataflow to see what’s happening:

Looks interesting! Finally, in BigQuery, we can see the data:

Summary

We now have the two separate cloud systems functioning independently. Step three will be to join them together.

Working with Multiple Cloud Providers – Part 1 – Azure Function

Regular readers (if there are such things to this blog) may have noticed that I’ve recently been writing a lot about two main cloud providers. I won’t link to all the articles, but if you’re interested, a quick search for either Azure or Google Cloud Platform will yield several results.

Since it’s Christmas, I thought I’d do something a bit different and try to combine them. This isn’t completely frivolous; both have advantages and disadvantages: GCP is very geared towards big data, whereas the Azure Service Fabric provides a lot of functionality that might fit well with a much smaller LOB app.

So, what if we had the following scenario:

Santa has to deliver presents to every child in the world in one night. Santa is only one man* and Google tells me there are 1.9B children in the world, so he contracts out a series of delivery drivers. There needs to be around 79M deliveries every hour, let’s assume that each delivery driver can work 24 hours**. Each driver can deliver, say 100 deliveries per hour, that means we need around 790,000 drivers. Every delivery driver has an app that links to their depot; recording deliveries, schedules, etc.

That would be a good app to write in, say, Xamarin, and maybe have an Azure service running it; here’s the obligatory box diagram:

The service might talk to the service bus, might control stock, send e-mails, all kinds of LOB jobs. Now, I’m not saying for a second that Azure can’t cope with this, but what if we suddenly want all of these instances to feed metrics into a single data store. There’s 190*** countries in the world; if each has a depot, then there’s ~416K messages / hour going into each Azure service. But there’s 79M / hour going into a single DB. Because it’s Christmas, let assume that Azure can’t cope with this, or let’s say that GCP is a little cheaper at this scale; or that we have some Hadoop jobs that we’d like to use on the data. In theory, we can link these systems; which might look something like this:

So, we have multiple instances of the Azure architecture, and they all feed into a single GCP service.

Disclaimer

At no point during this post will I attempt to publish 79M records / hour to GCP BigQuery. Neither will any Xamarin code be written or demonstrated – you have to use your imagination for that bit.

Proof of Concept

Given the disclaimer I’ve just made, calling this a proof of concept seems a little disingenuous; but let’s imagine that we know that the volumes aren’t a problem and concentrate on how to link these together.

Azure Service

Let’s start with the Azure Service. We’ll create an Azure function that accepts a HTTP message, updates a DB and then posts a message to Google PubSub.

Storage

For the purpose of this post, let’s store our individual instance data in Azure Table Storage. I might come back at a later date and work out how and whether it would make sense to use CosmosDB instead.

We’ll set-up a new table called Delivery:

Azure Function

Now we have somewhere to store the data, let’s create an Azure Function App that updates it. In this example, we’ll create a new Function App from VS:

In order to test this locally, change local.settings.json to point to your storage location described above.

And here’s the code to update the table:


    public static class DeliveryComplete
    {
        [FunctionName("DeliveryComplete")]
        public static HttpResponseMessage Run(
            [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)]HttpRequestMessage req, 
            TraceWriter log,            
            [Table("Delivery", Connection = "santa_azure_table_storage")] ICollector<TableItem> outputTable)
        {
            log.Info("C# HTTP trigger function processed a request.");
 
            // parse query parameter
            string childName = req.GetQueryNameValuePairs()
                .FirstOrDefault(q => string.Compare(q.Key, "childName", true) == 0)
                .Value;
 
            string present = req.GetQueryNameValuePairs()
                .FirstOrDefault(q => string.Compare(q.Key, "present", true) == 0)
                .Value;            
 
            var item = new TableItem()
            {
                childName = childName,
                present = present,                
                RowKey = childName,
                PartitionKey = childName.First().ToString()                
            };
 
            outputTable.Add(item);            
 
            return req.CreateResponse(HttpStatusCode.OK);
        }
 
        public class TableItem : TableEntity
        {
            public string childName { get; set; }
            public string present { get; set; }
        }
    }

Testing

There are two ways to test this; the first is to just press F5; that will launch the function as a local service, and you can use PostMan or similar to test it; the alternative is to deploy to the cloud. If you choose the latter, then your local.settings.json will not come with you, so you’ll need to add an app setting:

Remember to save this setting, otherwise, you’ll get an error saying that it can’t find your setting, and you won’t be able to work out why – ask me how I know!

Now, if you run a test …

You should be able to see your table updated (shown here using Storage Explorer):

Summary

We now have a working Azure function that updates a storage table with some basic information. In the next post, we’ll create a GCP service that pipes all this information into BigTable and then link the two systems.

Footnotes

* Remember, all the guys in Santa suits are just helpers.
** That brandy you leave out really hits the spot!
*** I just Googled this – it seems a bit low to me, too.

References

https://docs.microsoft.com/en-us/azure/azure-functions/functions-how-to-use-azure-function-app-settings#manage-app-service-settings

https://anthonychu.ca/post/azure-functions-update-delete-table-storage/

https://stackoverflow.com/questions/44961482/how-to-specify-output-bindings-of-azure-function-from-visual-studio-2017-preview