Tag Archives: Storage

Adding to an Existing Azure Blob

In this post I briefly cover the concept of Storage Accounts and Blob Storage; however, there are more to blobs than this simple use case. In this post, I’ll explore creating a blob file from a text stream, and then adding to that file.

As is stated in the post referenced above, Azure provides a facility for storing files in, what are known as, Azure Blobs.

In order to upload a file to a blob, you need a storage account, and a container. Setting these up is a relatively straightforward process and, again, is covered in the post above.

Our application here will take the form of a simple console app that will prompt the user for some text, and then add it to the file in Azure.

Set-up

Once you’ve set-up your console app, you’ll need the Azure NuGet Storage package.

Also, add the connection string to your storage account into the app.config:

<connectionStrings>
    <add name="Storage" connectionString="DefaultEndpointsProtocol=https;AccountName=testblob;AccountKey=wibble/dslkdsjdljdsoicj/rkDL7Ocs+aBuq3hpUnUQ==;EndpointSuffix=core.windows.net"/>
</connectionStrings>

Here’s the basic code for the console app:

static void Main(string[] args)
{
    Console.Write("Please enter text to add to the blob: ");
    string text = Console.ReadLine();
 
    UploadNewText(text);
 
    Console.WriteLine("Done");
    Console.ReadLine();
}

I’ll bet you’re glad I posted that, otherwise you’d have been totally lost. The following snippets are possible implementations of the method UploadNewText().

Uploading to BlockBlob

The following code will upload a file to a blob container:

string connection = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
string fileName = "test.txt";
string containerString = "mycontainer";
 
using (MemoryStream stream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(stream))
{
    sw.Write(text);
    sw.Flush();
    stream.Position = 0;
 
    CloudStorageAccount storage = CloudStorageAccount.Parse(connection);
    CloudBlobClient client = storage.CreateCloudBlobClient();
    CloudBlobContainer container = client.GetContainerReference(containerString);
    CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
    blob.UploadFromStream(stream);
}

(note that the name of the container in this code is case sensitive)

If we have a look at the storage account, a text file has, indeed been created:

New Blob

But, what if we want to add to that? Well, running the same code again will work, but it will replace the existing file. To prove that, I’ve changed the text to “Test data 2” and run it again:

Test Data

So, how do we update the file? Given that we can update it, one possibility is to download the existing file, add to it and upload it again; that would look something like this:

string connection = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
string fileName = "test.txt";
string containerString = "mycontainer";
 
CloudStorageAccount storage = CloudStorageAccount.Parse(connection);
CloudBlobClient client = storage.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerString);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
 
using (MemoryStream stream = new MemoryStream())
{
    blob.DownloadToStream(stream);
 
    using (StreamWriter sw = new StreamWriter(stream))
    {
        sw.Write(text);
        sw.Flush();
        stream.Position = 0;
 
        blob.UploadFromStream(stream);
    }
}

This obviously means two round trips to the server, which isn’t the best thing in the world. Another possible option is to use the Append Blob…

Azure Append Blob Storage

There is a blob type that allows you to add to it without actually touching it; for example:

string connection = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
string fileName = "testAppend.txt";
string containerString = "mycontainer";
 
CloudStorageAccount storage = CloudStorageAccount.Parse(connection);
CloudBlobClient client = storage.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerString);
CloudAppendBlob blob = container.GetAppendBlobReference(fileName);
if (!blob.Exists()) blob.CreateOrReplace();
 
using (MemoryStream stream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(stream))
{
    sw.Write("Test data 4");
    sw.Flush();
    stream.Position = 0;
 
    blob.AppendFromStream(stream);                
}

There are a few things to note here:

  • The reason that I changed the name of the blob is that you can’t append to a BlockBlob (at least not using an AppendBlob); so it has to have been created for the purpose of appending.
  • While UploadFromStream will just create the file if it doesn’t exist, with the AppendBlob, you need to do it explicitly.

PutBlock

The final alternative here is to use PutBlock. This can bridge the gap, by allowing the addition of blocks into an existing block blob. However, you either need to maintain the Block ID list manually, or download the existing block list; here’s an example of creating, or adding to a file using the PutBlock method:

string connection = ConfigurationManager.ConnectionStrings["Storage"].ConnectionString;
string fileName = "test4.txt";
string containerString = "mycontainer";
 
CloudStorageAccount storage = CloudStorageAccount.Parse(connection);
CloudBlobClient client = storage.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference(containerString);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
 
ShowBlobBlockList(blob);
 
using (MemoryStream stream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(stream))
{
    sw.Write(text);
    sw.Flush();
    stream.Position = 0;
 
    double seconds = (DateTime.Now - new DateTime(2000, 1, 1)).TotalSeconds;
    string blockId = Convert.ToBase64String(
        ASCIIEncoding.ASCII.GetBytes(seconds.ToString()));
 
    Console.WriteLine(blockId);
    //string blockHash = GetMD5HashFromStream(bytes);                
 
    List<string> newList = new List<string>();
    if (blob.Exists())
    {
        IEnumerable<ListBlockItem> blockList = blob.DownloadBlockList();
 
        newList.AddRange(blockList.Select(a => a.Name));
    }
 
    newList.Add(blockId);
 
    blob.PutBlock(blockId, stream, null);
    blob.PutBlockList(newList.ToArray());
}

The code above owes a lot to the advice given on this Stack Overflow question.

In order to avoid conflicts in the Block Ids, I’ve used a count of seconds since an arbitrary date. Obviously, this won’t work in all cases. Further, it’s worth noting that the code above still does two trips to the server (it has to download the block list).

The commented MD5 hash allows you to provide some form of check on the data being valid, should you choose to use it.

What is ShowBlobBlockList(blob)?

The following function will give some details relating to the existing blocks (it is shamelessly plagiarised from here):

public static void ShowBlobBlockList(CloudBlockBlob blockBlob)
{
    if (!blockBlob.Exists()) return;
 
    IEnumerable<ListBlockItem> blockList = blockBlob.DownloadBlockList(BlockListingFilter.All);
    int index = 0;
    foreach (ListBlockItem blockListItem in blockList)
    {
        index++;
        Console.WriteLine("Block# {0}, BlockID: {1}, Size: {2}, Committed: {3}",
            index, blockListItem.Name, blockListItem.Length, blockListItem.Committed);
    }
}

Summary

Despite being an established technology, these methods and techniques are sparsely documented on the web. Obviously, there are Microsoft docs, and they are helpful, but, unfortunately, not exhaustive.

References

https://stackoverflow.com/questions/33088964/append-to-azure-append-blob-using-appendtextasync-results-in-missing-data

https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs–append-blobs–and-page-blobs

http://www.c-sharpcorner.com/UploadFile/40e97e/windows-azure-blockblob-putblock-method/

https://docs.microsoft.com/is-is/rest/api/storageservices/put-block

https://www.red-gate.com/simple-talk/cloud/platform-as-a-service/azure-blob-storage-part-4-uploading-large-blobs/

https://stackoverflow.com/questions/46368954/can-putblock-be-used-to-append-to-an-existing-blockblob-in-azure

Creating a Basic Azure Web Job

In this article, I discussed the use of Azure functions; however, Web Jobs perform a similar task. Azure Functions are effectively an abstraction on top of Web Jobs – meaning that, while you have more control when using Web Jobs, there’s a little more to do when writing them.

This article covers the basics of Web Jobs, and has a walk-through for creating a very simple task using one.

Create a new Web Job

Once you create this project, you’ll need to fill in the following values in the app.config:

<configuration>
  <connectionStrings>
    <!-- The format of the connection string is "DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY" -->
    <!-- For local execution, the value can be set either in this config file or through environment variables -->
    <add name="AzureWebJobsDashboard" connectionString="" />
    <add name="AzureWebJobsStorage" connectionString="" />
  </connectionStrings>

These can both be the same value, but they refer to where Azure stores it’s data.

AzureWebJobsDashboard

This is the storage account used to store logs.

AzureWebJobsStorage

This is the storage account used to store whatever the application needs to function (for example: queues or tables). In the example below, it’s where the file will go.

Storage accounts can be set-up from the Azure dashboard (more on this later):

A Basic Application

For this example, let’s take a file from a blob storage and parse it, then write out the result in a log. Specifically, we’ll take an XML file, and write the number of nodes into a log; here’s the file:

<test>
    <myNode>
    </myNode>
    <myNode>
    </myNode>
</test>

I think we’ll probably be looking for a figure around 2.

Blob Storage

Before we can do anything with blob storage, we’ll need a new storage area; create a new storage account:

Set the storage kind to “General Storage” (because we’re working with files); other than that, go with your gut.

Uploading

Once you’ve created the account, you’ll need to add a file – otherwise nothing will happen. You can do this in the web portal, or you can do it via a desktop utility that Microsoft provide: Storage Explorer.

I kind of expected this to take me to the web page mentioned… but it doesn’t! You have to navigate there manually:

http://storageexplorer.com

Install it… unless you want to upload your file using the web portal… in which case: don’t.

We can create a new container:

Now, we can see the storage account and any containers:

Now, you can upload a file from here (remember that you can do all this inside the Portal):

Once you’ve created this, go back and update the storage connection string (described above). You may also want to repeat the process for a dashboard storage area (or, as stated above, they can be the same).

Programmatically Downloading

Now we have a file in the directory, it can be downloaded via the WebJob; here’s a function that will download a file:

        public static async Task<string> GetFileContents(string connectionString, string containerString, string fileName)
        {
            CloudStorageAccount storage = CloudStorageAccount.Parse(connectionString);
            CloudBlobClient client = storage.CreateCloudBlobClient();
            CloudBlobContainer container = client.GetContainerReference(containerString);
            CloudBlob blob = container.GetBlobReference(fileName);

            MemoryStream ms = new MemoryStream();
            await blob.DownloadToStreamAsync(ms);
            ms.Position = 0;

            StreamReader sr = new StreamReader(ms);
            string contents = sr.ReadToEnd();
            return contents;
        }

The code to call this is here (note the commented out commands from the default WebJob Template):

        static void Main()
        {
            Console.WriteLine("Starting");

            var config = new JobHostConfiguration();

            if (config.IsDevelopment)
            {
                config.UseDevelopmentSettings();
            }

            //var host = new JobHost();

            string fileContents = AzureHelpers.GetFileContents(config.StorageConnectionString, "testblob", "test.xml").Result;
            Console.WriteLine(fileContents);

            // The following code ensures that the WebJob will be running continuously
            //host.RunAndBlock();

            Console.WriteLine("Done");
        }

Although this works (sort of – it doesn’t check for new files, and it would need to be run on a scheduled basis – “On Demand” in Azure terms), you don’t need it (at least not for jobs that react to files being uploaded to storage containers). WebJobs provide this functionality out of the box! There are a number of decorators that you can use for various purposes:

  • string
  • TextReader
  • Stream
  • ICloudBlob
  • CloudBlockBlob
  • CloudPageBlob
  • CloudBlobContainer
  • CloudBlobDirectory
  • IEnumerable<CloudBlockBlob>
  • IEnumerable<CloudPageBlob>

Here, we’ll use a BlobTrigger and accept a string. Moreover, doing it this way makes the writing to the log much easier, as there’s injection of sorts (at least I’m assuming that’s what it’s doing). Here’s what the complete solution looks like in the new paradigm:

        public static void ProcessFile([BlobTrigger("testblob/{name}")] string fileContents, TextWriter log)
        {            
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.LoadXml(fileContents);            
            log.WriteLine($"Node count: {xmlDoc.FirstChild.ChildNodes.Count}");
        }

The key thing to notice here is that the function is static and public (the class it’s in needs to be public, too – even is that’s the Program class). The WebJob framework uses reflection to work out which functions it needs to run.

The other point to note is that I’m getting the parameter as a string – the article above details what you could have it as; for example, if you wanted to delete it afterwards, you’d probably want to use an ICloudBlob or something similar.

Anyway, it works:

The log file

Remember the storage area that we specified for the dashboard earlier? You should now see some new containers created in that storage area:

This has created a number of directories, but the one that we’re interested in is “output-logs” in the “azure-webjobs-hosts” container:

And here’s the log itself:

References

https://docs.microsoft.com/en-us/azure/app-service-web/web-sites-create-web-jobs

https://stackoverflow.com/questions/36610952/azure-webjobs-vs-azure-functions-how-to-choose

https://stackoverflow.com/questions/27580264/where-do-i-get-the-azurewebjobsdashboard-connection-string-information

http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx

https://stackoverflow.com/questions/24286214/where-are-azure-webjobs-blobinput-and-bloboutput-classes

https://docs.microsoft.com/en-us/azure/app-service-web/websites-dotnet-webjobs-sdk-storage-blobs-how-to