Monthly Archives: December 2016

Microsoft Cognitive Services – Text Recognition

Recently at DDD North I saw a talk on MS cognitive services. This came back and sparked interest in me while I was looking at some TFS APIs (see later posts for why). However, in this post, I’m basically exploring what can be done with these services.

The Hype

  • Language: can detect the language that you pass
  • Topics: can determine the topic being discussed
  • Key Phrases: key points (which I believe may equate to nouns)
  • Sentiment: whether or not what you are saying is good or bad (I must admit, I don’t really understand that – but we can try some phrases to see what it comes up with)

For some reason that I can’t really understand, topics requires over 100 documents, and so I won’t be getting that to work, as I don’t have a text sample big enough. The examples that they give in marketing seem to relate to people booking and reviewing holidays; and it feels a lot like these services are overly skewed toward that particular purpose.

Set-up

Register here:

https://www.microsoft.com/cognitive-services/

Registration is free (although I believe you need a live account).

cog1

Client

The internal name for this at MS is Project Oxford. You don’t have to install the client libraries (because they are just service calls), but you get some objects and helpers if you do:

cog2

Cognition

The following code is largely plagiarised from the links at the bottom of this page:

Here’s the Main function:

var requestDocs = PopulateDocuments();
Console.WriteLine($"-=Requests=-");
foreach (var eachReq in requestDocs.Documents)
{
    Console.WriteLine($"Id: {eachReq.Id} Text: {eachReq.Text}");
}
Console.WriteLine($"-=End Requests=-");
 
string req = JsonConvert.SerializeObject(requestDocs);
 
MakeRequests(req);
Console.WriteLine("Hit ENTER to exit...");
Console.ReadLine();

PopulateDocuments just fills the RequestDocument collection with some test data:


private static LanguageRequest PopulateDocuments()
{
    LanguageRequest requestText = new Microsoft.ProjectOxford.Text.Language.LanguageRequest();
    requestText.Documents.Add(
        new Microsoft.ProjectOxford.Text.Core.Document()
        { Id = "One", Text = "The quick brown fox jumped over the hedge" });
    requestText.Documents.Add(
        new Microsoft.ProjectOxford.Text.Core.Document()
        { Id = "Two", Text = "March is a green month" });
    requestText.Documents.Add(
        new Microsoft.ProjectOxford.Text.Core.Document()
        { Id = "Three", Text = "When I press enter the program crashes" });
    requestText.Documents.Add(
        new Microsoft.ProjectOxford.Text.Core.Document()
        { Id = "4", Text = "Pressing return - the program crashes" });
    requestText.Documents.Add(
        new Microsoft.ProjectOxford.Text.Core.Document()
        { Id = "5", Text = "Los siento, no hablo Enspanol" });
 
    return requestText;
}

As you can see, I dropped some Spanish in there for the language detection. The MakeRequests method and its dependencies:


static async void MakeRequests(string req)
{
    using (var client = new HttpClient())
    {
        client.BaseAddress = new Uri(BaseUrl);
 
        // Request headers.
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", AccountKey);
        client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
 
        // Request body. Insert your text data here in JSON format.
        byte[] byteData = Encoding.UTF8.GetBytes(req);
 
        // Detect key phrases:
        var uri = "text/analytics/v2.0/keyPhrases";
        string response = await CallEndpoint(client, uri, byteData);
        Console.WriteLine("Key Phrases");
        Console.WriteLine(ParseResponseGeneric(response));
 
        // Detect language:
        var queryString = HttpUtility.ParseQueryString(string.Empty);
        queryString["numberOfLanguagesToDetect"] = NumLanguages.ToString(CultureInfo.InvariantCulture);
        uri = "text/analytics/v2.0/languages?" + queryString;
        response = await CallEndpoint(client, uri, byteData);
        Console.WriteLine("Detect language");
        Console.WriteLine(ParseResponseLanguage(response));
 
        // Detect topic:
        queryString = HttpUtility.ParseQueryString(string.Empty);
        queryString["minimumNumberOfDocuments"] = "1";
        uri = "text/analytics/v2.0/topics?" + queryString;
        response = await CallEndpoint(client, uri, byteData);
        Console.WriteLine("Detect topic");
        Console.WriteLine(ParseResponseGeneric(response));
 
        // Detect sentiment:
        uri = "text/analytics/v2.0/sentiment";
        response = await CallEndpoint(client, uri, byteData);
        Console.WriteLine("Detect sentiment");
        Console.WriteLine(ParseResponseSentiment(response));
    }
}
private static string ParseResponseSentiment(string response)
{
    if (!string.IsNullOrWhiteSpace(response))
    {
        SentimentResponse resp = JsonConvert.DeserializeObject<SentimentResponse>(response);
        string returnVal = string.Empty;
 
        foreach (var doc in resp.Documents)
        {
            returnVal += Environment.NewLine +
                $"Sentiment: {doc.Id}, Score: {doc.Score}";
        }
 
        return returnVal;
    }
 
    return null;
}
 
private static string ParseResponseLanguage(string response)
{
    if (!string.IsNullOrWhiteSpace(response))
    {
        LanguageResponse resp = JsonConvert.DeserializeObject<LanguageResponse>(response);
        string returnVal = string.Empty;
        foreach(var doc in resp.Documents)
        {
            var detectedLanguage = doc.DetectedLanguages.OrderByDescending(l => l.Score).First();
            returnVal += Environment.NewLine +
                $"Id: {doc.Id}, " +
                $"Language: {detectedLanguage.Name}, " +
                $"Score: {detectedLanguage.Score}";
        }
        return returnVal;
    }
 
    return null;
}
 
private static string ParseResponseGeneric(string response)
{            
    if (!string.IsNullOrWhiteSpace(response))
    {
        return Environment.NewLine + response;                
    }
 
    return null;
}

The subscription key is given when you register (in the screen under “Set-up”). Keep an eye on the requests, too: 5000 seems like a lot, but when you’re testing, you might find you get through them faster than you expect.

Here’s the output:

cog3

Evaluation

So, the 5 phrases that I used were:

The quick brown fox jumped over the hedge

This is a basic sentence indicating an action.

The KeyPhrases API decided that the key points here were “hedge” and “quick brown fox”. It didn’t think that “jumped” was key to this sentence.

The Language API successfully worked out that it’s written in English.

The Sentiment API thought that this was a slightly negative statement.

March is a green month

This was a nonsense statement, but in a valid sentence structure.

The KeyPhrases API identified “green month” as being important, but not March.

The Language API successfully worked out that it’s written in English.

The Sentiment API thought this was a very positive statement.

When I press enter the program crashes

Again, a completely valid sentence, and with a view to my idea ultimate idea for this API.

The KeyPhrases API spotted “program crashes”, but not why. I found this interesting because it seems to conflict with the other phrases, which seemed to identify nouns only.

Again, the Language API knew this was English.

The sentiment API identified that this was a negative statement… which I think I agree with.

Pressing return – the program crashes

The idea here was, it’s basically the same sentence as above, but phrased differently.

The KeyPhrases API wasn’t fooled, and returned the same key phrase – this is good.

Still English, according to the Language API.

This is identified as a negative statement again, but oddly, not as negative as the previous one.

Los siento, no hablo Enspanol

I threw in a Spanish phrase because I felt the Language API hadn’t had much of a run.

The KeyPhrase API pulled out “hablo Espanol”, which based on my very rudimentary Spanish, means the opposite of that was said.

It was correctly identified as Spanish by the Language API.

The Sentiment API identified it as the most negative statement. Perhaps because it has the word “sorry” and “no” in it?

References

Sample code:

https://text-analytics-demo.azurewebsites.net/Home/SampleCode

https://elbruno.com/2016/04/13/cognitiveservices-text-analytics-api-new-operation-detect-key-topics-in-documents/

https://mrfoxsql.wordpress.com/2016/09/13/azure-cognitive-services-apis-with-sql-server-2016-clr/

Manually Parsing a JSON String Using JSON.NET

How to manually parse a JSON string using JSON.NET.

Disclaimer

If you jump straight to the references, you will find a very similar set of information, and I strongly encourage people to do so. Additionally, this is probably not the most efficient way to achieve this.

Right, on with the show

Here’s the string that I’ll be parsing, and a little code stolen directly from the link at the bottom to show what it looks like:

static void Main(string[] args)
{
    string json = "{\"documents\":[{\"keyPhrases\":[\"Test new bug\"],\"id\":\"1\"}],\"errors\":[]}";
    JsonTextReader reader = new JsonTextReader(new StringReader(json));
    while (reader.Read())
    {
        if (reader.Value != null)
        {
            Console.WriteLine("Token: {0}, Value: {1}", reader.TokenType, reader.Value);
        }
        else
        {
            Console.WriteLine("Token: {0}", reader.TokenType);
        }
    }
 
    Console.ReadLine();
}

The output for this looks like:

Using this, it’s easier to create a routine to manually parse this. Each object can be tracked by using the Start and EndObject tags. Here’s my unit test to check this works:

[TestMethod]
public void TestJSONParse()
{
    // Arrange
    string json = "{\"documents\":[{\"keyPhrases\":[\"Test new bug\"],\"id\":\"1\"}],\"errors\":[]}";
    // Act
    var result = JsonHelper.ParseResponse(json);
 
    // Assert
    Assert.AreEqual(1, result.Count());
    Assert.AreEqual(1, result.Keys.First());
    string expectedPhrase = result.Values.First().First().ToString();
    Assert.AreEqual("Test new bug", expectedPhrase, false);
}

And here’s the code itself:

/// <summary>
/// Parse the following JSON
/// {"documents":[{"keyPhrases":["Test new bug"],"id":"1"}],"errors":[]}
/// </summary>
/// <param name="response"></param>
/// <returns></returns>
public static Dictionary<int, List<string>> ParseResponse(string response)
{
    Dictionary<int, List<string>> dict = new Dictionary<int, List<string>>();
    object readerValue;
 
    if (!string.IsNullOrWhiteSpace(response))
    {
        JsonTextReader reader = new JsonTextReader(new StringReader(response));
        int? currentValue = null;
        List<string> currentList = null;
 
        while (reader.Read())
        {
            readerValue = reader.Value;
 
            switch (reader.TokenType)
            {
                case JsonToken.PropertyName:                            
                    if (readerValue.ToString() == "id")
                    {
                        reader.Read();
                        currentValue = int.Parse(reader.Value.ToString());                                
                    }
                    else if (readerValue.ToString() == "keyPhrases")
                    {
                        // Do nothing
                    }
                    else if (readerValue.ToString() == "errors")
                    {
                        currentValue = null;
                    }
                    break;
 
                case JsonToken.String:                            
                    currentList.Add(reader.Value.ToString());
                    break;
 
                case JsonToken.StartArray:
                    currentList = new List<string>();
                    break;
 
                case JsonToken.StartObject:
                    currentList = null;
                    currentValue = null;
                    break;
 
                case JsonToken.EndObject:
                    if (currentValue.HasValue)
                    {
                        dict.Add(currentValue.Value, currentList);
                    }
                    break;
            }
        }
    }
 
    return dict;
}

It is messy, and it is error prone, and it would be better done by creating classes and serialising it; however, I’d never attempted to do this manually before, and it’s generally nice to do things the hard way, that way, you can appreciate what you get from these tools.

References

http://www.newtonsoft.com/json/help/html/ReadJsonWithJsonTextReader.htm

Finding Duplicate Values in a Dictionary Using C#

Due to a series of blog posts that I’m writing on TFS and MS Cognitive Services, I came across a requirement to identify duplicate values in a dictionary. For example, imagine you had an actual physical dictionary, and you wanted to find all the words that meant the exact same thing. Here’s the set-up for the test:

Dictionary<int, string> test = new Dictionary<int, string>()
{
    { 1, "one"},
    { 2, "two" },
    { 3, "one" },
    { 4, "three" }
};
DisplayDictionary("Initial Collection", test);

I’m outputting to the console at every stage, so here’s the helper method for that:


private static void DisplayDictionary(string title, Dictionary<int, string> test)
{
    Console.WriteLine(title);
    foreach (var it in test)
    {
        Console.WriteLine($"Key: {it.Key}, Value: {it.Value}");
    }
}

Finding Duplicates

LINQ has a special method for this, it’s Intersect. For flat collections, this works excellently, but no so well for Dictionaries; here was my first attempt:


Dictionary<int, string> intersect = test.Intersect(test)
    .ToDictionary(i => i.Key, i => i.Value);
DisplayDictionary("Intersect", intersect);

As you can see, the intersect doesn’t work very well this time (don’t tell Chuck).

Manual Intersect

The next stage then is to roll your own; a pretty straightforward lambda in the end:


var intersect2 = test.Where(i => test.Any(t => t.Key != i.Key && t.Value == i.Value))
    .ToDictionary(i => i.Key, i => i.Value);
DisplayDictionary("Manual Intersect", intersect2);
 

This works much better.

Change a TFS Work Item Definition

Let’s assume that the built-in TFS standard templates are not sufficient for you. You’re in luck: TFS allows you to create your own custom work item. Let’s imagine that, for some reason, you want a work item type called “Defect”, rather than “Bug”. Here’s the process, based on the “Bug” work item type.

First thing is to open the command prompt in administrator mode, and navigate to a work directory; for example, the “Documents” folder
.
Then export the template work item type, like so (you can export the entire definition of all work items, but it becomes unmanageable):

witadmin exportwitd /collection:http://tlaptop:8080/tfs/DefaultCollection /p:"TFSSandbox" /n:Bug /f:Defect.xml

customwi1

Check your working directory:

customwi2

Now, open the XML file. I used Notepad++, but you can use Notepad or vi, or whatever. To change the work item, change the name attribute; once you’ve done this, you have a new work item type:

customwi3

Now, you can add the fields that you want. I’m adding a field called “pcm.CustomField”:

<?xml version="1.0" encoding="utf-8"?>
<witd:WITD application="Work item type editor" version="1.0" xmlns:witd="http://schemas.microsoft.com/VisualStudio/2008/workitemtracking/typedef">
  <WORKITEMTYPE name="Defect">
    <DESCRIPTION>Customised defect</DESCRIPTION>
    <FIELDS>
      <FIELD name="Iteration Path" refname="System.IterationPath" type="TreePath" reportable="dimension">
        <HELPTEXT>The iteration within which this bug will be fixed</HELPTEXT>
      </FIELD>
	  <FIELD name="New Field" refname="pcm.CustomField" type="Integer" />
      <FIELD name="Iteration ID" refname="System.IterationId" type="Integer" />

The field name attribute (“New Field” in this case) tells TFS how to refer to this field to the user. This is important, because if you forget that you’ve called it “New Field”, you might assume this hasn’t worked and start googling to find out why.

Now you’ve told TFS that you have a field to store; the next step is to add that field to the layout:

    <FORM>
      <Layout HideReadOnlyEmptyFields="true" HideControlBorders="true">
        <Group Margin="(4,0,0,0)">
          <Column PercentWidth="90">
            <Control FieldName="System.Title" Type="FieldControl" ControlFontSize="large" EmptyText="&lt;Enter title here&gt;" />
          </Column>
          <Column PercentWidth="10">
            <Control FieldName="System.ID" Type="FieldControl" ControlFontSize="large" />
          </Column>
        </Group>
     <Group Margin="(10,0,0,0)">
          <Column PercentWidth="60">
            <Control FieldName="pcm.CustomField" Type="FieldControl" Label="Custom Field Here" ControlFontSize="large" EmptyText="&lt;Custom field here&gt;" />
          </Column>		
    </Group>
        <Group Margin="(10,0,0,0)">

As you can see, we can move this about and pretty much show anything we want here. Next, re-import:

witadmin importwitd /collection:http://tlaptop:8080/tfs/DefaultCollection /p:TFSSandbox /f:Defect.xml

customwi5

Using the new Work Item Type

And now we have a new work item type:

customwi6

And the new field is available:

customwi7

Using The Field in a Query

customwi8

References

Tutorial on modifying Work Item States

Tutorial on customising TFS 2013

Detailed definition of the FIELD tag

Turotial on modifying the FIELD tag

Programmatically List Existing Tags Using The TFS API

Having looked into this for some time; I came up with the following method of extracting team project tags. I’m not for a minute suggesting this is the best way of doing this – but it does work. My guess is that it’s not a very scalable solution, as it’s doing a LOT of work.

As it was, I couldn’t find a way to directly query the tags, so instead, I’m going through all the work items, and picking the tags. I couldn’t even find a way to filter the work items that actually have tags; so here’s the query that I ended up with:

private static IEnumerable<string> GetAllDistinctWorkItemTags(string uri, string projectName)
{
    TfsTeamProjectCollection tfs;
 
    tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(uri)); // https://mytfs.visualstudio.com/DefaultCollection
    tfs.Authenticate();
 
    var wis = new WorkItemStore(tfs);
 
    WorkItemCollection workItemCollection = wis.Query(
         " SELECT [System.Tags]" +
         " FROM WorkItems " +
         $" WHERE [System.TeamProject] = '{projectName}' ");                
 
    if (workItemCollection.Count == 0)
        return null;
 
    List<string> tags = new List<string>();
    foreach (WorkItem wi in workItemCollection)
    {
        if (string.IsNullOrWhiteSpace(wi.Tags)) continue;
 
        var splitTags = wi.Tags.Split(';');
        tags.AddRange(splitTags.ToList());                
    }
 
    return tags.Distinct();
}

From debugging, I strongly suspect that whatever you put in the “SELECT”, it returns the entire work item. I also, for the life of me, couldn’t work out a lambda query for parsing the tags.

The calling method is here:

public static IEnumerable<string> GetAllTags(string uri, string teamProject)
{
    var project = GetTeamProject(uri, teamProject);
    IEnumerable<string> tags = GetAllDistinctWorkItemTags(uri, teamProject);
 
    return tags;
}

I’ve listed GetTeamProject helper method before, but for the sake of completeness:


public static Project GetTeamProject(string uri, string name)
{
    TfsTeamProjectCollection tfs;
 
    tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(uri)); // https://mytfs.visualstudio.com/DefaultCollection
    tfs.Authenticate();
 
    var workItemStore = new WorkItemStore(tfs);
    
    var project = (from Project pr in workItemStore.Projects
                       where pr.Name == name
                       select pr).FirstOrDefault();
    if (project == null)
        throw new Exception($"Unable to find {name} in {uri}");
 
    return project;
}

Here’s the output:

tags1

Notes on Tags

A couple of points on tags: firstly, tags seem to exist in a kind of transient state; that is, while something is tagged, the tag exists, but once you remove all instances of a tag (for example, if I removed “Tagtest1” from all work items in my team project, TFS would eventually (I believe after a couple of days) just delete the tag for me. Obviously, in my example, as soon as I did this, I would no longer find it. This might leave you thinking that there is a more efficient way of removing tags (that is, you should be able to access the transient store in some way).

The existence of this Visual Studio plug-in lends support to that idea. It allows you to maintain the tags within your team project. If you’re using tags in any kind of serious way then I’d strongly recommend that you try it.

Performance

This is doing a lot of (IMO) unnecessary work, so I tried a little performance test; using this post as a template, I created a lot of bugs:

tags2

As you can see, I created a random set of tags. One other point that I’m going to put here is that a TFS database with ~30K work items and no code whatsoever increases the size of the default collection DB to around 2GB:

tags3

Now I ran the GetAllTags with some timing on:

tags4

19 seconds, which seems like quite a reasonable speed to me for 13.5k tags.

Programmatically Create Test Case Steps

In this earlier post, I discussed how to create a test case via the TFS API. For my next trick, I’m going to create some test case steps.

The first thing you need is the TestManagement Client:

testcasestep1

Once this is installed, the rest is quite straightforward; here’s the method to create the steps:

private static void AddTestCaseSteps(string uri, Project project, int testCaseId, string[] steps)
{
    TfsTeamProjectCollection tfs;
 
    tfs = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(uri)); // https://mytfs.visualstudio.com/DefaultCollection
    tfs.Authenticate();
 
    ITestManagementService service = (ITestManagementService)tfs.GetService(typeof(ITestManagementService));
    ITestManagementTeamProject testProject = service.GetTeamProject(project);
    ITestCase testCase = testProject.TestCases.Find(testCaseId);
 
    foreach (string step in steps)
    {
        ITestStep newStep = testCase.CreateTestStep();
        newStep.Title = step;
        // newStep.ExpectedResult = "Expected result";
        testCase.Actions.Add(newStep); 
    }
 
    testCase.Save();
 
 
}

As you can see, the ITestManagementService does a lot of the work for you, and once you have the test case, the rest is straightforward. Here’s what the CreateNewTestCase looks like now:


private static ActionResult CreateNewTestCase(string uri, Project project, WorkItem testedWorkItem, string description, string assignee, string[] reproductionSteps)
{
    WorkItemType workItemType = project.WorkItemTypes["Test Case"];
 
    // Create the work item. 
    WorkItem newTestCase = new WorkItem(workItemType);
    newTestCase.Title = $"Test {testedWorkItem.Title}";
    newTestCase.Description = description;
    newTestCase.AreaPath = testedWorkItem.AreaPath;
    newTestCase.IterationPath = testedWorkItem.IterationPath;
    newTestCase.Fields["Assigned To"].Value = assignee;
 
    // Copy tags
    newTestCase.Fields["Tags"].Value = testedWorkItem.Fields["Tags"].Value;
 
    ActionResult result = CheckValidationResult(newTestCase);
    if (result.Success)
    {
        CreateTestedByLink(uri, testedWorkItem, result.Id);
        AddTestCaseSteps(uri, project, result.Id, reproductionSteps);
    }
 
    return result;
}

Finally, here’s the test case:

testcasestep2

References

https://msdn.microsoft.com/en-us/library/microsoft.teamfoundation.testmanagement.client.itestmanagementteamproject.aspx?f=255&MSPPError=-2147217396

http://blogs.microsoft.co.il/shair/2013/10/07/tfs-api-part-51-adding-test-step-amp-shared-step/

https://blogs.msdn.microsoft.com/visualstudioalm/2011/12/30/how-to-get-the-test-case-associated-with-the-unit-test/

GetFilesAsync is not returning all the files in the “My Documents” folder

The following code should return a list of all files in the “My Documents” folder:

    var files = await KnownFolders.DocumentsLibrary.GetFilesAsync();
    List<FileInformation> filesList = new List<FileInformation>();
    
    System.Diagnostics.Debug.WriteLine($"File count: {files.Count()}");

The returned count from this code is (in my case) 11. However, excluding folders, there are 61 files in that directory for me. When I iterate through the collection, it does indeed find 11.

After scratching my head for a while, I finally realised that the answer to my question was my own blog post. It only lists the allowed types.