Due to a series of blog posts that I’m writing on TFS and MS Cognitive Services, I came across a requirement to identify duplicate values in a dictionary. For example, imagine you had an actual physical dictionary, and you wanted to find all the words that meant the exact same thing. Here’s the set-up for the test:
Dictionary<int, string> test = new Dictionary<int, string>()
{
{ 1, "one"},
{ 2, "two" },
{ 3, "one" },
{ 4, "three" }
};
DisplayDictionary("Initial Collection", test);
I’m outputting to the console at every stage, so here’s the helper method for that:
private static void DisplayDictionary(string title, Dictionary<int, string> test)
{
Console.WriteLine(title);
foreach (var it in test)
{
Console.WriteLine($"Key: {it.Key}, Value: {it.Value}");
}
}
Finding Duplicates
LINQ has a special method for this, it’s Intersect. For flat collections, this works excellently, but no so well for Dictionaries; here was my first attempt:
Dictionary<int, string> intersect = test.Intersect(test)
.ToDictionary(i => i.Key, i => i.Value);
DisplayDictionary("Intersect", intersect);
As you can see, the intersect doesn’t work very well this time (don’t tell Chuck).
Manual Intersect
The next stage then is to roll your own; a pretty straightforward lambda in the end:
var intersect2 = test.Where(i => test.Any(t => t.Key != i.Key && t.Value == i.Value))
.ToDictionary(i => i.Key, i => i.Value);
DisplayDictionary("Manual Intersect", intersect2);
This works much better.