Shannon Entropy and .NET

Introduction

Hello, and welcome to my article. Sometimes, I wonder where all my ideas for my articles come from; at least I now know why I cannot sleep most evenings. My brain doesn’t switch off. It is a blessing and a curse.

Today, you will learn how to make use of the Shannon Entropy equation to work out probabilities in your .NET applications.

Entropy

Entropy can be defined in the context of a probabilistic model. For example: A coin flip has an entropy of 1 bit per coin flip. A string that always generates a long sequence of As has an entropy of 0, because the next character in the string will always be an ‘A’.

Shannon Entropy

Claude Shannon’s entropy measures information contained in a message; for example: redundancy in language structure, and information about the occurrence frequencies of letter or word pairs, and so on. Shannon entropy provides a way to determine the average minimum number of bits needed to encode a string, based on the frequency of the symbols inside the string.

Our Project

Create a new C# or Visual Basic.NET Windows Forms project. Once the default form has loaded, add one Button and one ListBox to it.

Code

Add a new Class to your project and name it ShannonEntropy; then, add the necessary NameSpaces.

C#

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

VB.NET

Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Linq

Add the following fields.

C#

   SortedList<byte, int> slTimeSymbolAppears;

   SortedList<byte, double> slEntropy;

   double dblEntropy;

   bool blnUsed;

   int iSize;

VB.NET

   Private slTimeSymbolAppears As SortedList(Of Byte, Integer)
   Private slEntropy As SortedList(Of Byte, Double)
   Private dblEntropy As Double
   Private blnUsed As Boolean
   Private iSize As Integer

slTimeSymbolAppears contains each occurrence of the desired symbol. dblEntropy will contain the result of the process and blnUsed is True or False depending on whether or not a symbol has been used. Add the Properties.

C#

   public int Size
   {

      get
      {
         return iSize;
      }

      private set
      {
         iSize = value;
      }

   }

   public int Unique
   {

      get
      {
         return slTimeSymbolAppears.Count;
      }

   }

   public double Entropy
   {

      get
      {
         return GetEntropy();
      }

   }

   public Dictionary<byte, int> Distribution
   {

      get
      {
         return SortedDistribution();
      }

   }

   public Dictionary<byte, double> Probability
   {

      get
      {
         return SortedProbability();
      }

   }

VB.NET

   Public Property Size As Integer

      Get

         Return iSize

      End Get

      Private Set(ByVal value As Integer)

         iSize = value

      End Set

   End Property

   Public ReadOnly Property Unique As Integer

      Get

         Return slTimeSymbolAppears.Count

      End Get

   End Property

   Public ReadOnly Property Entropy As Double

      Get

         Return GetEntropy()

      End Get

   End Property

   Public ReadOnly Property Distribution As Dictionary(Of Byte, _
         Integer)

      Get

         Return SortedDistribution()

      End Get

   End Property

   Public ReadOnly Property Probability As Dictionary(Of Byte, _
         Double)

      Get

         Return SortedProbability()

      End Get

   End Property

Add the reset of the Functions and the Constructor.

C#

   public byte GreatestDistribution()
   {

      return slTimeSymbolAppears.Keys[0];

   }

   public byte GreatestProbability()
   {

      return slEntropy.Keys[0];

   }

   public double SymbolDistribution(byte bSymbol)
   {

      return slTimeSymbolAppears[bSymbol];

   }

   public double SymbolEntropy(byte bSymbol)
   {

      return slEntropy[bSymbol];

   }

   public Dictionary<byte, int> SortedDistribution()
   {

      List<Tuple<int, byte>> lstEntries = new
         List<Tuple<int, byte>>();

      foreach (KeyValuePair<byte, int> e in slTimeSymbolAppears)
      {

         lstEntries.Add(new Tuple<int, byte>(e.Value, e.Key));

      }

      lstEntries.Sort();
      lstEntries.Reverse();

      Dictionary<byte, int> dicResult = new
         Dictionary<byte, int>();

      foreach (Tuple<int, byte> e in lstEntries)
      {

         dicResult.Add(e.Item2, e.Item1);

      }

      return dicResult;

   }

   public Dictionary<byte, double>SortedProbability()
   {

      List<Tuple<double, byte>> lstEntries = new
         List<Tuple<double, byte>>();

      foreach (KeyValuePair<byte, double> e in slEntropy)
      {

         lstEntries.Add(new Tuple<double, byte>(e.Value, e.Key));

      }

      lstEntries.Sort();
      lstEntries.Reverse();

      Dictionary<byte, double> dicResult = new
         Dictionary<byte, double>();

      foreach (Tuple<double, byte> e in lstEntries)
      {

         dicResult.Add(e.Item2, e.Item1);

      }

      return dicResult;

   }

   public double GetEntropy()
   {

      if (!blnUsed)
      {

         return dblEntropy;

      }

      dblEntropy = 0;
      slEntropy = new SortedList<byte, double>();

      foreach (KeyValuePair<byte, int> e in slTimeSymbolAppears)
      {

         slEntropy.Add(e.Key, (double)slTimeSymbolAppears[e.Key] /
            (double)iSize);

      }

      foreach (KeyValuePair<byte, double> e in slEntropy)
      {

         dblEntropy += e.Value * Math.Log((1 / e.Value), 2);

      }

      blnUsed = false;

      return dblEntropy;
   }

   public void GetBytes(byte[] bBytes)
   {
      if (bBytes.Length < 1 || bBytes == null)
      {

         return;

      }

      blnUsed = true;

      iSize += bBytes.Length;

      foreach (byte bt in bBytes)
      {

         if (!slTimeSymbolAppears.ContainsKey(bt))
         {

            slTimeSymbolAppears.Add(bt, 1);

            continue;

         }

         slTimeSymbolAppears[bt]++;

      }
   }

   public void GetBytes(string strBytes)
   {

      GetBytes(StringToByteArray(strBytes));

   }

   byte[] StringToByteArray(string strInput)
   {

      char[] c = strInput.ToCharArray();

      IEnumerable<byte> b = c.Cast<byte>();

      return b.ToArray();

   }

   void Clear()
   {

      blnUsed = true;

      dblEntropy = 0;
      iSize = 0;

      slTimeSymbolAppears = new SortedList<byte, int>();
      slEntropy = new SortedList<byte, double>();

   }

   public ShannonEntropy(string fileName)
   {
      Clear();

      if (File.Exists(fileName))
      {

         GetBytes(File.ReadAllBytes(fileName));
         GetEntropy();
         SortedDistribution();

      }
   }

   public ShannonEntropy()
   {

      Clear();

   }

VB.NET

   Public Function GreatestDistribution() As Byte

      Return slTimeSymbolAppears.Keys(0)

   End Function

   Public Function GreatestProbability() As Byte

      Return slEntropy.Keys(0)

   End Function

   Public Function SymbolDistribution(ByVal bSymbol As Byte) _
         As Double

      Return slTimeSymbolAppears(bSymbol)

   End Function

   Public Function SymbolEntropy(ByVal bSymbol As Byte) As Double

      Return slEntropy(bSymbol)

   End Function

   Public Function SortedDistribution() As Dictionary(Of Byte, _
         Integer)

      Dim lstEntries As List(Of Tuple(Of Integer, Byte)) = New _
         List(Of Tuple(Of Integer, Byte))()

      For Each e As KeyValuePair(Of Byte, Integer) In _
            slTimeSymbolAppears

         lstEntries.Add(New Tuple(Of Integer, Byte)(e.Value, _
            e.Key))

      Next
      lstEntries.Sort()
      lstEntries.Reverse()

      Dim dicResult As Dictionary(Of Byte, Integer) = New _
         Dictionary(Of Byte, Integer)()

      For Each e As Tuple(Of Integer, Byte) In lstEntries

         dicResult.Add(e.Item2, e.Item1)

      Next

      Return dicResult

   End Function

   Public Function SortedProbability() As Dictionary(Of Byte, _
         Double)

      Dim lstEntries As List(Of Tuple(Of Double, Byte)) = New _
         List(Of Tuple(Of Double, Byte))()

      For Each e As KeyValuePair(Of Byte, Double) In slEntropy

         lstEntries.Add(New Tuple(Of Double, Byte)(e.Value, e.Key))

      Next

      lstEntries.Sort()
      lstEntries.Reverse()

      Dim dicResult As Dictionary(Of Byte, Double) = New _
         Dictionary(Of Byte, Double)()

      For Each e As Tuple(Of Double, Byte) In lstEntries

         dicResult.Add(e.Item2, e.Item1)

      Next

      Return dicResult

   End Function

   Public Function GetEntropy() As Double

      If Not blnUsed Then

         Return dblEntropy

      End If

      dblEntropy = 0
      slEntropy = New SortedList(Of Byte, Double)()

      For Each e As KeyValuePair(Of Byte, Integer) In _
            slTimeSymbolAppears

         slEntropy.Add(e.Key, CDbl(slTimeSymbolAppears(e.Key)) / _
            CDbl(iSize))

      Next

      For Each e As KeyValuePair(Of Byte, Double) In slEntropy

         dblEntropy += e.Value * Math.Log((1 / e.Value), 2)

      Next

      blnUsed = False

      Return dblEntropy

   End Function

   Public Sub GetBytes(ByVal bBytes As Byte())

      If bBytes.Length < 1 OrElse bBytes Is Nothing Then

         Return

      End If

      blnUsed = True
      iSize += bBytes.Length

      For Each bt As Byte In bBytes

         If Not slTimeSymbolAppears.ContainsKey(bt) Then

            slTimeSymbolAppears.Add(bt, 1)

            Continue For

         End If

         slTimeSymbolAppears(bt) += 1

      Next

   End Sub

   Public Sub GetBytes(ByVal strBytes As String)

      GetBytes(StringToByteArray(strBytes))

   End Sub

   Private Function StringToByteArray(ByVal strInput As String) _
         As Byte()

      Dim c As Char() = strInput.ToCharArray()

      Dim b As IEnumerable(Of Byte) = c.Cast(Of Byte)()

      Return b.ToArray()

   End Function

   Private Sub Clear()

      blnUsed = True
      dblEntropy = 0
      iSize = 0

      slTimeSymbolAppears = New SortedList(Of Byte, Integer)()

      slEntropy = New SortedList(Of Byte, Double)()

   End Sub

   Public Sub New(ByVal fileName As String)

      Clear()

      If File.Exists(fileName) Then

         GetBytes(File.ReadAllBytes(fileName))
         GetEntropy()
         SortedDistribution()

      End If

   End Sub

   Public Sub New()

      Clear()

   End Sub

Add the code for your Form.

C#

namespace ShannonEntropy_C
{
   public partial class Form1 : Form
   {
      ShannonEntropy se = new
         ShannonEntropy(@"C:\\Temp\\TestFile.txt");
      public Form1()
      {
         InitializeComponent();
      }

      private void button1_Click(object sender, EventArgs e)
      {
         double ge = se.GetEntropy();

         listBox1.Items.Add(ge.ToString());
      }
   }
}

VB.NET

Public Class Form1

   Private se As ShannonEntropy = New _
      ShannonEntropy("C:\Temp\TestFile.txt")
   Private Sub button1_Click(sender As Object, e As EventArgs) _
         Handles button1.Click

      Dim ge As Double = se.GetEntropy()

      listBox1.Items.Add(ge.ToString())

   End Sub

End Class

When you click the button, it will calculate and display the Entropy. I have included the Textfile, but keep in mind that it must be referenced properly and you might not have a Temp folder on your disk.

Figure 1 shows the result.

Running
Figure 1: Running

Conclusion

In this article, you have learned how useful entropy can be in determining repetitive values. Until next time, happy coding!

Hannes DuPreez
Hannes DuPreez
Ockert J. du Preez is a passionate coder and always willing to learn. He has written hundreds of developer articles over the years detailing his programming quests and adventures. He has written the following books: Visual Studio 2019 In-Depth (BpB Publications) JavaScript for Gurus (BpB Publications) He was the Technical Editor for Professional C++, 5th Edition (Wiley) He was a Microsoft Most Valuable Professional for .NET (2008–2017).

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read