Thursday, January 14, 2016

Prerequisites for Conditional Random Fields

When I first started out to learn CRFs there was a scarcity of material that I could consume. They were either too technical for my ability or were repeating what I already knew. This article hopes to bridge that gap and allow you to read material on CRFs. We assume knowledge of some math related to set theory and probability theory.

  1. Factorization of functions is when a function is shown to be the product of some other functions. 
  2. Probability
    1. It is the chance of something happening.
  3. Graphs
    1. A Graph is defined as a set of Vertices, Edges. The edges may be directed or undirected.
    2. A Bipartile graph is a one in which the vertices can be split into two groups where members of a group are note connected to any other member of that group.
    3. A factor graph is a Bipartile Graph representing the factorization of a function.
  4. Graphical Model
    1. Probabilistic Graphical Models are probabilistic models where a graph represents the conditional dependence structure between variables.
    2. Two branches of graphical representations of distributions are common.
      1. Bayesian networks
        1. Hidden Markov Models (HMMs) and Neural networks are special cases of Bayesian networks
      2. Markov networks/ Markov random fields
    3.  Markov networks may be cyclic and are undirected whereas Bayesian networks are directed and acyclic.
  5. Markov property
    1. At it's core the Markov property asserts memory-less-ness.
    2. It says that each observation must not be influenced by the past ones.
  6. Just as Capital sigma is used to denote the sum of a series, product of a series is denoted py PI.
Feel free to delve deeper into that body of knowledge before understanding what a CRF is. Understand what they represent before stepping onto this next lot.

A Conditional Random field is an undirected probabilistic graphical model. It is used to predict a set of classification labels for a set of inputs. Instead of considering a single input variable individually it considers the effect of neighbours.

A well worn example is the English sentence. Classifying the words of a sentence as 'verb', 'noun' etc while individually looking at the word is classification. A CRF would look at the neighbouring words too and thus predict the classification for the entire sentence together and not just one word at a time.

A CRF is a graphical model where the set of vertices can be split into two disjoint sets X and Y such that X is the set of inputs and Y is the set of outputs and then the conditional probabilities are modeled according to P(Y|X).

A very good example of CRF with python can be seen in this notebook (Jupyter Notebook).