Prediction Algorithm

Jose Rios
July 08, 2018

Introduction

The goal of this product is to build a natural language processing model that will receive as input a simple text and predict what should be the next word.

This is just like the virtual keyboard in a cellphone, which suggest the next word given the ones already typed.

How the model works

Just give it a sentence in plain English, hit Enter and the model will return the most likely word that completes the sentence.

The model considers only two last two words to try and predict the continuation of the sentence.

Given that the purpose of this work is to build an application that will run on a cellphone, both the size of the predictive model and the runtime had to be drastically reduced. Therefore, only one word is provided.

How the model was built (1)

How the model was built (2)

  • for those shingles whose first two words are exactly the same, only the most common combination was kept
  • example: if “inside the box” occurs more frequently than “inside the magic” and “inside the car”, only the former was kept
  • shingles with a frequency smaller than a threshold, 0.1% of the total number of words, were disposed of.
  • lastly, only the 50,0000 most frequent shingles were used.
  • actually in the end, the final model is 43,927 shingles long

Examples of use

input: the whole process will take about…
output: minutes

input: I received this gift from an old…
output: friend

input: the school committee will soon address this…
output: issue

input: You should worry what you do with your own…
output: life