Google BERT — An Inflection Point in the Field of NLP
- February 04, 2020
Deep learning has generated excitement among automation experts and organizations worldwide due to its potential to be very useful for real-world applications. Many experts consider deep learning as the next technological revolution, with the potential to solve complex use cases. In 2016, Google’s AlphaGo program, which was built using deep learning and neural networks, won a game of Go — a game of strategy played across a 19 x 19 grid and had long been considered the uncrackable gold standard for AI programming —against legendary Go player Lee Se-dol.
What is deep learning?
The term “deep learning” refers to the use of artificial neural networks to carry out a form of advanced pattern recognition. Therefore, deep learning is a collection of multiple layers of artificial neural networks used in supervised and unsupervised machine learning tasks. According to the McKinsey Global Institute, “The value a company could hope to gain from applying this technology ranges from 1 to 9 percent of its revenue.”
Natural Language Processing (NLP) – a key use case for deep learning
NLP enables conversation between humans and machines. With NLP, the machine will have the ability to understand the unstructured text which we humans use to converse. Humans don’t have to instruct the machine/system in a structured text. With NLP powered bots, an enterprise can not only automate specific tasks but also elevate the user experience by allowing the users to converse in natural language. With the optimal NLP model in place, the machine/system can describe photos accurately, and can also translate text from one language to another. However, the performance and accuracy of the NLP model are dependent on the availability of large training data sets. Popular pre-training approaches, including Word2vec and Global Vectors for Word Representation (GloVe), have certain limitations as they are context-independent and have trouble capturing the meaning of combinations of words and negation.
Introducing Google BERT
To help close the pre-training gap, in November 2018, Google open-sourced a new technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With this release, anyone in the world can train their state-of-the-art question-answering system (or a variety of other models), in about 30 minutes on a single cloud Tensor Processing Units (TPU), or in a few hours using a single Graphics Processor Unit (GPU). BERT has shown ground-breaking results in many tasks such as question answering, natural language inference and paraphrase detection. Since it is openly available, it has become popular in the research community.
Key features of BERT
Since it was introduced in 2018, BERT has been a popular topic of discussion among the machine learning and NLP community. Sources around the world have contributed to help define its key features. Below is an ever-growing list of features that can have wide-ranging benefits for multiple industries.
- It is pre-trained on a large volume English Wiki Data of 2.5 billion words, which can help computers understand natural human language.
- It uses word pieces (for example, playing -> play + ##ing) instead of words, which increases the amount of data that is available for each word.
- Its pre-trained model is available in 11 languages
- It understands the context of a word by its position in a sentence by generating different word embeddings for that word
- The performance of BERT was found superior to other state-of-the-art NLP systems. On The Stanford Question Answering Dataset v1.1 test, BERT achieved a 93.2% F1 score (a measure of accuracy), surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%.
BERT is an inflection point in the application of Machine Learning for NLP. With the availability of more pre-trained models, implementing NLP tasks will be less cumbersome. AI enthusiasts and developers can build services on top of these models to build a wide range of applications in the future.
Faster searches and faster resolutions
BERT shows promise to deliver faster searches and faster resolutions to end-users and agents. Currently end-users and agents spend considerable amount of time in searching for relevant information in their knowledge data bases to find potential solutions to address a problem. An improvement in delivery of relevant and contextual knowledge articles can result into improvement in satisfaction, user delight and productivity.
However NLP advances enabled by BERT is at an early stage of maturity. At NTT DATA we are experimenting with this technology to significantly change the way knowledge articles are searched — making the search results more relevant and contextual. With BERT we hope to arrive on a better method for mapping search results to search intent. This means increasing the specificity and support for longtail queries. We want the users and resolution teams to get faster access to information such as self-help, troubleshooting and task resolution, thereby elevating user experience. NLP powered by BERT has potential to reduce the average handle time of tickets, mean time to resolve and reduce the number of contacts at the service desk.
BERT will have its most effect on NLP applications where context plays a big role. Eg BERT could be used by conversational chatbots to improve the intent identification. We expect leading product vendors of conversational language-based solutions to adopt BERT to make the interaction contextual.
Enterprises looking to experiment with BERT on its own should encourage their data scientists, and leaders responsible for NLP projects, to leverage the capabilities enabled by the emerging generation of pre-trained NLP models like BERT. As these are sophisticated methods, business leaders should anticipate a longer learning curve and evolution of BERT-derived models. Regular reviews should be conducted to determine how BERT can offer competitive differentiation.