How can your business benefit from named entity recognition (NER)?
What is NER?
Named entity recognition (NER) — sometimes referred to as entity, extraction or identification — is the task of identifying and categorizing key information (entities) in text. An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category. For example, a NER machine learning model might detect the word “Flaps” in a text and classify it as a “Company”.
How NER works
At the heart of any NER model is a two-step process:
1) Detect a named entity:
Step one involves detecting a word or string of words that form an entity. Each word represents a token: “The Great Lakes” is a string of three tokens that represents one entity. Inside-outside-beginning tagging is a common way of indicating where entities begin and end. We’ll explore this further in a future blog post.
2) Categorize the entity:
The second step requires the creation of entity categories. Here are some common entity categories:
Person. E.g., Elvis Presley, Audrey Hepburn, David Beckham
Organization. E.g., Google, Mastercard, University of Oxford
Time. E.g., 2006, 16:34, 2am
Location. E.g., Trafalgar Square, MoMA, Machu Picchu
Work of art. E.g., Hamlet, Guernica, Exile on Main St.
These are just a few examples. You can create your own entity categories to suit your task, as well as provide granular rules for which entities belong to which categories in instances of ambiguity or task-specific ontologies.
How is NER used in business?
NER is suited to any situation in which a high-level overview of a large quantity of text is helpful. With NER, you can, at a glance, understand the subject or theme of a body of text and quickly group texts based on their relevancy or similarity.
Some notable business NER use cases include:
– Human resources
Speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions.
– Customer support
Improve response times by categorizing user requests, complaints and questions and filtering by priority keywords.
– Health care
Improve patient care standards and reduce workloads by extracting essential information from lab reports. Roche is doing this with pathology and radiology reports
– Search and recommendation engines
Improve the speed and relevance of search results and recommendations by summarizing descriptive text, reviews, and discussions. Booking.com is a notable success story here.
– Content classification
Surface content more easily and gain insights into trends by identifying the subjects and themes of blog posts and news articles.
Enable students and researchers to find relevant material faster by summarizing papers and archive material and highlighting key terms, topics, and themes
The EU’s digital platform for cultural heritage, Europeana, is using NER to make historical newspapers searchable