Natural Language Processing or NLP is a subfield of AI that aids machines – to process and understand human language so that they can perform repetitive tasks. These machines respond to text or voice data with text or speech of their own, as humans do.
Why is NLP used?
The NLP approach helps businesses create a clear conclusion from a collection of unstructured data that aids them in making effective decisions regarding improving user satisfaction. NLP helps automate routine tasks all the while reducing cost and time. Thus, boosting productivity and efficiency.
How does NLP work?
In the NLP approach, human language is fragmented, so grammatical structure and word meanings become comprehensible to be analyzed and understood in context. Pre-processing tasks are performed on the user input so that the NLP tools can transform the text into a computer-understandable format. The next step is creating an NLP algorithm that will aid data collection and present it to be used effectively.
The following are the 5 most practical projects we have worked on that incorporate the NLP approach.
- User interest and sentiment analysis:
A Laptop manufacturing company approached us with their need:
- to find their users’ satisfaction with the after-sales service they provided
- to spot new potential customers
Our dataset was to be based on chat discussions between users and customer care agents and sales personnel.
Our approach to the demand at hand:
The entire project was written using the R programming language.
- We generated n-grams and ranked them with the help of tf-idf (term frequency-inverse document frequency) to uncover important terminologies and used Flash Text to filter out proper words and phrases.
- We also used CoreNLP as a server to determine the sentiment of a user-input sentence.
- Next, using the RShiny app, we built a dashboard to display words and phrases in bubble plots and word clouds. Average user sentiment over time was determined using line charts.
- FAQ bot:
A Fortune 500 network equipment company asked us to create an effective FAQ (Frequently Asked Questions) bot.
Our client didn’t have any dataset to train on. Additionally, since NLG (Natural Language Generation) was in the baby stage in 2018, we couldn’t give any subjective answer to the question that the bot would face. We started out by using our client’s “Help and FAQ” section as data.
- We created a large corpus of “Help and FAQ” data and divided it paragraph-wise.
- Next, for each paragraph, we identified the important keyword using tf-idf along with the score.
- Once a question came in, we extracted the key terminologies and checked them against the summation of the score for each paragraph. The highest-ranked paragraph was selected as an answer to the question.
- AI Story creation:
The goal of this project was to create an article given a single input sentence as context.
Our approach to the demand:
- While writing the article, we used a transformer network, specifically NLP BERT (Bidirectional Encoder Representations from Transformers), to predict the next most appropriate word. Hugging Face’s well-known Transformers library was used.
- But we were faced with an issue. Common terms were being predicted repeatedly after an interval, limiting the entire output, and making it appear machine-generated.
Statistical finetuning solved this problem. We continuously collected initial predictions and built a probability density curve as we kept writing the article and selecting words that lie in the first standard deviation of the curve.
- Predicting the honesty score of a candidate:
The project demanded the creation of an NLP algorithm to be applied to the audio transcript of all the calls and conversations conducted in an interview. The algorithm had to detect:
- specific keywords and ‘pauses’ in the transcription
- the number of correct interview questions answered by the candidate.
Based on these factors, a classifier was to be built to predict the honesty score.
The provided dataset contained information such as-
- Interested and answered all questions
- Interested and answered a few questions
- Duration of each answer
- Total words spoken
- Type of words spoken (can be used to determine emotional / personality traits)
- Firstly, we conducted some hypothesis testing to determine the factors that would prove useful in determining honesty and discovered that the following points had to have an impact on our purpose:
- Whether the candidate answered all or just a few questions
- The duration
- The type and sequence of words used
- Having gathered both the input and output variables (employed or not hired), we used a random forest classifier to train a model. We hypothesized that being hired successfully and staying on the job for at least a year would be associated with the candidate’s high level of honesty.
- Train and deploy AI chatbots:
A technology startup approached us to build a customized open-source zero-code framework to train and deploy AI chatbots. Our inspiration for this project was Yoon Kim’s CNN model, as presented in his paper, Convolutional Neural Networks for Sentence Classification. Kim had worked on a simple CNN with little hyperparameter tuning and static vectors involving sentiment analysis and question classification. We successfully modified it for our use.
Our approach to the project:
We used Python, Flask, Scikit Learn, and Tensorflow Keras for the project.
- 1D CNN (Convolutional Neural Networks) was used but for unigrams, bigrams, and trigrams to classify a sentence.
- Matthews correlation coefficient was used as a measure of accuracy.
- The library created could be used as a server to expose two rest APIs.
- Train API – To train a chatbot with a new dataset.
- Predict API – To predict intent and entity from an input sentence.
(We can include healthcare-specific NLP uses as future blogs and use backlinks to connect them)