Learning Library

← Back to Library

NLP Basics: Translating Unstructured to Structured

Key Points

  • Natural language processing (NLP) is the technology that enables computers to understand and generate human language by converting unstructured text (like spoken sentences) into structured data that machines can process.
  • The transformation from unstructured to structured data is called natural language understanding (NLU), while the reverse conversion from structured data back to natural language is known as natural language generation (NLG).
  • A primary NLP application is machine translation, which requires grasping context and sentence structure to avoid mistranslations such as the “spirit is willing, but the flesh is weak” → “vodka is good, but the meat is rotten” example.
  • NLP also powers virtual assistants (e.g., Siri, Alexa) and chatbots, interpreting spoken or written user inputs to execute commands or carry on conversational interactions.

Full Transcript

# NLP Basics: Translating Unstructured to Structured **Source:** [https://www.youtube.com/watch?v=fLvJ8VdHLA0](https://www.youtube.com/watch?v=fLvJ8VdHLA0) **Duration:** 00:09:35 ## Summary - Natural language processing (NLP) is the technology that enables computers to understand and generate human language by converting unstructured text (like spoken sentences) into structured data that machines can process. - The transformation from unstructured to structured data is called natural language understanding (NLU), while the reverse conversion from structured data back to natural language is known as natural language generation (NLG). - A primary NLP application is machine translation, which requires grasping context and sentence structure to avoid mistranslations such as the “spirit is willing, but the flesh is weak” → “vodka is good, but the meat is rotten” example. - NLP also powers virtual assistants (e.g., Siri, Alexa) and chatbots, interpreting spoken or written user inputs to execute commands or carry on conversational interactions. ## Sections - [00:00:00](https://www.youtube.com/watch?v=fLvJ8VdHLA0&t=0s) **Understanding Natural Language Processing** - Martin Keen explains NLP as the technology that translates unstructured human language—like spoken or written sentences—into structured data that computers can process, highlighting its role in AI and its relationship to natural language understanding. - [00:04:13](https://www.youtube.com/watch?v=fLvJ8VdHLA0&t=253s) **Sentiment, Spam Detection, and Tokenization** - The speaker outlines NLP applications such as sentiment analysis and spam detection and explains that processing begins with tokenizing unstructured text into individual word tokens. - [00:07:41](https://www.youtube.com/watch?v=fLvJ8VdHLA0&t=461s) **Part‑of‑Speech Tagging and NER Overview** - The speaker explains how part‑of‑speech tagging identifies a word’s grammatical role from sentence context and how named‑entity recognition assigns real‑world categories to tokens, converting unstructured speech into structured data for AI applications. ## Full Transcript
0:00What is natural language processing? Well,  you're doing it right now, you're listening 0:06to the words and the sentences that I'm forming  and you are forming some sort of comprehension 0:13from it. And when we ask a computer to do that  that is NLP, or natural language processing. 0:21My name is Martin Keen, I'm  a Master Inventor at IBM, 0:24and I've utilized NLP in a good number of  my invention disclosures. NLP really has a 0:32really high utility value in all sorts of AI  applications. Now NLP starts with something called 0:40unstructured text. What is that? Well, that's  just what you and I say, that's how we speak. 0:47So, for example, some unstructured text is  "add eggs and milk to my shopping list." 0:59Now you and I understand exactly what that means,  but it is unstructured at least to a computer. 1:11So what we need to do, is to have a structured  representation of that same information that 1:17a computer can process. Now that might look  something a bit more like this where we have a 1:22shopping list element. And then it has sub  elements within it like an item for eggs, 1:34and an item for milk. 1:43That is an example of  something that is structured. 1:50Now the job of natural language processing  is to translate between these two things. 1:56So NLP sits right in the middle here translating  between unstructured and structured data. And when 2:03we go from structure from unstructured here  to structured this way, that's called NLU, or 2:11natural language understanding. And when we  go this way from structured to unstructured, 2:16that's called natural language generation,  or NLG. We're going to focus today primarily 2:24on going from unstructured to structured in  natural language processing now let's think of 2:30some use cases where nlp might be quite handy.  First of all, we've got machine translation. 2:41Now when we translate from one language to  another we need to understand the context of 2:52that sentence. It's not just a case of taking  each individual word from say English and 2:57then translating it into another language. We  need to understand the overall structure 3:02and context of what's being said. And my  favorite example of this going horribly wrong 3:08is if you take the phrase the "spirit is willing,  but the flesh is weak" and you translate that from 3:15English to Russian and then you translate  that Russian translation back into English 3:20you're going to go from the "spirit is willing,  but the flesh is weak" to something a bit more 3:26like the "vodka is good, but the meat is  rotten" which is really not the intended 3:32context of that sentence whatsoever. So  NLP can help with situations like that. Now 3:39the the second kind of use case that I like  to mention relates to virtual assistants, 3:46and also to things like chatbots. Now a virtual  assistant that's something like Siri, or Alexa 3:53on your phone that is taking human utterances and  deriving a command to execute based upon that. And 4:01a chatbot is something similar except in written  language and that's taking written language and 4:07then using it to traverse a decision tree in order  to take an action. NLP is very helpful there. 4:14Another use case is for sentiment analysis. Now  this is taking some text perhaps an email message 4:24or a product review and trying to derive  the sentiment that's expressed within it. 4:30So for example, is this product review a positive  sentiment or a negative sentiment, is it written 4:38as a serious statement or is it being sarcastic?  We can use NLP to tell us. And then finally, 4:47another good example is spam detection so this  is a case of looking at a given email message 4:54and trying to drive is this a real email  message or is it spam and we can look for 4:59pointers within the content of the message. So  things like overused words or poor grammar or an 5:06inappropriate claim of urgency can all indicate  that this is actually perhaps spam. So those are 5:13some of the things that NLP can provide but how  does it work well the thing with NLP is it's 5:20not like one algorithm, it's actually more like a  bag of tools and you can apply these bag of tools 5:28to be able to resolve some of these use cases.  Now the input to NLP is some unstructured text 5:36so either some written text or spoken text that  has been converted to written text through a 5:43speech to text algorithm. Once we've got that,  the first stage of NLP is called tokenization 5:54This is about taking a string and breaking  it down into chunks so if we consider the 6:02unstructured text we've got here "add  eggs and milk to my shopping list" 6:08that's eight words that can be eight tokens. 6:11And from here on in we are going to work one  token at a time as we traverse through this. Now 6:18the first stage once we've got things down into  tokens that we can perform is called stemming. 6:27And this is all about deriving the word stem  for a given token. So for example, running, 6:34runs, and ran, the word stem for all three of  those is run. We're just kind of removing the 6:40prefix and the suffixes and normalizing the  tense and we're getting to the word stem. 6:46But stemming doesn't work well for every  token. For example, universal and university, 6:54well they don't really stem down to  universe. For situations like that, 6:59there is another tool that we have  available and that is called lemmatization. 7:08And lemmatization takes a given token and learns  its meaning through a dictionary definition 7:14and from there it can derive its root, or its lem.  So take better for example, better is derived from 7:24good so the root, or the lem, of better is good.  The stem of better would be bet. So you can see 7:34that it is significant whether we use stemming,  or we use lemmatization for a given token. 7:42Now next thing we can do is we can do a  process called part of speech tagging. 7:51And what this is doing is for a given token  it's looking where that token is used within the 7:57context of a sentence. So take the word make for  example, if I say "I'm going to make dinner", make 8:09is a verb. But if I ask you "what make is your  laptop?", well make is now a noun. So where that 8:16token is used in the sentence matters, part of  speech tagging can help us derive that context. 8:22And then finally, another stage  is named entity recognition. 8:30And what this is asking is for a given token  is there an entity associated with it. So 8:36for example, a token of Arizona has an entity of a  U.S. state whereas a token of Ralph has an entity 8:45of a person's name. And these are some of the  tools that we can apply in this big bag of tools 8:52that we have for NLP in order to get from this  unstructured human speech through to something 8:58structured that a computer can understand. And  once we've done that then we can apply that 9:04structured data to all sorts of AI applications.  Now there's obviously a lot more to it than this 9:11and I've included some links in the description if  you'd like to know more, but hopefully this made 9:16some sense and that you were able to process some  of the natural language that I've shared today. 9:25Thanks for watching. If you have questions,  please drop us a line below. And if you want 9:31to see more videos like this in the  future, please like and subscribe.