Learning Library

← Back to Library

Video BeDeHntF68M

Full Transcript

# Video BeDeHntF68M **Source:** [https://www.youtube.com/watch?v=BeDeHntF68M](https://www.youtube.com/watch?v=BeDeHntF68M) **Duration:** 00:08:17 ## Sections - [00:00:00](https://www.youtube.com/watch?v=BeDeHntF68M&t=0s) **Untitled Section** - ## Full Transcript
0:01i recently bought a new shirt outside of 0:04this darkened room i do occasionally 0:08dress in 0:09something other than a black tee 0:12and that purchase was a disaster the 0:16colors were nothing like the picture and 0:18the fit 0:19it was not how it was described 0:22so 0:24i returned it 0:25along with a strongly worded review and 0:29my review was one of thousands it would 0:31take the shirt seller hours to read them 0:34all and this is just one of many many 0:37items of clothing they sell 0:39fortunately there's a better way to 0:41process vast amounts of text like 0:44product reviews and that is through 0:46something called 0:48text 0:49mining 0:52text mining is the practice of analyzing 0:55vast amounts of textual materials to 0:58capture key concepts trends and hidden 1:01relationships it's the process of 1:03transforming unstructured text into a 1:06structured format to identify meaningful 1:09patterns and new insights 1:11now unstructured and structured text 1:14what is that 1:16well if we break text down 1:18there's structured 1:21and 1:22structured text or structured data is 1:24standardized into a tabular format with 1:27with rows and with columns 1:31so this makes it very easy to process 1:33think of like a database table or a 1:36spreadsheet it's easy to query it's easy 1:39to filter and to analyze 1:42now unstructured 1:44data 1:46well that doesn't have a predefined 1:49format and this includes all sorts of 1:52text things like text documents 1:56email messages 1:58images videos social media posts that 2:00sort of thing 2:02now there's also 2:05semi 2:06structured 2:07text and 2:09that has some structure but not quite 2:11enough to meet the requirements of a 2:12relational database so think of like 2:15xml or 2:17json or something along those terms 2:20now it turns out that something like 80 2:23of the data in the world resides in 2:26an 2:27unstructured format so there's plenty of 2:30opportunity to put text mining to work 2:34we use text mining to generate an index 2:36of structured concepts to be able to 2:39answer questions like which concepts 2:41occur together and 2:43what do the concepts predict 2:45to do this we'll go through four 2:48different stages 2:53okay so stage one that is identify 3:00this is where we identify the text that 3:03is to be mined and that might be a 3:05collection of news articles or product 3:07reviews 3:08in stage 2 3:10we process 3:13the text to remove noise and to 3:16standardize the format so this includes 3:18doing things like removing stop words 3:20tokenizing the words uh lemonize 3:23lemmatizing and uh part of speech 3:25tagging all sort of things like that are 3:27used in the processing stage 3:29then stage three builds the concept 3:34and the categories 3:37and then in stage four we analyze all of 3:41this 3:44to really make predictions and to 3:46discover relationships 3:49now first of all let's focus here on 3:51stage two for a moment 3:54the primary problem with the management 3:56of all this institutional text and data 3:59is that there are no standard rules for 4:01writing text so that a computer can 4:02understand it but language and 4:04consequently the meaning varies for 4:06every document and every piece of text 4:10so if we take a phrase let's say 4:12reproduction 4:16that pen's not so good let's try this 4:17one reproduction 4:22of documents 4:28how can we expand the meaning of this 4:31what other words would be cinnamons for 4:34reproduction 4:36well a linguistics 4:39based 4:40text mining model 4:42might suggest a couple of words for 4:44reproduction like 4:47copy 4:48or it might suggest 4:52duplication 4:54and those look good 4:56and that's because 4:58linguistics-based text mining applies 5:00the principles of natural language 5:02processing or nlp to the analysis of 5:05words phrases and syntax of text 5:08an alternative to linguistics-based text 5:11mining is statistics based text mining 5:15and that uses calculations of frequency 5:18to derive related terms and 5:21statistics-based text mining tells us 5:23that reproduction is related to the term 5:30birth 5:32that's going to generate some highly 5:33irrelevant results so using nlp 5:37to understand 5:38the language used cuts through the 5:40ambiguity of text making linguistic 5:42space text mining the more reliable 5:45approach 5:46and it's this processing that brings us 5:48to the category building of stage three 5:52where the concepts and the types that 5:53were extracted are used as the category 5:56building blocks 5:58when the build categories records and 6:01documents then assigned to those 6:03categories 6:05we can take a look at the text that they 6:06contain and match an element of the 6:09categories definition and from there the 6:11relationship discovery and the 6:13prediction analysis is performed here 6:17by data 6:18mining 6:20and data mining is a topic that we've 6:23addressed in another video so check that 6:25out if you want to see some more detail 6:28now beyond sifting through product 6:30reviews where can text mining also be 6:32applied 6:35well in the wider field of customer 6:38service 6:42text mining can be applied to work with 6:45sentiment analysis and that can provide 6:47a mechanism for companies to prioritize 6:49key pain points by their customers by 6:52processing support tickets chat bot 6:54responses and so forth 6:56there's also risk 6:57management 6:59and in risk management text mining can 7:02provide insights around industry trends 7:05of financial markets by monitoring 7:06shifts in sentiment and by extracting 7:09information from analyst reports and 7:11white papers 7:12and then in the field of maintenance 7:17we can use text mining to derive 7:19patterns that are correlated with 7:21problems and that can be used to 7:22generate preventative and reactive 7:24maintenance procedures 7:26oh and by the way that that poorly 7:28fitted shirt that i sent back with the 7:30scathing review well the seller sent me 7:33a 50 7:35discount code in addition to my refund 7:38another happy outcome of text mining at 7:41work 7:43thanks for watching and please consider 7:45to like and subscribe to our channel and 7:48also in the comments let us know about 7:50any other tech topics you'd like us to 7:52cover and we can continue to bring you 7:55the content that is relevant to you like 7:59some of these videos here 8:15you