Learning Library

← Back to Library

Claude 3.5, Text-to‑SQL Benchmark, AI Content

Key Points

  • The episode introduces three main AI industry updates: the launch of Claude 3.5 Sonnet, the new “Bird Bench” text‑to‑SQL benchmark, and the current state and future of AI‑generated content.
  • Hosts and guests debate how quickly enterprise clients can adopt the rapid stream of new models, questioning whether they constantly update APIs or stick with existing solutions despite frequent leaderboard churn.
  • Claude 3.5 Sonnet is highlighted for topping major leaderboards, but the conversation notes uncertainty about its concrete ROI and use‑case suitability for most clients.
  • The Bird Bench benchmark reveals which models excel or lag in translating natural language queries to SQL, underscoring both the promise of AI‑assisted analytics and the persistent data‑quality challenges.
  • The segment on AI‑generated content examines what’s working today, what falls short, and the desired improvements, while humorously noting the difficulty of assessing “funny” outputs compared to more straightforward technical metrics.

Full Transcript

# Claude 3.5, Text-to‑SQL Benchmark, AI Content **Source:** [https://www.youtube.com/watch?v=FjppatnFxsE](https://www.youtube.com/watch?v=FjppatnFxsE) **Duration:** 00:38:53 ## Summary - The episode introduces three main AI industry updates: the launch of Claude 3.5 Sonnet, the new “Bird Bench” text‑to‑SQL benchmark, and the current state and future of AI‑generated content. - Hosts and guests debate how quickly enterprise clients can adopt the rapid stream of new models, questioning whether they constantly update APIs or stick with existing solutions despite frequent leaderboard churn. - Claude 3.5 Sonnet is highlighted for topping major leaderboards, but the conversation notes uncertainty about its concrete ROI and use‑case suitability for most clients. - The Bird Bench benchmark reveals which models excel or lag in translating natural language queries to SQL, underscoring both the promise of AI‑assisted analytics and the persistent data‑quality challenges. - The segment on AI‑generated content examines what’s working today, what falls short, and the desired improvements, while humorously noting the difficulty of assessing “funny” outputs compared to more straightforward technical metrics. ## Sections - [00:00:00](https://www.youtube.com/watch?v=FjppatnFxsE&t=0s) **AI Podcast Kickoff: Claude, Benchmarks, Content** - In this opening segment of the "Mixture of Experts" show, host Brian Casey outlines the episode’s three focus areas—Claude 3.5 Sonnet, a new text‑to‑SQL benchmark, and the state of AI‑generated content—while introducing the expert panelists. ## Full Transcript
0:00welcome everyone to this week's mixture 0:01of experts I'm your host Brian Casey 0:03every week we cover the top stories from 0:05around the industry in artificial 0:06intelligence today we're excited to 0:08bring you three of the top stories of 0:10across the industry we'll be kicking 0:11things off today with Claude 3.5 Sonet 0:14so even though I have this beautiful 0:16model that does this really well and I'm 0:17super excited that finally we're able to 0:19do this I couldn't see the actual use 0:21case in the ROI to recommend that to one 0:23of my clients second on the agenda today 0:25we'll talk about the new bird bench text 0:27tosql Benchmark we'll talk about Which 0:28models are doing well which ones aren't 0:30the keys to success and what the future 0:33might hold for Aid driven analysts in 0:35the enterprise we have models that are 0:37pretty good at code especially with a 0:39human in the loop they can you know 0:41already add value today um but there's a 0:45lot of challenge with data third on the 0:47agenda today we'll talk about AI content 0:50the present the future what's working 0:51well right now what's not and what we're 0:53hoping to see so most of the things 0:55we've been talking about here are a 0:56little bit harder to understand where 0:58whereas you see a picture of shrip Jesus 1:00and it's funny and you're like all right 1:02that was 1:07funny joining us this week to cover all 1:10the most important topics in AI we have 1:11Varney VP and Senior partner gen AI 1:14in the Americas we have Marina danesi 1:17senior research scientist and we have 1:19Michael glass an AI 1:21[Music] 1:25researcher we've been debating renaming 1:27the show for mix mixture of experts to 1:29to America's Next Top Model uh because 1:32it seems that every week there is an 1:34announcement of yet another model that 1:36is yet in top of yet another leaderboard 1:38uh and today we're going to talk about 1:39two of them we'll talk about Claude 1:41we'll talk about the bird equl lead 1:43leaderboard we have some other topics 1:45but we're going to jump right in um and 1:46talk about the release of Claude 3.5 1:49sonnet uh which recently came out and is 1:52topping a lot of the uh the big leader 1:54boards and to kind of kick things off 1:56show bit wanted to turn it over to you 1:58and just you know 2:00one of the things that I noticed is that 2:02one of these things happens like every 2:03week and I'm just curious like how 2:05Enterprise clients are ingesting this 2:06stuff are they you know waiting on pins 2:09and needles being like when's the next 2:10model coming just so I can update all my 2:12API calls to a new one or are they like 2:15H I just I just did this like do I need 2:18another one like how does this stuff 2:20land for them and like are they actually 2:21able to ingest the pace at which things 2:23are coming out right now Brian what a 2:25world we live in man like from gbt 40 2:28and now with cl 3.5 2:30just the pace of innovation and what my 2:3210-year-old kid has access to is just 2:34insane like just we've got to wait and 2:36appreciate how far AI has come so 2:39quickly so I'm big huge fan of the big 2:41models that we we seeing and the 2:42capabilties just absolutely stunning 2:44from an Enterprise perspective majority 2:46of our clients are still users of AI so 2:50they will go sign up for a large SAS 2:52platform an article or or copilot with 2:54Azure and so and so forth and they'll be 2:55consuming AI through that particular 2:57vendor in the SAS format in which case 2:59now now the vendor is choosing Which 3:00models to use so so for there's a small 3:03subset of teams that are actually 3:04building their own models or they are 3:06configuring their own LM apps end to end 3:10kind of made some choices on going to 3:11bring in so in this case AWS investing 4 3:14billion plus in anthropic gives people 3:17access to 3.5 so if you have an 3:19Enterprise that has already built entire 3:21end to end contract analytics on Azure 3:25they are not going to be able to use the 3:273.5 goodness that comes from Claude 3:29Contin to use their GD 40's and and 3.5s 3:32and so on so forth so there a little bit 3:34of in the Enterprise space the coolest 3:36model is not what you really get access 3:37to there's a lot of things around what 3:39happens to the liability and 3:41indemnification and does it really I've 3:43spent enough energy in making it 3:45understand my domain so Enterprises 3:47don't quite switch models the way we do 3:48I dropped my uh gyd uh my chat gyd plus 3:52and I switched over to Claude to 3:54experiment more with it right so as a 3:56individual user I'm going to keep 3:57switching around for the best model and 3:59no longer use my Gemini subscription as 4:01well right so I may follow more than 4:03what an Enterprise word that makes a lot 4:05of sense and Marina Michael question for 4:08both of you like the announcement I 4:10thought was interesting because it was a 4:11mix of improved capabilities across a 4:14lot of the dimensions that were 4:16accustomed to people talking about you 4:17know just benchmarks Vision um but it 4:21was also product announcements where 4:23they were talking more about um about 4:26artifacts and projects and actually like 4:29in some ways show it to your point about 4:30like people like ingesting SAS tools 4:33they went from like Claude is like going 4:34a little bit away from just like that 4:37like standard chat interface to like 4:39behaving a little bit more like a SAS 4:41tool um in some of these areas so like 4:43I'm curious if you all thought that the 4:46more noteworthy piece with some of the 4:47capabilities or actually just like where 4:50they're nudging the product from you 4:52know like a like you know the end result 4:54of all technology B2B SAS on on some 4:56level but if like which one of those 4:59dimensions um that you found kind of 5:00more interesting about the technology 5:03and just like where it mean or what it 5:04means in terms of like where things are 5:05going well for me yeah the the business 5:09part of it um maybe didn't land as as 5:12strongly uh I was interested foring the 5:14the scientific progress angle um so you 5:17know the thing with these closed models 5:20you don't necessarily know what they did 5:22that got the performance increase but 5:25you do see the performance increase um 5:28so it it's a it's a interesting mystery 5:30what what's uh what's behind all of that 5:33I think that speaks to maybe a very 5:35slightly different work that Michael and 5:37I do when it comes to the research side 5:39he's certainly more on the science 5:40theory side I end up swimming sometimes 5:42more in the business application both of 5:44which are really valid I actually was 5:46interested in the the artifacts side and 5:48their claims I haven't played around 5:50with it yet myself but the claims of how 5:52it would be used for potentially for for 5:54work collaboration or how to create 5:56things that are richer because I'm 5:57really interested in how these models 5:59actually end up being useful rather than 6:01maybe how they work on particular 6:03benchmarks there though I would like to 6:05know the the science behind why is this 6:07one according to anthropic going to be 6:09better or more useful is it merely 6:11because of the workflow ux that they're 6:13enabling or there something to this 6:15model versus other models and actually 6:17that's something I'm really interested 6:18to dive deeper into so so t on the just 6:21the core capabilities though uh I do 6:23feel that the the artifacts having a 6:25window on the side where they're en now 6:27going to be enabling some team 6:29collaboration that's going to go quite a 6:30bit a long way today each data scientist 6:32has their own space and we continue to 6:34work in that so I'm pretty excited about 6:36what we can do just I was working with 6:39my 8-year-old daughter and we were 6:40trying to figure out how to create a 6:42game that she was playing on her iPad 6:43say let's let's replicate that with 6:45Claud and absolutely stunning how well 6:46it was able to do that and we able to 6:48iterate through it and see things moving 6:50on the right side and you could see that 6:52that uh thing upgrading we have uh when 6:55we look at benchmarks across these uh 6:57different vendors as soon as the 6:59announcement comes in we like oh rah 7:01champagne open the champagne it's such a 7:03great Benchmark right but in the real 7:04Enterprise world we we have to go deploy 7:08that in our own use cases and see if 7:09it's working and stuff right so I would 7:11like to share one example um I was 7:13recently at our in our Bangalore Center 7:15in Bangalore we have our industry Labs 7:17where we have set up different sections 7:19to each industry cpg and utilities and 7:21so on so forth in the utilities U Wing 7:24we had actual dials of uh machines right 7:27so you had these analog dials and 7:29there's whole gradal the needle is 7:31pointing towards the reading right so we 7:33work closely with a lot of our utilities 7:34and Manufacturing clients with Boston 7:37dyamics and whatnot so we get a bunch of 7:39images coming off of those and the image 7:41is of a dial that's at an angle it is 7:44reflecting some sun and a lot of the 7:46markings over time have actually rubbed 7:48off so there'll be marking at one at 3 7:51four five so the middle two is missing 7:53and then most of these dials also have a 7:55green marker that says hey this is the 7:57safe zone for this particular dial 7:59I've been very frustrated so far that 8:01open ai's gp40 the best in-class model 8:05if I give it those images that I'm I'm 8:06working with today was not able to go 8:08look at the reading and I was pleasantly 8:10surprised to see 3.5 Claude take that 8:13image and was able to correctly annotate 8:16and I the way I had asked the question 8:17was figure out all the major ticks and 8:19all the smaller ticks after that and 8:20then figure out where the needle is and 8:22then tell me if it's in the green zone 8:23or not it absolutely nailed it and I was 8:26very pleasantly surprised it didn't get 8:27the green Zone part correct but 8:30comparing Gemini Pro 1.5 to 40 from from 8:34GPT and then now Cloud 3.5 I could see 8:36that but then we started to do the math 8:38on it my team does this when we are 8:39doing millions of these images every day 8:41for our clients and started to do math 8:44our pipeline is a is a classic machine 8:46learning YOLO base model uh and we have 8:49different versions we've done the whole 8:50thing end to end we're using some 8:52segment model from meta to figure out 8:54what's where in the image and stuff the 8:56accuracy that I'm getting there is more 8:57consistent but the price point of 8:59delivering that for this utility is a 9:02fraction of what I'm getting from claws 9:043.5 so even though I have this beautiful 9:07model that does this really well and I'm 9:08super excited that finally we're able to 9:10do this I couldn't see the actual use 9:12case in the ROI to recommend that to one 9:14of my 9:15clients what was the actual difference 9:17was it a like a factor of 10 or a factor 9:20of two so out of out of say I put 9:23through about 20 images uh through it 9:26and clot got about 18 of them correct 9:28that was pretty high and gp4 and Gem 9:31live was was actually not doing very 9:32well on those particular images and 9:34stuff gp4 was doing a bit better but 9:37getting 18 out of 20 correct that's 9:39quite a bit unfortunately even 18 is not 9:41high enough for any of my clients right 9:43they need to have very high accuracy if 9:46you're talking about plants with like 9:48power plants power Productions happening 9:50you have to be pretty spoton you can't 9:52just have 18 out2 correct do our clients 9:55have a sense of how like they're used 9:59cases that are close you know it's like 10:02the we almost have capable enough models 10:04to unlock you know these sorts of Di and 10:06we're just waiting and like once we get 10:08there like this thing's going to be able 10:10uh we're going to be able to use these 10:11things I know some of the emergence is 10:13often times like unpredictable um and so 10:16you don't actually know when capability 10:17is going to show up um but do they have 10:19like good feel for the things that are 10:21are close versus the things that are 10:23like you know way way farther away in 10:25terms of you know real use cases um I 10:27think that you're actually uh pointing 10:30at something interesting so a lot of the 10:32clients are still not quite at the point 10:34where they know what it takes to trust 10:36these systems even in something that's 10:37text and not only not multimodal but 10:39it's just a single modal uh there's a 10:43feeling of yeah I play with it for a 10:44little bit and it it seems to pass the 10:46initial sniff test but I can still 10:48consistently break it consistently break 10:49it and the risks just continue to be too 10:52high exactly what shba just said of it 10:54messing up one time having either a 10:56viral negative PR moment or a lawsuit or 10:59you know anything else of that so 11:01because there's still this feeling of 11:03actually I I don't get the sense that 11:05there is that level of comfort that uh 11:07where they're ready to Contin you know 11:09really completely put it into production 11:11unless it's fully wrapped by something 11:14else an amount of guard rails and amount 11:16of anything that's able to catch it so 11:18uh certainly not there yet at least from 11:21from my particular experience I don't 11:22know show but if you've seen something 11:23different yeah so there are uh certain 11:25use cases like uh code completion like 11:27any code development that's the biggest 11:30progress we' have seen in the industry 11:31from across all vendors we have done 11:33some amazing work with our Granite code 11:35models that are small tiny understand 11:37the domain beautifully and they can help 11:39with code completion endend life cycle 11:42so far we've been most impressed with 11:44the accuracy that we get out of code 11:45completion especially for the more 11:47popular popular languages right the 11:49second has been around customer care 11:51there are internal facing use cases in 11:53Customer Care not external you don't 11:55want uh to sell a car for a dollar to 11:58end users you want to make make sure 11:59that you're doing something in where 12:01it's internal facing looking for 12:02insights looking for transcribing uh 12:04Speech and figuring out what's H 12:06happening things of that nature creating 12:07test data sets training data sets those 12:09are working out really well agent facing 12:11workflows so those two I would say we 12:13are close enough where we have great 12:15deployments and they're being done at 12:17scale across a large set of AI vendors I 12:20think the the the tip over point is when 12:24it starts to look at data really well so 12:26so far we're talking about text the big 12:29Chasm it needs to cross is structured 12:31data sets uh this is where we're looking 12:33at your data is in sap or in Salesforce 12:36oracles things of that nature and now 12:38all of a sudden my in the middle of my 12:40workflow there's unstructured documents 12:42it's getting there where it can give you 12:43lineage tell you where the answer came 12:45from but when it comes to structured 12:47data sets llms don't inherently 12:49understand data really well so most of 12:52the TS have been made have been can I go 12:54from national language into making a API 12:57call or a SQL statement things that 12:59nature and then I'll go grab some data 13:01that's the area that's going to unlock a 13:03significant amount of productivity once 13:05we understand how to pull in structure 13:07data sets into our national language 13:09[Music] 13:13conversation I wonder if that's a great 13:15pivot to the the second topic which is 13:17to talk a little bit about structured 13:20data and even just the um the bird 13:23sequel um Benchmark uh do you want to 13:25maybe like kick us off there and just 13:27talk a little bit about um what that is 13:29and why it's important yeah I think it 13:32it goes directly to what sh was saying 13:35that you know we have models that are 13:37pretty good at code especially with a 13:39human in the loop they can you know 13:41already add value today um but there's a 13:45lot of challenge with data um so you can 13:48take their strength uh to address that 13:51weakness and create a text to SQL model 13:53so that people can interact with their 13:56data you know using natural language um 13:59and but with a human in the loop you can 14:03examine the the SQL statement uh ideally 14:06with some explanation uh also from the 14:09llm about what it's doing and why uh to 14:12gain confidence that that is really 14:14selecting the right results set I that's 14:17that is the data I wanted to to get out 14:19that makes sense and I think there's 14:20like is there's a few benchmarks in this 14:23space that are actually there's 14:25benchmarks for everything um and we 14:28could probably also retitle this show 14:29Just The Benchmark show um on some level 14:32but I think um there's a few benchmarks 14:35I want to say in the the Texas sequel 14:37area and um I I believe bird is one of 14:41them do you do you want to talk it all 14:43about I read a little I read the paper 14:45um that they published on that but do 14:46you want to maybe just take a minute and 14:47talk a little bit about uh the approach 14:49that they're taking why it's maybe a 14:50little bit different from from some of 14:52the other benchmarks that are out there 14:53in the space sure yeah I think uh for a 14:56long time spider um spider one .0 I 14:59should clarify now in case 2.0 comes out 15:02soon uh that that Benchmark was very 15:06popular um drove a lot of research but 15:09it reached a point where people were 15:12getting 90 plus percent on it uh some 15:16you know the hard questions weren't that 15:18hard uh and didn't necessarily challenge 15:21state-of-the-art llms anymore so with 15:24bird bench you know the uh 15:27organizers still made a variety of 15:30difficulty levels but the the 15:32challenging questions in BD are much 15:33more challenging than the extra hard 15:36questions in spider uh so it it really 15:38moved uh the difficulty level um also 15:43yeah the the databases in spider were 15:46created uh all databases all database 15:49schemas are created and and the ones in 15:51spider were invented by database 15:54students uh but the uh databases and 15:59bird were gathered from Real World um 16:03databases and and sets of uh data sets 16:06uh so they're considerably more messy um 16:10which is going to be the case in in in 16:12the real world you're going to have to 16:14deal with databases that are not not 16:17very clean not well normalized have uh 16:20missing and messy data um so there's a 16:25both of those challenges the 16:27the difficulty and reality of the the 16:31schemas uh as well as the difficulty of 16:33the information need the complexity of 16:35the information need there are two 16:37different things that the bird SQL paper 16:39did one was looking at the actual 16:42accuracy of the SQL statements but then 16:44they were also looking at the efficiency 16:45of the SQL statement itself they are 16:48different ways in which I could join 16:49tables and get the answer out and they 16:52in some of the comparisons one query 16:54could take 42 seconds the other one 16:56could take 5 Seconds right so there's 16:57obviously a efficent and then 17:00Effectiveness score with bir equal to do 17:02you see that in other benchmarks as well 17:04I think that was their Innovation you 17:06know that was one of the things they 17:07brought uh 17:09was I think also may be related to the 17:11complexity of the queries um when you 17:14reach a level of complexity there's now 17:16many ways of answering that um by 17:20crafting many joins or maybe some 17:22subselects um so they can vary 17:25considerably as you say in the actual 17:28efficiency that the time it takes to 17:29execute um so I think that was maybe 17:32something that came out of just 17:34developing uh a more complex uh 17:37Benchmark Michael where do you think 17:39there might be a good direction to go 17:41for the next set of benchmarks of what 17:44are the next set of hard questions that 17:45you would like to see if you could uh 17:47get a benchmark on order what might be 17:49valuable uh I do I do think that there's 17:52a lot of questions and and it was a true 17:54in spider uh it's true in bird that have 17:57a a very simple result set it's just 18:00asking for a particular number uh I 18:03think a lot of what people want to do 18:05with their data is to build a result set 18:08which they can then visualize so it's it 18:10would have many rows uh with some 18:13aggregation over you know often location 18:16and time uh you know that could then be 18:19presented in a 18:20dashboard um that that's not a a high 18:23percentage uh of bird not very many 18:26queries that are suitable for that kind 18:28of of uh visualized analysis so that 18:32that's what I would like to see and 18:34followup queries right once you have 18:36that kind of aggregation you could then 18:37potentially even have turn it into like 18:39a multi-turn type of stuff T I want to 18:41highlight a few things about bird SQL uh 18:44I I was generally very impressed 18:46there're about close to 13,000 unique 18:49questions as part of the test data sets 18:51there's about close to 100 big databases 18:54it represents about what 30 GB plus of 18:57data that we quing right so a whole 18:59bunch of different professional domains 19:00Stu of that and on that Benchmark IBM's 19:05uh Granite models are number one we 19:08crushed it we have the EXL plus our 19:10Granite 20 billion parameter model from 19:12IM research that is currently the king 19:15of bird SQL can you just highlight in a 19:17couple sentences what are some of the 19:19innovations that IBM has done in the in 19:21that space to get the SQL just nailed 19:24yeah I think uh you know it's a 19:25multi-stage uh multi multi-piece uh 19:28model so uh one thing we call schema 19:31linking is just identifying what piece 19:34of this database is actually relevant 19:36for answering your question typically 19:38you going have many tables many columns 19:40for each table uh but only a small 19:43fraction are actually relevant for what 19:45you're asking um so that was a big focus 19:48is first narrowing down uh what piece of 19:52of the database is relevant um then 19:55there's another part where we try to 19:57identify what are the conditions uh you 20:00need to match against in the database uh 20:03so that the values in the database can 20:06be represented in many ways and this is 20:08part of having realistic uh databases 20:11you know the it can be represented by Y 20:14and N for a Boolean column or zero and 20:17one or true and false uh 20:20so dates and places can be represented 20:23so many different ways so just that 20:26ability to understand uh 20:29what you need to include in your query 20:30that will actually match the values in 20:32the 20:33database uh that that was also a 20:35significant part I'll tell you from my 20:38perspective this whole area is one of 20:41the more exciting and interesting like 20:44spaces and opportunities for for folks 20:47who like worked with me a few years ago 20:49once upon a time I used to manage um uh 20:52have I managed analytics teams a couple 20:54different times um in my life and we 20:57have we used to have an analyst ICS 20:58request Channel um within our digital 21:01team that I used to joke and tell people 21:03was my least favorite slack channel in 21:06all of IBM um and it actually basically 21:08functioned as a text to SQL model which 21:11is that people would just dump in 21:12requests and then our analytics team 21:14would just go off and return you know an 21:17hour later or a day later with an Excel 21:19file being like I ran some SQL here's 21:21your answer and I would just sit there 21:23and stew at that channel and just be mad 21:26all day long uh basically cuz I would 21:28look at that and be like you guys should 21:29learn how to do this yourselves like why 21:31are you asking the team these insanely 21:33basic questions like if you can't answer 21:34these questions like how are you even 21:36doing your job like all the time sort of 21:39thing and so like when I think about 21:42this space like the Nirvana to me is 21:46that people the questions that people 21:47would dump in the analytics request 21:49Channel they can just ask to an llm in 21:51natural language but one of the one of 21:54the tricky things i' love the panelist 21:56thoughts on this is a lot of things 21:58people want is like their core kpis um 22:01that sometimes they're measured on it's 22:02like how am I doing on performance on 22:04this thing and like getting the wrong 22:07data in some of those like depending on 22:10the type of kpi like a lot of 22:12consequences uh sometimes of not getting 22:14that data accurate um so you know I'm 22:17just curious to team's thoughts on like 22:19you know do we think we'll be more 22:21productive in this space early on 22:23empowering anal analysts to be faster do 22:26we like you know is that Nirvana stage 22:29of you know just business uniters asking 22:31questions in natural language you know 22:34getting results back do we think that's 22:36closer than it feels um right now like 22:39how like where are we on that maturity 22:41curve we're going to get to a point 22:43where it's useful for a data analyst 22:46first uh that that's going to come well 22:49before it's useful for somebody who who 22:52doesn't understand the sequel uh but I 22:54think that uh you know a business User 22:58it's it's it's on the horizon it's um 23:01with with tools for explainability to to 23:03provide some justification and 23:05verification of you that this is 23:08actually answering what you're asking uh 23:11I think that uh you can gain some 23:12confidence in the output the llm even if 23:16you're not familiar with SQL I I think 23:18there are a few different ways in which 23:19different uh vendors are going after 23:21this Market you have the classic 23:23business intelligent platforms right 23:25analytics and Pi uh tools Gartner just 23:27released theirs this week again 23:29Microsoft Salesforce Tableau or article 23:32Google big queries or the thought spots 23:34of the world top right right so you'll 23:36have all these leaders uh who have been 23:39bi tools that're now not adding natural 23:41queries to it then you have another Camp 23:43of deer brakes and snowflake that's 23:46where the real inje engines are for 23:48queries right so you have query that 23:50comes in you're going to fire off in 23:51real time cross multiple different sets 23:53of of T sets there the speed of inje is 23:56going to be very very quick quick there 23:59they are investing quite heavily in 24:02understanding what is in the data so 24:04given a particular column and there's a 24:06header for that that says abcore XYZ now 24:09what does that name actually translate 24:11to a national language so there's a lot 24:13of work that's been done in 24:14interrogating and discovering the data 24:16creating the right set of met data 24:17attached to it so that when somebody 24:19says hey I'm looking for numbers from 24:21this particular region across X then you 24:23have a good way of me mechanics to go 24:25and translate that where what it means 24:26in snowflakes ice tables or what does it 24:29mean in in different formats across data 24:31breaks so I there's going to be a good 24:33mesh up between the pure bi players 24:37adding national language and then the 24:39data side of the of the world 24:40extrapolating and exposing more metadata 24:43around it so you can nail the actual 24:44data where you need to go bring it from 24:46in the lineage and then all the 24:48governance falls on the left hand side 24:50where which data set has been approved 24:52for what kind of reporting if you're 24:54looking at say creating a report on 24:55sustainability there has to be a data 24:57catalog that's OS so now we're starting 24:59to understand how the data quality and 25:02the data cataloges Discovery there's 25:04there's a lot of effort that's going in 25:06that space and now you're starting to 25:08scale up on the national language inside 25:09of bi bi reporting dashboards that is 25:12still critical Microsoft is attacking 25:14this with powerbi where in the first 25:16iteration of it powerbi co-pilots was 25:18really focused more on the developers so 25:20my team will go and build out a whole 25:22bunch of different dashboards there's AI 25:24baked in for me to go switch out 25:26different panels so on so forth 25:28over time now they're adding uh ways in 25:31which a end user business analyst can 25:33just ask a national language question 25:34but then then you have to go uh ask 25:37follow-ups and say hey did you mean this 25:38region or that one and so on so forth 25:40but it's great to see this blend of data 25:44getting better and the tools that are 25:46analyzing and interrogating that dat are 25:47getting even better as well we release 25:49our own Watson xbi uh tool that again 25:53tries to mesh that together from our 25:54Watson x. data set and on the right hand 25:56side you have a better national language 25:59query system that goes and pulls out as 26:01well but it's a very very hot field for 26:03a lot of motion I was writing an article 26:05about how the dots on the Gartner have 26:07moved on the bi it's incredible how much 26:09movement has happened and so many people 26:11are now moving into the leader space in 26:13the Gartner quadrant whereas we didn't 26:14have those many last year one other 26:16interesting piece and maybe as a last 26:18word Marina I'll throw it over to you 26:20I've also I was also just 26:22reading a bunch of like articles Reddit 26:25threads things like that about people 26:27who are trying to make Texas Equal work 26:29inside of their companies and like one 26:31of the interesting things people are 26:32saying Well turns out documentation is 26:34useful for humans and also for llms U 26:37and so people were talking about 26:39actually building some like rag patterns 26:41that were like just taking the 26:42documentation for their database and 26:44putting it into the context for um for 26:47the llm so it actually like had some 26:49sense of like what some of these fields 26:51um were supposed to mean so I'm just 26:53curious if you've seen you know anything 26:55like interesting or noteworthy kind of 26:57you know just in the rags space around 26:58how it like how it intersects with the 27:00whole kind of Texas equl area I mean I 27:02will add to that that documentation is 27:04useful uh examples of hey I ran this 27:07kind of kpi report for one company for 27:10one thing can you do the same thing but 27:11for something else those kind of 27:13examples are very useful a reason that 27:15you could I think make a lot of progress 27:17and we are making a lot of progress with 27:18this bi space it's very similar just in 27:21uh code creation space you're you're 27:23wanting particular functions in 27:25particular order and you can very 27:27rapidly build up a sequence of things 27:28that you need and check whether it makes 27:30any kind of sense so like what Michael 27:33and CH you were saying that it can be a 27:35little bit hard to go straight to an end 27:36user who doesn't understand but it's 27:38quite easy to go to a data analyst who 27:39looks at it and goes all right I 27:40immediately can see where this is going 27:42off the rails give me this information 27:43this this this that ends up speeding you 27:45up a whole lot but the human can 27:47immediately tell where things are good 27:49where things are bad so what kind of 27:51knowledge is uh helpful to throw in and 27:53especially examples of oh yeah this it's 27:56like this but for this company for this 27:57time period for for this kpi that's 27:59where you're going to get a lot of speed 28:00up because you end up just doing a lot 28:02of the same thing over and over and over 28:04again and that always means it's a place 28:06that's ripe for this this human in the 28:07loop 28:08[Music] 28:12automation every week we have a thread 28:14where we talk just like what are people 28:16seeing in the industry what do we think 28:17is interesting um obviously we covered 28:19some pretty big news so far U the last 28:22piece is we actually there was an 28:24article going around that was talking 28:27about 28:28books for like 299 that were appearing 28:31in Kindle U as like ads and they're 28:35clearly 100% AI generated books and 28:38they're they were nighttime stories for 28:40kids and parents uh essentially which 28:42usually you don't hear that usually the 28:45nighttime story which I think was a joke 28:47in the article nighttime story for 28:49parents is just a book uh right as 28:52opposed to like a bedtime story um and 28:55it would be like you know using your 28:56mind control powers for good and it's 28:58just like all of this like uncanny 29:00valley like ebook content that's out 29:03there and we just we thought it' be a 29:05good idea to just like talk about what's 29:07happening sort of in the content space 29:10um we've been using the term AI slop um 29:13a little bit and there's there's been a 29:14few things that have I think been been 29:16going around the internet I'd love to 29:18just get the the panels kind of 29:20experience with some of this but some of 29:22the most recent ones I've seen is um 29:24there was a video of like Dave Chappelle 29:27um that went Moder viral which was not 29:29Dave Chappelle speaking it was somebody 29:31had one of the llms write like a routine 29:34and then basically piped it into 29:36Chappelle um there was you know there 29:39was a video that was going around of 29:40like Toys R Us's first like attempt to 29:43do an ad uh with Sora there's like way 29:46too many SEO stories of people just like 29:49lighting domains on fire building like 29:51llm generated content and one of the 29:54like maybe my hot take to start this off 29:56and then I throw to throw it to everyone 29:58what else is like content to me and 30:00content generation almost feels like a 30:02red herring um for llms right now it was 30:05the first thing that people latched on 30:06to it's like oh I can get this thing to 30:08like write a blog post or a movie script 30:11and like none of those use cases I don't 30:13know if y'all have seen any of them like 30:14none of those use cases have like seemed 30:16like they produce really anything useful 30:18um right now and there's lots of other 30:20areas where like many interesting things 30:22are happening like most of the 30:24conversation we've had here today but 30:26like content still like takes up weirdly 30:28high percentage of discussion in the 30:30market and there's just you know I don't 30:32know if it's just not there yet but I'm 30:34not seeing like good things happen in in 30:36this space so I'd love to just like open 30:38it up to the group and just one have you 30:40seen anything else and let just like has 30:42struck you of being like how did we get 30:44here to this place and then do you also 30:46like do you disagree with the take that 30:48you know content is like a distraction 30:51almost um in the llm world right now I 30:53don't know if content is a distraction 30:56it is something that is easy to create 30:58and easy to consume so most of the 30:59things we've been talking about here are 31:01a little bit harder to understand where 31:03whereas you see a picture of shrimp 31:04Jesus and it's funny and you're like all 31:06right that was funny all right I'll 31:08allow that there's one good use case 31:10then sh Jes that's funny um slob I would 31:14categorize it as some kind of mixture of 31:16funny annoying and dangerous funny is 31:18strimp Jesus annoying is these uh Kindle 31:20ads where you look at them and you're 31:21like okay the the title is weird the 31:24girl in the picture has eight fingers 31:26okay this is clearly some kind of you 31:28know low grade content that is trying to 31:30get to see if it get my attention then 31:31there's dangerous which is the AI guide 31:33to mushroom foraging which is going to 31:35kill you if you're going to go ahead and 31:36read that that was going around a little 31:37while ago and you're going to get this 31:40again because it is so easy to generate 31:42Amazon lets you complain about these 31:44book sellers and and take stuff down but 31:46it's just as easy to put stuff back up 31:48again uh so it's just going to continue 31:50to come back and and continue to come 31:52back and go and be on uh various social 31:55media platforms because it's funny I 31:58actually think there's something 31:58beneficial to this because it continues 32:00to keep in the public eye a reminder of 32:03just how easy it is to turn these things 32:06into trolls and so that you don't fall 32:09into a sense of complacency of only 32:11hearing about you know oh this is 32:12successful this is useful yeah but as 32:14soon as you turn it to to evil means 32:16it's actually still continues to be very 32:18easy and very very clever so I do think 32:20there's uh there's some use to that 32:23Brian I'm going to disagree with you uh 32:25I think content creation is an amazing 32:28use case for Enterprises so like we 32:29within IBM we've had a big partnership 32:31with Adobe and we do a lot of our 32:33marketing end to end as long as it's 32:35done within constraints you're using 32:37brand approved guidelines you're using 32:39content that's been being wetted and so 32:41on so forth as's a human in the loop the 32:43content generation has delivered amazing 32:45value for us as IBM and we've done the 32:47same for our clients and I'll take a 32:49more hot take on this and this is 32:51something that Yan had shared he runs 32:53all the AI for meta uh he had described 32:57how their using llms to filter content 33:00before people post right so two years 33:02back when they were looking at somebody 33:04about to post something on Facebook or 33:06Instagram and they're trying to validate 33:07if this is hate crime views and so on so 33:09forth they would have one in four 33:11chances of actually catching content now 33:14with their lava models they're able to 33:16get close to 90 94% of those about to be 33:19posted they can flag them as something 33:21that should not be posted right so this 33:22is AI being leveraged for good instead 33:25of saying that hey if I can just 33:26generate a whole bunch of content the 33:28reverse is also true I'm leveraging 33:30these llms for some of my Banks where 33:32we're doing social engineering attacks 33:35security related and we're able to 33:36identify which of these are actually 33:38social engineering hacks right so 33:40there's a flip side of this and I would 33:42say content creation I would think is a 33:44amazing superpower for llms if they're 33:46used rightly the right context in 33:48Enterprise with the right guardling 33:50around it you did say one thing which 33:51I'll just I that I think is 33:54underappreciated and this is kind of 33:55what I meant by the red herring piece of 33:57it is we actually we do have some use 33:59cases on the team where we're using now 34:00and it's like saved us a gazillion hours 34:02so it's like I there are real things 34:04that are that have helped from a 34:05productivity perspective but the second 34:07thing you mentioned that Yan talked 34:09about where like I think they're an 34:11underrated Analytics tool actually um 34:14like it's a different type of like NLP 34:16analytics that like you couldn't do 34:17without human sort of judgment um before 34:20but now you have this other type of 34:22computer that can do a different sort of 34:25analysis um and we have so many use 34:28cases around that sort of activity so 34:31it's like that sort of analysis 34:32summarization like how that plays into 34:35internal workflows like just this like 34:37whole universe of value that that we see 34:40there and that sometimes I think it's 34:41like gets a little bit distracted that's 34:43like oh you can create content it's like 34:44well you can analyze it too um and you 34:46can do all sorts of other things with 34:47language so you know I I do know there 34:50are some there are legitimately some 34:51good good use cases but sometimes I like 34:53I get a little sad that we don't talk 34:55more about like the use case that you 34:56just talked about there yeah I'll say 34:58that uh I think for a long time we've 35:00had an abundance of of some lowquality 35:02content for blog posts and um books 35:07that's true the internet is not like has 35:09not won the Academy Award right I think 35:12also you know it speaks the the danger 35:14might be uh a little overstated that you 35:17know we we've developed defenses against 35:19human created fakes uh which have been 35:22around for much longer than llm or other 35:27models have been able to create any kind 35:29of convincing fake so um I think uh both 35:33the both the danger and the value of 35:36some of this uh content creation that 35:39humans have already created an abundance 35:40of um probably pretty low that's true I 35:43tell you one of the when I saw the 35:45Kindle thing one of the YouTube had all 35:47these sort of interesting challenges 35:49with like they're calling it 35:50algorithmics Lop back before j a i was 35:53even cool uh and you know I look at like 35:56a show like Coco melon uh um this is 35:58probably there's only a very specific 35:59audience that's going to resonate with 36:01this but I'm just every time I hear Coco 36:03melon I just like dye a little bit 36:04inside I'm like put on Bluey like um and 36:08so I look at CoCo melon that just like 36:10feels like an algorithm created like a 36:13child tuned like engagement engine 36:16essentially and I'm like oh I see that 36:18Kindle stuff and I'm like oh no I I can 36:20feel more Coco melons coming and you 36:22know and for like especially for kids so 36:25you know I don't I don't know whether I 36:26would put that Under the Umbrella of 36:28dangerous but you know it's like some of 36:30that it's like I don't know makes me a 36:32little nervous on on some level that's 36:34far away from an Enterprise use case but 36:35maybe as like a parent I see some of 36:37those things and I'm like side iing side 36:39iing a little bit is so interesting by 36:41the way I don't know if you've read 36:42about the company but like it is 36:44absolutely uh based on make take 36:46tracking a lot of kpis like how long 36:48people watch the videos and trying to 36:50game that as much as possible and 36:52maximizing that Bluey which I love is a 36:55wonderful content and it's not for that 36:57at all and Coco Melon really is how long 36:59can we keep the kids engaged they track 37:01like every second what the kids are 37:03watching so there's a lot of AI being 37:06put to use there although it's still 37:07being put to use by people who have 37:09decided that that's their goal yes I 37:11think overall if you're looking at this 37:12the content creation part you touched on 37:14this with the Toys R Us with Sora as 37:17well the ultimate use case would be I 37:19don't like I don't care much about who 37:21created the ad itself right tar did an 37:24amazing job at kan's releasing their ad 37:26fully created by 80% created by by Sora 37:28right amazing work but I really want to 37:30fast forward to a point where the movie 37:32that I'm seeing in the in in my screen I 37:36want to be in there like I would love to 37:38have a scene with Beyonce right I was 37:41that was not where I thought you were 37:42going I was like it's like I'm the star 37:45of the movie imagine right like I would 37:47like if I'm watching Avengers and I'm 37:49big Avengers fan I would want to be 37:51helping Iron Man in that particular 37:53scene fighting Thanos right just imagine 37:55the power of you create a this of an 37:57inserting inside of a movie that you're 38:00watching I think that's the future that 38:01I want to live in don't tell my wife I 38:03refer Beyonce and all those we're gonna 38:06we're just going to cut this part of the 38:08Pod so like it's over but I think it 38:11will be very interesting to see how this 38:12space like shapes up there are places 38:15where it's like I could see like an 38:17algorithm and like a reinforcement 38:18learning type of thing producing like 38:20Coco melon on steroids and parents like 38:23people looking that and be like what is 38:24that stuff but at the same time I think 38:26there's plenty of plenty of places where 38:28like there's a ton of opportunity 38:30there's like real stuff people are doing 38:31today um but there's also other 38:33interesting things like around like 38:35analysis that I wish people would pay 38:37maybe a little bit more attention to so 38:39in any case Marina Michael thank 38:42you for joining us today um it's great 38:45discussion and we will see you back here 38:47next time on mixture of experts thanks 38:49all thanks so much for