Learning Library

← Back to Library

Open Source AI, RAG, and KANs

Key Points

  • The “Mixure Experts” podcast brings together AI researchers, product leaders, engineers, and policy experts each week to dissect the biggest AI news, starting with three focus topics: open‑source model trends, the future of Retrieval‑Augmented Generation (RAG), and the hype around KAN (Kolmogorov‑Arnold Network) models.
  • Recent open‑source breakthroughs were highlighted, including Meta’s Llama 3, Apple’s on‑device model release, and IBM’s new Granite family, underscoring a rapid expansion of publicly available, high‑capacity AI models.
  • IBM’s Granite models (3 B, 8 B, 20 B, and 34 B parameters) were announced as open source, trained on 116 programming languages, and positioned for enterprise use with capabilities that go beyond typical Python‑centric code generation.
  • The panel previewed the next evolution of RAG, discussing how retrieval‑augmented techniques have matured and what breakthroughs or challenges may define their future impact on AI applications.
  • KAN networks were introduced as the latest buzzword, with the experts weighing their theoretical promise, current hype, and whether organizations should invest in the technology now or wait for further validation.

Full Transcript

# Open Source AI, RAG, and KANs **Source:** [https://www.youtube.com/watch?v=K83tTEeGCBc](https://www.youtube.com/watch?v=K83tTEeGCBc) **Duration:** 00:46:18 ## Summary - The “Mixure Experts” podcast brings together AI researchers, product leaders, engineers, and policy experts each week to dissect the biggest AI news, starting with three focus topics: open‑source model trends, the future of Retrieval‑Augmented Generation (RAG), and the hype around KAN (Kolmogorov‑Arnold Network) models. - Recent open‑source breakthroughs were highlighted, including Meta’s Llama 3, Apple’s on‑device model release, and IBM’s new Granite family, underscoring a rapid expansion of publicly available, high‑capacity AI models. - IBM’s Granite models (3 B, 8 B, 20 B, and 34 B parameters) were announced as open source, trained on 116 programming languages, and positioned for enterprise use with capabilities that go beyond typical Python‑centric code generation. - The panel previewed the next evolution of RAG, discussing how retrieval‑augmented techniques have matured and what breakthroughs or challenges may define their future impact on AI applications. - KAN networks were introduced as the latest buzzword, with the experts weighing their theoretical promise, current hype, and whether organizations should invest in the technology now or wait for further validation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=K83tTEeGCBc&t=0s) **AI Trends Panel Kickoff** - Tim Hong launches the Mixture of Experts podcast by outlining three AI storylines—open‑source model trends, the future of retrieval‑augmented generation, and the buzz around Kaveri/Arnold networks—and introduces the IBM‑MIT expert panel. ## Full Transcript
0:00[Music] 0:06hello and welcome to mixure experts I'm 0:08your host Tim Hong each week we bring 0:10together a panel of researchers product 0:12leaders Engineers policy experts and 0:14more to discuss debates and distill down 0:17the week's biggest news and Trends in AI 0:19so today on the show three stories first 0:21one the state of the open source uh what 0:23are the biggest Trends in open source 0:25models and how will they shape the 0:26business of AI second the future of 0:28retrieval augmented or rag uh they've 0:31come so far where are they going to go 0:33next and then finally kav Arnold 0:36networks or can what the hell are they 0:39why are all the Nerds suddenly talking 0:40about it and should we buy the hype so 0:42today on the show I'm ay supported by an 0:45incredible panel of experts so uh first 0:48off Marina denki senior research 0:50scientist at IBM Marina thanks for 0:52joining good to be here yeah and 0:55particularly thanks to you for joining 0:57us so early Pacific Time David Cox uh VP 1:00models and director of the MIT IBM 1:02Watson lab David thanks for joining the 1:04show pleasure to be 1:06here and uh returning for the second 1:09episode we were joking that we just made 1:10make this a Kush vars show going 1:13forwards uh he's unfortunately declined 1:15that but Kush Varney IBM fellow working 1:17on issues surrounding AI governance Kush 1:19welcome back it's great to be here and 1:21uh yeah I'm the VY with the hair um so 1:25yeah we'll use that as a little 1:27pneumonic so 1:33well great so let's start with the first 1:35story that we want to cover today on 1:36mixture of experts um so I think from 1:39where I'm sitting uh you know there has 1:42been just so much happening in the world 1:44of Open Source right So Meta of course 1:46released llama 3 a few weeks back um 1:49Apple in a very you know big move for 1:51them I think released the open um uh Elm 1:54on device models and then IBM just 1:57recently released its Granite family of 1:58models uh and so David I kind of want to 2:01give you a chance to kind of first plug 2:03Granite tell us what it is and what you 2:04guys have been working on um and then I 2:06kind of want you to kind of go into you 2:08know why it is that IBM decided to 2:10release Granite open source and why it 2:12thinks that doing this matters and I 2:14think from there I think we can talk 2:15more broadly about what's happening in 2:16open source but I wanted to give you a 2:18shot to talk a little bit about the work 2:19that you and the team have been doing 2:21sure yeah happy to um we actually had 2:24two major open source announcements this 2:26was a big week for us across IBM and red 2:28hat uh the first first uh was that we 2:31open sourced the granite code family of 2:33models so these are models in a a 2:36variety of sizes 3 8 20 and 34 billion 2:40parameters trained on 116 programming 2:43languages these are um you know 2:46state-of-the-art models competitive with 2:48you know the best in in the field and 2:51one of the areas that we really 2:52optimized for for Enterprise users 2:54because ultimately IBM is interested in 2:56supporting Enterprise is that allaround 2:59capability you know not just Python and 3:01gener code generation which is often the 3:03focus uh for the academic Community but 3:06also Java and rust and all kinds of 3:08other languages and also things like uh 3:11code fixing and explaining um so there's 3:14a lot of things you can do with code 3:16models and it's really being integrated 3:18into the software development you know 3:20fabric of how we do software and 3:21software is integrated into the fabric 3:23of everything we do in society and we we 3:25wanted to release these because we you 3:27know ultimately our position is that 3:29that open winds you know like we're 3:31communities will build around these 3:33models people will build things um that 3:35you know that we wouldn't expect they'll 3:37they'll be able to extend the models and 3:39that's that's super powerful and and 3:41that leads a little bit to the second um 3:43announcement which which happened 3:45through red hat so we have a technology 3:46that we developed for doing alignment of 3:49models uh we call large scale alignment 3:51for chat Bots and that gave rise to a 3:54project called instruct lab and what 3:56instruct lab is is a way to actually 3:59aggregate Community contributions to the 4:02instruction tuning of a model so now uh 4:05any developer anywhere in the world can 4:08submit new skills and new knowledge to a 4:11model and and then that actually gets 4:14integrated and then we do a weekly build 4:16of that model so it's a different cycle 4:19of development a different kind of 4:20community forming where we're not just 4:22forming around a model and you know 4:24building inference tools and things like 4:26that but we're actually able to merge 4:29all those contributions and then update 4:31the model every week uh so we're really 4:33excited about this it's been a fantastic 4:35partnership with red hat building this 4:37out they know open source better than 4:40than anyone and uh we're really excited 4:42this got announced uh at at Summit on on 4:45Tuesday by Matt hix the CEO of Red Hat 4:47yeah that's awesome and I think I don't 4:49know if You' agree with this is like I 4:50see what IBM has done with granite and 4:53with instruct lab and it's kind of like 4:55you I was joking with a friend the other 4:56day I was like it's open source putting 4:58its big boy pants on right like they're 5:00kind of like moving into like open 5:02source being something that like 5:03Enterprises will actually use um and I 5:07think that's really changing what we 5:08mean by open source right like I think 5:10like the big Trend even a few months ago 5:12was like oh my God these open source 5:14models are just getting so big right 5:16like they're huge parameter models and 5:17like isn't that the exciting thing is 5:19that open source models will be you know 5:21on par as sophisticated as like 5:23state-of-the-art but what's kind of 5:24interesting here I think is really sort 5:27of like twofold right like I think like 5:29what's interesting with granite is you 5:30guys are releasing a class of models of 5:32different sizes sort of on the idea that 5:34like not everybody's going to need like 5:35the the chunkiest model in the whole 5:38world uh which I think is really really 5:40interesting um and then I think again 5:42it's also kind of like on par with what 5:44we see out of the Apple announcement the 5:46open Elm announcement right which is 5:47like these are not the biggest models 5:49but they're on device models right and 5:51it kind of feels like I don't know if 5:52you'd agree with this David is like it 5:53feels like open source is finally now 5:55kind of like responding to Market need 5:57like then in some ways like Enterprises 5:58are like how do we actually apply this 6:00stuff and now like essentially open 6:01source Community is like trying to now 6:03you know adapt to actually provide 6:04solutions to that but I don't know if 6:06that's like a characterization you guys 6:07would agree with yeah yeah no and and I 6:09think I think you're spoton you know 6:11there isn't just one thing that people 6:12want to do with llms so there's not 6:14going to be just one llm that wins the 6:16day um you know we have our models 6:19running on laptops and there are that's 6:22they're really 6:24interesting uh you know advantages to 6:26doing that like if you if you want to be 6:27on Prem you don't want to be data over 6:30the network it's it's uh you know 6:32there's you know it's proprietary and 6:34you're worried about IP you can run 6:36these models in many cases on on your 6:37laptop for other applications you want 6:40the very best performance you know say 6:41you're doing an application 6:42modernization that's just going to 6:43happen once um and it needs to be you 6:46know the highest quality then you can 6:47move to one of the larger models um so 6:50we we really are trying to be responsive 6:52across the spectrum of of different 6:54needs and yeah at IBM we we are trying 6:57to be sort of I think you said the the 6:59big boy pants you know like we're very 7:02transparent about what data we put in 7:03the models which is which is not always 7:05true but which is very important if 7:07you're an Enterprise want to use these 7:09uh the other point of differentiation 7:11for our models is we release them under 7:13Apache 2 license just a clean uh pachy 7:16to no additional restrictions and this 7:19can be really important for for adoption 7:21we knew this is something that our 7:22customers uh would ultimately need and 7:24want so um that that that's you know how 7:27we're evolving uh sort of the our 7:29approach um to open source and and again 7:32yeah like you said meeting the customer 7:34needs yeah definitely and chis Marine 7:37I'm not sure if you've got views on this 7:38is like I think you know I think we can 7:40use this as a springboard to kind of 7:41talk about like how this is going to 7:42sort of shape the the market as a whole 7:44right because I think you know if I'm 7:46now you like an Enterprise sort of 7:48thinking about how to integrate llms 7:50right it feels like there's increasingly 7:52options right well we can you know go 7:54work with we can try to do it at all 7:56ourselves in house right like we can try 7:58to go with like the big proprietary 8:00models right um and it kind of also 8:03feels like there's going to be a range 8:04of new businesses that emerge here as 8:06well like just like the whole business 8:08of like you come to us with a problem 8:10and we fine-tune open source models for 8:11you seems like it'll increasingly become 8:13a big part of the ecosystem but um yeah 8:16I'm kind of curious as from from your 8:17point of view kind of like in you know 8:19the the kind of research space and even 8:21thinking about like you know where this 8:24all goes just if you've got views on on 8:25how this will kind of impact the 8:26ecosystem as a 8:28whole yeah I mean uh one great thing 8:31that uh I mean instruct lab enables is 8:33really I mean shifting power to Value 8:36creators so um it uh really allows uh I 8:39mean as David said I mean this whole 8:41Community to uh to really congeal around 8:44this thing um and uh make these models 8:46authentic for themselves it's some sort 8:48of uh commitment to locality as well I 8:50mean for whatever you need uh for your 8:53Enterprise for your organization you can 8:55um uh really make things uh make things 8:57yours so I think it's uh it's an awesome 9:00awesome thing yeah I really appreciate 9:03being able to add the skills as you find 9:05them needed for your own use case so the 9:07thing with all of these models is that 9:09it's very hard to predict when you put 9:10them out what are they actually going to 9:12be used for so being able to have the 9:14flexibility to say oh I've realized I 9:16have a use case I need to adapt quickly 9:18I need to make the model adapt quickly 9:20sometimes with something that's 9:21proprietary or somewhere else you just 9:23don't have the ability to move that 9:24quickly or even to stress test or check 9:27and is this going somewhere or is this 9:28not going to be helpful at all so from 9:30that perspective actually the way that 9:32uh We've released instru lab is very is 9:34very good it's very effective for 9:36checking these cases it it's very rare 9:39that um any given company or Enterprises 9:42needs would be represented would be a 9:45top of mind for the developer of of Any 9:47Given base Foundation model like does 9:50does meta you know care about an 9:51insurance company well you know they 9:53probably do but not it's not their not 9:55their primary uh con thinking about 9:59just being like I wonder what AIG thinks 10:01about this exactly exactly so so having 10:04a base that's built for Enterprise but 10:06then giving the ability to customize and 10:09really focus and and bring in you know 10:11knowledge and and and particular things 10:13you want to do that are specific to that 10:14industry uh can be really powerful can I 10:17so we have a few more minutes on this 10:18topic can I play Jerk for a second right 10:20because I do think that like you know 10:22one of the most interesting things about 10:24open source is that early on you know if 10:26you were if you were a government right 10:28or someone worried about AI ethics or AI 10:30safety right you basically say well the 10:33rise of these few leading companies with 10:35proprietary models is like really good 10:37for us right because we only have to go 10:39to a few companies and change their 10:40policies in order to sort of secure the 10:42ecosystem right and I think you might 10:45say well one of the issues of these 10:47increasing proliferation of Open Source 10:49models right and the fact that 10:50everybody's kind of going to be running 10:51their models on premises right is that 10:54there's a lot more room for people to 10:56misuse these models um and also like you 11:00might think that also they create all of 11:01these supply chain security issues as 11:03well like I'm kind of thinking about how 11:04like uh mpm right like other instances 11:06in which open source is really taken off 11:09um you know security ends up being this 11:10really big problem because like the 11:11provenance of any particular component 11:13is really difficult and your stack might 11:15rely on you know hundreds of Open Source 11:17components and I guess I'm kind of 11:19curious I mean I don't think anyone's 11:20got a good solution to this and and look 11:22I came up as like a free software 11:23advocate so like I I'm on I'm on the 11:25side of what's going on here but I'd 11:28love the kind of panelist of like you 11:30know offer an opinion about that like do 11:31you buy that those are risks I don't 11:33know if there's kind of smart Solutions 11:34you guys are thinking about just to kind 11:36of wrestle with that a little bit I 11:37think is one of the most interesting 11:38parts of this development yeah one thing 11:40just to start off um on the security 11:42issue um history has proven in open 11:46source software that open source 11:47ultimately ends up being safer not less 11:50safe their efforts for instance to 11:52create you know private versions of the 11:54Linux kernel and it it turns out it's 11:56just hard to keep those safe because 11:57more eyes mean uh you know more more 12:00sort of you know uh people who can find 12:03uh you know problems understand problems 12:05and and fix them um so I think having 12:08that transparency enabling the academic 12:10Community to get involved to build 12:11Solutions uh for many problems that we 12:13may face I think is super important I 12:16will also say we're very careful about 12:18what we um what we release I mean we're 12:21we're we're very careful about what data 12:22goes into these models uh before we 12:24release them ensuring that they're you 12:26know minimizing the risks uh any po 12:29risks around you know you know 12:32potentially dangerous you know 12:34activities where we're not releasing 12:35models that we think are could be used 12:37for for for ill intent of course not 12:39yeah and I think uh I mean I do think 12:41that there's going to be a need almost 12:42for like a consumer reports or a wire 12:44cutter for these models at some point 12:46where it's basically like there's going 12:47to be so many models out there that it's 12:48going to literally be like well we had a 12:50couple experts spend like a few hours 12:51really testing this thing you know and 12:53this is like an important part of the 12:55the ecosystem Kush it looks like you 12:56might want to get in yeah I mean uh we 12:59actually do work on exactly that the 13:01consumer report sort of idea so we call 13:03it uh AI fact sheets um and model risk 13:06assessment and uh it is uh exactly a way 13:10to uh to analyze uh these different 13:13models that are out there um give them 13:14different scores along different 13:16dimensions um and as a consumer um you 13:19can I mean really look at different 13:21vendors different sort of options and uh 13:23get a good sense of uh of what's 13:25available so this is actually um 13:27something already available through 13:29through Watson x. governance one of our 13:31Flagship products yeah I imagine it's 13:33come some kind of future when I quit my 13:35job as a podcast host to be a like a 13:37model Sali you it's just like have you 13:40considered like this this model for your 13:41use case fine vintage that's awesome 13:44yeah a fine vintage yeah exactly right 13:46yeah really good oky overtones on this 13:482024 was a good year for llms yeah 13:51exactly um Mar any final thoughts before 13:53we move to the next topic here yeah I 13:56would say that it's uh still very early 13:58days also with this technology and 13:59everything that we're going into so 14:00especially as scientists we would like 14:02to try not to have the hubris of 14:04thinking yeah we've got this you know 14:06leave it with us we've we've sorted out 14:07the rest of this there's been so many 14:09interesting developments and surprises 14:11in this technology in the last few years 14:13and we we think that will continue to be 14:15for sure in that sense open source is 14:17actually going to be more efficient even 14:19from a market standpoint more eyes means 14:21more ideas means more places that this 14:23is going to develop in unexpected and 14:25interesting ways so it's actually even I 14:27think more efficient besides whatever 14:29thoughts we may have about the morality 14:31of it as well yeah no for sure and again 14:33I'm kind of arguing against myself 14:34because like I'm very Pro open source um 14:37I think it's just like a very 14:37interesting kind of set of 14:38considerations as like the whole 14:40architecture of the industry sort of 14:42shapes uh and 14:44[Music] 14:47changes well this is great so let's move 14:49to the second topic today I really want 14:51to talk about retrieval augmented 14:53generation or rag um so if you're not 14:56familiar with this rag is uh basically 14:59one of the hotness uh in in in AI um if 15:02you look at the papers that I clear this 15:04year or ACL um there are a lot of papers 15:07using rag methods um and you know I 15:11guess Marina I you keep me honest here I 15:12mean I think one of the reasons that it 15:14has been so prolific and of so much 15:17interest is that rag seems to kind of 15:19open a window for solving a lot of the 15:21models the problems that we have with 15:23language models right like well we can't 15:25train these models pre-train these 15:27models all the time but if they're 15:28really good at pulling data from 15:30elsewhere um you know this is a good way 15:32of keeping their responses up to date um 15:34it's a good way of ensuring that they're 15:35you know more factual potentially um and 15:38um and so I I'm curious because I know 15:40your group recently released a paper um 15:42thinking about and using Rag and so 15:44maybe as a springboard for the 15:45conversation I don't know if you want to 15:45quickly talk about that and then we can 15:47kind of more generally talk about you 15:49know I guess from your point of view 15:50what you see as sort of the existing 15:52limitations of rag and what are the Big 15:53Technical problems that need to be 15:54solved sure that sounds great so um the 15:58paper that you refer to it's a 16:00description of a methodology and a 16:02system for trying to evaluate it more 16:04deeply again the point of rag is it's 16:06one to be able to have a conversation 16:09with an llm in which you ask it to write 16:10a hiu about frogs they're great at that 16:13no problem we he we live at business use 16:16cases and so it's very important that 16:18when you have business use cases that 16:19rely on factual information and it's 16:21really a problem if you get things wrong 16:23this is where you get into rag like you 16:25said being able to point to a reference 16:27of all right the reason I'm giving you 16:28this answer is because this is the 16:30content that I am relying on whether 16:31it's informational or it comes from a 16:33knowledge base whatever then you want to 16:35actually go and double check is this 16:36going to act the way that I expect it to 16:38act and it's one thing again to uh test 16:42these llm models against large 16:44benchmarks there was some good comments 16:45last week about benchmarks and the use 16:47you know usefulness of them as time goes 16:49on it's another thing to actually see 16:51what happens in a customer's use case 16:53this is an old data analysis uh 16:56necessity you have to go into okay what 16:58when to the test cases that you've 17:00created your testing where did your data 17:02come from what are the documents how 17:04have you managed to without knowing it 17:06introduce biases into the evaluation 17:08that you're doing because of the way 17:09your annotations are done because of the 17:11way you defined your metrics because 17:13people have different understandings of 17:14what is acceptable what is not you have 17:17over uh corrected for a particular query 17:20type you have cor over corrected for a 17:21particular way of responding this is all 17:24uh analysis that you need to do to have 17:27confidence in the solution you put out 17:29that includes an llm but is not just the 17:31llm by itself it's the llm as a part of 17:34a solution and so that's something that 17:35my group is does a lot I know kush's 17:37group does that a lot as well is diving 17:40into the details of that especially how 17:41we take our our customers through 17:44getting confidence and what does it mean 17:45to to deploy their llm and our system 17:47has a fun for those of us from the 90s 17:50we called inspector raggot yeah 17:52Inspector 17:53Gadget um and it really is a a way to to 17:57make sure that you can take yourself 17:58through that analysis and and feel 17:59confidence in what you're getting not 18:01just the Agate number yeah it's funny 18:03about the 90s I was in a class that a 18:04friend was teaching today or earlier 18:06this week and one of the kids was like I 18:08hear back in the day there's this thing 18:09called geoc cities I hear it was really 18:11cool or something like that I was like 18:12oh my God I gotta get out of here um 18:17yeah so there there's there's so much to 18:18go into there and I think there's kind 18:19of like maybe two topics we could dive 18:21into you know I think the first 18:23submarine I'd love to get your thoughts 18:24on is I think one really great theme I 18:26think that came up from last week's 18:28episode episode was kind of the idea 18:30that almost AI is in this kind of weird 18:32period of like Benchmark bankruptcy 18:34where like essentially there's like all 18:36of these B benchmarks that no one cares 18:37about and then the benchmarks that do 18:39people do care about are like so 18:41thoroughly gamed that they basically 18:43provide no valid information anymore and 18:46like one outcome that I think schit was 18:47saying on on the uh on the episode was 18:50like well that's one of the reasons why 18:51like the solution now is like just talk 18:53to the model for 15 minutes and then you 18:54figure out whether or not it's good or 18:55not and it strikes me that like I don't 18:58know if you put inspector ragit in kind 19:01of this context is like it seems like 19:02there's also a switch from like from 19:04benchmarks to like monitoring as the way 19:07that we really assess whether or not 19:08models are high quality I don't know if 19:10you'd buy that because I as I take kind 19:12of your group's work is an attempt to 19:14say okay well we're not going to really 19:15you know benchmarks are a useful guide 19:17but really in practice what most people 19:19want is to see like lots and lots of 19:21telemetry about their models and like 19:24that's how we approach this problem um 19:25but kind of curious to get your response 19:27on that like do you buy the idea that AI 19:28is is in a benchmark bankruptcy and do 19:30you see kind of ragged as sort of a 19:32solution to that or an answer to that 19:35yeah I think you should think of 19:36benchmarks is something that you should 19:37iterate on rapidly and evolve now the 19:40problem with talk to the model for 15 19:42minutes and just get it Vibes uh kind of 19:45feel of it is uh people are not very 19:48good at coming up with what is the right 19:50thing to talk about it to for 15 minutes 19:52consistently they are not very good they 19:55themselves will uh only think of 19:56whatever came into their head whatever 19:58they were talking about to their 19:59customer last week and they will 20:00introduce like I said a really a lot of 20:02biases and what they thought of then you 20:04end up being very nastily surprised when 20:06you actually go ahead and deploy your 20:07model and they're like well that didn't 20:09work but I talked to it for 15 minutes 20:11it seemed fine you wouldn't also um you 20:14know deploy a representative to a 20:16customer after talking to them 50 20:17minutes and thinking that seems fine so 20:20realistically what you actually want and 20:22what I hope the point is of approaches 20:24like inspector aot is constant evolving 20:27benchmarks yeah talk to it for 15 20:28minutes and then go check yourself hey 20:30what data did you end up actually 20:31putting in what kind of questions did 20:33you end up putting in do you realize 20:34that you didn't do the right Vibe check 20:36do that a few times your Vibe check 20:38becomes then into something systematic 20:40but it's something that is that is 20:42iterative that is interactive rather 20:44than some academic somewhere put out a 20:45benchmark I don't know what this has to 20:47do with my data make your own make it 20:49iterative and constantly you know check 20:51yourself for what you're doing is 20:53actually proper quality that ends up 20:55being really the right thing to do so 20:57move yourself from that um you know 20:59shout out to Daniel Conan from that 21:00system one thinking to the system two 21:02thinking then you're going to have 21:04confidence in what you're actually 21:05deploying yeah this is a I think it's so 21:07interesting because I think this is what 21:09you're describing is going to be an 21:10enormous need across like every company 21:12that attempts to uh adopt this stuff and 21:15you know I was joking earlier about 21:16being a model somaler like I think my 21:18other business proposal is like you're 21:20an eval atellier where you're basically 21:22like we help to craft finely crafted 21:25evals for like what you need and kind of 21:28what you're talking about because like 21:28the art of creating a good Benchmark and 21:30evolving that benar describing a lot of 21:32our jobs actually here at IBM is what 21:34that's literally what we're doing for 21:35our 21:36cellier so um David Kush I don't know if 21:39you got responses you want to jump in um 21:41maybe I can be a little bit 21:43controversial um so yeah um I mean 21:47people talk about rag being a solution 21:49for hallucination for lack of factuality 21:51lack of faithfulness lack of 21:53groundedness these sort of things but um 21:55to me I mean it's part of the solution 21:58but uh I don't think it's the full 21:59solution because even when you get the 22:01retrieve documents there's a model in 22:03between and it can ignore those 22:05documents it can get confused by them it 22:08can uh I mean just hallucinate anyways I 22:11mean all sorts of things so um uh to me 22:15I mean what Marina is talking about is 22:18very important not just I mean over time 22:22but uh like as part of the the the 22:25process initially as well or in runtime 22:27in uh in Fr time so mean checking for 22:30hallucination separately um uh thinking 22:33about can we Trace back the information 22:35where did it come from in those 22:37documents uh can we even come up with 22:39new architectures that uh uh don't 22:42hallucinate uh By Design in some ways so 22:45I think there's rag gets a lot of play 22:48right now but uh I think it's a stepping 22:50stone um I don't think it's the the end 22:53of the journey I actually completely 22:55agree with you fully agree with you rag 22:57has not fixed it it's just an additional 23:00step in the direction I completely agree 23:01with you interesting so do you think in 23:03like I don't know it's always tough to 23:05predict on these things like in two 23:06years we'll be talking about rag I think 23:08we will because um rag is like I mean 23:11it's search right and a lot of the 23:13companies who are in the cell LM game 23:16are search companies at the end of the 23:17day so um I think it'll stick around 23:20it'll uh have a lot to to to do but uh 23:24yeah I mean I think for Enterprise use 23:26cases um maybe it'll uh get 23:28a little bit less emphasis maybe not I 23:31don't know well I mean for freshness of 23:33data some kind of retrieval can be 23:35helpful like you just added it to the 23:37database you can retrieve it immediately 23:39so there there are more than one problem 23:41that rag solves and and I agree with uh 23:43with Christian Marina that hallucination 23:46that's that's a SE it you know it helps 23:48a little but like it's a separate 23:49problem that we need to address lots of 23:52different ways um but but the ability to 23:56access new information the ability to 23:58customize quickly I mean we're starting 24:00to get uh I think layers of technology 24:02that allow us to to address that 24:03instruct lab is one of them if you 24:04wanted to ingest knowledge into the LM 24:07and build it into your sort of your into 24:10the llm itself you can do that but you 24:12probably still want to be retrieving 24:14things and there's going to be a balance 24:15and and we're going to figure that 24:16balance out I think over the next uh 24:18next couple years yeah as itol yeah I 24:20think it's definitely like the much more 24:22realistic pathway that I've heard right 24:24like I think like the other Alternatives 24:26I've heard are like well at some point 24:28the model will become so big and know 24:30everything and then we'll be able to 24:31pre-train it frequently enough I'm just 24:33like I really how many h100 so you're G 24:36to buy to pull this up you know it's 24:38just like is not within the realm of 24:39possibility so yeah and it's not a 24:42concept go ahead David sorry I was just 24:44going to say and not every company's 24:46going to give their data over to open AI 24:48to let them you know their proprietary 24:50data it's it's a real problem yeah all I 24:51was going to say is I mean like these 24:53sort of ideas like having multiple 24:55levels and layers I mean it's part of 24:57Computer Engineering I mean cash in like 24:59different types of locality I mean this 25:01is all like uh very much the sort of 25:04thinking that uh computer people have 25:07had so it just needs to come into to 25:09this too yeah orchestration right and 25:11making sure that there's routing 25:13involved there's decisions that evolves 25:14there's different checks there's 25:16different guards that's not going to go 25:18away I don't care how long you've 25:20trained the L that's not going to be 25:22fixed yeah for sure so we have a few 25:25minutes left on this topic I think the 25:27last area that I want to kind of push us 25:28into is I think Marina you had kind of a 25:30really sort of interesting comment when 25:32you're explaining inspector ragit which 25:34is basically this kind of feature of 25:36trust right essentially like what does a 25:39user need to be shown to trust the model 25:42um and I think what I love about that 25:43topic is that in some ways it's it's 25:46pushes you into the realm away from like 25:48like it turns out people trust models 25:50regardless whether or not they're a huge 25:51parameter model or tiny model and like 25:53you know I heard this great anecdote 25:55where um this mle was telling me this 25:57story like we were doing an eval where 25:59there's these side by sides and what we 26:01discovered is that the users that we 26:03were testing against just felt that 26:05longer outputs were more credible and 26:07trustworthy regardless of any content 26:09they included right and I was like oh 26:10that makes sense because like you're 26:11saying like do 500 of these tasks in the 26:14next hour and so they're just using 26:16these visual heuristics to evaluate text 26:18and one visual heuristic that we use to 26:20tell whether or not something is more 26:21substantive is like is it long and look 26:23dense um and I think this is like such 26:25an interesting thing because if you go 26:27down that route I mean I guess it's a 26:28prescription for madness because you're 26:30basically like well does font Choice 26:32influence how people think like how 26:34trustworthy their models are yeah 26:36exactly and so I guess Marina I'd love 26:37for you to kind of like Riff on that a 26:38little bit is like you know how far does 26:40this Rabbit Hole go like once you move 26:41away from benchmarks and you say we're 26:43going to give you a dashboard of 26:44different things you know you're now 26:46kind of in almost like the theater of 26:48trust like what do we need to show you 26:50and what is the metric that drives the 26:51most trust with the user and is that 26:53trust Justified or not and just would 26:55love to get your thoughts on that as 26:55someone who's working in the space 26:57there's some interesting psychology here 26:59so we know that people get extremely mad 27:02at uh computers when they make one 27:03mistake but they're much more okay if 27:05you were told that it was a human so 27:06that's an interesting psychology there's 27:08another fact that these models are 27:10fabulous snake oil salesmen they will 27:13tell you something that you will read it 27:14and you'll believe it even if in the 27:16back of your mind you're like wait isn't 27:17that not what I thought that was but it 27:20will sound so convincing and so accurate 27:23that you like oh yeah that's that that's 27:24the right answer I have no further 27:26questions they're very good at that so 27:27in that sense actually even human 27:29evaluation is very challenging people 27:31are bad at catching these kinds of 27:33things on the other hand you find 27:35yourself inprise situation and that's 27:37risky if you really did give incorrect 27:39information that is very risky it's 27:41again it's a good reason that you can't 27:42really deploy these models by themselves 27:44with with no support but I think there's 27:46a lot of psychology actually in setting 27:48expectation just in the same way that 27:50when we first had Wikipedia in the world 27:52when we first had Google search and 27:54people thought oh well if it's on the 27:55Internet it's true and then people 27:57learned and I think that over time it's 27:59going to be the same thing where you're 28:00going to learn what kind of things 28:02really need to be what is the right way 28:04to interact with these models what is 28:05the safe way what is the consumer report 28:07sort of uh appropriate way so some of 28:09this is technology a lot of this is 28:11people a lot of this is people 28:13psychology I cannot give you enough data 28:15points and tell you I will never ever 28:17ever make a mistake in this model it's 28:19not possible so we're going to have to 28:21figure out how it is to set people's 28:23expectations if people are allowed to 28:25sometimes make mistakes and you ask for 28:27for a clarification how do we get to 28:28that state of the world also with the 28:30use of the 28:32technology yeah for sure yeah I think 28:35and and this sort of pushes into I think 28:37a sort of interesting direction is like 28:39under certain conditions I'll just throw 28:41out the hot take is like under certain 28:43conditions basically optimally safe 28:45performance of the model is not 28:46necessarily 28:48optimally uh easy to use I guess right 28:51so like that is to say like a perfectly 28:53articulate model may actually signal 28:55more trustworthiness than is warranted 28:58so there actually may be weird 28:59situations where there's kind of this 29:00trade-off which is like we actually 29:01wanted to perform worse because it 29:03inspires uh a an optimal level of doubt 29:06in the user right would be sort of the 29:08theory that I'm I guess I'm arguing well 29:11short of you don't have to have a 29:12perform if it just could just express 29:14its own uncertainty that would already 29:16be a big a big Improvement because 29:18people really aren't great at um 29:21dissociating things like 29:23fluency and you know convincingness of 29:26of of sort of discourse with actual 29:29truth and particularly in contexts that 29:31enterprises are working in with rag 29:33where you're you know taking Enterprise 29:34documents HR documents and policies and 29:38you have to be correct about them unless 29:40the person who's Vibe checking that 29:41model really understands those policies 29:44in perfect detail it's it's very tricky 29:46to evaluate and people tend to be fooled 29:47very easily Chris you want to jump in I 29:49see you kind of um no I mean I think the 29:52word Marina used earlier humility um is 29:54the key I mean the AI needs to be humble 29:57um and we need to be humble as well so I 30:00think that that combination is is the 30:03the right way to go yeah for sure um 30:06yeah and I think it's part of the 30:07problem too you know I was talking with 30:08my friend who kind of relayed this story 30:10as well is basically that um people have 30:12all these expectations that are built up 30:14around computers which makes this 30:16particularly difficult and like the 30:17language model behaves in such a 30:19fundamentally different way that it's 30:20like violating our expectations where 30:22you're like it's good at poetry but bad 30:24at math like that literally flips 30:26everything we have built up in terms of 30:28intuitions about computers for like the 30:29last 20 years and so like the adage I've 30:31been using is like everything you think 30:33computers are good at like llms are bad 30:35at everything that LM are bad at like 30:37you know you know computers are good at 30:38and there's kind of this weird mismatch 30:40that we're sort of navigating at the 30:42[Music] 30:46moment all right so Kush uh you and I 30:49are going to kick this topic off um this 30:51is going to be the big challenge of the 30:52episode um which is if you've been 30:54watching the more nerdy channels of the 30:57AI discourse online people have been 31:00very recently excited by something 31:02called uh the kog gav Arnold 31:04representation theorem which is giving 31:07rise to a paper that proposes a kind of 31:09kog gav Arnold Network or can for short 31:14and um it's very difficult to tell from 31:16the outside I think if you're not a 31:18technical person as to like what it is 31:20and why it's exciting which worries me 31:22because it has all of the indications of 31:24being like Oh do you do you use 31:26blockchain cuz like blockchain will 31:27solve this right like I think we like 31:29rapidly go down that direction so what I 31:31kind of want to do over the next um few 31:34minutes as we close out this episode is 31:36to basically give the clearest easiest 31:39to understand explanation of cans that 31:41anyone has yet articulated on the 31:43internet and we're going to do this 31:45together right okay um so Kush no 31:47pressure on this no pressure um so I 31:50think Kush maybe the best place to start 31:52reading some of the papers is can you 31:55give kind of like a quick explanation of 31:58why models large language models 32:02approximate functions what does that 32:04mean exactly yeah that's precisely the 32:08right place to start so um yeah uh I 32:11mean so a mathematical function uh it's 32:15looking in some space right um so in 32:18middle school or high school we saw 32:20these one-dimensional functions um uh if 32:23our data was just onedimensional what 32:26that uh function is trying to do is fit 32:29that data and by that uh use the 32:32function rather than the data to predict 32:34the next thing um so by doing so we're 32:38actually able to uh to So-Cal generalize 32:41so data tells us the pattern the 32:43function describes the pattern so that 32:45the next time we want to um make a 32:47prediction we use the function instead 32:49of the the past data so I think that's a 32:52very key point and then we can go into 32:55what those functions are how to 32:56represent those functions and how to 32:58compute those functions but uh yeah 33:00that's the the starting point right and 33:01a prediction here just to make it very 33:03simple is like um uh tall people that's 33:08one variable tend to be heavier is that 33:11right like for example that's that's a 33:12prediction that you could build a 33:13function around yeah exactly and you can 33:15be very quantitative about that so if uh 33:17I'm 6 feet tall maybe that predicts that 33:21I'm 180 lbs or something like that so um 33:24yeah so then I think the next step we're 33:25going to take the step by step we think 33:27through this problem step by step right 33:28uh is basically 33:31um 33:32so as I take it right like one of the 33:35things that machine learning has really 33:37done better than kind of traditional 33:39models of AI or traditional models of 33:41you know computer programming is that 33:44we've we frequently tried to kind of 33:45like hand draft all of these rules right 33:49so like you want to write an algorithm 33:50to divide you know pictures of cats from 33:52pictures of dogs you would do feature 33:54engineering right you get a bunch of 33:55people together to kind of like write 33:56these equations these functions out 33:58right and I guess is it right to say 34:00that machine learning what it's been 34:02really good at doing is like coming up 34:04with these functions on its own right 34:05like to basically come up with those 34:07rules on its own to do this prediction 34:09yeah I think that's a good way to put it 34:12I'll be a little bit more specific 34:13though um so we as humans um the 34:16algorithm designers Etc um have been the 34:19ones who come up with what the functions 34:21are um the functions have parameters and 34:24it's the learning algorithm that's 34:26figuring out the parameters to best uh 34:28best fit the data but at some level um 34:32we the the computer scientist the folks 34:35are the ones who decided what were the 34:38functions uh in this library or this 34:40universe of of possible functions and 34:42then let the uh the algorithm figure out 34:45the uh uh the Nuance the the parameters 34:47and so forth yeah for sure and so what 34:51we what we do hear I guess right now in 34:53the world of AI is multi-layer 34:56perceptrons right which is like this 34:58very particular way of implementing AI 35:01that is doing the approximation of these 35:03complex functions on its own and can 35:05solve all of these magical things right 35:08like it can you know have conversations 35:10with you it can sort pictures of cats 35:11from dogs whatever whatever you want do 35:14you talk a little bit about maybe like 35:15the trade-offs of that like what what do 35:16we need to do in order to achieve that 35:18magic right like um yeah yeah so um 35:22there used to be this uh car dealership 35:24that had a commercial one in the 9s 35:26again um so they used to dating all of 35:29ourselves 35:30here I'm yeah so they used to say um 35:34stack andum deep salinum cheap uh in 35:36terms of their cars um so uh so with 35:40these uh the multi-layer perceptrons or 35:42the feed forward neural networks um uh 35:45the trend has been just uh there's this 35:47one kind of these layers what they do is 35:50uh they multiply some inputs buy some 35:52weights um add them up and then apply a 35:55nonlinearity um something often which is 35:58called a reu function a a rectified 36:00linear unit which kind of um changes the 36:04output right so you have those you layer 36:06them on top of each other you stack them 36:08really deep um you keep doing this keep 36:10doing this keep doing this um and that 36:13way uh you actually end up with the 36:15ability to um uh end up with almost any 36:18uh sort of nonlinear function uh to 36:20describe the data so uh you talked about 36:23uh this Universal representation or 36:26approximation theorems and stuff so um 36:29uh you can actually prove that through 36:31even not very deep neural networks um uh 36:34you're able to represent any function uh 36:37that that you have in front of you yeah 36:40so that has been seems like we're at the 36:42magic right like this is this is how it 36:44works um I guess one result of that 36:47right is that these models have been 36:49like they're really expensive to make 36:52like you need like a lot of energy and a 36:53lot of chips and a lot of data and a lot 36:55of computing time um and so 36:58Along Comes col gav and Arnold actually 37:01don't know who those people are I just 37:02know them from their representation 37:04theorem you should know korov from a lot 37:07of other things yeah oh he's that one 37:09okay same 37:11gu okay all right 37:13yeah I know that guy uh it's the big 37:16labasi yeah know um so uh tell me about 37:20the representation theorem right what 37:21does it what does it tell us what's 37:23what's the big deal yeah so um like we 37:26just were talking about so when you have 37:29some function that you're trying to 37:30represent um mathematicians have come up 37:33with all sorts of ways to um be able to 37:36decompose that goal of representing this 37:39function I have in front of me in terms 37:41of more primitive uh other functions 37:44right so uh this is I mean we see it uh 37:47as an electrical engineer I see it um 37:49like forier transforms or Foria series 37:51are ways of decomposing a function into 37:54uh SS and cosine functions um the same 37:57way um uh with the uh uh the mul 38:01perceptrons it's into that particular 38:03structure you're decomposing into these 38:05weights and um these nonlinearities so 38:07what uh the kagor of Arnold 38:10representation theorem is about is 38:12decomposing again any function into um 38:15some other functions in this case they 38:17happen to be one-dimensional um so uh 38:20they could be splines or some other 38:22smooth one-dimensional function and by 38:25combining them uh in a particular 38:28uh summation uh you can again uh 38:31represent any multi-dimensional 38:32nonlinear function and that's the proof 38:35um is what uh what korel tells us that 38:38uh in this way of taking 1D functions 38:41you can come up and represent any uh 38:43multi-dimensional function yeah which is 38:45which is pretty wild right because what 38:47you're sort of telling me is basically 38:49that like the machine learning models 38:50that we have now can work this magic 38:52right they basically come up with a 38:54magic mathematical formula that can help 38:56you tell pictures of C from dogs let's 38:58just take that example and then kind of 39:00what you're saying is that we can take 39:01that magic formula and like represent it 39:03like we can break it down to these tiny 39:05tiny tiny Lego blocks right like all the 39:08way down to what we were just talking 39:09about like tall people or heavier like 39:12single variable stuff right and and kind 39:15of the theory and course you keep me 39:16honest you're the one who actually 39:17understands this stuff is like like any 39:20I don't know or like most very complex 39:23uh formulas can can be reducible in this 39:26way I think is what the theorem is 39:27saying is that right exactly okay so 39:29we're there uh the can Network then is a 39:32network that attempts to do that yeah 39:34exactly so um uh so when in the regular 39:37neural network we had um these weights 39:40um on these edges that are multiplying 39:43the inputs here we're applying that 39:44function the splines or the other sort 39:47of wending um uh sort of nonlinear 39:49function and then instead of uh I mean 39:52you add in the same way so the can is 39:54also adding up the inputs but then 39:56there's no additional um nonlinearity 39:59afterwards um like the reu that we see 40:02um in in the normal neur Network so all 40:04the nonlinearity is done um before you 40:07add things up uh so it's just more 40:09complication in one place rather than in 40:12a in a different place and so uh just by 40:15doing that change you can um uh reduce 40:18the number of parameters um because uh 40:21this more complicated thing on the edges 40:24uh is actually uh more able to represent 40:27the these weird various sort of 40:30behaviors so so that's kind of what it's 40:32uh what it's trying to do nice so we're 40:35there deep breath yes two last questions 40:39yes does it matter what what would what 40:41would can models what's what's the 40:43promise of can models if that you can do 40:45this it seems like you've just taken 40:46this complex thing and turn it into a 40:47bunch of Lego blocks so from my one 40:49brain cell standpoint I'm like isn't 40:51that kind of the same thing yeah um so 40:55it it's true I mean you're just shifting 40:57the nonlinearities from one place to a 40:59different place um one thing that the 41:02cans uh are able to do better a little 41:04bit is um more interpretability so um 41:07when you look at those blinds uh they 41:09actually make sense to us um so that uh 41:12height and weight sort of relationship 41:14or any of those other things um uh we 41:16can understand better uh so there's this 41:19interpretability method called shap 41:21which um people have been using for a 41:23while um this is like automatic shap 41:25without having to do shap in a sense um 41:29inability there for some folks who are 41:30not maybe as into the papers is just 41:32simply understanding what the models why 41:34the model is doing what it's doing yeah 41:36exactly exactly yep um and so I mean 41:40that's one advantage um the disadvantage 41:42though is that uh uh our Hardware 41:45infrastructure has not been optimized um 41:47for uh uh for these sort of things for 41:50these blindes and and so forth um 41:52whereas The Matrix Vector computations 41:54for n networks um are um uh kind of uh 41:59like very highly optimized through those 42:01uh h100s and so forth so uh so that's I 42:05think the difference we might if this 42:06catches on I mean develop some hardware 42:08for this type of thing as well um and uh 42:12I mean the last point that I'll make is 42:13these are not new ideas I mean this is 42:15something that's been around uh uh even 42:19our team uh like a couple years ago um 42:21we developed something called Coffer 42:23nuts it uses continued fractions which 42:25are um a third way of representing 42:28functions also it has a approximation or 42:31um Universal approximation theorem 42:32associated with it um uses continued 42:35fractions that have been known since 42:37Antiquity like the ancient Indians and 42:39ancient Greeks knew about all this stuff 42:41so I mean like all this fancy math is 42:43great and um uh it's just different ways 42:46of putting together I mean different uh 42:49of these functions together and then at 42:51the end of the day um uh they all let 42:53you I mean kind of represent uh these 42:56different nonlinear function functions 42:58how you train them uh might be more or 43:00less costly where the interpretability 43:02is might be more or less easy to hard so 43:05uh so I mean it could turn into 43:07something uh it might just be another 43:09option um so we we'll see yeah for sure 43:12yeah it's fascinating and that I think 43:13opens up definitely a direction that I 43:15hadn't really thought of because I think 43:16the main thing I had heard is well you 43:17can make much more energy efficient 43:19models right but um it seems like two 43:21things you're pointing out one of them 43:22is like we might be able to understand 43:24why these models are making the 43:25decisions they do at a much 43:27like closer level of depth than we have 43:29in the past which seems which seems huge 43:32um and then I think the second point is 43:33actually that this is like not new stuff 43:35right like I mean much like neural Nets 43:37themselves this is like we're just like 43:39pulling all this old stuff back again 43:41and being like Oh I guess it works now 43:43you know ultimately well I think that 43:44resonates really well with Chris's point 43:46about the hardware match I mean often 43:48times uh you know we get success in the 43:51field moves not because something is the 43:54mathematically optimal thing but it's 43:56something that can be done 43:58sort of irrational scale with irrational 44:00speed and and as you say deep you know 44:03deep learning was you know kind of a 44:04Rebrand of artificial Minal networks it 44:06was around for decades before it caught 44:08on why did it catch on it's not because 44:10there was some mathematical 44:10breakthroughs because the hardware like 44:12gpus by accident were really good at 44:14doing this and then that just sort of 44:16set us on a path so um obviously all 44:19these new developments are really 44:20exciting and we could build different 44:22Hardware potentially um but you know any 44:25new idea like this is going to compete 44:27against how wonderfully good gpus are at 44:31doing the basic computations needed for 44:33for deep learning um so you know that 44:36there's it's an interesting battle you 44:37know an interesting set of trade-offs 44:38there yeah that I think relationship 44:40between sort of like hardware and what's 44:42happening on the model side I think is 44:44like one of the most interesting aspects 44:45of this and like how long does it take 44:47for a model to influence Hardware you 44:49know are we just locked into Cuda for 44:50the rest of our lives it's like all 44:52these things are like very very 44:53interesting questions um so uh Marina 44:56any last thoughts uh before we we close 44:58up today yeah um even more General 45:02comment continuing what Christian David 45:03said is representations of data are not 45:05created equal so yes it's the same 45:07information but when you change the way 45:09that you represent it you're able to do 45:10things with it that you weren't able to 45:12before so even with something like a 45:14large language model you're representing 45:16data that exists let's say on the 45:18internet but you're representing it in 45:19such a way that you can access it in a 45:21way that you couldn't before same thing 45:23with for example can versus MLP the 45:24representation changes there's going to 45:26be trade-off but it's always very 45:28interesting to try this uh the fact that 45:30we now have more of these options open 45:32to us because the hardware has caught up 45:34to the maap that has been around for 45:36years or decades or centuries yeah that 45:38means try again try again try again see 45:40what what new things will come up data 45:42representation is really one of the 45:44things underlying driving this current 45:47ERA of AI so more work in this direction 45:51is just going to continue to drive 45:52things an interesting place that's great 45:55yeah well I can't think of a better not 45:56to end on um Kush MVP thank you for 46:00coming on the show again um and Marina 46:03David hope to have you on the show again 46:04and uh I hope all of you listeners out 46:06there join us next week for another 46:09episode of mixture of experts thanks 46:12everyone thanks thank you appreciate it