Learning Library

← Back to Library

Multimodal AI for Real-Time Fraud Detection

Key Points

  • Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment.
  • Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score.
  • These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review.
  • An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content.

Full Transcript

# Multimodal AI for Real-Time Fraud Detection **Source:** [https://www.youtube.com/watch?v=Mo7JMC_oDlI](https://www.youtube.com/watch?v=Mo7JMC_oDlI) **Duration:** 00:10:50 ## Summary - Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment. - Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score. - These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review. - An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=0s) **AI-Powered Real-Time Fraud Screening** - Banks use ultra‑fast AI models—traditionally predictive ML on structured transaction features—to decide fraud in under 200 ms, but emerging multimodal AI aims to catch subtle, novel fraud tactics that evade these conventional approaches. - [00:03:11](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=191s) **LLMs vs Predictive Models in Fraud** - The excerpt contrasts encoder LLMs, which read unstructured text like memos to spot scam language and spoofing cues, with traditional predictive ML that relies on fast, low‑cost analysis of structured columns, highlighting each approach's strengths for fraud detection. - [00:10:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=600s) **On-Chip Multimodel AI Fraud Detection** - The speaker describes how on‑chip AI acceleration enables a multimodel architecture that combines traditional machine‑learning with large‑language‑model reasoning to detect fraud in milliseconds directly where the data resides. ## Full Transcript
0:00Every payment transfer or claim has to pass a  single question. Is this fraud - yes or no? Before 0:09the money moves and we usually have less than 200  milliseconds to decide. And that's why banks lean 0:18on AI models. They watch for patterns, learn from  history, make decisions fast. And when an AI model 0:24is unsure of how to rate a given transaction,  fraud or not fraud, the request can be escalated 0:30for human evaluation. But multimodel AI is  changing that. Exactly right. Most fraud detection 0:36platforms today, they begin with traditional  machine learning models. So let's call these 0:42predictive ML. We're talking about algorithms  like logistic regression and decision trees and 0:49random forest. And don't forget my favorite,  gradient boosting machines. Oh, how could we 0:55forget those? Now, these models, they're trained  on large labelled data sets of past transactions. 1:01Some of those are fraudulent, some of those are  legitimate. And that's so they can recognize 1:06patterns that indicate fraud. So, for example,  a gradient boosting model. Yeah. that might 1:13use dozens of features like transaction amount  and time, location, merchant category, and user 1:19spending history in its analysis before outputting  a risk score. Yeah, lots of well-defined 1:25structured data that the model can process and  access, which ultimately gets to an evaluation 1:32that answers the question, is this transaction  fraud or not fraud? But novel or subtle fraud 1:39tactics can evade detection if they don't trigger  one of the known indicators. And also these models 1:45generally ignore unstructured information like  free form text, descriptions, and images entirely. 1:51Cases with any of these attributes or where risk  assessment is uncertain typically get escalated 1:57for manual review since the automated system  can't conclusively classify them. which brings 2:03us to an ensemble of AI. I was wondering if you'd  ever get to that. So ensemble being multiple AI 2:12models. Yeah, exactly. A second AI model for fraud  detection. So in addition to the predictive ML, 2:19this second model uses transformer-based large  language models and these are yeah encoder LLMs. 2:27Thought you might say that. And so we're clear  here on the distinction. Uh a decoder LLM is a 2:34generative AI that can generate new content based  on given prompt. So chat bots and things like 2:39that. Whereas an encoder LLM is a non-generative  LLM that focuses on natural language 2:46understanding. It's great for understanding  text classification, name entity recognition, 2:51and sentiment analysis. Models like BERT and  RoBERTa, a lovely couple, aren't they just? And an 2:57encoder LLM is a great fit for fraud detection as  these models can grasp nuanced language patterns. 3:03They can detect contextual clues and they can  extract key information from unstructured data. 3:11For example, a bank might use an encoder LLM to  read the description of an online funds transfer. 3:17If the text says, "Refund for overpayment. Please  rush." Well, the model might detect urgency and 3:24phrasing common in scam scenarios and assign a  higher risk score. Or an encoder LLM could analyze 3:30the merchant name and free form address for signs  of spoofing or association with known fraud cases, 3:36which is something a traditional model might not  capture. So, let's compare the two. Predictive ML 3:42and encoder LLM. Predictive ML loves structured  data and numbers. It's well suited at spotting 3:49sudden card not present spikes bursts of spending  geolocation jumps impossible travel scenarios and 3:56things like that. Anything you can measure in neat  columns or in whatever those are that's structured 4:03data Jeff of course. Now whereas an encoder LLM  that can read between the lines of unstructured 4:09data so images and the like and uh I guess this is  like a mountain vista of course. Does he like that 4:17better? Much better. Much better. So, so think  of a a wire memo that says urgent investment 4:23guaranteed 200% ROI. That's a pretty obvious  indication of a scam to humanize and also to an 4:31encoder LLM because both can recognize linguistic  patterns associated with fraud that don't nicely 4:37fit into a spreadsheet column. In the pros column,  predictive ML has certain advantages. for instance 4:45microsecond latency, its cheap compute requirements  and its simple scaling and easy to follow audit 4:53trail. Whereas the pros for encoder LLMs are being  well context aware and being language savvy and 5:02then being surprisingly good at connecting dots  even in situations that humans might miss. A good 5:09encoder LLM can greatly reduce false positives  because it understands why something looks a 5:15bit fishy. In the cons column for predictive  ML is being that the it's patternbound. A new 5:22scam using clever wording slips right through the  defenses unless you craft manually a new sort of 5:30detection scheme. And then the cons for encoder LLMs,  well, they're more computationally intensive 5:36than simpler ML models. They have millions  or billions of parameters and they require 5:41significant processing often with GPU acceleration  to run inference. So that begs the question, 5:48how do you use these two types of AI models in  an ensemble solution? So let's build a multiple 5:55model AI fraud detection workflow to answer that.  So we're going to start at the top here with a box 6:01that will represent incoming transaction data.  All incoming transactions first go through the 6:08predictive model. This model using ML algorithms  like random forest receives structured data and 6:16generates a fraud score based on probability of  fraud and a confidence level at which point we 6:23assess that confidence level at this stage here.  Now in most instances the model's output's pretty 6:30clearcut. Either the score is well below the  risk threshold, so it's likely legitimate, 6:35or it's well above it, so most likely fraud.  And when the model has a high confidence level, 6:43either way, the transaction is routed straight  to the final decision, that's where an action 6:50is taken. Either the transaction is auto approved  or it's flagged as fraud. It's the low confidence, 6:56ambiguous transactions that trigger the second  stage. When the predictive model returns a score 7:01in the borderline range indicating uncertainty,  the system will not immediately decide. Instead, 7:07the transaction is escalated to an encoder model  LLM like BERT for further analysis. And the 7:15encoder LLM that receives the original structured  features, but it can also process any unstructured 7:22data or contextual information that's available.  So that could be like the transactions description 7:28text or customer profile notes and the like. And  the encoder LLM, it ingests this composite input 7:35and it compares it with millions of fraud  patterns using a deeper context-aware lens, 7:41outputting its own LLM assessment. The final  decision engine combines the LLM findings with 7:48the original model's input. So, a transaction that  was borderline might be definitively flagged as 7:55fraud because the LLM uncovered incriminating text  or it might be cleared because the LLM found the 8:01context to be innocuous. So, in this architecture,  straightforward cases, they're processed with 8:07minimal overhead, while trickier cases, they get  kind of a second look through the AI rather than 8:14being immediately handed off to a human evaluator.  By not sending everything through the LLM, 8:20the system stays efficient. This costly LLM  here is only run when necessary. And by using 8:26the LLM on truly ambiguous cases, the system  improves overall accuracy. Fewer legitimate 8:33transactions are falsely flagged because the LLM  can recognize a benign explanation. And fewer 8:39frauds slip through because the LLM can catch  subtle cues the first model missed. And this 8:44can really save a lot of time and resources. So,  let's consider insurance claims processing. When 8:51a natural disaster hits, well, lots of claims,  they're all kind of filed at the same time, and 8:56insurance agents are probably going to need to put  in a bit of overtime to process the high number of 9:02claims coming in. And there's probably a bunch  of unstructured data in these claims here. So, 9:09images of uh property damage and stuff like that.  Now an ensemble of AI model solution using an 9:15encoder LLM that can look at that unstructured  data here and it can extract insights like the 9:22cause of the claim and the urgency and the  predictive model can automatically rank and 9:27autojudicate incoming claims together reducing the  burden on insurance agents a bit less overtime for 9:34them. But there is one more important piece and  that's the infrastructure. Because running these 9:40multiple models in real time, especially something  as compute heavy as encoder LLMs, it requires 9:47specialized hardware, right? You need a system  that can handle low latency inference at scale, 9:54ideally right at the point of transaction. That's  where things like AI accelerator chips come in. 10:00On-chip AI acceleration support workloads like  this, allowing fraud detection models to run 10:06directly where the data lives. So while the models  do the detecting, it's the hardware that makes 10:12it all possible, especially when you're aiming to  catch fraud in milliseconds and well, not minutes. 10:18So that's the multimodel AI for fraud detection.  As fraudsters devise new tactics, banks and 10:24businesses need to respond with smarter detection.  And a multiple model AI architecture combines the 10:31predictive power of traditional ML here with the  contextual reasoning of large language models.