Learning Library

← Back to Library

Elasticsearch: Scalable Distributed JSON Database

Key Points

  • Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data.
  • It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls.
  • Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval.
  • Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns).
  • This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL.

Full Transcript

# Elasticsearch: Scalable Distributed JSON Database **Source:** [https://www.youtube.com/watch?v=ZP0NmfyfsoM](https://www.youtube.com/watch?v=ZP0NmfyfsoM) **Duration:** 00:09:54 ## Summary - Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data. - It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls. - Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval. - Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns). - This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZP0NmfyfsoM&t=0s) **Introducing Elasticsearch: Scalable NoSQL Store** - IBM developer advocate Jamil Spang outlines Elasticsearch as a distributed, JSON‑based NoSQL database, contrasts it with relational systems, highlights its REST‑ful API, and cites common use cases such as log aggregation, metric collection, and application tracing. ## Full Transcript
0:00would you believe me if i told you that 0:02there's a database out there that can 0:04continuously handle large volumes of 0:07information 0:08scale automagically 0:10and be available to keep on continuously 0:13taking on data 0:15hello my name is jamil spang developer 0:17advocate with ibm 0:19and today's topic is the answer to that 0:22elasticsearch 0:24all right it's a great database data 0:27store and i want to talk a little more 0:28about the some of the characteristics of 0:31it we're going to compare it to a 0:33relational database management system 0:35and then talk about the ecosystem that 0:37comes with it 0:39so to get started let's talk about what 0:42is elasticsearch exactly well 0:45first it is distributed in nature 0:48and it is a nosql 0:51json based 0:54datastore 0:56we're going to abbreviate that with the 0:57ds there as well 0:59um 1:00so um 1:02on the spectrum of where databases fall 1:04with postgres in my sequel kind of being 1:07the most structured type of databases 1:10put this on the outer sphere 1:12past mongodb when it comes to how 1:15unstructured and nosql it can be 1:18when it comes to interacting with 1:20elasticsearch interest interests 1:23interestingly enough it's done through a 1:27restful 1:31api 1:33so all your queries happen that way you 1:35programmatically 1:37program all your indexes and all the 1:40stuff that you pretty much anything you 1:42need to interact with it would be 1:43through rest urls 1:45and a lot of the major use cases for 1:48this 1:48you know could be 1:50you can take many different data sources 1:52from logs it could be 1:55any type of metrics you have from 1:57different systems 1:58and maybe even some application 2:01trace data 2:03that comes in and you can have one 2:05system that you can combine all of this 2:08you think about data coming from all 2:09these different sources 2:12and it being able to 2:13uh push them into json documents and 2:16then allow you the ability to search 2:19and get that information back in real 2:21time 2:23so it sounds like a big job that it has 2:25to do and certainly let's 2:27do it from our normal comparison of what 2:30a relational what we know of from 2:32relational databases 2:34to see how that compares and how the 2:36lingo and the context changes well we 2:39know that with relational database 2:42management systems they are called 2:44databases 2:46and in 2:48elasticsearch these are known as 2:52indexes 2:56or 2:57i 2:58n d i c 3:00e s indexes all right 3:03and also in a 3:05uh 3:06relational database we have the term of 3:09tables 3:10okay 3:11and in 3:13this they're going to be called 3:15it could be called kind of index 3:16patterns 3:18and some of the earlier versions they 3:20were known as types 3:22all right so now we know from our tables 3:25in relational database has many tables 3:28all right and we know the obvious second 3:30one we're going to look at is 3:33i'm going to put both of these down as 3:35we're getting to the bottom of the 3:36screen here rows and then columns 3:43okay let's get my other marker here 3:45and 3:46rows just like we know from most 3:49nosql data sources are going to be as 3:52documents 3:57and normally in a relational database 4:00you know you have tables you have the 4:01rows individual columns these are going 4:04to be called fields 4:08so just a quick comparison if you have a 4:11lot of familiarity with a relational 4:13databases like mysql or postgres this is 4:16kind of a way to transition your 4:18understanding 4:19of all that and know how things kind of 4:21map together and when you start planning 4:23out your your structure these are things 4:25that you need to consider that how you 4:27can translate that over so we know that 4:29it's a 4:31json based data store you're going to 4:33interact with it with rest and we're 4:35looking to get many it's very powerful 4:37has the capability to 4:41ingest data from many many data sources 4:43and scale out if i think about the cap 4:45theorem concepts i will probably put 4:47this on an a and a p for availability 4:50and partition tolerance already built in 4:52and depending on how you want to 4:53configure it you could probably achieve 4:55some 4:56different consistency bases as well but 4:59let's get move on to the whole ecosystem 5:02so you hear the name elasticsearch out 5:04there but often you will hear about this 5:07term elk 5:08elk stack 5:12this is how you you will hear about it 5:14being referenced and i think the easiest 5:16way to break down how the stack works 5:19let's diagram it out and then we'll talk 5:21about each counter component and the 5:23place that it fits and that would be a 5:25great way to really help understand this 5:27so let's put 5:29elasticsearch 5:31i'm going to abbreviate this es that's 5:33going to be kind of in the center 5:35of everything here and what we're going 5:38to do the 5:42the k is for cabana 5:46and kambana is a web-based ui 5:51this will be how you actually interact 5:53with a lot of the data that uh 5:56elasticsearch prepares and indexes for 5:58you to use and so you can build um 6:01your dashboard 6:05and you can build different widgets 6:09or visualizations 6:16that can continuously update as well as 6:19data comes in uh on that side so this 6:22could really be your main interface that 6:24you use to keep 6:27keep updating and looking at your data 6:29as it flows in now let's talk about the 6:32other side so we talked about the output 6:34we have this great data store 6:36elasticsearch we're going to be 6:37visualizing things with cabana kind of 6:40our gateway to view our data and how 6:42things are running now let's talk about 6:44how data gets in and there are two parts 6:47that i would like to talk about here 6:51we have something called logstash and 6:53you'll also 6:55hear something called beats 6:58all right 7:00so for logstash 7:04think of this as well it actually is a 7:06very open source 7:08server-side uh processing pipeline 7:12and its main job is to do two things to 7:14take data in 7:21input data from many different sources 7:24is then going to transform 7:29that data 7:30and then you get to what we like to call 7:33so eloquently stash it somewhere all 7:35right now the inputs can be from variety 7:38of things you can actually just put it 7:40in a format most of the time you can add 7:42sdks or things to your 7:44code or or different systems and they 7:46push the data into logstash 7:48transformations may be to do some 7:50formatting on the data minor structuring 7:53before it comes in 7:54if you would like and then you can 7:57output that through 7:59to stash that somewhere 8:00and you can imagine one of the first 8:03plugins that are there is elasticsearch 8:05so let's complete our triangle here 8:08we'll go from logstash 8:10into 8:11uh elasticsearch and so you can 8:13continuously feed things in 8:16now we mentioned the part beats uh that 8:18were here unlike the headphones these 8:21beats are set up to 8:22be kind of agents on different servers 8:25so say you have something in maybe in 8:27serverless or 8:29um or you have some files that you want 8:31to do or different 8:32[Music] 8:33maybe something on windows server so 8:35it's kind of a complementary kind of 8:37component that's very logstash in nature 8:39but it has plug-ins to many different 8:41other services 8:42and one of its outputs is to go directly 8:45into logstash 8:46so collectively you're kind of building 8:48this consistent 8:50pipeline that keeps going 8:52in as you visualize you can kind of say 8:55you program more 8:57things to come in and continuously keep 9:00this 9:01circular nature coming 9:04and keep flowing 9:05now 9:06this can scale up to 9:09massive amounts of information and nodes 9:12that can really already set up to be 9:13distributed in nature and handle a 9:15variety of scenarios but one great thing 9:18is there are containers available that 9:21you can set up this complete 9:23infrastructure all on your laptop to 9:25taste test things out on a very much 9:28smaller scale and have it grow to a much 9:31larger scale effectively making it a 9:33great component in your architecture to 9:35be how you visualize your data that will 9:38be in the data lake that you are 9:39building 9:40thank you very much for your time 9:43if you have any questions please drop us 9:45a line below and if you want to see more 9:48videos like this in the future please 9:50like and subscribe