From SQL to NoSQL: A Simplified Deep Dive into Databases

Published by

on

data codes through eyeglasses

Welcome to what’s sure to be the first of many breakdowns here on TechBreakdowns.com. Here on Tech Breakdowns we’re going to talk technology, like humans. Today we’re going to get things going with databases.

The goal is to give you a deep dive into the world of databases, hitting all the necessities, but leaving out the fluff. The target audience? Product Managers, Designers, investors, basically anyone that doesn’t have to work with databases on a daily basis.

What is a database? Why Do They Matter?

We have to start at the ground level, with an analogy, what is a database?

Imagine you had a box on your computer. Inside this box, you have pieces of information. Anything: photos from your last vacation, a list of your favorite songs, high scores from your video games, and your last few water bills.

You keep tossing things in this box without any organization at all, it’s a complete mess. How do you find that one water bill that hasn’t been paid yet? Well, this is where a tool like a database would come into play.

Sticking with the analogy of an “everything box,” a database like tool would allow you to:

  1. Create shelves and labels: instead of throwing everything in the box randomly you could have “shelves” (called tables in databases) where similar items are kept together. Each item on the shelf could have “labels” (called records) that would tell you exactly what it is.
  2. Search: you’d be able to search your box pretty quickly by thinking about different characteristics. You can even put frequently accessed items in special places to make them easier to access next time around.
  3. Update your information: in the case of a water bill, you’d want to know when it has been paid. With your database powers, you’d be able to update it to mark it as paid.

Let’s stop talking about boxes and shelves though, we can now pivot to talking more about databases and how they actually operate.

Key Concepts

In this key concepts section, we’re going to talk specifically about the most common type of database, a relational database. We will talk about other types of databases in the next section, but it’s likely that if you’re here to learn about databases, you’re likely trying to understand relational ones.

Relational databases contain tables. You can best think of a table as a sheet in an Excel document. That sheet has lots of cells that are made up of many rows and columns.

Well, imagine you were a super strict Excel user and made a new sheet for each particular type of information. You’d have a “Person” sheet that might contain demographic information, an “Occupation” sheet that might contain information about different occupations, and an “Education” sheet that lists various schools and information about them.

From here on out, it makes more sense to get a bit visual. I’ll try to use some words too, though. Below we have our database. The database has three tables: person, occupation, and education.

An illustration of a SQL database

As you can see, “NFL Player” is an occupation. What’s that funky “ID” column though? Well, “17” in this case is the unique identifier, or “Primary Key” that we can use throughout the database to refer to the “NFL Player” occupation.

Likewise, “Michigan” is a school. We can refer to this school in other tables by using the id of 29.

Illustrating data in a SQL database

This starts to make more sense if we add a Person like “Tom Brady” to our table of people. As you can see he got an ID of 1 as he is the first person we’ve added. He has a name, and we can give him an occupation_id (note, we don’t need to have the “_id” appended, but that’s just a best practice) of 17.

This type of column, the one where we’re referring to a different table, is called a “Foreign Key.” We don’t need to get in-depth there for an overview of databases, but it’s nice to know they exist.

Speaking of exist… why do we do relations like that? And what happens if Tom’s job changes? Well, it did, right? He’s retired now. Do we go to ID of 17 and change the name from “NFL Player” to “Retired?” No. We don’t do that as we might break any other NFL players we have in our dataset that are not already retired. Instead, we update Tom’s record to point to the occupation_id for “Retired.”

Database updated to reflect Tom Brady's new occupation.

It turns out that “Retired” has an ID of 36 in our table, so we can change Tom’s occupation_id to reflect that.

Key Terms so far…

  • Table -> designed to hold a particular type of information.
  • Records -> A row within a table (like Tom Brady).
  • Primary Key -> The unique identifier of a table. Note: a bit complex, but a primary key isn’t required.
  • Foreign Key -> Data that refers to a different table.

Another key concept in databases is data integrity and consistency. If your data is bad, it’s useless, so you want to do all you can to ensure things are done correctly.

Data integrity refers to the accuracy, completeness, and reliability of data throughout its lifecycle. This is typically a more complex domain, and you’d likely find database administrators or devops team members taking care of this.

How? Backups and access controls are the two primary mechanisms. Software engineers can help keep integrity and quality by validating correct data (i.e. you probably want to block “hello” going into a zip code column).

Types of Databases and Their Use Cases

We covered relational databases above. As a quick recap, a relational database is a type of database that organizes data into structured tables. These tables are related to one another based on predefined rules and relationships. Here’s an image to illustrate that:

An illustrated example of a database.

The term “relational” refers to the relationship that can occur between tables (as illustrated above). There are a few different ways that this relationship can be mapped, including:

  1. One-to-one: one record in a table refers to one record in another table
  2. One-to-many: one record in a table relates to multiple records in another table
  3. Many-to-many: multiple records in one table relate to multiple records in another table

Before turning to NoSQL, it’s worth recognizing some of the names in this space. They’re likely familiar, even if this is your first time hearing about databases.

  1. Oracle Database
  2. MySQL
  3. Microsoft SQL Server
  4. PostgreSQL

Alright, it’s time to talk about NoSQL at a high level.

NoSQL

The “No” here does not stand for “No,” it’s actually “Not Only.” “Not only” because it’s a departure from the traditional relational database model.

The key part that’s missing is what’s called a schema. By not having a concrete schema, NoSQL allows for more flexibility, scalability, and support of diverse data models.

This is a good point to pivot and talk about the “schema” that a database traditionally uses.

DDL Definition of a table in SQL

The above is an example of a schema in “data definition language.” This language describes data and its relationships in a database. It’s pretty static. If we had a “Users” table that contained a name, address, and email address, we can’t simply just start throwing a phone number at it, we have to change the schema.

While changing the schema might seem easy in this scenario, it can be a very difficult thing to do in some scenarios. You have to plan it out, write the code, figure out the best time to run it, make a plan to backfill data, it’s a whole ordeal.

Well, NoSQL gets rid of that ordeal by removing the need for a static schema. NoSQL was designed to solve challenges related to scalability and performance that relational databases struggled with, especially in web and cloud applications.

The most common type of NoSQL database (we’ll cover others in “More technical databases”) is the document store. The largest name in this space is MongoDB.

Instead of requiring users to pre-determine (for the most part) what needs to be in the database, MongoDB and other NoSQL solutions support any and all data. Of course, this could get pretty complex for your dev team if you find yourself sending some data some time and different data the rest of the time. You still have to have some form of consistency.

A “document” in NoSQL is essentially a row in a SQL table. As relations don’t play a vital role in the use of NoSQL databases, our Tom Brady example from earlier looks a little different. Instead of having a separate document (or table in the relational world) for occupation, it’ll be right in there. The document itself, “TomBrady” would hold all the necessary information.

_id: kans237489hn3ui4n2,
name: "Tom Brady",
occupation: "NFL Player",

To update the data related to Tom Brady from “NFL Player” to “Retired” one only needs to change the document:

_id: kans237489hn3ui4n2,
name: "Tom Brady",
occupation: "Retired",

This would generally be done in code, and there’s a bevy of helpers and tooling to help you get through it.

Advantages / Disadvantages of SQL and NoSQL

At first glance, one might be inclined to always go the NoSQL right. I mean, you don’t have to define a schema up front! That, however, can be a problem. It’s an even bigger problem the larger you grow.

A NoSQL database is well-suited to big data applications, logging, or other things where performance is not essential. That is a general and broad sweeping statement, and there are likely some MongoDB users out there seething right now. Sure, MongoDB has built tooling to make NoSQL more efficient and usable in regular applications, but it’s still not a worthy substitute for a well thought out relational database.

Relational databases are a mature technology that have been around for decades. There’s a wealth of tooling and knowledge about how to build something that works exceptionally well.

Structuring data across tables allows for more easily maintainable integrity, and an engineer can always lookup what data is in the table with ease. NoSQL on the other hand can get a little bit tricker if you’ve had periods of not sticking to a defined data structure.

More technical databases

I did promise above that we’d talk about some more technical databases, and here we go. These databases lean on NoSQL methodologies and are used in specific situations where they make sense.

Time-Series Databases

These types of databases are generally high-volume. They’re used for monitoring, like keeping an eye on your applications. They’re also used a lot in finance, IoT, and forecasting.

Some examples of time-series databases include:

You’d generally use a time-series database when you’re looking for performance (fast writing) and querying based on time. They’re well-suited to pruning if you only care about something like the last 7 days, and they’re designed to handle lots of data.

Worth noting, the data in time-series databases is immutable. These are not databases that you go back and change. If you want to change data in a time-series database, you’ve probably picked the wrong tool for the job.

In-Memory Databases

In-memory databases store data directly on the system’s RAM rather than on disk. This allows them to be lightning fast with regards to reading and writing.

A word of warning! RAM is volatile, and you should never consider using an in-memory database for anything critical. If there’s a power-outage your data is gone. System crash… gone. Some in-memory database options do offer mechanisms to periodically persist data, but these types of systems shouldn’t be considered replacements for regular relational databases.

You’d generally use an in-memory database for things like real-time analytics, caching, gaming, or in financial services like high-frequency trading where milliseconds matter.

Some popular offerings in the space include Redis, Memcached, and Apache Ignite.

Oh, and these databases are NoSQL too. For example, Redis uses what’s called a key-value pair. It looks a bit like this:

my_key: "the value"

Querying Databases

While I don’t want this to be a deeply technical article, and I definitely don’t want it to be a SQL tutorial, I do think it’s worth showing some of the basics.

While there are subtle differences in SQL databases, the below SQL commands should work on all of them. The NoSQL commands we’ll take a look at are from MongoDB.

Selecting Data

The most basic operation performed on a database, and sometimes you’ll do this even as a non-tech person, is the SELECT. SELECT allows you to query and retrieve data. These queries can be rather basic like this one that gets all the data from the users table:

SELECT * FROM users;

They can also get rather hairy. For example, you could join tables, look for a value across multiple columns (say in sports you’re looking for a team no matter if they were home or away), and then aggregate the data based upon the results of that first query. I won’t show you one of those as it’d distract from the goals of this article.

The * in SQL denotes “all.” You can specify the exact columns of data you want back by replacing * with what you need. For example, lets say we wanted the name, age, and email of our users:

SELECT name, age, email FROM users;

In MongoDB, the above “select all from users” query would look like this:

db.users.find()

You can also specify the exact pieces of data you actually want to retrieve with:

db.users.find({}, { name: 1, age: 1, email: 1, _id: 0 })

The first argument {} is the query condition. An empty object means “match all documents”. The second argument is called the projection, specifying which fields to include (1 means include, 0 means exclude). By default, MongoDB will always return the _id field, so if you don’t want to include it in the results, you need to explicitly exclude it with _id: 0.

Updating Data

Updating data in SQL is rather easy, but you do have to be careful. Here’s a simple command:

UPDATE users SET name = "Aaron Rodgers";

See the problem? Well we just set the name of every single user in our database to “Aaron Rodgers” and that’s not good.

When writing an UPDATE, you need to ensure you narrow the scope:

UPDATE users
SET name = "Aaron Rodgers"
WHERE id = 17

Above we chose id to get specific, but you could do a WHERE on any column.

A pointer here: always do a SELECT with the same WHERE first to ensure you’re only going to UPDATE what you expect.

MongoDB is a little more safer when compared to SQL as it has distinct updateOne or updateMany options. Here’s what our more broad first query would look like:

db.users.updateMany({}, { $set: { name: "Aaron Rodgers" } })

And then narrowing it down to a specific id:

db.users.updateOne({ id: 17 }, { $set: { name: "Aaron Rodgers" } })

Inserting Data

When using INSERT, you need to specify the columns you’re adding data to, and the values you’ll be inserting.

INSERT INTO users (name, age)
VALUES ('Brock Purdy', 23)

You don’t need to provide a value for every column, only those that are required by the schema. If you mess something up, your error message will absolutely lead you in the right direction.

Here’s the same INSERT in MongoDB:

db.users.insertOne({ name: "Brock Purdy", age: 23 })

You might see the insertOne and wonder if there’s an insertMany… there is:

db.users.insertMany([
    { name: "Brock Purdy", age: 23 },
    { name: "Davis Mills", age: 24 }
])

Deleting Data

Finally, DELETE. Much like the UPDATE, be careful with this one.

DELETE FROM users

The above will delete every user your application has. You can narrow things down a bit with:

DELETE FROM users WHERE name = 'Tom Brady'

That will only DELETE where the name is ‘Tom Brady’.

In MongoDB there are equivalents for both of these. First, if you include no arguments on the deleteMany you’ll remove everything:

db.users.deleteMany({})

If you wanted to delete all Tom Brady’s:

db.users.deleteMany({ name: "Tom Brady" })

Or if you wanted to make sure you only deleted one Tom Brady (note it’ll only delete the first one it encounters):

db.users.deleteOne({ name: "Tom Brady" })

That’s A Wrap

Hopefully this primer on databases was worth your while. If it was, do consider subscribing for free:

There will be weekly technical deep dives here on TechBreakdowns.com.

Glossary of Database Terms

We used a lot of terms in this article, here’s a glossary:


Of course! Here’s a glossary based on the key terms and concepts from your article, listed in alphabetical order:

Glossary:

  1. DELETE: A command used to remove data from a database.
  2. Document: A unit of data in NoSQL databases, analogous to a row in relational databases.
  3. Foreign Key: A column in a database table that refers to the primary key of another table, establishing a link between them.
  4. INSERT: A command used to add data to a database.
  5. In-Memory Databases: Databases that store data directly in a system’s RAM, allowing for faster read and write operations.
  6. NoSQL: A type of database that doesn’t solely rely on the traditional relational database schema. It allows for flexibility and scalability.
  7. Primary Key: A unique identifier for a record in a database table.
  8. Records: Rows within a database table that contain specific data.
  9. Relational Databases: Databases that organize data into structured tables. These tables relate to one another based on predefined rules.
  10. Schema: The blueprint or structure of a database, defining how data is organized and how relationships between data are handled.
  11. SELECT: A command used to query and retrieve data from a database.
  12. Table: In relational databases, a structure designed to hold a specific type of information, analogous to a sheet in an Excel document.
  13. Time-Series Databases: Databases optimized for storing time-stamped or sequential data.
  14. UPDATE: A command used to modify existing data in a database.
%d bloggers like this: