Feature image How to design NoSQL databases

How to Design NoSQL Databases?

By Laurent Mauer · November 15, 2022 · 10 min read

NoSQL data modelling is a technique for designing data structures and manipulating data that uses a separate data model for each type of data. It allows for quick and accurate data analysis and helps to manage large data sets more efficiently.

NoSQL data modelling is a term used to describe a way of modelling data in a way that is different from traditional database modelling. A key difference is that NoSQL data modelling does not rely on tables, but instead on objects that can be accessed by using arrays, hashes, and other algorithms.

This makes it much easier to use and maintain, as well as to reason about the structure of data.

Designing NoSQL Databases:

NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph.

1. Column-based

Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. Values of single column databases are stored contiguously.

2. Graph-Based

A graph type database stores entities as well the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.

3. Document-Oriented

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. The document is stored in JSON or XML formats. The value is understood by the DB and can be queried.

4. Key Value Pair Based

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load. Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

What are NoSQL Design Principles?

NoSQL databases require a different design approach than that used with traditional relational database management systems (RDBMS). The rise in popularity of NoSQL databases paralleled the adoption of agile and DevOps practices. Unlike RDBMSs, NoSQL databases encourage ‘application-first’ or API-first development patterns. Before considering the data models and entities, developers first consider queries that support the functionality specific to an application. This developer-friendly architecture paved the path to the success of the first generation of NoSQL databases.

In contrast, relational databases impose fairly rigid, schema-based structures to data models; tables consisting of columns and rows. Each table typically defines an entity. Each row in a table holds one entry, and each column contains a specific piece of information for that record. The relationships among tables are clearly defined and usually enforced by schemas and database rules.

In the world of relational databases, schemas are usually managed by a centralized team of database administrators, who ensure that data models and data types are consistent across multiple applications. This situation can often introduce friction between administrators and development teams. This friction often translates into very long, non-agile application development lifecycles. Such highly structured data requires normalization to reduce redundancy and improve reliability. The data model is based on the entity being represented; query patterns are a secondary consideration.

NoSQL inverts this approach. Non-relational data models are flexible, and schema management is often delegated to application developers, who are relatively free to adapt data models independently. Such a decentralized approach accelerates development cycles and provides a more agile approach to addressing user requirements. Application developers can easily add properties and attributes, and the changes can be applied to existing data sets. This enables developers to add new features more quickly than if a schema migration were required.

The first step to understanding NoSQL database design principles is to understand the basic flavors of NoSQL available today. Besides these basic types, NoSQL hybrid databases can combine some of these types.

  • Document Store: Data and metadata are stored hierarchically in JSON-based documents inside the database.
  • Key Value Store: The simplest of the NoSQL databases, data is represented as a collection of key-value pairs.
  • Wide-Column Store: Related data is stored as a set of nested-key/value pairs within a single column.
  • Graph Store: Data is stored in a graph structure as node, edge, and data properties.

How is Data Stored in NoSQL?

NoSQL data storage depends on which type of database you use. Since NoSQL doesn’t require a schema, there is no blueprint on how data should be stored, and therefore varies between databases.

Generally, there are two ways that NoSQL data storage functions:

  • On-the-disk using B-Trees, with the top of it being permanently in RAM.
  • In-memory where it’s all on RAM using RB-Trees and anything stored on the disc is just an append.

Schema Design for NoSQL

Since NoSQL databases don’t really have a set structure, development and schema design tends to be focused around the physical data model. That means developing for large, horizontally expansive environments, something that NoSQL excels at. Therefore, the specific quirks and problems that come with scalability are at the forefront.

As such, the first step is to define business requirements, as optimizing data access is a must, and can only be achieved by knowing what the business wants to do with the data. Your schema design should complement the workflows tied to your use case.

There are several ways to select the primary key, and ultimately that depends on the users themselves. That being said, some data might suggest a more efficient schema, especially in terms of how often that data is queried.

NoSQL Data Modeling Techniques

Below, we will briefly discuss all NoSQL data modeling techniques.

1. Conceptual Techniques

There are a three conceptual techniques for NoSQL data modeling:

  • Denormalization. Denormalization is a pretty common technique and entails copying the data into multiple tables or forms in order to simplify them. With denormalization, easily group all the data that needs to be queried in one place. Of course, this does mean that data volume does increase for different parameters, which increases the data volume considerably.
  • Aggregates. This allows users to form nested entities with complex internal structures, as well as vary their particular structure. Ultimately, aggregation reduces joins by minimizing one-to-one relationships.
    Most NoSQL data models have some form of this soft schema technique. For example, graph and key-value store databases have values that can be of any format, since those data models do not place constraints on value. Similarly, another example such as BigTable has aggregation through columns and column families.
  • Application Side Joins. NoSQL doesn’t usually support joins, since NoSQL databases are question-oriented where joins are done during design time. This is compared to relational databases where are performed at query execution time. Of course, this tends to result in a performance penalty and is sometimes unavoidable.

2. General Modeling Techniques

Some of the general techniques for NoSQL data modeling:

  • Enumerable Keys. For the most part, unordered key values are very useful, since entries can be partitioned over several dedicated servers by just hashing the key. Even so, adding some form of sorting functionality through ordered keys is useful, even though it may add a bit more complexity and a performance hit.
  • Dimensionality Reduction. Geographic information systems tend to use R-Tree indexes and need to be updated in-place, which can be expensive if dealing with large data volumes. Another traditional approach is to flatten the 2D structure into a plain list, such as what is done with Geohash.
    With dimensionality reduction, you can map multidimensional data to a simple key-value or even non-multidimensional models.
    Use dimensionality reduction to map multidimensional data to a Key-Value model or to another non-multidimensional model.
  • Index Table. With an index table, take advantage of indexes in stores that don’t necessarily support them internally. Aim to create and then maintain a unique table with keys that follow a specific access pattern. For example, a master table to store user accounts for access by user ID.

3. Hierarchy Modeling Techniques

Some of the hierarchy modeling techniques for NoSQL data:

  • Tree Aggregation. Tree aggregation is essentially modeling data as a single document. This can be really efficient when it comes to any record that is always accessed at once, such as a Twitter thread or Reddit post. Of course, the problem then becomes that random access to any individual entry is inefficient.
  • Adjacency Lists. This is a straightforward technique where nodes are modeled as independent records of arrays with direct ancestors. That’s a complicated way of saying that it allows you to search nodes by their parents or children. Much like tree aggregation though, it is also quite inefficient for retrieving an entire subtree for any given node.
  • Materialized Paths. This technique is a sort of denormalization and is used to avoid recursive traversals in tree structures. Mainly, we want to attribute the parents or children to each node, which helps us determine any predecessors or descendants of the node without worrying about traversal. Incidentally, we can store materialized paths as IDs, either as a set or a single string.

Conclusion

NoSQL data modeling techniques are critical in designing the NoSQL Database, especially since a lot of programmers aren’t necessarily familiar with the flexibility of NoSQL. The specifics vary since NoSQL isn’t so much a singular language like SQL, but rather a set of philosophies for database management. As such, data modeling techniques, and how they are applied, vary wildly from database to database. 

Don’t let that put you off though, learning NoSQL data modeling techniques is very helpful, especially when it comes to designing a scheme for a DBM that doesn’t actually require one. More importantly, learn to take advantage of NoSQL’s flexibility. Don’t have to worry as much about the minutiae of schema design as you would with SQL.

At RestApp, we’re building a Data Activation Platform for modern data teams with our large built-in library of connectors to databases data warehouses and business apps.

We have designed our next-gen data modeling editor to be intuitive and easy to use.

If you’re interested in starting with connecting all your favorite tools, check out the RestApp website or try it for free with a sample dataset.

Discover the next-gen end-to-end data pipeline platform with our built-in No Code SQL, Python and NoSQL functions. Data modeling has never been easier and safer thanks to the No Code revolution, so you can simply create your data pipelines with drag-and-drop functions and stop wasting your time by coding what can now be done in minutes! 

Play Video about Analytics Engineers - Data Pipeline Feature - #1

Discover Data modeling without code with our 14-day free trial!

Category

Share

Subscribe to our newsletter

Laurent Mauer
Laurent Mauer
Laurent is the head of engineer at RestApp. He is a multi-disciplinary engineer with experience across many industries, technologies and responsibilities. Laurent is at the heart of our data platform.

Related articles

Build better data pipelines

With RestApp, be your team’s data hero by activating insights from raw data sources.