JavaScript Web Guide

Structuring Data

This guide will cover some of the key concepts in data architecture and best practices for structuring JSON data in your Firebase database.

Best Practices

Building properly structured NoSQL data structures requires quite a bit of forethought. Most importantly, we need to understand how the data will be read back later, and how to make that process as easy as possible. In general, it's best to use nested data sparingly and to flatten data where possible.

Use Nested Data Sparingly

Because we can nest data up to 32 levels deep, it's tempting to think that this should be the default structure. However, when we fetch data at a location in our database, we also retrieve all of its child nodes. Therefore, nesting data must be done with careful consideration for how the data will be read later. Consider the following badly nested structure:

When we read a data node in our database, we also retrieve all of its children!
ANTIPATTERN: This is not a recommended practice
{
      // a poorly nested data architecture, because
      // iterating over "rooms" to get a list of names requires
      // potentially downloading hundreds of megabytes of messages
      "rooms": {
        "one": {
          "name": "room alpha",
          "type": "private",
          "messages": {
            "m1": { "sender": "mchen", "message": "foo" },
            "m2": { ... },
            // a very long list of messages
          }
        }
      }
    }

With this nested design, iterating over the data becomes problematic. Even a simple operation like listing the names of rooms requires that the entire rooms tree, including all members and groups, be downloaded to the client.

Prefer Flattened Data

If the data were instead split into separate paths (i.e. denormalized), it could be effeciently downloaded in segments, as it is needed. Consider this flattened architecture:

{
      // rooms contains only meta info about each room
      // stored under the room's unique ID
      "rooms": {
        "one": {
          "name": "room alpha",
          "type": "private"
        },
        "two": { ... },
        "three": { ... }
      },

      // room members are easily accessible (or restricted)
      // we also store these by room ID
      "members": {
        // we'll talk about indices like this below
        "one": {
          "mchen": true,
          "hmadi": true
        },
        "two": { ... },
        "three": { ... }
      },

      // messages are separate from data we may want to iterate quickly
      // but still easily paginated and queried, and organized by room ID
      "messages": {
        "one": {
          "m1": { "sender": "mchen", "message": "foo" },
          "m2": { ... },
          "m3": { ... }
        },
        "two": { ... },
        "three": { ... }
      }
    }

Note how we have some lightly nested data (e.g. messages for each room are themselves objects with children), but we've taken care to organize our components logically by how they will be iterated and read later. It's now possible to iterate the list of rooms by only downloading a few bytes per room, quickly fetching meta data for displaying rooms in a UI.

Using Indices to Define Complex Relationships

A lot of times in building apps, it's preferable to download a subset of a list. This is particularly common if the list contains thousands of records or more. When this relationship is static, and one-directional, we can use queries to grab a subset of data, or simply nest the entries under the logical grouping, such as users' names:

{
    "messages": {
      "john": {
          "rec1": "Walk the dog",
          "rec2": "Buy milk",
          "rec3": "Win a gold medal in the Olympics"
      }
    }
  }

However, we already know that flattening data is a best practice. So let's see why, by examining where this structure begins to break down. If we move into something more dynamic, like shared chat rooms, then suddenly our data (e.g. lists of rooms, lists of messages) now have two-way relationships.

Users can belong to a group and groups comprise a list of users. A first attempt at resolving this data structure would probably look this:

ANTIPATTERN: This is not a recommended practice
// A first attempt at a two-way relationship
  {
    "users": {
      "mchen": { "name": "Mary Chen" },
      "brinchen": { "name": "Byambyn Rinchen" },
      "hmadi": { "name": "Hamadi Madi" }
    },
    "groups": {
      "alpha": {
         "name": "Alpha Tango",
         "members": {
            "m1": "mchen",
            "m2": "brinchen",
            "m3": "hamadi"
         }
      },
      "bravo": { ... },
      "charlie": { ... }
    }
  }

Great start! But when it comes time to decide which groups a user belongs to, things get complicated. If rooms can contain thousands of users, we can't just iterate all the groups to see if a user's id exists.

Even worse, security may prevent Mary from reading some groups, or iterating over the list of groups. When we try to fetch the entire list we'll get an error telling us the operation wasn't allowed, since there is no way to filter the list using security rules.

What we would like instead is an elegant way to list the groups Mary belongs to and only fetch data for those groups. A highly scalable approach is to use an index, or a list of keys, that refer to Mary's groups:

// Tracking two-way relationships between users and groups
  {
    "users": {
      "mchen": {
        "name": "Mary Chen",
        // index Mary's groups in her profile
        "groups": {
           // the value here doesn't matter, just that the key exists
           "alpha": true,
           "charlie": true
        }
      },
      ...
    },
    "groups": {
      "alpha": {
        "name": "Alpha Group",
        "members": {
          "mchen": true,
          "hmadi": true
        }
      },
      ...
    }
  }

Didn't we just duplicate some data by storing the relationship under both Mary's record and under the group? Looking closely, we see mchen indexed in group alpha, and also the same relationship under mchen's user record. Doesn't this mean we have to write to both places any time this relationship changes?

Yes. This is a necessary redundancy for two-way relationships. It allows us to quickly and efficiently fetch Mary's memberships, even when the list of users or groups scales into the millions, or when Security and Firebase Rules would prevent access to some of the records.

Try it on JSFiddle

Click here to try this out in an interactive example, which demonstrates using an index to reference a master list of data.

Why do we invert the data by listing the ids as keys and setting the value to true? There are a few good reasons for this approach. It makes checking for a key very easy since we can just read /users/mchen/groups/$group_id and see if it is null.

// see if Mary is in the 'alpha' group
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org/users/mchen/groups/alpha");
ref.once('value', function(snap) {
  var result = snap.val() === null? 'is not' : 'is';
  console.log('Mary ' + result + ' a member of alpha group');
});

Thus, the index is faster and a good deal more efficient. Later, when we talk about securing data, this structure will also be very important. Since Security and Firebase Rules cannot do any sort of "contains" on a list of child nodes, we'll rely on using keys like this extensively.

Joining Flattened Data

Flattening data and using indices is great for creating modular, high performance data structures. However, reading this data back may seem non-trivial. Let's start out with a simple join example, building on the same users/groups data we referenced in the sections above, and talk through the concerns.

// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");

// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
  // for each group, fetch the name and print it
  String groupKey = snapshot.key();
  ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
    System.out.println("Mary is a member of this group: " + snapshot.val());
  });
});

Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

Okay, but all this does is print data to the screen. It's also not realtime as any changes to the data will not be detected. How about a more realistic example? Glad you asked!

Try it on JSFiddle

Click here to try joining records in an interactive example.

  1. 1

    Next

    Installation & Setup

  2. 2

    Next

    Understanding Data

  3. 3

    Next

    Saving Data

  4. 4

    Next

    Retrieving Data

  5. 5

    Next

    Structuring Data

  6. 6

    Next

    Understanding Security

  7. 7

    Next

    User Authentication

  8. 8

    Next

    Offline Capabilities

  9. 9

    Next

    Deploying Your App