Working with JSON and PostgreSQL

Recently, I needed to scrape some JSON documents and do some pre-processing on the data. I had to link documents together and query the data in order to generate an input file for another process. I thought about using MongoDB to help with some of the pre-processing. However, I hadn’t used MongoDB in over a year. The thought of brushing up on it for a one-off task was not super exiting to me. Jumping from technology to technology is no longer exciting and it makes me slower. I have been working a lot with PostgreSQL lately and I figured it would be a good opportunity to look at what it can do with JSON (and stick with Postgres). I was pleasantly surprised.

PostgreSQL and the json,jsonp Types

With PostgreSQL version 9.4, there are now two JSON-related data types : json and jsonb. According to the Postgres documentation, the difference between these two types is efficiency. The json type is quick on inserts because there is no pre-processing of the data before an insert occurs. The jsonb type is a bit slower on inserts because there is a pre-processing step that happens as data is inserted. This makes it easier to process documents after. jsonb (newly added in 9.4) brings with it the ability to index on JSON document elements. There are other differences between the json and jsonb types relating to usage of whitespace, order preservation and duplication of document keys. The PostgreSQL documentation suggests that most applications would use the jsonb type.

Initial Exploration

In this post, I’m going to explore how PostgreSQL deals with JSON documents. To do that, I’ll need to create some database tables. The SQL scripts below create a car and a car_lot table. Notice the INTEGER and JSONB types in the table definitions.

The script to create and populate the car table :

DROP TABLE IF EXISTS car
;

CREATE TABLE car (id INTEGER, data JSONB)
;

INSERT INTO car 
VALUES ( 
 1, '{
       "CarTypeId" : 225, 
       "Description" : "Ford-Pinto", 
       "Recalls" : [1967, 2001] 
     }'
)
;

INSERT INTO car 
VALUES ( 
 2, '{
       "CarTypeId" : 228, 
       "Description" : "VW-Rabbit"
     }'
)
;

INSERT INTO car 
VALUES ( 
 3, '{
       "CarTypeId" : 230, 
       "Description" : "VW-Golf"
     }'
)
;

INSERT INTO car 
VALUES ( 
 4, '{
       "CarTypeId" : 445, 
       "Description" : "Ford-Mustang", 
       "Recalls" : [2009] 
     }'
)
;

Notice that some cars have a Recalls array and some don’t in the insert script. This is what the data looks like in the database :

The script to create and populate the car_lot table :

DROP TABLE IF EXISTS car_lot
;

CREATE TABLE car_lot (id INTEGER, data JSONB)
;

INSERT INTO car_lot 
VALUES(
  1, '{
        "LotName" : "Car Lot A" ,
        "AvailableCarTypeId" : [225, 228] 
      }'
)
;

INSERT INTO car_lot 
VALUES(
  2, '{
        "LotName" : "Car Lot B" ,
        "AvailableCarTypeId" : [445, 230] 
      }'
)
;

We have two car lots and in each of those lots, there are two car types in the inventory. Here’s what the data looks like in the database :

JSON Functions and Operators

I’ll create a few views to illustrate some neat JSON manipulation features. I was impressed with the built-in functions Postgres has to manipulate JSON documents. For example, the jsonb_array_elements() function is really neat. It expands a JSON array (inside a document) to a set of JSON values. The script below creates a view out of the JSON that is in the car table. The view shows the CarTypeId, Description and Recalls for all relevant documents. The names of the fields are renamed in the view just for fun. If you’ve used views before, you’ll notice that parsing the JSON document and getting the array values inside the Recalls document element is fairly easy.

DROP VIEW IF EXISTS v_recalls
;

CREATE VIEW 
   v_recalls 
AS 
SELECT 
   data->>'CarTypeId' as car_type_id,
   data->>'Description' as car_type, 
   jsonb_array_elements(jsonb(data->>'Recalls')) AS recall_year 
FROM 
  car
;

Here’s the result of selecting everything on the v_recalls view :

It’s a tabular / denormalized view of the JSON document data showing all car types and recall years.

Joins

I was also impressed with how easy it was to join two tables on document elements. This next view joins the car and car_lot tables based on document CarTypeId’s. The query is a basic join that makes use of JSON document elements inside the car and car_lot tables to produce the result. Looking at the view creation script below, it’s evident that querying these documents feels familiar. If you’ve used SQL before, the extra JSON functionality in PostgreSQL doesn’t distract.

CREATE VIEW 
  v_inventory
AS
SELECT 
  tmp_car_lots.car_lot,
  tmp_cars.car_type_id,
  tmp_cars.car_type
FROM (
  SELECT 
     data->>'CarTypeId' as car_type_id,
     data->>'Description' as car_type
  FROM 
    car
  ) as tmp_cars
JOIN 
  (
  SELECT 
   jsonb_array_elements(jsonb(data->>'AvailableCarTypeId')) AS car_type_id,
   data->>'LotName' as car_lot
  FROM 
    car_lot
  ) as tmp_car_lots
ON 
  tmp_cars.car_type_id::text = tmp_car_lots.car_type_id::text
;

Here’s the result of selecting everything on the v_inventory view :

Final Thoughts

PostgreSQL’s JSON data type support seems to strike the right balance between old and new technologies by leveraging existing relational database concepts and introducing new capabilities relating to JSON documents. The JSON functionality in Postgres is an extension of the existing model. Queries use the same syntax and run in the same transactional environment.

I just took a quick peak at PostgreSQL and JSON, apparently you can even use JavaScript to write PostgreSQL stored procedures by using an extension. I don’t know if I will use this but it’s nice to know the capability exists. Will PostgreSQL take the place of MongoDB in the wild? I’m not sure but for quick and dirty data pre-processing task that include JSON, I think I may reach for PostgreSQL first.

comments powered by Disqus