Cover image credit: Hunter x Hunter manga by Yoshihiro Togashi, meme-ified by yours truly. <3
Ah, good ol' CSV files: spreadsheet/database data rendered as pure text. Beautiful!
Parsing CSV (Comma-Separated Value) files involves reading a sequence of records row-by-row. Typically (but not necessarily), the first row is interpreted as the header for each column. All rows after the first are considered individual records, with each comma-separated-value belonging to a single column. Because of this, each row of a CSV must have the exact same number of values as each other row. (OK, I'm realizing that writing about CSVs is very tricky--see below for a visualization!)
The Challenge
In a recent takehome challenge, I was asked to parse a CSV of purchases that included:
- date (as a String)
- item (as an Integer) -- for this example, we'll use candy!
- price (as a Float)
- name (as a String)
Here's the example code we'll be working with (and a reveal about my recent TV-watching):
Raw CSV:
PURCHASE DATE,CANDY PURCHASED,CASH PAID,BUYER NAME
2016-04-03,1000,10000.11,Charlie Kelly
2017-11-14,1000,12000.22,Frank Reynolds
2018-01-20,2000,40000.33,Frank Reynolds
2018-03-20,2000,40000.44,Mac
2019-01-02,2000,50000.55,Sweet Dee
2019-01-02,1500,13500.66,Dennis Reynolds
More readable formatting:
PURCHASE DATE, | CANDY PURCHASED, | CASH PAID, | BUYER NAME
---------------|------------------|------------|-----------------
2016-04-03, | 1000, | 10000.11, | Charlie Kelly
2017-11-14, | 1000, | 12000.22, | Frank Reynolds
2018-01-20, | 2000, | 40000.33, | Frank Reynolds
2018-03-20, | 2000, | 40000.44, | Mac
2019-01-02, | 2000, | 50000.55, | Sweet Dee
2019-01-02, | 1500, | 13500.66, | Dennis Reynolds
Our Tools
We'll use two particular Node.js tools to accomplish our goal:
- The Node.js fs (file system) module, specifically the fs.createReadStream() method
- The npm package, csv-parser, which will convert our CSV into JSON
Since the fs
module is native to Node.js, no external packages are needed. For our csv-parser npm package, go ahead and install it by running $ npm install csv-parser
in your terminal.
Let's start off our app.js
code by importing both with require
:
// app.js
const csvParser = require('csv-parser');
const fs = require('fs');
Importing csv-parser
this way will allow us to call csvParser()
as a method, which will allow us to pipe in its data row-by-row. Now we're ready to dive in!
Reading the CSV File
Let's read some external data!
The CSV data above is currently stored in example_data.csv
, located in the same directory as app.js
. Let's create a variable that holds the filepath:
// app.js
const filepath = './example_data.csv'
Now let's put that fs.createReadStream() method to use:
fs.createReadStream(filepath)
.on('error', () => {
// handle error
})
.pipe(csvParser())
.on('data', (row) => {
// use row data
})
.on('end', () => {
// handle end of CSV
})
Let's break down each step:
-
fs.createReadStream(filepath)
initializes thefs.ReadStream
object -- since this is under-the-hood, we won't interact withfs.ReadStream
explicitly. -
.on('error', () => {} )
checks for errors with the given filepath before we start trying to pipe in its data. -
.pipe(csvParser())
begins to pipe data into our ReadStream, which is now listening for the next two events:-
.on('data', (row) => {} )
returns each line of the CSV row-by-row, accessible in its callback asrow
. -
.on('end', () => {} )
listens for the end of the CSV. You can use this event to call methods you need after the entire file is read.
-
csv-parser converts CSV data to JSON
Let's take a look at how csv-parser
is converting each row into usable JSON data.
Remember, the first line of a CSV is recorded as the column-headers! In our JSON, this will show up as the keys of each row
Object, with their values being the data in their respective fields. This will make accessing the data in each row much simpler.
Here's our complete code:
const csv = require('csv-parser');
const fs = require('fs');
const filepath = "./example_data.csv"
fs.createReadStream(filepath)
.on('error', () => {
// handle error
})
.pipe(csv())
.on('data', (row) => {
console.log(row);
})
.on('end', () => {
// handle end of CSV
})
Running $ node app.js
outputs:
// $ node app.js
Row {
'PURCHASE DATE': '2016-04-03',
'CANDY PURCHASED': '1000',
'CASH PAID': '10000.11',
'BUYER NAME': 'Charlie Kelly' }
Row {
'PURCHASE DATE': '2017-11-14',
'CANDY PURCHASED': '1000',
'CASH PAID': '12000.22',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-01-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.33',
'BUYER NAME': 'Frank Reynolds' }
Row {
'PURCHASE DATE': '2018-03-20',
'CANDY PURCHASED': '2000',
'CASH PAID': '40000.44',
'BUYER NAME': 'Mac' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '2000',
'CASH PAID': '50000.55',
'BUYER NAME': 'Sweet Dee' }
Row {
'PURCHASE DATE': '2019-01-02',
'CANDY PURCHASED': '1500',
'CASH PAID': '13500.66',
'BUYER NAME': 'Dennis Reynolds' }
A couple things to note:
-
csv-parser
does some pretty-printing for itsconsole.log()
-ing of the row-JSON. Keep this in mind if you have a challenge where the console output must match a specific format! - The keys from the first row have been stored as Strings, so in order to access each value, we must use the bracket notation:
row["PURCHASE DATE"]
. If the titles in the first row are single words with no special characters or spaces (i.e. if we had DATE instead of PURCHASE DATE in the first row), they can be accessed with dot notation:row.DATE
Using the row data
Let's revise our .on('data', (row) => {} )
callback to use the row's data, and print out a sentence describing each candy-purchase:
.pipe(csv())
.on('data', (row) => {
let str = `${row["BUYER NAME"]} bought ${row["CANDY PURCHASED"]} pieces of candy on ${row["PURCHASE DATE"]} and paid $${row["CASH PAID"]}.`;
console.log(str)
})
Of course, you could use destructuring on each JSON-row to avoid writing out row["PURCHASE DATE"]
and the other verbose key names inside the string...but I'll save that for another article. ;)
Running $ node app.js
now outputs:
// $ node app.js
Charlie Kelly bought 1000 pieces of candy on 2016-04-03 and paid $10000.11.
Frank Reynolds bought 1000 pieces of candy on 2017-11-14 and paid $12000.22.
Frank Reynolds bought 2000 pieces of candy on 2018-01-20 and paid $40000.33.
Mac bought 2000 pieces of candy on 2018-03-20 and paid $40000.44.
Sweet Dee bought 2000 pieces of candy on 2019-01-02 and paid $50000.55.
Dennis Reynolds bought 1500 pieces of candy on 2019-01-02 and paid $13500.66.
Voila! CSV parsed!
Conclusion
Working with CSVs in Node.js is very easy and straightforward with the csv-parser
package. However, there are plenty of other ways to approach this, including numerous other npm package options (surprise surprise), as well as purely using the fs
module.
This particular code snippet from Dev.to user ItsASine does a great job of illustrating how to use fs
only -- long story short, use fs.createReadStream()
to return a Promise that resolves to a utf-8 encoded String. However, you MUST handle the first row headers and newline characters manually!
Have any other strategies to recommend? Please feel free to share and comment below!