logoNavigate back to the homepage
ARCHETYPER

Papaparse

BM

·

 
August 15th, 2020 · 5 min read

Papa Parse

Overview

With a plethora of options to parse CSV files/data and adding to it the inconsistency of the data present in files, ever wondered about a simple and efficient package to do it for you? Presenting Papa Parse, a new and upcoming javascript library which is claimed to be the fastest in-browser CSV parser! This is your one stop shop for parsing CSV to JSON!

Highlights

Before getting into the features of Papa Parse, let us look at how we can include this package in our code:

1/* babel or ES6 */
2import papaparse from ‘papaparse’;
3
4/* node or require js */
5const papaparse = require(‘papaparse’);

The general syntax of use -

For a CSV string -

1var parsedOutput = Papa.parse(stringOfCsv[, config])

There are numerous configurations to choose from, best explained in the Papa Parse documentation here.

For a file -

1Papa.parse(myFileInput.files[0], {
2 complete: function(parsedOutput) {
3 console.log(parsedOutput);
4 }
5});

As the portion of file parsing is an asynchronous activity, a call back must be added to collect the results.

For a URL which is provided for the CSV -

1Papa.parse(csvUrl, {
2 download: true,
3 complete: function(parsedOutput) {
4 console.log(parsedOutput);
5 }
6});

The same rule goes for parsing CSVs from urls where a callback has to be added.

The parsed output/result consists of three parts - data array, errors array and the meta object.

The data array has the result of the CSV rows parsed.

The data is an array only when the value is header: false in configs. If the header: true, then the output data is a set of objects keyed by the column field names.

The errors array contains the information on any errors which are encountered while parsing the CSV. The meta object is an object consisting of metadata related to the parsing such as delimiters, line break sequences and field names to name a few.

Auto delimiter detection -

There could be many scenarios in which you wouldn’t be sure of the delimiter used in the CSV. Not to worry! Papa Parse has an auto delimiter detection feature in which the first few rows of the CSV are scanned to automatically figure out the delimiter used in the CSV file.

The delimiter which was considered for parsing can always be checked in the result output’s meta object under the delimiter field.

1var output = Papa.parse(stringOfCsv); // input: a,b,c,d,e
2console.log(output.meta.delimiter); // delimiter: ,

If you don’t want to have auto detection of delimiters but want to provide a range of delimiters to guess from while parsing the CSV, there’s a config option called delimitersToGuess which takes in a list of delimiters provided as input. The default value for delimitersToGuess is -

1delimitersToGuess : [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP]

Where Papa.RECORD_SEP and Papa.UNIT_SEP are read-only properties used to represent the ASCII Code 30 and ASCII Code 31 respectively as delimiters.

Ability to parse huge file inputs -

If the input file is really huge, then Papa Parse has the ability to ‘stream’ the input data and provide the output row-by-row. Doing this will avoid loading the whole file into memory which would otherwise crash the browser. The step function should be provided as a config which collects the result for each row.

1Papa.parse("http://csvexample.com/enormous.csv", {
2 download: true,
3 step: function(row, parser) {
4 console.log("Row:", row.data);
5 },
6 complete: function() {
7 console.log("All done!");
8 }
9});

The second input to the step function is parser. The parser object can be used to abort, pause or resume the CSV parsing.

1parser.abort();
2parser.pause();
3parser.resume();

Do not use parser.pause() and parser.resume() while using Web Workers in your CSV parsing. We will get to what Web workers are in the next section.

Multithreading option in Papa Parse -

If you are worried that your webpage will become unresponsive because of a CSV parsing script running for a long time on the same thread, Papa Parse provides a configuration called worker which when set to true will ensure that a worker thread is used for the parsing of the CSV. Adding a worker thread might result in the parsing operation to slow down a little but will ensure that your website will remain responsive.

1Papa.parse("http://csvexample.com/enormous.csv", {
2 worker: true,
3 step: function(row) {
4 console.log("Row:", row.data);
5 },
6 complete: function() {
7 console.log("All done!");
8 }
9});

The worker thread is an extension of the default Worker interface provided by javascript.

Comments in your CSV?

However bizarre it sounds, if there are comments in your CSV which you would not want the browser to parse, you can add the config provided by Papa Parse called comments and set it to the comment string.

1Papa.parse("http://csvexample.com/csv.csv”, {
2 comments: “#”, // All lines starting with ‘#’ are treated as comments and ignored by the parser.
3 complete: function(parsedOutput) {
4 console.log(parsedOutput);
5 }
6});

Type Conversion in Papa Parse -

By default, all lines and fields are parsed as strings. But if you want to preserve the numeric and boolean types, Papa Parse provides an option called dynamicTyping to automatically enable the type conversion for your data.

1Papa.parse("http://csvexample.com/csv.csv”, {
2 dynamicTyping: true,
3 complete: function(parsedOutput) {
4 console.log(parsedOutput);
5 }
6});

If true, numeric and boolean data will be converted to their type instead of remaining strings. Numeric data must conform to the definition of a decimal literal. Numerical values greater than 2^53 or less than -2^53 will not be converted to numbers to preserve precision. European-formatted numbers must have commas and dots swapped. If also accepts an object or a function. If object it’s values should be a boolean to indicate if dynamic typing should be applied for each column number (or header name if using headers). If it’s a function, it should return a boolean value for each field number (or name if using headers) which will be passed as first argument.

Converting JSON to CSV format -

Another wonderful feature of Papa Parse is it’s ability to convert JSON to CSV. All this while, you would have come across the parse() function. But for this feature, Papa Parse provides the unparse() option.

The output of the unparse() is a neatly formatted string of CSV. The general syntax is -

1Papa.unparse(data[, config])

The data field can be an array of objects, an array of arrays or an object with header fields and data. The optional config for unparse(), much like the one for the parse() function has a wide range of options to choose from. You can check them out here.

Error Handling -

The last feature we will be discussing in this article is about the error handling by Papa Parse.

As mentioned at the top of the article, the parsed results consists of 3 components: data, errors and meta.

The errors array is structured in the following way:

1{
2 type: "", // A generalization of the error
3 code: "", // Standardized error code
4 message: "", // Human-readable details
5 row: 0, // Row index of parsed data where error is
6}

One way of extracting the errors -

1var results = Papa.parse(csvString);
2console.log(results.errors.<key_type>);

Even if you do encounter errors while parsing, that’s no indication that the parsing of the CSV file failed.

A few useful configs for Parsing

Some notable configs of Papa Parse for parsing which we will just mention here are -

newline - The newline sequence
quoteChar - The character used to quote fields
escapeChar - The character used to escape the quote character within a field
preview - If > 0, only that many rows will be parsed
transformHeader - A function to apply on each header. Requires header:true
chunk - A callback function, identical to step, which activates streaming

And many more :)

Bonus Utility Functions

Below are some react and angular hooks for using Papa Parse to parse CSV data -

For react -

1function useGoogleSheetData(url) {
2 const [rows, setRows] = useState([]);
3 useEffect(() => {
4 Papa.parse(url, {
5 download: true,
6 header: true,
7 complete: function(results) {
8 setRows(results.data);
9 }
10 }, [url]);
11 return rows;
12}
13
14and we would use it as:
15
16const rows = useGoogleSheetData("<my_csv_url>");

For angular -

1useGoogleSheetData = (url: string): Observable<any> => {
2 return new Observable((observer) => {
3 parse(url, {
4 download: true,
5 header: true,
6 complete: (result) => {
7 observer.next(result);
8 observer.complete();
9 },
10 error: (error) => {
11 observer.error(error);
12 observer.complete();
13 }
14 })
15 });
16};
17
18Can be used as below:
19
20this.useGoogleSheetData("<my_csv_url>").pipe(catchError((error) => {
21 console.error(error);
22 })).subscribe((data) => {
23 this.sheetData = data;
24 });
25}

Evaluation Metrics

CategoryRatingsDeliberations
Ease of useGoodExtensive cross platform support, zero dependencies, no specific configurations required, has support for node and react. Separate package for angular exists
CommunityAverageFrequent updates, but have a lot of outstanding issues. Active development
Active usageGoodMore that 650k weekly downloads at the time of writing
VulnerabilitiesGoodNo vulnarablities seen or raised
Docs and TrainingsGoodVery well written documentation. Official docs have detailed explanation of syntax and details of use and contain real world examples

Conclusion

Looking at the features described above for Papa Parse and many more it has to offer, it is beyond any doubt that this package is the real deal. The ability of Papa Parse to handle huge files and unstructured data, and it’s support for taking in readable streams as input(used in node.js) is what makes it stand out from the rest of the csv parsing packages.

In addition to the features mentioned above, there are many more features which this package provides. You can check them out here.

Hope you’ve got a good insight into what Papa Parse is all about and how you can use it for your future projects :)

Check out the package and some reading materials

Video review of the package

Video review of the package with interesting use cases and in-depth exploration of the features coming soon! For more related content, check out Unpackaged Reviews.

Disclosures

The content and evaluation scores mentioned in this article/review is subjective and is the personal opinion of authors at Unpackaged Reviews based on everyday usage and research on popular developer forums. They do not represent any company’s views and are not impacted by any sponsorships/collaboration.

Header Photo by Marc Sendra Martorell on Unsplash

More articles from Unpackaged Reviews

Moment

A Concise Date Handling Experience!

August 7th, 2020 · 6 min read

Tailwind

Build great interfaces without breaking a sweat

August 1st, 2020 · 4 min read
© 2020-2021 Unpackaged Reviews
Link to $https://twitter.com/unpakgd_reviewsLink to $https://unpackaged-reviews.medium.com