Find and Replace Text with SQL Regular Expressions in Rockset

In our first blog, we used a regular expression to replace the quotes in genres. Afterward, we were able to UNNEST() the JSON object. We’ll be working with the same data set in this blog

In our data:

Embedded content: https://gist.github.com/nfarah86/ef1cc9da88e56226c4c46fd0e3c8e16e

there is a JSON string that’s called spoken_languages, and it’s formatted similarly to genres:

[ { "spoken_languages": "[{'iso_639_1': 'fr', 'name': 'Français'}]" }]

Assuming everything is consistent, we can just write the SQL statement similar to what we wrote for genres— right?

Find and replace text using regular expression in SQL - drums

Wrong ⛔️

We actually get a parsing error:
json parse error on line 0 near `x9akai”}, {“iso_’: unknown escape 120 in string

So, we sort of know what the culprit is here, but we don’t know if there are going to be more parsing errors. Let’s go ahead and debug this, so we can get this SQL statement working!

We can actually use REGEXP_LIKE() to see what exactly is causing the error:

SELECT
    spoken_languages
FROM
    commons.TwtichMovies t
WHERE REGEXP_LIKE(t.spoken_languages,'x9akai')

This is a sample of what we get back:

Find and replace text using regular expression in SQL - sample query

This is great— it looks like the \ are actually causing some issues. We can use REGEXP_REPLACE() to replace those slashes with an empty string:

SELECT (REGEXP_REPLACE('Lietuvi\\x9akai', '\\',''));

Now— to put it all together: How do we UNNEST() spoken_languages and fix the 2 issues at hand (the string format and the double slashes)?

Hint:

 SELCT REGEXP_REPLACE(REGEXP_REPLACE('[{''iso_639_1'': ''lt'', ''name'':''Lietuvi\\x9akai''}]', '''', '"'), '\\', '');

I’m sure you can take it to the finish line from here, but just in case, you can watch this youtube link down below to catch the full replay!

Embedded content: https://youtu.be/8aHgJrQjT4U

TLDR: you can find all the resources you need in the developer corner.