String split while ignoring delimiters inside quotes?

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
User avatar
Jasoco
Inner party member
Posts: 3725
Joined: Mon Jun 22, 2009 9:35 am
Location: Pennsylvania, USA
Contact:

String split while ignoring delimiters inside quotes?

Post by Jasoco »

How would I split a string of text while ignoring delimiters that are contained between certain characters? For example, my delimiter is a comma and I want it to ignore commas that are inside a string surrounded by quotes.

Say this is the string itself complete with quotation marks:

"This is a text message, it contains a comma", Answer #1, "Answer #2"

I want to split it into the three strings of text. BUT, note that Answer #1 do not have quotes and quotes are not required in all cases. What I want it to turn into is a table with three entries:

"This is a text message, it contains a comma"
Answer #1
"Answer #2"


If I just use string.split, it's going to turn the first part into two parts:

"This is a text message,
it contains a comma"


Which is not what I'd want. Surely there's a way to do this using grep or something?
User avatar
s-ol
Party member
Posts: 1077
Joined: Mon Sep 15, 2014 7:41 pm
Location: Cologne, Germany
Contact:

Re: String split while ignoring delimiters inside quotes?

Post by s-ol »

For 3 parts this regex should do: (untested)

([^"][^,]*,|"[^"]*",)([^"][^,]*,|"[^"]*",)([^"][^,]*|"[^"]*")

Note that it does capture the quotes if any are used. Also it's ugly and could be improved for readability and failure handling probably.

s-ol.nu /blog  -  p.s-ol.be /st8.lua  -  g.s-ol.be /gtglg /curcur

Code: Select all

print( type(love) )
if false then
  baby:hurt(me)
end
User avatar
Ref
Party member
Posts: 702
Joined: Wed May 02, 2012 11:05 pm

Re: String split while ignoring delimiters inside quotes?

Post by Ref »

I would just stripe out any commas (string.gsub (text, ',', '' )) and then use split to return a table.
My 2 cents.
User avatar
Jasoco
Inner party member
Posts: 3725
Joined: Mon Jun 22, 2009 9:35 am
Location: Pennsylvania, USA
Contact:

Re: String split while ignoring delimiters inside quotes?

Post by Jasoco »

Maybe I could modify my current string:split() function?

Code: Select all

function string:split(delimiter)
	local result = { }
	local from = 1
	local delim_from, delim_to = string.find( self, delimiter, from )
	while delim_from do
		table.insert( result, string.sub( self, from , delim_from-1 ) )
		from = delim_to + 1
		delim_from, delim_to = string.find( self, delimiter, from )
	end
	table.insert( result, string.sub( self, from ) )
	return result
end
davisdude
Party member
Posts: 1154
Joined: Sun Apr 28, 2013 3:29 am
Location: North Carolina

Re: String split while ignoring delimiters inside quotes?

Post by davisdude »

S0lll0s wrote:For 3 parts this regex should do: (untested)

([^"][^,]*,|"[^"]*",)([^"][^,]*,|"[^"]*",)([^"][^,]*|"[^"]*")

Note that it does capture the quotes if any are used. Also it's ugly and could be improved for readability and failure handling probably.
That wouldn't work since Lua doesn't use regular regex.

Here's what I've got working:

Code: Select all

str = '"asdf", abd, ", this is a test, This is a text message, it contains a comma", Answer #1, "Answer #2", asdf'

local separate
do
	local tab = {}
	local tempstr
	function separate( str )
		str:gsub( '[^,]+', 
			function( str )
				local occurrences = select( 2, str:gsub( '"', '' ) )
				if ( not occurrences ) or ( occurrences % 2 == 0 ) then
					if not tempstr then
						str = str:gsub( '^%s*(.*)$', function( str ) return str end )
						table.insert( tab, str )
					else
						tempstr = tempstr .. ',' .. str
					end
				else
					if not tempstr then
						str = str:gsub( '^%s*(.*)$', function( str ) return str end )
						tempstr = str
					else
						table.insert( tab, tempstr .. ',' .. str )
						tempstr = nil
					end
				end
			end
		)
		return tab
	end
end

results = separate( str )
for i = 1, #results do print( results[i] ) end
This could be expanded to include single quotes, but I don't really want to. It also doesn't do anything about unfinished strings or anything like that.
GitHub | MLib - Math and shape intersections library | Walt - Animation library | Brady - Camera library with parallax scrolling | Vim-love-docs - Help files and syntax coloring for Vim
User avatar
deströyer
Prole
Posts: 32
Joined: Thu Jun 27, 2013 7:59 pm
Contact:

Re: String split while ignoring delimiters inside quotes?

Post by deströyer »

As mentioned, Lua doesn't support full regexp (such an implementation would be larger than the full Lua implementation, according to the manual). Depending on how you will be processing the stringdata, you can use @davisdude's suggestion if you want to preprocess into a table, or you can write a custom iterator using a closure if you want to iterate directly through the string in a for-loop, for example.

Have you considered using a different delimiter (eg tab, semicolon, double-semicolon, pipe)? This should be possible if you are controlling the incoming data (eg. making game data). Having a delimiter also represent data is generally bad practice.

You can use string.gmatch in your string:split() function to save some lines. This also support multi-character delimiters (eg ";;") without any changes or while-loops.

Code: Select all

SplitString = function( stringData, delimiter )
	local result = { }
	local stringData = stringData .. delimiter -- to get the last entry
	for slice in string.gmatch( stringData, "(.-)" .. delimiter ) do
		table.insert( result, slice )
	end
	return result
end
User avatar
Jasoco
Inner party member
Posts: 3725
Joined: Mon Jun 22, 2009 9:35 am
Location: Pennsylvania, USA
Contact:

Re: String split while ignoring delimiters inside quotes?

Post by Jasoco »

Yeah, I guess I could just use two pipes since it's rare any normal string would really contain even one. (What is a pipe even for anyway?) Actually I am currently using pipes since it seems to be one of those characters no one really uses. I was just wondering if there was a way to do it this way.

Edit: Looking up the history of the pipe character on Wikipedia seems to imply that this is exactly what the character is meant to be used for anyway. "piping" things together and delimiting them.
User avatar
airstruck
Party member
Posts: 650
Joined: Thu Jun 04, 2015 7:11 pm
Location: Not being time thief.

Re: String split while ignoring delimiters inside quotes?

Post by airstruck »

Using a different delimiter is the way to go.

If your quotes are always balanced, though, the %b pattern does come in handy for this kind of thing. You could solve the original problem with %b and a sentinel, something like this:

Code: Select all

for match in s:gsub('%b""', function (x) return x:gsub(',', '@') end):gmatch('[^,]+') do
    print((match:gsub('@', ',')))
end
I used @ as a sentinel here because it stands out in this example. You could use | or some other character (or sequence) reserved for that purpose. Of course, that sort of leads to the conclusion that you could just use | (or whatever) in the first place, and not have to do anything with %b and sentinels.
User avatar
Kingdaro
Party member
Posts: 395
Joined: Sun Jul 18, 2010 3:08 am

Re: String split while ignoring delimiters inside quotes?

Post by Kingdaro »

Never mind.
Last edited by Kingdaro on Sat Dec 05, 2015 5:42 pm, edited 1 time in total.
User avatar
airstruck
Party member
Posts: 650
Joined: Thu Jun 04, 2015 7:11 pm
Location: Not being time thief.

Re: String split while ignoring delimiters inside quotes?

Post by airstruck »

Kingdaro wrote:The only one that fails is the fourth
Isn't that the same as the example in the OP, though? Many of the other examples you tested contain imbalanced quotes and should probably be considered invalid input.

Lua pattern matching might be inferior to regular expressions in most cases (Lua has no alternation or lookarounds), but in some special cases it more than makes up for that (regex has nothing like %b or %f). I think this is one of those special cases.
Post Reply

Who is online

Users browsing this forum: darkfrei and 198 guests