How would I split a string of text while ignoring delimiters that are contained between certain characters? For example, my delimiter is a comma and I want it to ignore commas that are inside a string surrounded by quotes.
Say this is the string itself complete with quotation marks:
"This is a text message, it contains a comma", Answer #1, "Answer #2"
I want to split it into the three strings of text. BUT, note that Answer #1 do not have quotes and quotes are not required in all cases. What I want it to turn into is a table with three entries:
"This is a text message, it contains a comma"
Answer #1
"Answer #2"
If I just use string.split, it's going to turn the first part into two parts:
"This is a text message,
it contains a comma"
Which is not what I'd want. Surely there's a way to do this using grep or something?
String split while ignoring delimiters inside quotes?
Forum rules
Before you make a thread asking for help, read this.
Before you make a thread asking for help, read this.
Re: String split while ignoring delimiters inside quotes?
For 3 parts this regex should do: (untested)
([^"][^,]*,|"[^"]*",)([^"][^,]*,|"[^"]*",)([^"][^,]*|"[^"]*")
Note that it does capture the quotes if any are used. Also it's ugly and could be improved for readability and failure handling probably.
([^"][^,]*,|"[^"]*",)([^"][^,]*,|"[^"]*",)([^"][^,]*|"[^"]*")
Note that it does capture the quotes if any are used. Also it's ugly and could be improved for readability and failure handling probably.
Re: String split while ignoring delimiters inside quotes?
I would just stripe out any commas (string.gsub (text, ',', '' )) and then use split to return a table.
My 2 cents.
My 2 cents.
- Jasoco
- Inner party member
- Posts: 3725
- Joined: Mon Jun 22, 2009 9:35 am
- Location: Pennsylvania, USA
- Contact:
Re: String split while ignoring delimiters inside quotes?
Maybe I could modify my current string:split() function?
Code: Select all
function string:split(delimiter)
local result = { }
local from = 1
local delim_from, delim_to = string.find( self, delimiter, from )
while delim_from do
table.insert( result, string.sub( self, from , delim_from-1 ) )
from = delim_to + 1
delim_from, delim_to = string.find( self, delimiter, from )
end
table.insert( result, string.sub( self, from ) )
return result
end
Re: String split while ignoring delimiters inside quotes?
That wouldn't work since Lua doesn't use regular regex.S0lll0s wrote:For 3 parts this regex should do: (untested)
([^"][^,]*,|"[^"]*",)([^"][^,]*,|"[^"]*",)([^"][^,]*|"[^"]*")
Note that it does capture the quotes if any are used. Also it's ugly and could be improved for readability and failure handling probably.
Here's what I've got working:
Code: Select all
str = '"asdf", abd, ", this is a test, This is a text message, it contains a comma", Answer #1, "Answer #2", asdf'
local separate
do
local tab = {}
local tempstr
function separate( str )
str:gsub( '[^,]+',
function( str )
local occurrences = select( 2, str:gsub( '"', '' ) )
if ( not occurrences ) or ( occurrences % 2 == 0 ) then
if not tempstr then
str = str:gsub( '^%s*(.*)$', function( str ) return str end )
table.insert( tab, str )
else
tempstr = tempstr .. ',' .. str
end
else
if not tempstr then
str = str:gsub( '^%s*(.*)$', function( str ) return str end )
tempstr = str
else
table.insert( tab, tempstr .. ',' .. str )
tempstr = nil
end
end
end
)
return tab
end
end
results = separate( str )
for i = 1, #results do print( results[i] ) end
GitHub | MLib - Math and shape intersections library | Walt - Animation library | Brady - Camera library with parallax scrolling | Vim-love-docs - Help files and syntax coloring for Vim
Re: String split while ignoring delimiters inside quotes?
As mentioned, Lua doesn't support full regexp (such an implementation would be larger than the full Lua implementation, according to the manual). Depending on how you will be processing the stringdata, you can use @davisdude's suggestion if you want to preprocess into a table, or you can write a custom iterator using a closure if you want to iterate directly through the string in a for-loop, for example.
Have you considered using a different delimiter (eg tab, semicolon, double-semicolon, pipe)? This should be possible if you are controlling the incoming data (eg. making game data). Having a delimiter also represent data is generally bad practice.
You can use string.gmatch in your string:split() function to save some lines. This also support multi-character delimiters (eg ";;") without any changes or while-loops.
Have you considered using a different delimiter (eg tab, semicolon, double-semicolon, pipe)? This should be possible if you are controlling the incoming data (eg. making game data). Having a delimiter also represent data is generally bad practice.
You can use string.gmatch in your string:split() function to save some lines. This also support multi-character delimiters (eg ";;") without any changes or while-loops.
Code: Select all
SplitString = function( stringData, delimiter )
local result = { }
local stringData = stringData .. delimiter -- to get the last entry
for slice in string.gmatch( stringData, "(.-)" .. delimiter ) do
table.insert( result, slice )
end
return result
end
- Jasoco
- Inner party member
- Posts: 3725
- Joined: Mon Jun 22, 2009 9:35 am
- Location: Pennsylvania, USA
- Contact:
Re: String split while ignoring delimiters inside quotes?
Yeah, I guess I could just use two pipes since it's rare any normal string would really contain even one. (What is a pipe even for anyway?) Actually I am currently using pipes since it seems to be one of those characters no one really uses. I was just wondering if there was a way to do it this way.
Edit: Looking up the history of the pipe character on Wikipedia seems to imply that this is exactly what the character is meant to be used for anyway. "piping" things together and delimiting them.
Edit: Looking up the history of the pipe character on Wikipedia seems to imply that this is exactly what the character is meant to be used for anyway. "piping" things together and delimiting them.
Re: String split while ignoring delimiters inside quotes?
Using a different delimiter is the way to go.
If your quotes are always balanced, though, the %b pattern does come in handy for this kind of thing. You could solve the original problem with %b and a sentinel, something like this:
I used @ as a sentinel here because it stands out in this example. You could use | or some other character (or sequence) reserved for that purpose. Of course, that sort of leads to the conclusion that you could just use | (or whatever) in the first place, and not have to do anything with %b and sentinels.
If your quotes are always balanced, though, the %b pattern does come in handy for this kind of thing. You could solve the original problem with %b and a sentinel, something like this:
Code: Select all
for match in s:gsub('%b""', function (x) return x:gsub(',', '@') end):gmatch('[^,]+') do
print((match:gsub('@', ',')))
end
Re: String split while ignoring delimiters inside quotes?
Never mind.
Last edited by Kingdaro on Sat Dec 05, 2015 5:42 pm, edited 1 time in total.
Re: String split while ignoring delimiters inside quotes?
Isn't that the same as the example in the OP, though? Many of the other examples you tested contain imbalanced quotes and should probably be considered invalid input.Kingdaro wrote:The only one that fails is the fourth
Lua pattern matching might be inferior to regular expressions in most cases (Lua has no alternation or lookarounds), but in some special cases it more than makes up for that (regex has nothing like %b or %f). I think this is one of those special cases.
Who is online
Users browsing this forum: darkfrei and 198 guests