Regular Expressions

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Regular Expressions

Post by Roland_Yonaba »

Incredingly powerful ...

I am pretty sure mixing them up with string pattern matching functions may let you unlock the true power of Lua...
Fact is, I am trying to master them... Become a kinda dark Lord of Pattern Matching... :cool:
But hey, I am far from that. Actually when I see things like
(.^$[%s])
, I go mad... :awesome:

I am planning to work on stuff that will require parsing files, plain text, data, querying for specific information...
Chapter 20 of PIL covers the topic a whole lot, but that is definitely not enough to me...

Does anyone have some useful links, or hints, tricks to help me better understand how they work ?
And exercices too, would be greatly appreciated. Thanks in advance...
User avatar
kikito
Inner party member
Posts: 3153
Joined: Sat Oct 03, 2009 5:22 pm
Location: Madrid, Spain
Contact:

Re: Regular Expressions

Post by kikito »

Here are some pages with good examples of operations done with Lua regexps:

http://lua-users.org/wiki/StringRecipes
https://github.com/BlackBulletIV/strong

Also, keep in mind Jeff Atwood words: Regular Expressions are nice, but overuse of regular expressions is evil.

By the way, Lua's regular expressions aren't "real regular expressions". Some people call them "patterns" just to make that distinction more evident.
When I write def I mean function.
User avatar
trubblegum
Party member
Posts: 192
Joined: Wed Feb 22, 2012 10:40 pm

Re: Regular Expressions

Post by trubblegum »

Don't be too disappointed when, once you've spent time and energy learning about regex, you end up finding out that there is usually a better way to do what you're trying to do.
"I can do that with regex" is not the same as "I need a lib the size of Utah to do this".
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Regular Expressions

Post by Roland_Yonaba »

trubblegum wrote:Don't be too disappointed when, once you've spent time and energy learning about regex, you end up finding out that there is usually a better way to do what you're trying to do.
"I can do that with regex" is not the same as "I need a lib the size of Utah to do this".
Example ?
User avatar
trubblegum
Party member
Posts: 192
Joined: Wed Feb 22, 2012 10:40 pm

Re: Regular Expressions

Post by trubblegum »

Roland_Yonaba wrote:
trubblegum wrote:Don't be too disappointed when, once you've spent time and energy learning about regex, you end up finding out that there is usually a better way to do what you're trying to do.
"I can do that with regex" is not the same as "I need a lib the size of Utah to do this".
Example ?
Not if I don't have to.
How about this instead .. if I find a case where I need it, I'll ask you to write it for me .. as an exercise :P
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Regular Expressions

Post by Roland_Yonaba »

Well, why not ?
Thanks.

@kikito : BlackBullet's library seems very nice. I loved the way he tried to reproduce Ruby's functions as well.
I'll take a look at the code, might be really helpful to me.
User avatar
Mud
Citizen
Posts: 98
Joined: Fri Nov 05, 2010 4:54 am

Re: Regular Expressions

Post by Mud »

Roland_Yonaba wrote:when I see things like (.^$[%s]) I go mad...
Regular expressions are much harder to read than write.

Don't let scary ones intimidate you. You'll be able to write those quite easily yourself, long before (through practice) you're able to easily read them. There's almost nothing to them:

WHAT TO MATCH (atoms)
Most character matches themselves.
. matches any character.
%s, %w, et al. match a character in a class of characters, such as whitespace or word characters
%S, %W, et al. match a character NOT in class, such as NON whitespace, or NON word characters
[x-y] matches a character in a given set, such as any character between x and y
[^x-y] matches a character NOT in a given set, such as any character NOT between x and y

HOW MANY TO MATCH (qualifiers)

? says to match the previous atom 0 or 1 times
* says to match the previous atom 0 or more times
+ says to match the previous atom 1 or more times

ANCHORING
^ anchors a pattern to the start of the input
$ anchors a pattern to the end of the input

GROUPING/CAPTURES

Putting part of an expression in parenthesis "captures" a submatch, which can later be referred to by %1, %2, %3, ect.

That's it. The majority of regex (or in this case, Lua 'patterns'). There's more to know (greedy vs non-greedy matching, matching a specific number of atoms, etc.) and deeper features in some implementations, but that's most of what you use in most cases.

EXAMPLE

Say we wanted to find social security numbers in some input, and we know they're always formatted like XXX-XXXX-XXX.

So three digits, dash, four digits, dash, 3 digits: %d%d%d-%d%d%d%d-%d%d%d.

What if it was a part number, in a similar format, but the number of digits in each group is unknown? That's when we use a qualifiers. %d = single digit, %d+ = 1 or more digits, %d+-%d+-%d+ = three groups of 1 or more digits separated by dashes.

What if we need to parse out the three sections of the part number? Just put parenthesis around them to capture them: (%d+)-(%d+)-(%d+)

What if we're parsing lines that may have more than one part number, but the one we need to read is always at the end of the line? Just anchor the pattern to the end of the line: (%d+)-(%d+)-(%d+)$

Oops, it turns out the part number can start with a # character, but it's optional. That would be #? which means 0 or 1 # characters. #?(%d+)-(%d+)-(%d+)$

So on and so forth. You build up a pattern a bit at a time, getting parts of it to work then putting them together. You end up with a scary looking pattern, but the individual parts are all very simple.
trubblegum wrote:Don't be too disappointed when, once you've spent time and energy learning about regex, you end up finding out that there is usually a better way to do what you're trying to do.
You may find them a bad fit for the problem that motivated you to learn them, because you're not aware of their limitations.

Once you know them, you find situations all the time where they are a perfect fit.
trubblegum wrote:"I can do that with regex" is not the same as "I need a lib the size of Utah to do this".
Fortunately most modern languages have native support or standard library support for regex (or in Lua's case, an ultra-minimal variant). They're just that useful.
User avatar
trubblegum
Party member
Posts: 192
Joined: Wed Feb 22, 2012 10:40 pm

Re: Regular Expressions

Post by trubblegum »

Mud wrote:Fortunately most modern languages have native support or standard library support for regex (or in Lua's case, an ultra-minimal variant).
This appears to be true.

PS : Hey, I found you one.
Rewrite this to use pattern matching instead of its current table-based approach : viewtopic.php?p=54924#p54924
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Regular Expressions

Post by Roland_Yonaba »

@Trubblegum:

Hey, I hope that this should fit...
I didn't intend to rewrite everything (i'm a lazy people :ultrahappy: ) but I figured out I didn't have to re-write the random word selection.
Just equivalences to the explode and implode routines...

So,

Code: Select all

function explode(str)
   --assert(type(str)=='string','str should be a string')
   local tstr, guess = {}, {}
      for letter in str:gmatch('%a') do
      table.insert(tstr,letter)
      table.insert(guess,' _')
      end
   return tstr,guess
end
Or maybe, if we want to handle punctuation characters (as the hyphen) and space-characters (I expect that case,the player won't be asked to guess these characters, they will be showed at the first sight)...

Code: Select all

function explode(str)
   -- assert(type(str)=='string','str should be a string') 
   local tstr, guess = {}, {}
      for letter in str:gmatch('[%a%p%s]') do
      table.insert(tstr,letter)
      table.insert(guess,' _')
      end
   return tstr,guess
end
And a simple equivalence to the implode routine...

Code: Select all

function implode(tstr)
   -- table.foreach(t,function(i,v) assert(type(v)=="string",'not a table of strings!') end)
   return table.concat(tstr)
end
By the way, sorry if i'm asking a silly question, but why using alphanumeric integers (byte code of characters) , instead of acting directly upon characters themselves ? Seems to me it overcomplicates the problem.
User avatar
trubblegum
Party member
Posts: 192
Joined: Wed Feb 22, 2012 10:40 pm

Re: Regular Expressions

Post by trubblegum »

Not what I was looking for. What I want is a solution to the problem of the game of hangman, which eliminates the word and guess tables altogether, and uses pattern matching to operate directly on strings.
You should be able to do it by replacing the elseif with a single line (not counting the reset button push, of course).
There are no spaces or punctuation in hangman.
You could get bonus points for clever manipulation of a flat file database of words, again, using only pattern matching - no love.filesystem.lines().
Roland_Yonaba wrote:why using alphanumeric integers (byte code of characters) , instead of acting directly upon characters themselves ?
Are you talking about this?

Code: Select all

if code > 96 and code < 123 then -- could be key:find('%l')
If so, I'm not :

Code: Select all

if v == key then guess[i] = ' ' .. v end
I'm only using it to check that the key pressed is lower-case alphabetical, and suggested an alternative method.

Edit : solution must be able to easily accommodate score keeping.
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 215 guests