Concatenation and garbage

General discussion about LÖVE, Lua, game development, puns, and unicorns.
Post Reply
User avatar
pgimeno
Party member
Posts: 3548
Joined: Sun Oct 18, 2015 2:58 pm

Concatenation and garbage

Post by pgimeno »

I wanted to share something that I've discovered about how Lua handles concatenation, which may be useful if you worry about writing optimized code.

I noticed in the Lua manual that the concatenation operator is right associative. The operation itself is associative: you get the same result from ("a".."b").."c" as for "a"..("b".."c"), therefore why did Lua prefer the latter?

So I took a look at the generated bytecode and discovered the answer. Right associativity forces the operands to be all put into consecutive registers once each is evaluated. Then Lua, instead of issuing a pairwise concatenation for each pair of operands, issues a concatenation for all operands at a time. For example, in the case of "a".."b".."c", instead of concatenating "b" and "c" and generating an intermediate result "bc", and then concatenate "a" with it to obtain "abc", it does all three at a time, and therefore there's no generation of intermediate strings that would need to be garbage-collected at some point otherwise.

This can be easily verified with this program:

Code: Select all

local x
x = "a" .. "b"
x = "a" .. "b" .. "c"
x = "a" .. ("b" .. "c") -- equivalent to plain concatenation
x = ("a" .. "b") .. "c" -- bad - intermediate string generated
luac and luajit both produce equivalent bytecode, and luac's is a bit easier to understand because the opcodes are not so abbreviated as LuaJIT's, so here's the output of luac:

Code: Select all

	1	[1]	LOADNIL  	0 0
	2	[2]	LOADK    	1 -1	; "a"
	3	[2]	LOADK    	2 -2	; "b"
	4	[2]	CONCAT   	0 1 2
	5	[3]	LOADK    	1 -1	; "a"
	6	[3]	LOADK    	2 -2	; "b"
	7	[3]	LOADK    	3 -3	; "c"
	8	[3]	CONCAT   	0 1 3
	9	[4]	LOADK    	1 -1	; "a"
	10	[4]	LOADK    	2 -2	; "b"
	11	[4]	LOADK    	3 -3	; "c"
	12	[4]	CONCAT   	0 1 3
	13	[5]	LOADK    	1 -1	; "a"
	14	[5]	LOADK    	2 -2	; "b"
	15	[5]	CONCAT   	1 1 2
	16	[5]	LOADK    	2 -3	; "c"
	17	[5]	CONCAT   	0 1 2
	18	[5]	RETURN   	0 1
Lua and LuaJIT bytecode works with registers numbered starting from 0. The CONCAT operator (CAT in luajit) takes three arguments: destination register, first register to concatenate, last register to concatenate; it concatenates all registers between the first and the last, converting registers to string where necessary, and places the result in the destination.

Lines 2-4 correspond to x = "a".."b". Line 2 stores "a" into register 1, and line 3 stores "b" into register 2. Line 4 concatenates registers 1 to 2 and places the result in register 0 (which is the register that x is assigned to).

Lines 5-8 correspond to x = "a".."b".."c". Lines 5-7 store the constants into registers 1 to 3, and note how line 8 concatenates all three registers, storing the result into x. This way, no intermediate string is generated during concatenation.

Lines 9-12 correspond to x = "a"..("b".."c"), and are essentially identical to 5-8, thus demonstrating the right associativity of the operator.

Lines 13-17 correspond to x = ("a".."b").."c". Since we're forcing left-associativity, a concatenation must happen when both "a" and "b" are evaluated. This concatenation involves two registers, and results in the intermediate string "ab". This result is put in register 1. Then the constant "c" is loaded into register 2, and the two strings "ab" and "c" are concatenated in line 17, storing the result in register 0. This way, an intermediate string has been generated, "ab", which is somewhere in the object list and will need to be garbage collected.

I don't know if the intermediate strings that are created when implicitly calling tostring() in arguments that are not already strings, count as garbage-collectable objects or are somehow directly placed in the internal concatenation buffer without generating an object. I'd need to dig into the Lua code to find out.

tl;dr: Don't worry about generating garbage for the intermediate results of concatenation, when concatenating many pieces of a string in a single statement. The only potential cause of concern is to fill the register bank. If that happens, you can group smaller concatenations like this: (string1 .. string2 .. ... .. stringK) .. stringKplus1 .. stringKplus2 .. stringKplus3 .. ... .. stringN
Last edited by pgimeno on Wed Sep 18, 2019 12:56 pm, edited 2 times in total.
User avatar
raidho36
Party member
Posts: 2063
Joined: Mon Jun 17, 2013 12:00 pm

Re: Concatenation and garbage

Post by raidho36 »

pgimeno wrote: Wed Sep 18, 2019 11:32 am Don't worry about garbage generation when concatenating many pieces of a string in a single statement.
That's a bold statement considering that you don't actually know whether concatenating strings like this produce garbage or not.
User avatar
pgimeno
Party member
Posts: 3548
Joined: Sun Oct 18, 2015 2:58 pm

Re: Concatenation and garbage

Post by pgimeno »

Fair enough. I've edited the statement to be more precise.

The source code clearly gathers as many as possible without generating intermediate concatenated strings. https://www.lua.org/source/5.1/lvm.c.html#luaV_concat

From the source of luaV_tostring (https://www.lua.org/source/5.1/lvm.c.html#luaV_tostring) it's now clear that it does generate intermediate strings when numbers are implicitly converted to string.
User avatar
raidho36
Party member
Posts: 2063
Joined: Mon Jun 17, 2013 12:00 pm

Re: Concatenation and garbage

Post by raidho36 »

Also the same thing happens in table.concat function. Note however that all of this is implementation-defined.
User avatar
pgimeno
Party member
Posts: 3548
Joined: Sun Oct 18, 2015 2:58 pm

Re: Concatenation and garbage

Post by pgimeno »

LuaJIT is the one we're interested in here. I've shown that it generates bytecode consistent with right associativity and that it has an opcode that joins several strings at once, like Lua does. I don't need to check the LuaJIT source code in order to infer that Mike Pall realized that this optimization is very important in order to avoid intermediate strings. If he wasn't interested in performing it, he was free to dismiss the ranges in the bytecode and use pairs of registers instead.
monolifed
Party member
Posts: 188
Joined: Sat Feb 06, 2016 9:42 pm

Re: Concatenation and garbage

Post by monolifed »

string.format looks cleaner and probably uses sprintf (which might be better than a train of luaconcats)
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 32 guests