News and Updates on X
SudokuWiki.org
Strategies for Popular Number Puzzles

Sudoku String Definitions 2.0

I'm excited to introduce a new scheme to capture and transport the board and progress in minimal fashion for Sudoku and many other puzzles including Killers, KenKen and Str8ts. I've been collaborating with Quintin May a.k.a. Sasha behind the scenes to come up with a better approach. This page and the follow-up page on Killer/KenKen definitions is a very compressed overview of the approach - but enough to explain how. Quintin has created a superb and comprehensive documentation library - and if you intend to code - I recommend you browse the following: There are also some Test Strings built into links to each solver.

The problems with the old approach

The chief deficits of version 1 were:
  • A single candidate note was being interpreted as a solved cell
  • Unable to expand the scheme for puzzles with jigsaw boxes
  • Unable to expand the scheme for operators in clues like KenKen and KenDoku
  • No version number in case of changes to the definition
The solvers will continue to support the old packed definitions for the time being since there will be many hard coded links to the solvers out in the wild or stored in databases.

The unpacked 81 character sudoku definition remains the same as before in all places and uses. You can continue to import a Sudoku with 81 numbers (and other symbols like '.' or '_' for blanks). I will be building a more comprehensive importer in the near future to cope with line breaks and other more 'noisy' puzzle definitions.

Version 'B'

So lets dive into the new version for Sudoku including Jigsaw Sudoku, Sudoku X, Windoku and Coloured Sudoku. The next page will cater for Killer, KenKen and KenDoku. Lets consider the sorts of information we want to transport:
  • All puzzles need the ability to store progress - which consists of solved cells and candidates.
  • We should also retain which large numbers are clues.
  • It will be very helpful to include a version header to give vital context for interpreting the data
That is all for Sudoku and variants with extra constraints like Sudoku X. However other puzzles contains more information we will need to encode:
  • Jigsaw Sudoku has irregular boxes. Our new approach moves away from a shape number as this is peculiar to SudokuWiki.org and not universal.
  • For Killers and variants we have the concept of cages and clue operands (+ for Killer, -, x and / for others)
  • For Str8ts we need to encode the black cells
The following table lays it all out. We can also predict the minimum number of bytes per cell the new packed strings will achieve. (Str8ts is still under development).

Leave Out
Encode

Prefix

Clues

Solved

Notes

Boxes

Cages
Clue
Operands
Black
Cells
Bytes
/ Cell
Sudoku S Yes Yes Yes Regular - - - 2
Sudoku X X Yes Yes Yes Regular - - - 2
Windoku W Yes Yes Yes Regular - - - 2
Colour Sudoku C Yes Yes Yes Regular - - - 2
Jigsaw Sudoku J Yes Yes Yes Irregular - - - 3
Killer Sudoku L No Yes Yes Regular Yes - - 4
Killer Jigsaw M No Yes Yes Irregular Yes - - 5
KenKen K No Yes Yes - Yes 4 (+-x/) - 4
KenDoku D No Yes Yes Regular Yes 4 (+-x/) - 4
Str8ts T Yes Yes Yes - - - Yes ?
Str8ts X ? Yes Yes Yes - - - Yes ?
Str8ts B ? Yes Yes Yes Regular - - Yes ?
Str8ts BX ? Yes Yes Yes Regular - - Yes ?
OFF SET SHIFTED SHIFTED SHIFTED SHIFTED

The Header

Each packed string definition will be prefixed by three characters.
  • A puzzle type letter
  • + board size (exclusively 6 or 9 at the moment)
  • + version letter. Since this is version 2 we're going with 'B'.
Sudoku and any variants with a fixed pattern or extra constraint will have the same packed definition. We can tell what variant by the first letter of the packed string (yellow column).

The Body

As in the previous packed version we store candidates as bits. (1=1, 2=2, 3=4, 4=8, 5=16, 6=32, 7=64, 8=128 and 9=256). The key difference between this packing method and the previous is that we're going to create smaller numbers for the clues/solved/candidates. In fact they are going to be no larger than 511+18. Previously we shifted the bits to make room for the clue flag meaning the largest number was 1022. However, the new method recognises that a clue and a solved cell and candidates are mutually exclusive.
  • We can store a clue number (1 to 9) as 1 to 9
  • We can store a solved call (1 to 9) as 10 to 18 - offset.
  • And candidates can be any number between 1 and 511 + 18
It so happens that the largest number 529 can be stored in 10 bits. All this fits in two bytes. For Jigsaws we need three bits to store the box number. So we shift the box number 10 bits and add it to the clues/candidates number. Finally we convert to base 36 and pad with zeros. Jigsaws fit produce cell strings 3 bytes long.

All the data for each cell will fit into a single integer. But we're not using all 64 bits. Only the last 16 (two bytes)
unused
|  |box |cell no |
000 010 1010010001
|byte 1 | byte 2 |

A bit of javascript code shows how this is done

for (y = 0; y < 9; y++)
for (x = 0; x < 9; x++) {
n = get the cell value or set of candidates as bits
if( candidates )
n += 18; // offset by 18
else if ( !clue )
n += 9; // offset by 9
if( jigsaw )
n += (box_num << 10); // shift 10 bits and add
h = n.toString(36); // convert to base 32
if (h.length < 2) h = '0' + h; // pad the number if not 2 digits
if (jigsaw && h.length < 3) h = '0' + h; // pad if not 3 digits
s += h; // append to string being made
}

To unpack the string (after checking the version is S9B, X9B, W9B, C9B or J9B) we do the following
// This splits a string into an array of elements each 2 characters long 
var narr = theString.match(/.{1,2}/g);   // should be {1,3} for Jigsaws
for (y = 0; y < 9; y++)
	for (x = 0; x < 9; x++) {
		p = parseInt(narr[y * 9 + x], 36); // convert base 36 to decimal
		n = p & 1023; // first 10 bits
		clue(y, x) = (n < 10) ? 1 : 0; // extract the clue flag
		if (n < 18) {
			if (n > 9 ) n -= 9; { // removed 'solved cell' offset
			// save the number as a clue or solution
		} else {
			// save the number as a set of candidates
			n -= 18;
		}
		p >>= 10; // remove first 10 bits
		 // p is the box number for this cell
		 // Will be 0 for vanilla Sudoku
	}

Click on the link to read about how the scheme extends to Killer, KenKen and KenDoku



Comments

Your Name/Handle

Email Address - required for confirmation (it will not be displayed here)

Your Comment

Please enter the
letters you see:
arrow
Enter these letters Remember me


Please ensure your comment is relevant to this article.
Email addresses are never displayed, but they are required to confirm your comments. When you enter your name and email address, you'll be sent a link to confirm your comment. Line breaks and paragraphs are automatically converted - no need to use <p> or <br> tags.
CommentsTalk

... by: Sasha

Monday 27-Nov-2023
This algorithm inadvertently solves Naked Singles. The representation for a clue or solution is the same as a set containing a single candidate (bit_count(n) == 1).

If you encode a board containing a cell with a single candidate & then decode it, the candidate is solved.
Andrew Stuart writes:
This is true. I am wondering if that is a problem or not.
Sasha replies:Friday 11-Oct-2024
Because this is meant to be an interchange format for sharing & discussing Sudoku boards this loss of fidelity just didn't sit right. I've taken your algorithm & tweaked it a bit to solve this data loss. With your algorithm we run out of bits due to the doubling for the clue flag. Rather than doubling, I use offsets.

When encoding:
- empty cell: "00"
- value: zero padded base32(value)
- clue: zero padded base32(clue + 10)
- candidates: base32(base10(candidate bit pattern) + 20)

When decoding we deal with numeric ranges. Translate the base32 character pair to base 10:
0: empty cell
1...9: value
11...19: clue = value - 10
otherwise: candidates = decode bits (value - 20)

Still 2 characters per cell but with full fidelity.
Andrew Stuart writes:
Thank you sooo much for your patience working with me on these new definitions. It's great to see how small we have made the final strings considering how much information some of he larger complex puzzles contain. Beautiful ideas!
Add to this Thread

... by: tebo

Thursday 8-Jun-2023
I entered the Sudoku String below into "Import a Sudoku", and the Puzzle rendered on the Board is exactly as I intended. I clicked "Solve Path" and Step 1 indicates Elimination of Candidates that are already Eliminated. So, I think your solver backed up a couple steps from the current State.

I'm trying to determine which Strategy handles the almost Deadly Pattern at (29)R16C79.

0m4e4cog1121k084g41k544403o0ggs409208121g1400409020g10g4o4a4110hg6082240h4hc28g4g2400h2281410g03200980g411g409k04ggg201184840321868k8k410m10g109g6o61108o2g621410g
Andrew Stuart writes:
Just tried this myself and didn't see any change in the board with the first step. Not sure why we see differently.
Add to this Thread

... by: tebo

Thursday 25-May-2023
You may have a copy/paste error in your unpack code: parseInt(narr[j * 9 + i], 32) => parseInt(narr[y * 9 + z], 32). Also, I'm not sure which board the new link (bd-vu480...) refers to.

This is very elegant!

- It captures both the Given puzzle and the current board state (i.e., by tracking the Clue in the last bit).
- It saves a lot of effort when transferring a puzzle between my solver/player and your solver/player.
- It enables more efficient storage of puzzles in the database.

This should be a community standard. Kudos.
Andrew Stuart writes:
Appreciate the comments Tebo :)
And Doh! Well spotted. I don't know why I used i and j in my code originally when x and y make more sense. Thanks
Add to this Thread

... by: Pieter, Newtown, Oz

Saturday 22-Apr-2023
Hi Andrew
Ingenious! 😃

However, where you have given the random example above, may I suggest you show both the old and the new definitions, for comparison? E.g. Using the X-Wing Example 1 as an example:

81 char string:
https://www.sudokuwiki.org/sudoku.htm?bd=100000569492056108056109240009640801064010000218035604040500016905061402621000005

162 char string, with candidates:
https://www.sudokuwiki.org/sudoku.htm?bd=03c848csc4cs1121g10hg005481020024881c8112002c0g1040h485848g0210h4481140350200gs403c4k81448050281k0091120k00gc80h4811s4cck80320g1c810c820020hc805210503cos0cok8s811

Ciao, Pieter 🙂
REPLY TO THIS POST
Article created on 18-January-2023. Views: 9866
This page was last modified on 20-November-2024.
All text is copyright and for personal use only but may be reproduced with the permission of the author.
Copyright Andrew Stuart @ Syndicated Puzzles, Privacy, 2007-2024