Sudoku String Definitions 2.0
I'm excited to introduce a new scheme to capture and transport the board and progress in minimal fashion for Sudoku and many other puzzles including Killers, KenKen and Str8ts. I've been collaborating with
Quintin May a.k.a.
Sasha behind the scenes to come up with a better approach. This page and the follow-up
page on Killer/KenKen definitions is a very compressed overview of the approach - but enough to explain how. Quintin has created a superb and comprehensive documentation library - and if you intend to code - I recommend you browse the following:
There are also some
Test Strings built into links to each solver.
The problems with the old approach
The chief deficits of version 1 were:
- A single candidate note was being interpreted as a solved cell
- Unable to expand the scheme for puzzles with jigsaw boxes
- Unable to expand the scheme for operators in clues like KenKen and KenDoku
- No version number in case of changes to the definition
The solvers will continue to support the old packed definitions for the time being since there will be many hard coded links to the solvers out in the wild or stored in databases.
The unpacked 81 character sudoku definition remains the same as before in all places and uses. You can continue to import a Sudoku with 81 numbers (and other symbols like '.' or '_' for blanks). I will be building a more comprehensive importer in the near future to cope with line breaks and other more 'noisy' puzzle definitions.
Version 'B'
So lets dive into the new version for Sudoku including Jigsaw Sudoku, Sudoku X, Windoku and Coloured Sudoku. The
next page will cater for Killer, KenKen and KenDoku.
Lets consider the sorts of information we want to transport:
- All puzzles need the ability to store progress - which consists of solved cells and candidates.
- We should also retain which large numbers are clues.
- It will be very helpful to include a version header to give vital context for interpreting the data
That is all for Sudoku and variants with extra constraints like Sudoku X. However other puzzles contains more information we will need to encode:
- Jigsaw Sudoku has irregular boxes. Our new approach moves away from a shape number as this is peculiar to SudokuWiki.org and not universal.
- For Killers and variants we have the concept of cages and clue operands (+ for Killer, -, x and / for others)
- For Str8ts we need to encode the black cells
The following table lays it all out. We can also predict the minimum number of bytes per cell the new packed strings will achieve. (Str8ts is still under development).
Leave Out | |
Encode | |
|
Prefix |
Clues |
Solved |
Notes |
Boxes |
Cages |
Clue Operands |
Black Cells |
Bytes / Cell |
Sudoku |
S |
Yes |
Yes |
Yes |
Regular |
- |
- |
- |
2 |
Sudoku X |
X |
Yes |
Yes |
Yes |
Regular |
- |
- |
- |
2 |
Windoku |
W |
Yes |
Yes |
Yes |
Regular |
- |
- |
- |
2 |
Colour Sudoku |
C |
Yes |
Yes |
Yes |
Regular |
- |
- |
- |
2 |
Jigsaw Sudoku |
J |
Yes |
Yes |
Yes |
Irregular |
- |
- |
- |
3 |
Killer Sudoku |
L |
No |
Yes |
Yes |
Regular |
Yes |
- |
- |
4 |
Killer Jigsaw |
M |
No |
Yes |
Yes |
Irregular |
Yes |
- |
- |
5 |
KenKen |
K |
No |
Yes |
Yes |
- |
Yes |
4 (+-x/) |
- |
4 |
KenDoku |
D |
No |
Yes |
Yes |
Regular |
Yes |
4 (+-x/) |
- |
4 |
Str8ts |
T |
Yes |
Yes |
Yes |
- |
- |
- |
Yes |
? |
Str8ts X |
? |
Yes |
Yes |
Yes |
- |
- |
- |
Yes |
? |
Str8ts B |
? |
Yes |
Yes |
Yes |
Regular |
- |
- |
Yes |
? |
Str8ts BX |
? |
Yes |
Yes |
Yes |
Regular |
- |
- |
Yes |
? |
| |
OFF SET |
SHIFTED |
SHIFTED |
SHIFTED |
SHIFTED | |
The Header
Each packed string definition will be prefixed by three characters.
- A puzzle type letter
- + board size (exclusively 6 or 9 at the moment)
- + version letter. Since this is version 2 we're going with 'B'.
Sudoku and any variants with a fixed pattern or extra constraint will have the same packed definition. We can tell what variant by the first letter of the packed string (yellow column).
The Body
As in the previous packed version we store candidates as bits. (1=1, 2=2, 3=4, 4=8, 5=16, 6=32, 7=64, 8=128 and 9=256). The key difference between this packing method and the previous is that we're going to create smaller numbers for the clues/solved/candidates. In fact they are going to be no larger than 511+18. Previously we shifted the bits to make room for the clue flag meaning the largest number was 1022. However, the new method recognises that a clue and a solved cell and candidates are mutually exclusive.
- We can store a clue number (1 to 9) as 1 to 9
- We can store a solved call (1 to 9) as 10 to 18 - offset.
- And candidates can be any number between 1 and 511 + 18
It so happens that the largest number 529 can be stored in 10 bits. All this fits in two bytes.
For Jigsaws we need three bits to store the box number. So we shift the box number 10 bits and add it to the clues/candidates number. Finally we convert to base 36 and pad with zeros. Jigsaws fit produce cell strings 3 bytes long.
All the data for each cell will fit into a single integer. But we're not using all 64 bits. Only the last 16 (two bytes)
unused
| |box |cell no |
000 010 1010010001
|byte 1 | byte 2 |
A bit of javascript code shows how this is done
for (y = 0; y < 9; y++)
for (x = 0; x < 9; x++) {
n = get the cell value or set of candidates as bits
if( candidates )
n += 18; // offset by 18
else if ( !clue )
n += 9; // offset by 9
if( jigsaw )
n += (box_num << 10); // shift 10 bits and add
h = n.toString(36); // convert to base 32
if (h.length < 2) h = '0' + h; // pad the number if not 2 digits
if (jigsaw && h.length < 3) h = '0' + h; // pad if not 3 digits
s += h; // append to string being made
}
To unpack the string (after checking the version is
S9B,
X9B,
W9B,
C9B or
J9B) we do the following
// This splits a string into an array of elements each 2 characters long
var narr = theString.match(/.{1,2}/g); // should be {1,3} for Jigsaws
for (y = 0; y < 9; y++)
for (x = 0; x < 9; x++) {
p = parseInt(narr[y * 9 + x], 36); // convert base 36 to decimal
n = p & 1023; // first 10 bits
clue(y, x) = (n < 10) ? 1 : 0; // extract the clue flag
if (n < 18) {
if (n > 9 ) n -= 9; { // removed 'solved cell' offset
// save the number as a clue or solution
} else {
// save the number as a set of candidates
n -= 18;
}
p >>= 10; // remove first 10 bits
// p is the box number for this cell
// Will be 0 for vanilla Sudoku
}
Click on the link to read about how the scheme extends to Killer, KenKen and KenDoku
Comments
Email addresses are never displayed, but they are required to confirm your comments. When you enter your name and email address, you'll be sent a link to confirm your comment. Line breaks and paragraphs are automatically converted - no need to use <p> or <br> tags.
... by: Sasha
If you encode a board containing a cell with a single candidate & then decode it, the candidate is solved.
When encoding:
- empty cell: "00"
- value: zero padded base32(value)
- clue: zero padded base32(clue + 10)
- candidates: base32(base10(candidate bit pattern) + 20)
When decoding we deal with numeric ranges. Translate the base32 character pair to base 10:
0: empty cell
1...9: value
11...19: clue = value - 10
otherwise: candidates = decode bits (value - 20)
Still 2 characters per cell but with full fidelity.
... by: tebo
I'm trying to determine which Strategy handles the almost Deadly Pattern at (29)R16C79.
0m4e4cog1121k084g41k544403o0ggs409208121g1400409020g10g4o4a4110hg6082240h4hc28g4g2400h2281410g03200980g411g409k04ggg201184840321868k8k410m10g109g6o61108o2g621410g
... by: tebo
This is very elegant!
- It captures both the Given puzzle and the current board state (i.e., by tracking the Clue in the last bit).
- It saves a lot of effort when transferring a puzzle between my solver/player and your solver/player.
- It enables more efficient storage of puzzles in the database.
This should be a community standard. Kudos.
And Doh! Well spotted. I don't know why I used i and j in my code originally when x and y make more sense. Thanks
... by: Pieter, Newtown, Oz
Ingenious! 😃
However, where you have given the random example above, may I suggest you show both the old and the new definitions, for comparison? E.g. Using the X-Wing Example 1 as an example:
81 char string:
https://www.sudokuwiki.org/sudoku.htm?bd=100000569492056108056109240009640801064010000218035604040500016905061402621000005
162 char string, with candidates:
https://www.sudokuwiki.org/sudoku.htm?bd=03c848csc4cs1121g10hg005481020024881c8112002c0g1040h485848g0210h4481140350200gs403c4k81448050281k0091120k00gc80h4811s4cck80320g1c810c820020hc805210503cos0cok8s811
Ciao, Pieter 🙂