Comments/Ratings for a Single Item
One problem with my old unicode conversions is that to get the text to appear as unicode on the page, it had to look like gobbeldygook in the database. By setting the charset to utf8mb4 when opening a PDO connection, I am able to have text appear as unicode on the page while also appearing as unicode in the database. This is a good thing, but it also means that I have to do the unicode conversions over. I just did names of inventors. But to do this, I had to temporarily delete "charset=utf8mb4" from connect_to_database(), so that the text would appear correctly on the page so that I could copy and paste it to the database. When I'm not doing more conversions, I'll set it back to utf8mb4. Until the conversions are done, I'll be alternating between making the charset utf8mb4 and not making it that, and some non-ASCII strings will look weird. When it's all done, I'll set the charset to utf8mb4 for good, and all non-ASCII text should look correct.
So far, I have fixed the names of inventors and authors, and I have started on other names. But I stayed on the computer so long that artifacts have started floating in front of my eyes. So I'll have to get back to this later.
I now have a plan for automating the rest of the conversions. In a test, I opened two PDO connections, one with charset set to utf8mb4 and one with it not. I read a column with it not set, and I updated the same column with it set. Doing this corrected it just as if I had cut and pasted the correct version. I don't want to stay on the computer much longer tonight, but I can write a script tomorrow that uses two different PDO connections to automatically convert the database strings to proper Unicode.
4 comments displayed
Permalink to the exact comments currently displayed.
I made some conversions to UTF-8 today.
I converted the Chinese page for Chess by viewing the source code in the correct encoding, then copying that into a UTF-8 document.
For the rest, I did conversions with a script. If a string was detected to be either ASCII or UTF-8, I left it alone. If a string that was neither ASCII nor UTF-8 was detected to be Windows-1252, I converted it from Windows-1252 to UTF-8. If it wasn't detected to be any of these, I detected its encoding and converted it from that to UTF-8.
I converted the first and last names in the Person table to UTF-8.
I converted the following items from MemberSubmissions to UTF-8. I checked the first one before doing the rest, and I checked a few random ones after running the conversion, each one checking out fine.