encoding - FUCKING ENCODING SHITHEADS! WHY ISN'T UTF-8 THE FUCKING DEFAULT EVERYWHERE! §¥~@#&•…≈! - devRant

Ranter

Comments

0

kanduvisla

2349

9y

I think out has something to do with storage and the fact that there are billions of characters out there in all kinds of languages. Not sure though, but I've read somewhere that UTF8 reserves 2 to 4 bytes for each character. And UTF16 even more. Something with databases? But to be honest: I haven't got a clue.

But everything exists for a reason and everything has pro's and con's. Is there a charset-expert in the house?
2

somebody

727

9y

@kanduvisla UTF-8 has variable character lenght. US characters take one byte. Extended latin characters like 'čřá' take two bytes. Some asian character can take four bytes. Maximum is six I think. So depending on what you write, there is an overhead. Processing is also a little bit harder due to variable lenght, but it is nice from backward compatibility point of view. UTF-16 is fixed length, but uses two bytes all the time even for 7bit ascii. Similarly UTF-32. So for some Asian countries it might be more effective to use UTF-16 as they would use less space and it is easier to process. Maybe there will be case for UTF-32 as well. And old encodings like ISO8859-[2-...] managed to squeez all characters people cared about into one byte.
0

kanduvisla

2349

9y

@miska thanks! by the way. reading your comment it looks like the Devrant database has some encoding issues as well. 😁
1

somebody

727

9y

@kanduvisla I just put there examples from my native language :-) Those are real characters. I don't have means to type some kanji :-D Would have to search and copy paste.
3

d4ng3r0u5

4528

9y

7-bit ASCII or die.
3

kwilliams

6555

9y

Why isn't UTF16 the standard? Stop oppressing the Mandarin and Cyrillic typefaces, shit lord.
0

hiestaa

399

9y

Using python? :p
0

funerr

378

9y

@hiestaa sometimes, I guess I need to change to python3

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service