[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
When will windows finally not fuck up its filename encodings?
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 52
Thread images: 19
File: 1460636319101.jpg (474 KB, 2352x1563) Image search: [Google]
1460636319101.jpg
474 KB, 2352x1563
When will windows finally not fuck up its filename encodings?
>>
When Maki stops being a slut
>>
>>54033806
a.k.a never
>>
Never because backwards compatibility
>>
>>54033755
MAKI IS CUUUUUUTE
>>
>>54033755
I want to sniff Maki's armpits.
>>
File: 1446968710018.jpg (45 KB, 800x1200) Image search: [Google]
1446968710018.jpg
45 KB, 800x1200
>>54033755
What are you referring to specifically? The Windows filename encoding is UTF-16 (or uninterpreted UCS-2 code units if you want to be pedantic,) which is fucked up, but arguably not as fucked up OS X, which uses normalized Unicode in NFD or Linux, where filenames are usually interpreted as UTF-8, but can be interpreted in any 8-bit codepage.

Or are you referring to other Windows filename quirks, like the long list of invalid characters and filenames or the weird DOS-inspired restrictions like not being able to end a filename with a dot or a space?
>>
>>54033927
>but arguably not as fucked up as utf8

Fuck off mate. You are nothing. utf8 is perfect because it supports all programs working with null terminated strings made out of octets.

Want to know why Windows fucks up its filename encodings?

It's because for every Windows API function that involves filenames, there are two versions of it - A and W. W calls expect the text to be in utf16 format, A expect filenames to be in locale encoding. Both variants are fully supported and a very large amount of existing software uses A calls. Which makes it impossible to access files with characters outside of locale codepage, and makes Windows produce files with fucked up filenames when you want to create a file with characters outside of your codepage.

It could all be averted if they made a working utf8 codepage. But they won't. They want it to be broken.
>>
>>54033927
I was always under the impression that UTF8 was better than UTF16.
>>
>>54034007
It is worse for asian languages, because you end up using 3 bytes per character as opposed to 2 with utf16. Otherwise, utf8 is better. For pretty much anything else, including text-heavy formats like xml or html, utf8 is massively better. utf8 is also better because >it supports all programs working with null terminated strings made out of octets.
>>
File: 1460250750035.jpg (82 KB, 608x857) Image search: [Google]
1460250750035.jpg
82 KB, 608x857
>>54034005
>>54034007
Woah, woah, I didn't mean to imply there was anything wrong with UTF-8. There isn't. What's bad is that Linux allows you to use different codepages, which means programs can't necessarily interpret every filename as UTF-8.

>It's because for every Windows API function that involves filenames, there are two versions of it - A and W. W calls expect the text to be in utf16 format, A expect filenames to be in locale encoding.
Yeah, I agree, that's fucked. I think the technical reason for why they can't just make a UTF-8 codepage is because the CRT has some built-in assumptions about the number of characters in a multibyte sequence (in functions like mbclen(),) but I'm sure plenty of developers would accept a small amount of source-level incompatibility for the addition of UTF-8 support in a new CRT.

It will probably always be fucked though because Microsoft is incompetent. The perfect opportunity to do that thing would have been the release of the UCRT, but they missed it.
>>
>>54034206
>is because the CRT has some built-in assumptions about the number of characters in a multibyte sequence
This is wrong because shift jis, codepage 932 already has variable length characters.

>Linux allows you to use different codepages, which means programs can't necessarily interpret every filename as UTF-8.
De facto everything on linux these days uses utf8 so there literally is no problem on linux. It is not fucked up. Do you expect the kernel to check every string and panic when it's not utf8, or something?
>>
File: 1449848232262.gif (828 KB, 500x375) Image search: [Google]
1449848232262.gif
828 KB, 500x375
>>54034276
>This is wrong because shift jis, codepage 932 already has variable length characters.
Well, "variable" as in either one or two bytes. UTF-8 can have up to four bytes, but a number of Windows APIs and CRT functions assume that multibyte character sets (referred to in places as double byte character sets) only ever have one and two byte characters. Adding UTF-8 support to the Windows API or CRT will break a handful of programs that make this assumption.

>De facto everything on linux these days uses utf8 so there literally is no problem on linux. It is not fucked up.
That's the problem. Yes, a program that assumes filenames are valid UTF-8 will be able to open most files, but it wouldn't be technically correct and a (possibly malicious) user could force it to fail by giving it a filename in another codepage.

This is definitely a problem when programmers assume Qt/Python/Javascript strings can hold any valid Linux filename.

>Do you expect the kernel to check every string and panic when it's not utf8, or something?
It could just refuse to create the file or replace unrecognised sequences with U+FFFD.
>>
>>54034458
How does this "problem" manifest itself. Has your experience ever been hampered by this "problem"?
>>
I'm not even here for the thread I'm here for maki
>>
File: 1446964321672.png (281 KB, 700x857) Image search: [Google]
1446964321672.png
281 KB, 700x857
>>54034487
As a user, not that I can remember. As a developer, sometimes it's been a bit annoying writing programs that are correct when working with filenames that don't match the system codepage. I think it's also part of the reason why Python 3 uptake is slow. It's not a huge problem, but I still wouldn't say Linux does filename encoding perfectly.

>>54034500
Sorry fampai, I'm out. I only have like three Makis on this computer.
>>
>>54034584
Noone does that. Assume files are in utf8. If they are not, they are broken.
>>
>>54034607
Some people do that, including me.

>If they are not, they are broken.
Obviously, but my programs should still be correct for broken files.
>>
>>54034620
Enjoy your time well spent.
>>
>>54034458
>codepage
What a fucking joke. GNU/Linux uses locales. The only real ones in use are *UTF-8 and *ISO-8859-6.
However, the kernel itself doesn't give two fucks about file encoding. EVERY byte is completely valid except NUL and '/' (directory separator). There should be no conversions going on.
>>
>>54034631
Will do, famalam.

It's kind of like writing shell scripts that can handle filenames with newlines in them. Only a psychopath would have such things in their filesystem, but the script is not technically correct unless it can handle it, and you _are_ given the proper tools to do so (like find -print0.)
>>
I want to impregnate makis feet
>>
>>54034673
>ISO-8859-6
I mean ISO-8859 in general.
>>
>>54034677
print0 is useful for spaces in filenames, and spaces in filenames are common. In this way, doing what you are doing is not at all like using find -print0 in your scripts.
>>
>>54034584
>>54034500
Rin = The Great Leader > Nico > * > shit >>>>>> Maki
>>
>>
>>54034673
>There should be no conversions going on.
That's very kernel-minded though. Yes, kernel and low-level system code can probably be made quite clean by treating filenames as opaque byte strings, but kernels are made to run programs, and programs might be written in (possibly shit) languages like Python 3 or JavaScript, which assume all filenames match the system codepage. Because of this, it's probably a mistake for a user to have a file on their computer that doesn't match the system codepage, and it goes without saying that it's a mistake to not have your codepage set to UTF-8, so I think there's definitely an argument to be made for disallowing invalid UTF-8 in filenames.

>>54034705
Yes, but filenames could be newline separated if newlines weren't a valid filename character. Newline-separation is easier to deal with in shell scripts, but only NUL separation is correct.
>>
>>54033755
When will weebs grow up?
>>
Maki!
>>
File: ss+(2016-03-03+at+09.53.15).jpg (62 KB, 595x623) Image search: [Google]
ss+(2016-03-03+at+09.53.15).jpg
62 KB, 595x623
MAKI!!!
>>
Maki
>>
Hello?
>>
fucking animeposters are ruining this country
>>
File: japanese anime.png (1 MB, 1280x720) Image search: [Google]
japanese anime.png
1 MB, 1280x720
>>54037664
>>
>>54033755
Daily reminder that Maki literally has nothing to do with /g/.
>>
>All these Maki posters
This is why /g/ is my favorite board. Stay classy.
>>
File: 1427336437211.jpg (1 MB, 1231x1721) Image search: [Google]
1427336437211.jpg
1 MB, 1231x1721
my kind of girl
>>
>>54033755
When u stop posting anime
>>
File: 1424501426020.jpg (2 MB, 2280x3107) Image search: [Google]
1424501426020.jpg
2 MB, 2280x3107
>>
File: 1425426571249.jpg (273 KB, 1200x528) Image search: [Google]
1425426571249.jpg
273 KB, 1200x528
>>
File: 1421350426227.jpg (224 KB, 982x1678) Image search: [Google]
1421350426227.jpg
224 KB, 982x1678
/g/ - Maki
>>
Why /g/ loves Maki? Serious question tried to watch Love Live but it's so fucking boring
>>
File: 48755401_p0.jpg (83 KB, 766x766) Image search: [Google]
48755401_p0.jpg
83 KB, 766x766
>>54038259
>>
When weeb cancer stops shitposting on /g/.
>>
>>54033927
W-why is she wearing such a lewd shirt?
>>
File: Screenshot_2016-03-22-21-08-46.png (481 KB, 1280x720) Image search: [Google]
Screenshot_2016-03-22-21-08-46.png
481 KB, 1280x720
>>54038259
I imagine a lot of us play the mobile game 2bh famicom
>>
File: 1425855599345.jpg (200 KB, 796x800) Image search: [Google]
1425855599345.jpg
200 KB, 796x800
>>54038750
won't run on my phone
>>
File: 1408341536249.jpg (432 KB, 700x764) Image search: [Google]
1408341536249.jpg
432 KB, 700x764
>>54038827
Yeezuz what kind of fone do you have? I just have a shitty S3 and it runs perfect
>>
File: 48215453_p0.jpg (74 KB, 700x575) Image search: [Google]
48215453_p0.jpg
74 KB, 700x575
>>54038882
BLU Advance 4
>>
File: 1446054295792.png (1 MB, 1280x720) Image search: [Google]
1446054295792.png
1 MB, 1280x720
>>54038750
>>54038827
it requires google apps :((
>>
>>54038750
what game is this?
>>
File: 1460415421133.jpg (96 KB, 1280x720) Image search: [Google]
1460415421133.jpg
96 KB, 1280x720
tfw you'll never get to play school idol festival with your cute /g/ friends
Thread replies: 52
Thread images: 19

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.