[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Any regex wizards here? How come this expression: (?:")?(\S+)(?:")?
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 18
Thread images: 3
Any regex wizards here?

How come this expression:
(?:")?(\S+)(?:")?


Given this text:
"dicks" dicks "dicks


Results in:
MATCH 1
1. [1-7] `dicks"`
MATCH 2
1. [8-13] `dicks`
MATCH 3
1. [15-20] `dicks`


As you can see, the first match has a trailing " even though the trailing " is in non-capturing group, what's weirder is that it works fine for the leading ".
>>
>>54371705
[a-zA-Z]+?(\S+)(?:")?
>>
>>54371731
Won't work and it also eats the first character. What I'm trying to achieve is that I want to capture words but the quotation marks are optional, they can be either at the end of the word, at the beginning at the word, both, or not there at all.
>>
>>54371705
regexr.com
>>
>>54371762
Oh, gotcha. I thought you were just trying to capture just the text.
>>
File: special_char_regex.png (107 KB, 1212x3296) Image search: [Google]
special_char_regex.png
107 KB, 1212x3296
>>54371705
(?:")?

&
(\S+)(?:")?

are still trying to
(?:")

Which is why "dicks(") & "dicks() return dicks(").
>>
>>54371938
Ohh I actually get it, the (\S+) is matching the trailing " so the last (?:")? is getting ignored, that makes sense, now just need to figure out a way to exclude that ".
>>
>>54372041
(?:")?(\S+?)(?:")?

If your language's regex engine supports it, that is.
>>
>>54372131
>>54372041
Wait, no, that's wrong.
It would really help to know what you're trying to achieve, but maybe this would be better:

(?:")?([^\s"]+)(?:")?
>>
not helping you with your homework, you week 9 aussie faggot
>>
>>54372176
OOOOOOOOOOOOOOOOOOOOOOHHH SHIT SON CALLED THE FUCK OUT
>>
>>54372147
That's perfect! Works as expected, thanks mate. (What I was trying to achieve was >>54371762).

>>54372176
Not homework, writing some sort of a simple bulletin board, adding support for basic bbcodes using re.sub with Python.
>>
>>54372147
Alternatively, you can just list all 4 of your cases:

("\S+"|"\S+|\S+"|\S+)


But the problem with this is you'd include quotes into capturing group.
You can create 4 capturing groups:

(?:"(\S+)"|"(\S+)|(\S+)"|(\S+))


The problem with this is you'd have to handle 4 groups in code.
Finally, there is "branch reset" pattern:

(?|"(\S+)"|"(\S+)|(\S+)"|(\S+))


It creates just one group, but not all regex implementations support it.

The problem with my solution in >>54372147 is that if you want to allow quote characters inside words (for whatever reason), you can't.
>>
>>54372041
If it is something you think others might have automated before, this website
commandlinefu.com
might already have it figured out for you.

Good luck.
>>
File: a.png (22 KB, 296x243) Image search: [Google]
a.png
22 KB, 296x243
?
>>
>>54372305
also, is there really a use for putting one possible match into a non-capture group?
>(?:")
(?:) just means "don't capture this", it has nothing to do with making anything optional, you only need ()? to make a group optional(ly match)
>>
>>54371762
\"?[A-Za-z]+\"?
>>
>>54373446
To capture apostrophe's as well (eg "you're"):
\"?[A-Za-z\']+\"?
Thread replies: 18
Thread images: 3

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.