Guys I have an interesting coding problem for you that I'm

Thread replies: 31
Thread images: 1

Anonymous
2016-06-20 20:15:10 Post No. 8154446
[Report] Image search: [Google]

File: coding.jpg (326 KB, 1438x809) Image search: [Google]

Anonymous 2016-06-20 20:15:10 Post No. 8154446 [Report]

Guys I have an interesting coding problem for you that I'm really stuck with.

Say I have 2 lists of names. Both in random orders but with a few of them in both. I need to find out which names are in group B but not group A. How would I do this in excel or matlab?

To complicate matters further some entries aren't 100% similar but are very similar for example in list A a name might be Andrew B. Cosby and in group be just Andrew Cosby but obviously this is a match and should not be in my answer list.

Thanks guys!

>>

Anonymous 2016-06-20 20:21:06 Post No.8154453
[Report]

Anonymous 2016-06-20 20:21:06 Post No.8154453 [Report]

>>8154446
Do you know any other coding languages? I don't think excel (and I dont know about matlab) is the best way to handle lists of stings.

>>

Anonymous 2016-06-20 20:23:37 Post No.8154456
[Report]

Anonymous 2016-06-20 20:23:37 Post No.8154456 [Report]

>>8154446
>To complicate matters further some entries aren't 100% similar

I don't know what you mean with that but you can use the levenshtein difference with a tolerance level to find similar but not identical strings.

about the other thing,

repmat(A, N_b,1) - repmat(B,N_a,1)'

the zeros are your doubles.

>>

Anonymous 2016-06-20 20:24:28 Post No.8154457
[Report]

Anonymous 2016-06-20 20:24:28 Post No.8154457 [Report]

>>8154446
Nigger if you have data like that you need to standarize it first

>>

Anonymous 2016-06-20 20:27:37 Post No.8154461
[Report]

Anonymous 2016-06-20 20:27:37 Post No.8154461 [Report]

>>8154453
I don't really unfortunately, I'm a maths student so my using of coding is limited to calculations on matlab do you not think matlab or excel could handle something like this?

>>

Anonymous 2016-06-20 20:29:15 Post No.8154462
[Report]

Anonymous 2016-06-20 20:29:15 Post No.8154462 [Report]

>>8154457
Would take for slow going but I think you're right. Any ideas how to do it with standardised data?

>>

Anonymous 2016-06-20 20:30:35 Post No.8154465
[Report]

Anonymous 2016-06-20 20:30:35 Post No.8154465 [Report]

>>8154456
What language is this?

>>

Anonymous 2016-06-20 20:40:25 Post No.8154479
[Report]

Anonymous 2016-06-20 20:40:25 Post No.8154479 [Report]

>>8154462
I mean if you've standardized the data you can just use sets. Or a terrible ugly for loop.

>>

Anonymous 2016-06-20 20:45:19 Post No.8154486
[Report]

Anonymous 2016-06-20 20:45:19 Post No.8154486 [Report]

>>8154465

matlab

>>

Anonymous 2016-06-20 20:49:42 Post No.8154491
[Report]

Anonymous 2016-06-20 20:49:42 Post No.8154491 [Report]

>>8154479
What command would you use to compare an element of A with an element of B?
>>8154486
What are the inputs in this case?

>>

Anonymous 2016-06-20 20:55:52 Post No.8154503
[Report]

Anonymous 2016-06-20 20:55:52 Post No.8154503 [Report]

>>8154491

A, your first list, B, your second list, N_a, length of A, N_b, length of B

you're a math student and you've never used repmat?

>>

Anonymous 2016-06-20 20:58:51 Post No.8154512
[Report]

Anonymous 2016-06-20 20:58:51 Post No.8154512 [Report]

>>8154503
Nope, I'll give it a browse

Thanks all for your help, if this works I'll share some of the £65k with you!

>>

Anonymous 2016-06-20 21:08:41 Post No.8154524
[Report]

Anonymous 2016-06-20 21:08:41 Post No.8154524 [Report]

>>8154512
>£65k

pfff yea right

>>

Anonymous 2016-06-20 23:22:56 Post No.8154774
[Report]

Anonymous 2016-06-20 23:22:56 Post No.8154774 [Report]

uninteresting programming problem in a shit language

>>

Anonymous 2016-06-21 02:21:33 Post No.8155045
[Report]

Anonymous 2016-06-21 02:21:33 Post No.8155045 [Report]

>>8154446
First of all you should sort your fucking data.
After that its pretty simple:
Compare A[0] with the first letter of B[n]
If its a match; compare the names(just first and last) if the names match record the name/ remove from list
Else Break the loop and move onto A[1]
this is probably the simplest but it wont be terribly fast

>>

Anonymous 2016-06-21 02:27:27 Post No.8155050
[Report]

Anonymous 2016-06-21 02:27:27 Post No.8155050 [Report]

>>8154446

Sounds like a job for setdiff.

>>

Anonymous 2016-06-21 02:29:56 Post No.8155054
[Report]

Anonymous 2016-06-21 02:29:56 Post No.8155054 [Report]

it depends how your data is formatted. if you are using VBA for Excel you can use the Front() commands and compare the first n characters.

>>

Anonymous 2016-06-21 05:07:08 Post No.8155252
[Report]

Anonymous 2016-06-21 05:07:08 Post No.8155252 [Report]

>>8154446
Use sets in Python
Set B - ( Set A N Set B )

>>

Anonymous 2016-06-21 05:34:54 Post No.8155280
[Report]

Anonymous 2016-06-21 05:34:54 Post No.8155280 [Report]

>>8155252
He needs to normalize the data first so that equivalent names are equal

>>

Anonymous 2016-06-21 11:26:35 Post No.8155650
[Report]

Anonymous 2016-06-21 11:26:35 Post No.8155650 [Report]

>>8155280
sha4096

>>

Anonymous 2016-06-21 11:43:21 Post No.8155668
[Report]

Anonymous 2016-06-21 11:43:21 Post No.8155668 [Report]

>>8154446
Matlab is bad at this because it's a shit language (with shit string support), but you can do something like that:
First go through both lists of names and convert them to upper(or lower) case while also removing things like B. in your Andrew Cosby example (a good way to do this would probably be to take the first and last word).
After that, use the appropriate set operations on the lists.

>>

Anonymous 2016-06-21 11:54:01 Post No.8155680
[Report]

Anonymous 2016-06-21 11:54:01 Post No.8155680 [Report]

post it to mechanical turk for peanuts, your time clearly is more valuable

alternatively if your sets are really big make it into a captcha and let faggots do it for free

>>

Anonymous 2016-06-21 12:25:08 Post No.8155709
[Report]

Anonymous 2016-06-21 12:25:08 Post No.8155709 [Report]

>>8154491
FOR EACH X NOT LISTC()[] IN LISTA {
LISTD [] = X
}

Listc() {
For each X in LISTB[] {
LISTC [] = "*" & X & "*"
}
}

hisssss :^)

>>

Anonymous 2016-06-21 12:30:03 Post No.8155717
[Report]

Anonymous 2016-06-21 12:30:03 Post No.8155717 [Report]

>>8155709
[code]
void faggot {
FOR EACH X NOT LISTC()[] IN LISTA {
LISTD[] = X
}
}

static array listc()[] {
For each Y in LISTB[] {
LISTC[] = "*" & X & "*"
}
}
[/code]

Theres some python for you.

>>

Anonymous 2016-06-21 12:31:53 Post No.8155722
[Report]

Anonymous 2016-06-21 12:31:53 Post No.8155722 [Report]

>>8155717
>>8155709
Dont do this it makes mustard gas

But really this will infinitely loop and segfault Windows. 9/10

>>

Anonymous 2016-06-21 12:38:29 Post No.8155736
[Report]

Anonymous 2016-06-21 12:38:29 Post No.8155736 [Report]

not sure i can think of a non O(n^2) way to do it.

just go one by one thru list b, checking each value of list a. you should also do a isSimilar() method to take two names, split across whitespace and compare the first and last values (names).

>>

Anonymous 2016-06-21 15:20:36 Post No.8155997
[Report]

Anonymous 2016-06-21 15:20:36 Post No.8155997 [Report]

>>8155736
>O(n^2) way to do it.

Concatenate the lists in 1
Sort the list in N log N
Run through the list and check neighbors in N.

There you go, N log N solution

If the lists are already sorted it's an N solution.

Fucking noobs

>>

Catalyst Scwartzwald 2016-06-21 16:55:48 Post No.8156148
[Report]

Catalyst Scwartzwald 2016-06-21 16:55:48 Post No.8156148 [Report]

>>8154446
Perl has some lovely regular expression and this amazing data structure known as a hash for just this sort of thing. I encourage you to look it up, even if its the legacy of legacy.

Python has similar stuff going on, but regexp in Python is a little bit less intuitive for me (please dont ask me how /// is easier than regexp.) And a hash is just a 2 dimensional array in Python with naming and size restrictions.

Matlab has very poor regexp support from what I understand, even though I like it.

You have yourself there a week 1 day 5 regexp problem in Perl

>>

Anonymous 2016-06-21 17:29:36 Post No.8156211
[Report]

Anonymous 2016-06-21 17:29:36 Post No.8156211 [Report]

>>8154446
Seeing how you are thinking about excel or matlab you probably don't give a shit about time complexity.

Store both lists as simple arrays.

Take a name from list B and compare it to literally every other member in list A. If there is no match (track this with a boolean) then you output this name.

Repeat this for every member in list B and there you have.

Assuming lists of the same size this is just O(n squared) so it is not absolutely shit, but is literally as bad as you can do.

>>

Anonymous 2016-06-21 19:37:45 Post No.8156388
[Report]

Anonymous 2016-06-21 19:37:45 Post No.8156388 [Report]

in R, only considering exact matches:

unique(B[! B %in% A])

>>

Anonymous 2016-06-21 19:50:31 Post No.8156410
[Report]

Anonymous 2016-06-21 19:50:31 Post No.8156410 [Report]

>>8156148
>please dont ask me how /// is easier than regexp
it's not the syntax that's shit in python's regex, but the implementation.

they recommend you pre-compile your patterns, but have it set up so you can just pass a pattern string instead of a pattern object, but it's caching behind the scenes so there's sometimes no difference in the behavior no matter how you set up the search

it's a great example of horribly planned pre-optimization