[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
I have n lists of items, and these items are not numbers and
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 49
Thread images: 1
File: smitho.png (252 KB, 344x392) Image search: [Google]
smitho.png
252 KB, 344x392
I have n lists of items, and these items are not numbers and therefore shouldn't be sorted. First off, this isn't homework (I'm glad those years are long behind me) - it's part of a small side project of mine.

var list = [['Alex', 'Toby', 'Kenneth'], ['Fred'], ['Toby', 'Matt']];


In the above example, there are three lists, each with some number of strings. Some of the lists share items that are identical to other lists items, others have unique strings.

I'm looking for an algorithm that will take one item from each list, and result in a list with as few unique string as possible.

The above example would result in ['Toby', 'Fred'] as both list[0] and list[2] contain 'Toby', and list[1] must use its only item: 'Fred'
>>
>>51932865
Sure, its not your hw anon.
>>
>>51932904

At 32, I can certainly vouch that it isn't.

In fact, the problem I've described in my OP is a simplified explanation - the problem itself has more complexities. I've simply gone with names to make it easier for you guys.
>>
>>51932865
>In the above example, there are three lists
>I'm looking for an algorithm that will take one item from each list
>The above example would result in <two items>
Fuck you
>>
>>51932865
I dont get it. Why is toby included but not matt or kenneth??
>>
>>51932865
can you try to reword your problem a little better, i'm not really understanding what you mean by as few unique strings as possible

are you saying you want the item from each inner list with the least amount of characters, while also not having duplicates?
>>
>>51932978
>>51932963
oh and yeah, why isn't matt included on the list too? i'd also assume it would first check alphabetical order after checking length, and m<t so matt should also be in the result
>>
Why not just flatten the array, and convert to a set?
>>
>>51932963

Because one item needs to be chosen from each list, overall resulting in the least number of unique items. As Toby appears twice, Toby is the logical choice from the two lists containing Toby.

>>51932957
There could be n lists - again, 3 is just an example.
>>
>>51932865
I get what you're trying to do, but know that the way you explained it implies exactly the opposite of what you actually want, and anyone who tries to help you is probably going to offer useless advice because you didn't properly explain the problem
>>
>>51932992

There's no way to ensure one item from each list is chosen that way.

>>51932978
My apologies, see >>51932993

>>51932991
That's not what I'm trying to achieve, see >>51932993

>>51933000
The way I've explained it is exactly what I want.
>>
>>51933016
But what is the second array is ['Toby'], what gets selected? []?
>>
>>51932865
"I want the shortest possible array that contains at least one element from every array in the input"

Was that so hard?

>>51933016
>The way I've explained it is exactly what I want
The way you explained it is mutually exclusive with the example result you gave. You misunderstand the meaning of your own explanation.
>>
>>51933000
A name must be taken from each sublist.

The final list of names I choose should be as small as possible, because I want to double up as much as possible.

If more than one list contains the same name, I want to use it so that my final list of names is less than the number of lists.

In the case of the three lists in the OP, I would like to have two names. One name would be ideal, but there isn't a name that's in all three lists in that example. But there could be.
>>
Iterate list, split the list of lists, pass current list and other lists(as list of lists) to a function.

Check list against list of lists. Return list of uniques(measured against other lists.)

Initial function caches all returned lists and ultimately returns a list of lists containing what you want. Not difficult.

Wouldn't be done fast with large list of large lists.
>>
>>51933035
>"I want the shortest possible array that contains at least one element from every array in the input"

That explains it fine, thank you.
>>
>>51933042
See, that's a proper explanation of what you want. Way overly verbose, but correct.

Your explanation in the OP implies something entirely different.
>>
>>51933045

That's pretty much the notion I've had, although the exact implementation seems a little sticky at this point. n^2 or n^3 won't be an issue, as in reality the lists aren't very large.
>>
>>51933082
Use hashmaps to see if you already checked for a string or not.

Will save you a lot of computation time.

I could probably whip it up really quick. What's your stack?
>>
>>51933155
>>51933082

1 Second bro, I'll whip it out now for you..
>unzips
>>
http://pastebin.com/7f6qnkYM

Flatten list of lists, sort by how often each element is represented, filter list to show only unique elements(keeping original order, so first element would be represented the most in all of the lists.)

For each list, iterate the filtered names, checking if each element is in passed list, if it is push it to results and break to the next list. First element matched will be the most frequently occuring name.

Solution in javascript since I'm assuming that's what you wanted. Took a few minutes as I haven't done JS in about a year.
>>
Shit, messed up when I refactored. It's hardcoded to use the list variable instead of listOfNames that's passed to the function. Extra homework, fix it.
>>
>>51932865
This is the set cover problem, which is NP-complete.
>>
>>51932865
>three lists,
incorrect
>>
>>51934739
This is only an approximation and will not return the exact solution.
>>
function process(lists){
var map={};
lists.forEach(function(l){
l.forEach(function(s){ map[s]=map[s]+1||1; })
});

var res={};
lists.forEach(function(l){
l.sort(function(a,b){return map[b]-map[a]});
if(l.length>0) res[l[0]]=1;
});

return Object.keys(res);
}

process([['Alex', 'Toby', 'Kenneth'], ['Fred'], ['Toby', 'Matt']]);


> ["Toby", "Fred"]

Seems simple enough.
>>
>>51935118
This is only an approximation and will not return the exact solution.
>>
>>51935127
What input will produce incorrect solution?
>>
>>51935141
list = [["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["b"], ["c"]]
>>
>>51935187
The way OP specifies his problem, ["a", "b", "c"] should be the correct solution to that, and that's what the program outputs.
>>
>>51935220
The correct solution is ["b", "c"].
>>
>>51935187
>>51935220
Oh, wait, never mind, I see it now.
>>
>>51935235
function process(lists){
var map={};
lists.forEach(function(l){
l.forEach(function(s){ map[s]=map[s]+1||1; })
});

var res={};
lists.sort(function(a,b){ return a.length-b.length }).forEach(function(l){
for(i in l){ if(res[l[i]]) return; }
l.sort(function(a,b){
var res=map[b]-map[a];
if(res==0) res=b.localeCompare(a);
return res;
});
if(l.length>0) res[l[0]]=1;
});

return Object.keys(res);
}

//process([['Alex', 'Toby', 'Kenneth'], ['Fred'], ['Toby', 'Matt']]);
//process([['Alex', 'Toby'], ['Toby', 'Alex'], ['Alex', 'Toby']]);
process([["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["b"], ["c"]])


New and improved.

> ["b", "c"]
>>
>>51932865
1. Create a list of boolean flags equal to the number of lists.
2. Create a list of unique objects from the set that isn't flagged.
3. Count the number of original lists that object appears in that for each object.
4. Take the object with the greatest amount of matches and add it to your final list.
5. Flag all lists that were referenced.
6. Repeat steps 2-5 until your entire list of flags is set.

Might not be perfect, but should get you in the ballpark.
>>
>>51935269
list = [["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["e", "b"], ["e", "c"]]

Answer should be ["a", "e"]
>>
>>51935552
function process(lists){
var map={};
lists.forEach(function(l){
l.forEach(function(s){ map[s]=map[s]+1||1; })
});

var res={};
lists.sort(function(a,b){
var res=a.length-b.length;
if(res==0){
var guess=[a,b].map(function(l){ return l.reduce(function(x,y){ return x+map[y]; },0); });
res=guess[0]-guess[1];
}
return res;
}).forEach(function(l){
for(i in l){ if(res[l[i]]) return; }
l.sort(function(a,b){
var res=map[b]-map[a];
if(res==0) res=b.localeCompare(a);
return res;
});
if(l.length>0) res[l[0]]=1;
});

return Object.keys(res);
}

//process([['Alex', 'Toby', 'Kenneth'], ['Fred'], ['Toby', 'Matt']]);
//process([['Alex', 'Toby'], ['Toby', 'Alex'], ['Alex', 'Toby']]);
//process([["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["b"], ["c"]])
//process([["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["b","d",], ["c","d"]])
process([["a", "b"], ["a", "b"], ["a", "c"], ["a", "c"], ["e", "b"], ["e", "c"]])


Produces ["b", "c"] which is just as good as your answer.
>>
>>51935695
list = [["a", "b"], ["a", "b"], ["a"], ["a"], ["a", "c"], ["a", "c"], ["a", "b"], ["a", "c"], ["b", "e"], ["c", "e"]]

I encourage you to give up on trying to find a polynomial time solution. This is a known NP-complete problem.
>>
>>51935825
whoops, sorry, made a typo in the test case
>>
>>51935695
list = [["e", "b"], ["e", "c"], ["b", "a"], ["c", "a"]]
>>
>>51934962
/thread
>>
>>51935825
>>51936110
How do you verify that a particular solution is correct, in polynomial time?
>>
>>51936137
actually I guess I could look at wikipedia
>>
>>51936157
It looks like the set cover problem is formulated a bit differently on wikipedia. Can anyone explain how this problem is related?
>>
>>51936197
Each word represents a set consisting of the lists in the input which contain that word. You are trying to find the least number of words needed to cover the input list.
>>
>>51936197
Google "hitting set problem", which is equivalent.
>>
>>51936252
>>51936292
Thanks.
>>
>>51932865
I think (but im not sure) that the greedy solution is optimal:

pick the item that occurs in the most lists and remove those lists

repeat

I also don't know if you can do this efficiently
>>
>>51936922
Nevermind, this is wrong. I thought of a counter example:
a, d
a
b, d
b
c, d
c
>>
>>51936197
Minimum set that covers all subsets. In this case a set (list) of strings that contains at least 1 element from each set (list) in the origin set of sets (list of lists).
Thread replies: 49
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.