/wat/ - Weekly Algorithm Thread - 4ChanArchives : a 4Chan Archive of /g/

Thread replies: 57
Thread images: 9

Anonymous
/wat/ - Weekly Algorithm Thread 2015-12-17 20:51:59 Post No. 51917483
[Report] Image search: [Google]

File: SgcpY[1].png (12 KB, 735x456) Image search: [Google]

/wat/ - Weekly Algorithm Thread Anonymous 2015-12-17 20:51:59 Post No. 51917483 [Report]

I feel that /g/ needs more technology related threads, so I thought that over winter break (4 weeks), I would learn you a new algorithm. Hopefully this inspires more discussion of algorithms and maybe an algorithm general.

Week 0 - Huffman Trees

Huffman Trees are a compression algorithm in which you map a block of memory to a string of bits.

For example, let's say that we are trying to compress a message:
attackatonce
First, we would analyze the string, to see the frequency of the letters:
a - 7
t - 7
c - 2
e - 1
k - 1
n - 1
o - 1
We will now create the tree. We will start construction of this tree by taking the bottom two nodes (lowest frequency), and setting them as the child nodes, with a parent node to accompany:
Parent node: 2(frequency):*
Child node 1: 1(frequency):o
Child node 2: 1(frequency):n

   2:*
   /  \
 1:o  1:n
We now have a list as following:
a - 7
t - 7
c - 2
n+o tree - 2
e - 1
k - 1
As you can see, we can recursively work with this until our tree is complete, which you can view in the image post. Using this tree, we can represent our string using a compressed bitstream instead of using the 8 bit values for chars. This can significantly reduce memory usage.

To parse the compressed string, we start at the root node of the tree, and move using the bits provided. When we reach a leaf node, we replace that block of binary values with it's uncompressed value.

Resources:
https://en.wikipedia.org/wiki/Huffman_coding
https://home.cse.ust.hk/~dekai/271/notes/L15/L15.pdf
https://mitpress.mit.edu/sicp/full-text/sicp/book/node41.html

>>

Anonymous 2015-12-17 20:52:45 Post No.51917496
[Report]

Anonymous 2015-12-17 20:52:45 Post No.51917496 [Report]

>>51917483

I don't like technology, I like shitposting. Go away.

>>

AlgorithmDickSucker (ID: !IPB0Ht34Oc) 2015-12-17 20:53:11 Post No.51917502
[Report] Image search: [Google]

AlgorithmDickSucker (ID: !IPB0Ht34Oc) 2015-12-17 20:53:11 Post No.51917502 [Report]

File: Circuit board cover.jpg (228 KB, 1004x1776) Image search: [Google]

228 KB, 1004x1776

Problem of the week:

easy:

Create a huffman tree for the following passage (You may leave out punctiuation marks and is not caps-sensitive).

Gomenasai, my name is Ken-Sama.

I’m a 27 year old American Otaku (Anime fan for you gaijins). I draw Anime and Manga on my tablet, and spend my days perfecting my art and playing superior Japanese games. (Disgaea, Final Fantasy, Persona series)

I train with my Katana every day, this superior weapon can cut clean through steel because it is folded over a thousand times, and is vastly superior to any other weapon on earth. I earned my sword license two years ago, and I have been getting better every day.

I speak Japanese fluently, both Kanji and the Osaka dialect, and I write fluently as well. I know everything about Japanese history and their bushido code, which I follow 100%

When I get my Japanese visa, I am moving to Tokyo to attend a prestigious High School to learn more about their magnificent culture. I hope I can become an animator for Studio Ghibli or a game designer!

I own several kimonos, which I wear around town. I want to get used to wearing them before I move to Japan, so I can fit in easier. I bow to my elders and seniors and speak Japanese as often as I can, but rarely does anyone manage to respond.

Wish me luck in Japan!

You may use a frequency analyzer such as http://www.counton.org/explorer/codebreaking/frequency-analysis.php

normal:

Compress the following bitstream:
001000010000011000100000101000001001000000000110000111

hard:

Create a program that will take a string, and create a huffman tree. Print out the bit combos for each char.

HAPPY HOLIDAYS!

>>

AlgorithmDickSucker (ID: !IPB0Ht34Oc) 2015-12-17 20:54:41 Post No.51917529
[Report]

AlgorithmDickSucker (ID: !IPB0Ht34Oc) 2015-12-17 20:54:41 Post No.51917529 [Report]

>>51917496
why not anon?

Isn't /g/ getting a bit boring?

>>

Anonymous 2015-12-17 20:56:08 Post No.51917554
[Report]

Anonymous 2015-12-17 20:56:08 Post No.51917554 [Report]

>>51917483
Enjoying your first week in your introductions to algorithms course?

>>

Anonymous 2015-12-17 20:57:41 Post No.51917577
[Report]

Anonymous 2015-12-17 20:57:41 Post No.51917577 [Report]

>>51917554
I've never taken an algorithms course. I found a book in the library and I thought huffman trees were really cool, so I thought it would be a good thread idea on /g/.

Guess not though.

>>

Anonymous 2015-12-17 21:16:50 Post No.51917851
[Report]

Anonymous 2015-12-17 21:16:50 Post No.51917851 [Report]

Ohh nice idea. I'll continue lurking. Currently at Christmas party so I can't contribute.

>>

Anonymous 2015-12-17 21:52:24 Post No.51918349
[Report]

Anonymous 2015-12-17 21:52:24 Post No.51918349 [Report]

>>51917502
>normal:
>Compress the following bitstream:
>001000010000011000100000101000001001000000000110000111

I don't get it. I thought this was supposed to compress strings into bitstreams. How would you even compress a bitstream with huffman codes?

>>

Anonymous 2015-12-17 22:02:39 Post No.51918527
[Report]

Anonymous 2015-12-17 22:02:39 Post No.51918527 [Report]

>>51917483
i like your intentions. keep at it.

Better than AMD is for faggots, ublock is for faggots, pirating is for faggots, buy my shill product.

>>

Anonymous 2015-12-17 22:09:52 Post No.51918641
[Report] Image search: [Google]

Anonymous 2015-12-17 22:09:52 Post No.51918641 [Report]

File: feel.jpg (42 KB, 507x488) Image search: [Google]

42 KB, 507x488

>>51917851

>>

Anonymous 2015-12-17 22:50:26 Post No.51919364
[Report]

Anonymous 2015-12-17 22:50:26 Post No.51919364 [Report]

>>51918641
Normally I'd agree but the people at my workplace are really cool.

>>

Anonymous 2015-12-17 22:51:51 Post No.51919398
[Report]

Anonymous 2015-12-17 22:51:51 Post No.51919398 [Report]

>>51919364
>Currently at Christmas party
>. I'll continue lurking.

>>

Anonymous 2015-12-17 23:10:08 Post No.51919696
[Report]

Anonymous 2015-12-17 23:10:08 Post No.51919696 [Report]

>>51919398
Lurking aka I pinned this thread an I'll check it from time to time.

>>

Anonymous 2015-12-17 23:22:11 Post No.51919883
[Report]

Anonymous 2015-12-17 23:22:11 Post No.51919883 [Report]

>>51917483
Okay, now explain me where are the letters in the compressed form. I don't see a dictionary in the second bitstream. How would on decompress it, without using a separate algorithm for each different dictionary (set of letters)?

>>

Anonymous 2015-12-17 23:32:26 Post No.51920036
[Report]

Anonymous 2015-12-17 23:32:26 Post No.51920036 [Report]

Nice thread op, unfortunately most people on here are busy complaining about font rendering and which anime player is best.

Gonna code this thing tommorow after my last exam.

>>

Anonymous 2015-12-17 23:37:48 Post No.51920121
[Report]

Anonymous 2015-12-17 23:37:48 Post No.51920121 [Report]

>>51917483
>>51919883 (cont.)
The compressed bitstream has multiple interpretations.
Every 001111 can be interpreted (decompressed) as either 'e' or 'taaa'.
There's no way to compress 'e'/'taaa' (001111/001 1 1 1) nor 'na'/'tat' (001100 1/001 1 001) reliably, given we reuse the same dictionary.
1 001 001 1 0010 001110 1 001 001101 001100 0010 001111
a t   t   a c    k      a t   o      n      c    e
1 001 001 1 0010 001110 1 001 001101 001100 0010 001 1 1 1
a t   t   a c    k      a t   o      n      c    t   a a a

>>

Anonymous 2015-12-17 23:38:35 Post No.51920130
[Report] Image search: [Google]

Anonymous 2015-12-17 23:38:35 Post No.51920130 [Report]

File: Knuth-A-small-257x300.jpg (17 KB, 257x300) Image search: [Google]

17 KB, 257x300

>>51917483
Huffman encoding is pretty easy.
Knuth-Morris-Pratt is another interesting algorithm for people around here to implement

>>

Anonymous 2015-12-17 23:39:47 Post No.51920151
[Report]

Anonymous 2015-12-17 23:39:47 Post No.51920151 [Report]

>>51917483
>attackatonce
>First, we would analyze the string, to see the frequency of the letters:
I see 3 occurrences of 'a' and 3 occurrences of 't' in that string, so why does the frequency analysis get 7 instead of 3?

>>

Anonymous 2015-12-17 23:45:04 Post No.51920237
[Report]

Anonymous 2015-12-17 23:45:04 Post No.51920237 [Report]

>>51920151
OP is a faggot and does not understand what he's talking about.
He seems to not be using exact frequencies of letters to generate the tree.

>>

Anonymous 2015-12-17 23:56:16 Post No.51920405
[Report]

Anonymous 2015-12-17 23:56:16 Post No.51920405 [Report]

>>51917483
this is pretty stupid because the person at the other end doesn't have the tree and they can't generate it because its based on the message right?
but what are you saying that your compressed message takes up like 45 bits? my eyes aren't too good so that might not be right. so 12 characters, 45/12 is 3.7b bits per character right?
well suck my dick because you have 7 characters you can define an encoding of 3 bits per character, say a = 000, t = 001.....i = 110
then you are down to 36 bits for your message.

so you look pretty stupid now huh.

>>

Anonymous 2015-12-18 00:08:46 Post No.51920588
[Report]

Anonymous 2015-12-18 00:08:46 Post No.51920588 [Report]

>>51920405
Not OP, but it obviously becomes more useful as you scale up

>>

Anonymous 2015-12-18 00:11:24 Post No.51920620
[Report] Image search: [Google]

Anonymous 2015-12-18 00:11:24 Post No.51920620 [Report]

File: gay.png (20 KB, 775x585) Image search: [Google]

20 KB, 775x585

Yeah, OP unfortunately seems not to actually understand Huffman codes. The encoding I got was:
a -> 10
c -> 000
e -> 111
k -> 110
n -> 0011
o -> 0010
t -> 01
Which means
attackatonce -> 10010110000100100100011000111

>>

Anonymous 2015-12-18 00:47:53 Post No.51921077
[Report]

Anonymous 2015-12-18 00:47:53 Post No.51921077 [Report]

Please keep on with this kind of threads. I like it

>>

Anonymous 2015-12-18 01:14:35 Post No.51921417
[Report]

Anonymous 2015-12-18 01:14:35 Post No.51921417 [Report]

>>51917483
OK OP, let's start with something not so worn off...

Week -1 - Seam Carving

Seam Carving is an algorithm that resizes an image while preserving important part of it.

It's based on finding edges and that is why I found it fits in here.

After finding edges, the algorithm searches for the path of least resistance and eliminates the pixels or adds some in between.

>>

Anonymous 2015-12-18 01:16:26 Post No.51921444
[Report]

Anonymous 2015-12-18 01:16:26 Post No.51921444 [Report]

>>51920620

sorry, I was bored as hell at work, and i got distracted more than once while reading, and counting letters... I'm not sure how I ended up with 7 A's.

I'm glad some of you like this idea, though.

>>

Anonymous 2015-12-18 01:18:00 Post No.51921459
[Report]

Anonymous 2015-12-18 01:18:00 Post No.51921459 [Report]

>>51918349
you find patterns

>>

Anonymous 2015-12-18 01:39:04 Post No.51921706
[Report]

Anonymous 2015-12-18 01:39:04 Post No.51921706 [Report]

>>51920121
Then you failed at it, there should be no ambiguity if you did it correctly.

>>

Anonymous 2015-12-18 02:15:51 Post No.51922133
[Report]

Anonymous 2015-12-18 02:15:51 Post No.51922133 [Report]

>>51921706
It's OP's version. Seems you can't read, and are as huge faggot as OP.

>>

Anonymous 2015-12-18 02:18:33 Post No.51922165
[Report] Image search: [Google]

Anonymous 2015-12-18 02:18:33 Post No.51922165 [Report]

File: huffman.png (22 KB, 2018x332) Image search: [Google]

22 KB, 2018x332

>>51920620
His explanation leaves something to be desired.
There are many aspects of Huffman encoding that will effect the final result but will still encode something just as well. As such these decisions need to be made systematic across all Huffman implementations:
>Order your letters by probability from left to right (highest to lowest)
>To order letters with the same probability use order of occurrence in the original string (not alphabetical)
>If there are more two letters with the same probability pair the rightmost letters.

And even after being meticulous you're stuck with this business where you either have to tangle or dance.

Anyway starting from the top you go left for one and right for zero giving you:
t = 11
c = 10
a = 01
k = 0011
o = 0010
n = 0001
e = 0000

01 11 11 01 10 0011 01 11 0010 0001 10 0000
a t t a c k a t o n c e
32 bits

you had
10 01 01 10 000 110 10 01 0010 0011 000 111
a t t a c k a t o n c e
32 bits

Which would be fine but you cant read what I write and vice versa.

>>

Anonymous 2015-12-18 02:21:15 Post No.51922204
[Report]

Anonymous 2015-12-18 02:21:15 Post No.51922204 [Report]

>>51921417
Nah, you've gotta start with bubble sort.
It's not intuitive, efficient or used in any other algorithms.
A perfect place to start.

>>

Anonymous 2015-12-18 02:55:54 Post No.51922624
[Report]

Anonymous 2015-12-18 02:55:54 Post No.51922624 [Report]

>>51918349
>>51921459
I feel we are leaving Huffman here and moving to a more generalised probability distribution encoding

You can consider the binary a string consisting of these characters, that'd be a good way to encode it.

111: 1
11: 2
1: 7
000000000: 1
00000:3
0000: 2
000: 1
00: 2
0: 1

Though I doubt it is the most efficient way to do it.
In English text there are a number of words or combinations of letters that occur more frequently than some individual letters. For example "the" occurs more often than "z". It would therefore be more efficient if your algorithm takes into account combinations of letters. Generally you will try every combination of 2, 3, 4 ... up to a given number of letters. If the combination occurs a given minimum number of times then you will encode that sequence(word) as a character(give it a place on your tree). To be certain you've saved all the space possible you will have to search for every combination of characters up until the length of the search material, which could very well be a lengthy novel.

I commented earlier saying you should interpret the binary sequence as ascii then use standard Huffman encoding, but that only works if you have a factor of eight bits.

>>51919883
>>51920121
I don't know what he is trying to say.
You take the first bit. It it is 1 go left, if it is 0 go right. Continue until you reach the bottom of the tree, then you have a letter. Then move back to the start of the tree with the next bit.
You need the tree itself, not just the character translations. Otherwise you couldn't tell apart "1010" from "10" and "10".

>>51920151
A mistake it seems.

>>51920405
ASCII is 8 bits per character. You can always do better than that with English text. This is because there are some characters less used than others, so using fewer bits to represent common characters will save you some space. However if you try to compress true random or already compressed data, then you wont save anything.

>>

Anonymous 2015-12-18 03:03:04 Post No.51922688
[Report]

Anonymous 2015-12-18 03:03:04 Post No.51922688 [Report]

>>51922624
Ah, thank you for explanation. In such case, OP's tree is broken (e.g. 'a' can't be 1, 3rd node leads to two 1s).

>>

Anonymous 2015-12-18 03:03:34 Post No.51922694
[Report]

Anonymous 2015-12-18 03:03:34 Post No.51922694 [Report]

>>51922624
>true random
I meant evenly distributed

>>

Anonymous 2015-12-18 03:35:45 Post No.51923043
[Report] Image search: [Google]

Anonymous 2015-12-18 03:35:45 Post No.51923043 [Report]

File: huffman2.png (5 KB, 314x260) Image search: [Google]

5 KB, 314x260

>>51922688
The tree isn't broken. As long as the tree splits a maximum of 2 times per branch it is a valid tree.
His tree just doesn't fit the distribution.
Something like:
aaaaaaaaaaaaaaaabbbbbbbbccccdde
would turn out to be a very one sided tree.
Intuitively representing half the sequence as one character will save a lot of space even if e ends up being absurdly long.

With an ASCII like formatting you'd use 3 bits per letter
a = 000
b = 001
c = 010
d = 011
e = 100
for a total of 31*3=91 bits

Where the pictured tree would give you
1*16 + 2*8 + 3*4 + 4*2 + 5*1 = 57 bits

>>

Anonymous 2015-12-18 03:38:29 Post No.51923086
[Report]

Anonymous 2015-12-18 03:38:29 Post No.51923086 [Report]

>>51923043
>Intuitively representing half the sequence as one character
Intuitively, representing half the sequence with one bit per character

>>

Anonymous 2015-12-18 03:39:50 Post No.51923106
[Report] Image search: [Google]

Anonymous 2015-12-18 03:39:50 Post No.51923106 [Report]

File: Sorting_quicksort_anim.gif (91 KB, 280x214) Image search: [Google]

91 KB, 280x214

Struggling with quicksort anons?
This video helped me understand it: https://www.youtube.com/watch?v=XE4VP_8Y0BU

>>

Anonymous 2015-12-18 03:49:12 Post No.51923262
[Report]

Anonymous 2015-12-18 03:49:12 Post No.51923262 [Report]

>>51917483
More people like you are required in /g/. Plz keep these stuff coming :)

>>

Anonymous 2015-12-18 04:13:46 Post No.51923522
[Report]

Anonymous 2015-12-18 04:13:46 Post No.51923522 [Report]

>>51917502
Its MERRY CHRISTMAS you cunt.

>>

Anonymous 2015-12-18 04:17:57 Post No.51923581
[Report]

Anonymous 2015-12-18 04:17:57 Post No.51923581 [Report]

>>51923522
Shalom!

>>

Anonymous 2015-12-18 04:22:05 Post No.51923630
[Report]

Anonymous 2015-12-18 04:22:05 Post No.51923630 [Report]

>>51923106
>Struggling with quicksort?
No

>>

Anonymous 2015-12-18 05:15:15 Post No.51924203
[Report]

Anonymous 2015-12-18 05:15:15 Post No.51924203 [Report]

>>51917483

BUMP

>>

Anonymous 2015-12-18 05:19:06 Post No.51924250
[Report]

Anonymous 2015-12-18 05:19:06 Post No.51924250 [Report]

>>51917483
there are 3 A's, 3 T's, OP

>>

Anonymous 2015-12-18 05:30:37 Post No.51924408
[Report] Image search: [Google]

Anonymous 2015-12-18 05:30:37 Post No.51924408 [Report]

File: 1445304595275.gif (2 MB, 384x145) Image search: [Google]

2 MB, 384x145

>>

Anonymous 2015-12-18 08:37:25 Post No.51926303
[Report]

Anonymous 2015-12-18 08:37:25 Post No.51926303 [Report]

>>51924408
it hurts to know that this is how 90% of the cs majors would sort their lists.

>>

Anonymous 2015-12-18 08:59:00 Post No.51926455
[Report]

Anonymous 2015-12-18 08:59:00 Post No.51926455 [Report]

>>51917483

learning algorithms next semester senpai, will the MIT course help me out?

>>

Anonymous 2015-12-18 09:08:12 Post No.51926526
[Report]

Anonymous 2015-12-18 09:08:12 Post No.51926526 [Report]

>>51917483
'a' doesn't appear 7 times though, neither does 't'.
What the fuck are you doing?

>>

Anonymous 2015-12-18 09:33:44 Post No.51926737
[Report]

Anonymous 2015-12-18 09:33:44 Post No.51926737 [Report]

>>51926303
Bubblesort being shit is like the first thing they taught us in our algo class.

>>

Anonymous 2015-12-18 09:40:02 Post No.51926790
[Report]

Anonymous 2015-12-18 09:40:02 Post No.51926790 [Report]

>>51922165
OP's example is pretty shit, because

1) unlike in the English language, the probability distribution in the string he chose is pretty uniform, so Huffman can't show its full potential, and
2) he still failed at building the tree himself.

Huffman encoding is still pretty based, though. It gets really close to the entropy of the string and the size overhead of transmitting a tree of a few dozen characters is really negligible once you get to the point where compression makes sense.

>>

Anonymous 2015-12-18 09:57:22 Post No.51926940
[Report]

Anonymous 2015-12-18 09:57:22 Post No.51926940 [Report]

>>51917577
Wtf you are retarded

>>

Anonymous 2015-12-18 11:31:25 Post No.51927672
[Report]

Anonymous 2015-12-18 11:31:25 Post No.51927672 [Report]

>>51926455
Every university in the world rips of the MIT course. Lazy pricks.

>>

Anonymous 2015-12-18 12:28:12 Post No.51928135
[Report]

Anonymous 2015-12-18 12:28:12 Post No.51928135 [Report]

>>51921417
>give it a picture of an /r9k/ thread
>half the posts are highlighted

>>

Anonymous 2015-12-18 12:29:37 Post No.51928150
[Report]

Anonymous 2015-12-18 12:29:37 Post No.51928150 [Report]

>>51926303
Pretty sure 90% of CS majors would use the built in sorting package

>>

Anonymous 2015-12-18 12:32:07 Post No.51928177
[Report]

Anonymous 2015-12-18 12:32:07 Post No.51928177 [Report]

>>51926940
>>51917554
Close this tab at once

>>

Anonymous 2015-12-18 12:55:38 Post No.51928364
[Report]

Anonymous 2015-12-18 12:55:38 Post No.51928364 [Report]

>>51923522
>Its
Go back to school, dumb Yank.

>>

Anonymous 2015-12-18 13:24:05 Post No.51928582
[Report]

Anonymous 2015-12-18 13:24:05 Post No.51928582 [Report]

>>51924408
it's painful to watch

>>

Anonymous 2015-12-18 13:26:37 Post No.51928606
[Report]

Anonymous 2015-12-18 13:26:37 Post No.51928606 [Report]

You would have already known about this and done those exercises if you had read SICP.

Why are you on /g/ if you've never SICP?

>>

Anonymous 2015-12-18 13:39:41 Post No.51928710
[Report]

Anonymous 2015-12-18 13:39:41 Post No.51928710 [Report]

>>51924408
Now someone should make a gif of that using quick sort