[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
/wat/ - Weekly Algorithm Thread
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /g/ - Technology

Thread replies: 57
Thread images: 9
File: SgcpY[1].png (12 KB, 735x456) Image search: [Google]
SgcpY[1].png
12 KB, 735x456
I feel that /g/ needs more technology related threads, so I thought that over winter break (4 weeks), I would learn you a new algorithm. Hopefully this inspires more discussion of algorithms and maybe an algorithm general.

Week 0 - Huffman Trees

Huffman Trees are a compression algorithm in which you map a block of memory to a string of bits.

For example, let's say that we are trying to compress a message:

attackatonce


First, we would analyze the string, to see the frequency of the letters:

a - 7
t - 7
c - 2
e - 1
k - 1
n - 1
o - 1


We will now create the tree. We will start construction of this tree by taking the bottom two nodes (lowest frequency), and setting them as the child nodes, with a parent node to accompany:

Parent node: 2(frequency):*
Child node 1: 1(frequency):o
Child node 2: 1(frequency):n

2:*
/ \
1:o 1:n


We now have a list as following:

a - 7
t - 7
c - 2
n+o tree - 2
e - 1
k - 1


As you can see, we can recursively work with this until our tree is complete, which you can view in the image post. Using this tree, we can represent our string using a compressed bitstream instead of using the 8 bit values for chars. This can significantly reduce memory usage.

To parse the compressed string, we start at the root node of the tree, and move using the bits provided. When we reach a leaf node, we replace that block of binary values with it's uncompressed value.


Resources:
https://en.wikipedia.org/wiki/Huffman_coding
https://home.cse.ust.hk/~dekai/271/notes/L15/L15.pdf
https://mitpress.mit.edu/sicp/full-text/sicp/book/node41.html
>>
>>51917483

I don't like technology, I like shitposting. Go away.
>>
File: Circuit board cover.jpg (228 KB, 1004x1776) Image search: [Google]
Circuit board cover.jpg
228 KB, 1004x1776
Problem of the week:

easy:
Create a huffman tree for the following passage (You may leave out punctiuation marks and is not caps-sensitive).

Gomenasai, my name is Ken-Sama.

I’m a 27 year old American Otaku (Anime fan for you gaijins). I draw Anime and Manga on my tablet, and spend my days perfecting my art and playing superior Japanese games. (Disgaea, Final Fantasy, Persona series)

I train with my Katana every day, this superior weapon can cut clean through steel because it is folded over a thousand times, and is vastly superior to any other weapon on earth. I earned my sword license two years ago, and I have been getting better every day.

I speak Japanese fluently, both Kanji and the Osaka dialect, and I write fluently as well. I know everything about Japanese history and their bushido code, which I follow 100%

When I get my Japanese visa, I am moving to Tokyo to attend a prestigious High School to learn more about their magnificent culture. I hope I can become an animator for Studio Ghibli or a game designer!

I own several kimonos, which I wear around town. I want to get used to wearing them before I move to Japan, so I can fit in easier. I bow to my elders and seniors and speak Japanese as often as I can, but rarely does anyone manage to respond.

Wish me luck in Japan!

You may use a frequency analyzer such as http://www.counton.org/explorer/codebreaking/frequency-analysis.php


normal:
Compress the following bitstream:
001000010000011000100000101000001001000000000110000111


hard:
Create a program that will take a string, and create a huffman tree. Print out the bit combos for each char.


HAPPY HOLIDAYS!
>>
>>51917496
why not anon?

Isn't /g/ getting a bit boring?
>>
>>51917483
Enjoying your first week in your introductions to algorithms course?
>>
>>51917554
I've never taken an algorithms course. I found a book in the library and I thought huffman trees were really cool, so I thought it would be a good thread idea on /g/.

Guess not though.
>>
Ohh nice idea. I'll continue lurking. Currently at Christmas party so I can't contribute.
>>
>>51917502
>normal:
>Compress the following bitstream:
>001000010000011000100000101000001001000000000110000111

I don't get it. I thought this was supposed to compress strings into bitstreams. How would you even compress a bitstream with huffman codes?
>>
>>51917483
i like your intentions. keep at it.

Better than AMD is for faggots, ublock is for faggots, pirating is for faggots, buy my shill product.
>>
File: feel.jpg (42 KB, 507x488) Image search: [Google]
feel.jpg
42 KB, 507x488
>>51917851
>>
>>51918641
Normally I'd agree but the people at my workplace are really cool.
>>
>>51919364
>Currently at Christmas party
>. I'll continue lurking.
>>
>>51919398
Lurking aka I pinned this thread an I'll check it from time to time.
>>
>>51917483
Okay, now explain me where are the letters in the compressed form. I don't see a dictionary in the second bitstream. How would on decompress it, without using a separate algorithm for each different dictionary (set of letters)?
>>
Nice thread op, unfortunately most people on here are busy complaining about font rendering and which anime player is best.

Gonna code this thing tommorow after my last exam.
>>
>>51917483
>>51919883 (cont.)
The compressed bitstream has multiple interpretations.
Every 001111 can be interpreted (decompressed) as either 'e' or 'taaa'.
There's no way to compress 'e'/'taaa' (001111/001 1 1 1) nor 'na'/'tat' (001100 1/001 1 001) reliably, given we reuse the same dictionary.
1 001 001 1 0010 001110 1 001 001101 001100 0010 001111
a t t a c k a t o n c e
1 001 001 1 0010 001110 1 001 001101 001100 0010 001 1 1 1
a t t a c k a t o n c t a a a
>>
File: Knuth-A-small-257x300.jpg (17 KB, 257x300) Image search: [Google]
Knuth-A-small-257x300.jpg
17 KB, 257x300
>>51917483
Huffman encoding is pretty easy.
Knuth-Morris-Pratt is another interesting algorithm for people around here to implement
>>
>>51917483
>attackatonce
>First, we would analyze the string, to see the frequency of the letters:
I see 3 occurrences of 'a' and 3 occurrences of 't' in that string, so why does the frequency analysis get 7 instead of 3?
>>
>>51920151
OP is a faggot and does not understand what he's talking about.
He seems to not be using exact frequencies of letters to generate the tree.
>>
>>51917483
this is pretty stupid because the person at the other end doesn't have the tree and they can't generate it because its based on the message right?
but what are you saying that your compressed message takes up like 45 bits? my eyes aren't too good so that might not be right. so 12 characters, 45/12 is 3.7b bits per character right?
well suck my dick because you have 7 characters you can define an encoding of 3 bits per character, say a = 000, t = 001.....i = 110
then you are down to 36 bits for your message.

so you look pretty stupid now huh.
>>
>>51920405
Not OP, but it obviously becomes more useful as you scale up
>>
File: gay.png (20 KB, 775x585) Image search: [Google]
gay.png
20 KB, 775x585
Yeah, OP unfortunately seems not to actually understand Huffman codes. The encoding I got was:
a -> 10
c -> 000
e -> 111
k -> 110
n -> 0011
o -> 0010
t -> 01

Which means
attackatonce -> 10010110000100100100011000111
>>
Please keep on with this kind of threads. I like it
>>
>>51917483
OK OP, let's start with something not so worn off...

Week -1 - Seam Carving

Seam Carving is an algorithm that resizes an image while preserving important part of it.

It's based on finding edges and that is why I found it fits in here.

After finding edges, the algorithm searches for the path of least resistance and eliminates the pixels or adds some in between.
>>
>>51920620

sorry, I was bored as hell at work, and i got distracted more than once while reading, and counting letters... I'm not sure how I ended up with 7 A's.

I'm glad some of you like this idea, though.
>>
>>51918349
you find patterns
>>
>>51920121
Then you failed at it, there should be no ambiguity if you did it correctly.
>>
>>51921706
It's OP's version. Seems you can't read, and are as huge faggot as OP.
>>
File: huffman.png (22 KB, 2018x332) Image search: [Google]
huffman.png
22 KB, 2018x332
>>51920620
His explanation leaves something to be desired.
There are many aspects of Huffman encoding that will effect the final result but will still encode something just as well. As such these decisions need to be made systematic across all Huffman implementations:
>Order your letters by probability from left to right (highest to lowest)
>To order letters with the same probability use order of occurrence in the original string (not alphabetical)
>If there are more two letters with the same probability pair the rightmost letters.

And even after being meticulous you're stuck with this business where you either have to tangle or dance.

Anyway starting from the top you go left for one and right for zero giving you:
t = 11
c = 10
a = 01
k = 0011
o = 0010
n = 0001
e = 0000

01 11 11 01 10 0011 01 11 0010 0001 10 0000
a t t a c k a t o n c e
32 bits

you had
10 01 01 10 000 110 10 01 0010 0011 000 111
a t t a c k a t o n c e
32 bits

Which would be fine but you cant read what I write and vice versa.
>>
>>51921417
Nah, you've gotta start with bubble sort.
It's not intuitive, efficient or used in any other algorithms.
A perfect place to start.
>>
>>51918349
>>51921459
I feel we are leaving Huffman here and moving to a more generalised probability distribution encoding

You can consider the binary a string consisting of these characters, that'd be a good way to encode it.

111: 1
11: 2
1: 7
000000000: 1
00000:3
0000: 2
000: 1
00: 2
0: 1

Though I doubt it is the most efficient way to do it.
In English text there are a number of words or combinations of letters that occur more frequently than some individual letters. For example "the" occurs more often than "z". It would therefore be more efficient if your algorithm takes into account combinations of letters. Generally you will try every combination of 2, 3, 4 ... up to a given number of letters. If the combination occurs a given minimum number of times then you will encode that sequence(word) as a character(give it a place on your tree). To be certain you've saved all the space possible you will have to search for every combination of characters up until the length of the search material, which could very well be a lengthy novel.

I commented earlier saying you should interpret the binary sequence as ascii then use standard Huffman encoding, but that only works if you have a factor of eight bits.

>>51919883
>>51920121
I don't know what he is trying to say.
You take the first bit. It it is 1 go left, if it is 0 go right. Continue until you reach the bottom of the tree, then you have a letter. Then move back to the start of the tree with the next bit.
You need the tree itself, not just the character translations. Otherwise you couldn't tell apart "1010" from "10" and "10".

>>51920151
A mistake it seems.

>>51920405
ASCII is 8 bits per character. You can always do better than that with English text. This is because there are some characters less used than others, so using fewer bits to represent common characters will save you some space. However if you try to compress true random or already compressed data, then you wont save anything.
>>
>>51922624
Ah, thank you for explanation. In such case, OP's tree is broken (e.g. 'a' can't be 1, 3rd node leads to two 1s).
>>
>>51922624
>true random
I meant evenly distributed
>>
File: huffman2.png (5 KB, 314x260) Image search: [Google]
huffman2.png
5 KB, 314x260
>>51922688
The tree isn't broken. As long as the tree splits a maximum of 2 times per branch it is a valid tree.
His tree just doesn't fit the distribution.
Something like:
aaaaaaaaaaaaaaaabbbbbbbbccccdde
would turn out to be a very one sided tree.
Intuitively representing half the sequence as one character will save a lot of space even if e ends up being absurdly long.

With an ASCII like formatting you'd use 3 bits per letter
a = 000
b = 001
c = 010
d = 011
e = 100
for a total of 31*3=91 bits

Where the pictured tree would give you
1*16 + 2*8 + 3*4 + 4*2 + 5*1 = 57 bits
>>
>>51923043
>Intuitively representing half the sequence as one character
Intuitively, representing half the sequence with one bit per character
>>
File: Sorting_quicksort_anim.gif (91 KB, 280x214) Image search: [Google]
Sorting_quicksort_anim.gif
91 KB, 280x214
Struggling with quicksort anons?
This video helped me understand it: https://www.youtube.com/watch?v=XE4VP_8Y0BU
>>
>>51917483
More people like you are required in /g/. Plz keep these stuff coming :)
>>
>>51917502
Its MERRY CHRISTMAS you cunt.
>>
>>51923522
Shalom!
>>
>>51923106
>Struggling with quicksort?
No
>>
>>51917483

BUMP
>>
>>51917483
there are 3 A's, 3 T's, OP
>>
File: 1445304595275.gif (2 MB, 384x145) Image search: [Google]
1445304595275.gif
2 MB, 384x145
>>
>>51924408
it hurts to know that this is how 90% of the cs majors would sort their lists.
>>
>>51917483

learning algorithms next semester senpai, will the MIT course help me out?
>>
>>51917483
'a' doesn't appear 7 times though, neither does 't'.
What the fuck are you doing?
>>
>>51926303
Bubblesort being shit is like the first thing they taught us in our algo class.
>>
>>51922165
OP's example is pretty shit, because

1) unlike in the English language, the probability distribution in the string he chose is pretty uniform, so Huffman can't show its full potential, and
2) he still failed at building the tree himself.

Huffman encoding is still pretty based, though. It gets really close to the entropy of the string and the size overhead of transmitting a tree of a few dozen characters is really negligible once you get to the point where compression makes sense.
>>
>>51917577
Wtf you are retarded
>>
>>51926455
Every university in the world rips of the MIT course. Lazy pricks.
>>
>>51921417
>give it a picture of an /r9k/ thread
>half the posts are highlighted
>>
>>51926303
Pretty sure 90% of CS majors would use the built in sorting package
>>
>>51926940
>>51917554
Close this tab at once
>>
>>51923522
>Its
Go back to school, dumb Yank.
>>
>>51924408
it's painful to watch
>>
You would have already known about this and done those exercises if you had read SICP.

Why are you on /g/ if you've never SICP?
>>
>>51924408
Now someone should make a gif of that using quick sort
Thread replies: 57
Thread images: 9

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.