analyze json data with sed help

Thread replies: 19
Thread images: 8

Anonymous
analyze json data with sed help 2015-11-28 03:01:57 Post No. 51566580
[Report] Image search: [Google]

File: sed error.png (212 KB, 1004x674) Image search: [Google]

analyze json data with sed help Anonymous 2015-11-28 03:01:57 Post No. 51566580 [Report]

I need some help with sed. I have a txt file with some sample json data (https://github.com/jmportilla/Reddit-Data-Science-Project/blob/master/my_sample). I want to use sed to find the word "roster" (line 82 in the data set), then locate the author name (in this case "Dmagers") and output it to a file output.text, then jump to the next line and continue looking for more instances of the word "roster" and repeat the above (locate and write out author name) until it reaches the end.

If I run the following in a terminal:

cat /home/fsl/Desktop/data.txt|sed '/body":.*roster[&"]*/!d;s/.*name":"\([^"]*\).*/\1/

I get the error "sed: No Match" despite "roster" being there in line 82 in the data set. Can anyone help me figure out why it isn't working?

Pic related

>>

Anonymous 2015-11-28 03:09:01 Post No.51566674
[Report] Image search: [Google]

Anonymous 2015-11-28 03:09:01 Post No.51566674 [Report]

File: bIPgtan.jpg (80 KB, 720x960) Image search: [Google]

80 KB, 720x960

>>51566580
bump

>>

Anonymous 2015-11-28 03:10:53 Post No.51566704
[Report] Image search: [Google]

Anonymous 2015-11-28 03:10:53 Post No.51566704 [Report]

File: 1392046518533.png (664 KB, 1280x720) Image search: [Google]

664 KB, 1280x720

Pls /g/, are you there?

You're my only hope.

>>

Anonymous 2015-11-28 03:15:42 Post No.51566750
[Report] Image search: [Google]

Anonymous 2015-11-28 03:15:42 Post No.51566750 [Report]

File: 1386821571441.png (493 KB, 708x664) Image search: [Google]

493 KB, 708x664

help help

>>

Anonymous 2015-11-28 03:17:45 Post No.51566769
[Report] Image search: [Google]

Anonymous 2015-11-28 03:17:45 Post No.51566769 [Report]

File: 1385007824104.png (27 KB, 638x547) Image search: [Google]

27 KB, 638x547

>>51566750

>>

Anonymous 2015-11-28 03:23:39 Post No.51566848
[Report]

Anonymous 2015-11-28 03:23:39 Post No.51566848 [Report]

>>51566580
not sed, but python
this works

with open('C:\\Users\\faggot\\Downloads\\my_sample', 'rb') as infile:
    file_as_text = str(infile.read())
    split_json = file_as_text.splitlines()
    for i in split_json:
        j = json.loads(i)
        if 'roster' in j['body']:
            print('AUTHOR:\n%s\nPOST:\n%s\n' % (j['author'], j['body']))

>>

Anonymous 2015-11-28 03:24:53 Post No.51566865
[Report] Image search: [Google]

Anonymous 2015-11-28 03:24:53 Post No.51566865 [Report]

File: xjfdnm.png (10 KB, 790x196) Image search: [Google]

10 KB, 790x196

>>51566848
picture
I forgot, you also need to
import json

>>

Anonymous 2015-11-28 03:32:50 Post No.51566980
[Report]

Anonymous 2015-11-28 03:32:50 Post No.51566980 [Report]

>>51566865
>>51566848
Thanks, you are a saint! Trying it now!

>>

Anonymous 2015-11-28 03:37:29 Post No.51567045
[Report] Image search: [Google]

Anonymous 2015-11-28 03:37:29 Post No.51567045 [Report]

File: ootmgc.png (7 KB, 1098x130) Image search: [Google]

7 KB, 1098x130

>>51566980
my bad, you said you wanted to output to output.text (save to a file)
this should work, with no changes needed
just type 'python' in to your terminal (no quotes), and then copy/paste this
import json

with open('/home/fsl/Desktop/data.txt', 'rb') as infile, open('output.text', 'wb') as outfile:
    file_as_text = str(infile.read())
    split_json = file_as_text.splitlines()
    for i in split_json:
        j = json.loads(i)
        if 'roster' in j['body']:
            outfile.write('AUTHOR:\n%s\nPOST:\n%s\n' % (j['author'], j['body']))
this should then produce a text file like this

>>

Anonymous 2015-11-28 03:47:46 Post No.51567198
[Report]

Anonymous 2015-11-28 03:47:46 Post No.51567198 [Report]

>>51567045
How fast would you say this is compared to this:

sed -n '/roster/{s/.*,"author":"\([^"]*\)".*/\1/;p;}' /home/fsl/Desktop/data.txt

I have ~1,7 billion lines to run through.

>>

Anonymous 2015-11-28 03:50:31 Post No.51567235
[Report]

Anonymous 2015-11-28 03:50:31 Post No.51567235 [Report]

>>51567198
I honestly have no idea, but I'd assume that the sed version would be significantly faster
python is extremely slow compared to C (which is what sed is written in)
but either way, sed is the wrong tool for the job
I see this:
>\([^"]*\)
which is a regular expression
don't extract data from json using a regex, use a json parser

>>

Anonymous 2015-11-28 03:53:13 Post No.51567275
[Report]

Anonymous 2015-11-28 03:53:13 Post No.51567275 [Report]

>>51567235
>don't extract data from json using a regex, use a json parser

Do you think you could help me getting up an running with a json parser? I'm completely green as far as how to do that.

>>

Anonymous 2015-11-28 04:00:04 Post No.51567374
[Report]

Anonymous 2015-11-28 04:00:04 Post No.51567374 [Report]

regex may be the fastest way though
curl 'http://sprunge.us/HJfO' -o authorname && chmod +x authorname
then
./authorname roster
will output to a file output.txt

>>

Anonymous 2015-11-28 04:03:26 Post No.51567434
[Report]

Anonymous 2015-11-28 04:03:26 Post No.51567434 [Report]

>>51567275
>Do you think you could help me getting up an running with a json parser? I'm completely green as far as how to do that.
I'm sorry, but I don't think I'll be much help there
python is great for throwing a working prototype together really quickly, not so much for performance

>>

Anonymous 2015-11-28 04:18:01 Post No.51567658
[Report]

Anonymous 2015-11-28 04:18:01 Post No.51567658 [Report]

>>51567374
>then
>./authorname roster
>will output to a file output.txt

When I try that I get: Unmatched ".

>>

Anonymous 2015-11-28 04:29:03 Post No.51567843
[Report]

Anonymous 2015-11-28 04:29:03 Post No.51567843 [Report]

>>51566580
Out of curiosity, what is this for, OP?

>>

Anonymous 2015-11-28 04:29:42 Post No.51567853
[Report] Image search: [Google]

Anonymous 2015-11-28 04:29:42 Post No.51567853 [Report]

File: scrot.png (11 KB, 682x182) Image search: [Google]

11 KB, 682x182

>>51567658
werks on my machine

>>

Anonymous 2015-11-28 06:59:50 Post No.51569503
[Report]

Anonymous 2015-11-28 06:59:50 Post No.51569503 [Report]

>>51566580
Why would you ever use sed to parse json data?

So really, what you have here is multiple JSON objects (which would make up for broken JSON files, since JSON only allows one top-level structure) that are line separated. The various python examples should serve well.

>>51567198
As for performance, you should try it. I think you'll find using python isn't that slow for this task... if it is, maybe you should use jq(1).

>>

Anonymous 2015-11-28 10:39:41 Post No.51571472
[Report]

Anonymous 2015-11-28 10:39:41 Post No.51571472 [Report]

python -m json.tool shit.json | sed "your sed"