I need some help with sed. I have a txt file with some sample json data (https://github.com/jmportilla/Reddit-Data-Science-Project/blob/master/my_sample). I want to use sed to find the word "roster" (line 82 in the data set), then locate the author name (in this case "Dmagers") and output it to a file output.text, then jump to the next line and continue looking for more instances of the word "roster" and repeat the above (locate and write out author name) until it reaches the end.
If I run the following in a terminal:
cat /home/fsl/Desktop/data.txt|sed '/body":.*roster[&"]*/!d;s/.*name":"\([^"]*\).*/\1/
I get the error "sed: No Match" despite "roster" being there in line 82 in the data set. Can anyone help me figure out why it isn't working?
Pic related
>>51566580
bump
Pls /g/, are you there?
You're my only hope.
help help
>>51566750
>>51566580
not sed, but python
this workswith open('C:\\Users\\faggot\\Downloads\\my_sample', 'rb') as infile:
file_as_text = str(infile.read())
split_json = file_as_text.splitlines()
for i in split_json:
j = json.loads(i)
if 'roster' in j['body']:
print('AUTHOR:\n%s\nPOST:\n%s\n' % (j['author'], j['body']))
>>51566848
picture
I forgot, you also need toimport json
>>51566865
>>51566848
Thanks, you are a saint! Trying it now!
>>51566980
my bad, you said you wanted to output to output.text (save to a file)
this should work, with no changes needed
just type 'python' in to your terminal (no quotes), and then copy/paste thisimport json
with open('/home/fsl/Desktop/data.txt', 'rb') as infile, open('output.text', 'wb') as outfile:
file_as_text = str(infile.read())
split_json = file_as_text.splitlines()
for i in split_json:
j = json.loads(i)
if 'roster' in j['body']:
outfile.write('AUTHOR:\n%s\nPOST:\n%s\n' % (j['author'], j['body']))
this should then produce a text file like this
>>51567045
How fast would you say this is compared to this:
sed -n '/roster/{s/.*,"author":"\([^"]*\)".*/\1/;p;}' /home/fsl/Desktop/data.txt
I have ~1,7 billion lines to run through.
>>51567198
I honestly have no idea, but I'd assume that the sed version would be significantly faster
python is extremely slow compared to C (which is what sed is written in)
but either way, sed is the wrong tool for the job
I see this:
>\([^"]*\)
which is a regular expression
don't extract data from json using a regex, use a json parser
>>51567235
>don't extract data from json using a regex, use a json parser
Do you think you could help me getting up an running with a json parser? I'm completely green as far as how to do that.
regex may be the fastest way thoughcurl 'http://sprunge.us/HJfO' -o authorname && chmod +x authorname
then./authorname rosterwill output to a file output.txt
>>51567275
>Do you think you could help me getting up an running with a json parser? I'm completely green as far as how to do that.
I'm sorry, but I don't think I'll be much help there
python is great for throwing a working prototype together really quickly, not so much for performance
>>51567374
>then
>./authorname roster
>will output to a file output.txt
When I try that I get: Unmatched ".
>>51566580
Out of curiosity, what is this for, OP?
>>51567658
werks on my machine
>>51566580
Why would you ever use sed to parse json data?
So really, what you have here is multiple JSON objects (which would make up for broken JSON files, since JSON only allows one top-level structure) that are line separated. The various python examples should serve well.
>>51567198
As for performance, you should try it. I think you'll find using python isn't that slow for this task... if it is, maybe you should use jq(1).
python -m json.tool shit.json | sed "your sed"