Some processing of log files


As you may be aware of this blog is a big pile of more or less unedited things that I found useful.

Here are some commands I've used to process some access logs from our web-servers.



cat access_log_* | sed /CloudFront/d | sed /utm_source/d | grep ".se" | grep "single" | cut -f 3 -d " " | sort | uniq -c | sort -r  > crawlers4.txt

cat allaccess.txt | grep "87.222.222.111"| cut -f 3,9 -d " " | sort | uniq -c | sort -r


grep -Fx -f file1 file2
(intersection between two files)

faster
comm -12 file1 file2


complement (those that are in a but not b)
comm -2 -3 <(cat ips_all_campaigns.txt) <(cat ips_utmsource.txt)



cat access_log_* | grep "CloudFront"| cut -f 3 -d " " | sort -u > ips_cloudfront.txt
cat access_log_* | grep "utm_source"| cut -f 3 -d " " | sort -u > ips_utmsource.txt
cat access_log_* | grep "utm_name"| cut -f 3 -d " " | sort -u > ips_utmname.txt
cat access_log_* | grep "kampagner"| cut -f 3 -d " " | sort -u > ips_kampagner.txt
cat access_log_* | grep "campaigns"| cut -f 3 -d " " | sort -u > ips_campaigns.txt
cat access_log_* | grep "/public/javascripts"| cut -f 3 -d " " | sort -u > ips_downloaded_javascript.txt
cat access_log_* | grep "/profile/track/"| cut -f 3 -d " " | sort -u > ips_profile_track.txt


cat ips_kampagner.txt ips_campaigns.txt > all_campaigns.txt
cat all_campaigns.txt | sort -u > ips_all_campaigns.txt
comm -2 -3 <(cat ips_all_campaigns.txt) <(cat ips_utmsource.txt) > 1.txt <--- those that do not have an utm source
comm -2 -3 <(cat 1.txt) <(cat ips_cloudfront.txt) > 2.txt <-- those that are not cloud front
comm -2 -3 <(cat 2.txt) <(cat ips_utmname.txt) > 3.txt <-- those that are not utmname
comm -2 -3 <(cat 3.txt) <(cat ips_profile_track.txt) > 4.txt <-- of those... check who actually got tracked, and save the rest


sed /campaigns_ajax/d allaccess.txt > allnoajax.txt

grep -F -f 4.txt allaccess.txt > allmatchingloglines.txt

grep -F -f 4.txt allnoajax.txt > allmatchingloglines.txt

cat allmatchingloglines.txt | cut -f 3 -d " " | sort | uniq -c | sort -r > potentialcrawler.txt

Comments

Popular posts from this blog

Ruby weirdness

Running LXD/LXC on WSL2 with Ubuntu 20.04

Installing pikvm on raspberry pi zero 2 w