Useful Code

Feb 27, 2020 | 3 minutes read

Most of the code I google have purple links when the page loads, so I figured I’d just make a page where I can bookmark the useful code snippets I use all the time.


# Read .csv with a different delimiter
df = pd.read_csv('../output.csv',delimiter=';;',engine='python')

# Combining dataframes when the columns are the same (from many days of scraping, for instance)
import glob
import os

def get_data():
    path = ''
    all_files = glob.glob(os.path.join(path, "../data/*.csv"))
    li = []
    for filename in all_files:
        df = pd.read_csv(filename,delimiter=';;',engine='python')
    df = pd.concat(li, axis=0)
    df = df.reset_index()
    return df   

For string functions on pandas dataframes, tutorialspoint has a useful article.

Big Query

import pandas_gbq

def read_query(query):
    return pandas_gbq.read_gbq(query)

sql_query = f"""

read_query(sql_query) # returns a dataframe

Google Drive

To pull data from a google sheet, you need to set up the Google Drive API

Cron Jobs

Note: if you are able to run a nontrivial python script using cron and a .sh file in less than 4 hours, you can rest assured that you are faster than me xD

Cron jobs allow you to set up jobs to run on your machine on a particular schedule. Some important lessons I’ve learned when using cron jobs:

  • Sometimes the version of python that cron looks at is not the same as the version of python you are using elsewhere.
---- bash / zshell ----

touch 		# Creates the file ""
chmod +x 	# Adds the "executable" option to
./ 			# Executes the script ""
---- ----
echo "========================================================================================"
. ~/.bash_profile
source env/bin/activate 

---- ----

from datetime import datetime						
time ="%m-%d_%H:%M%p")		# gets the time and converts it (strftime == "string from time")
print(f"*** Testing Cron Job - {time} ***") 		# Prints to console
---- bash / zshell ----

crontab -e # access crontab in vim editor
* * * * * cd /Users/jakeanderson/research/usa_jobs/ && ./ >> tmp/test.log 2>&1
#* * * * * cd /Users/jakeanderson/research/usa_jobs/ && ./ >> tmp/usa_jobs.log 2>&1
---- bash / zshell ----

python3 -m venv env					# make a new virtual environment
source env/bin/activate				# activate the new virtual environment
pipreqs .							# create the requirements.txt file
pip install -r requirements.txt 	# install all dependencies in virtual environment
deactivate 							# Exit the virtual environment

The python interpreter might look in different places, as well.

pip install pandas
Requirement already satisfied: pandas in /Users/jakeanderson/.pyenv/versions/3.7.6/lib/python3.7/site-packages (0.24.2)

So I need to make sure my cron job is using the correct site packages located in /Users/jakeanderson/.pyenv/versions/3.7.6/lib/python3.7/site-packages

You can get the requirements for a single file using pipreqs!

pipreqs . 
INFO: Successfully saved requirements file in ./requirements.txt


Replaces all instances of “foo” with “bar” in all files in current and all subdirectories

sed -i -- 's/foo/bar/g' **/*(D.)

Merging PDFs on mac

"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/" -o PATH/TO/YOUR/MERGED/FILE.pdf /PATH/TO/ORIGINAL/1.pdf /PATH/TO/ANOTHER/2.pdf /PATH/TO/A/WHOLE/DIR/*.pdf

Useful Aliases / Functions

# I use sublime text, so opens any file using sublime
alias sublime="open -a /Applications/Sublime\"

# tublime means "touch and sublime" for files I want to open in sublime that currently don't exist
function tublime { touch "$@"; sublime "$@"; }

# opens the zshell config file
alias zshconfig="sublime ~/.zshrc"

# Searches for all files in directory (including sub-directories where input text is present)
function grepme { grep -rnw . -e "$@"}

# I keep my notes about my job @ Moloco in a folder and want to open them all at once
alias notes='cd ~jakeanderson/Desktop/moloco_notes && sublime *.md'

# I use the mlh zshell theme from oh-my-zsh