Dev and Admin Tools

From awctech wiki
Jump to: navigation, search

Git Basics

New repo

If new repo was created in web UI, to add local files to the new project:

  • git init create a new git local repo in current directory
  • git remote add origin [remote repository URL] sets the new remote repository
  • git remote -v verifies the new remote URL
  • git push origin master push local repo changes to remote repo you specified as the origin

File editing

working_dir --add--> index --commit--> head --push--> source_repo

  • git pull pulls changes from remote repo and merges any pulled commits into your current local branch, also consider git fetch
  • git add file.txt stages edited file to the local git Index
  • git checkout -- file.txt revert local change made in working directory (does not apply to Index) to what is in Head:
  • git rm file.txt removes file from working directory and local git Index
  • git status shows contents of local git Index
  • git commit -m "Commit Message" commits all staged changes to the local git Head
  • git push origin master pushes all commits to the source repo

Partial changes

From stackoverflow: git checkout -p lets you choose individual hunks from the diff between your working copy and index to revert. git add -p allows you to choose hunks to add to the index. git reset -p allows you to choose individual hunks from the diff between the index and HEAD to back out of the index.

Edit another user's repo

Example workflow for editing another user's repo:

  • Fork other_user's repo in UI and clone locally: local repo <----> remote forked repo (origin) <----> other_user's repo (upstream)
  • git remote add upstream [other_user's repo url] create upstream remote to original repo (note that remotes point to loc of repos, not branches)
  • git fetch upstream pull a cache of upstream
  • git checkout -b FEATURE_X create a new branch named "FEATURE_X" and switch to it
  • git push origin FEATURE_X update locally and push to remote forked repo the local branch FEATURE_X
  • Create pull request in UI to upstream (other_user/master), assume accepted/merged
  • git checkout master switch back to local master
  • git pull --rebase upstream master update local master with other_user's master
  • git push origin master push local master to remote repository


Basics of Branching/Merging:

Delete branch locally, delete branch on remote, locally prune remote-tracking branches no longer on remote:

$ git branch --delete my_branch
$ git push --delete origin my_branch
$ git pull --prune


Simple guide:

Check the git config: git config --list

Check who made edits on each line of a file: git blame file.txt

Alternate Pip Workflow

This pip workflow allows you to see packages added by the developer versus dependencies.

In requirements-dev.txt, put the first line as ## Packages added by developer followed by all pip packages you add:

$ cat requirements-dev.txt
## Packages added by developer

When you are ready to freeze current versions, run the following:

$ pip freeze -r requirements-dev.txt > requirements.txt
$ cat requirements.txt
## Packages added by developer
## The following requirements were added by pip freeze:

When you deploy your project, pip install from requirements.txt

$ pip install -r requirements.txt

Tmux basics

  • From bash: tmux to start session, tmux attach-session -d to reload session
  • Within tmux press ctrl-b then [key combos] to issue a tmux command
  • % " [arrow_keys] to split and navigate screns. Hold down ctrl-b with [arrow_keys] to resize
  • z goes to and returns from full screen
  • c for new desktop, [comma] to rename, [num_keys] to navigate desktops
  • [enter tilde period] key combo to disconnect ssh session and keep tmux layout, bash exit will loose tmux layout
  • [ goes into scroll mode, esc to exit
  • { moves current window around other panes


Unicode is way of representing each character, across all world languages, in a Unicode code point (Unicode is not an encoding).

Dog in Unicode is: U+0044 U+006F U+0067

In Python 3 the default 'str' type is unicode

>>> string = 'Café ZerФ'
>>> type(text)
<class 'str'>
>>> string.encode("unicode_escape")
b'Caf\\xe9 Zer\\u0424'

The special characters in string ending each word are U+00E9 and U+0424. Because U+00E9 can be represented in one byte, python displays it as \\xe9. Whether the escape character starts with \\x (for hex) or \\u (for Unicode) they are both Unicode code points in hexidecimal.

For text to be stored in memory or transmitted, its encoded into bytes. UTF-8 encoding is one very common way of storing any of the Unicode code points in memory. For code points 0-127 UTF-8 stores using 8-bit bytes, for 128 and above UTF-8 stores using multiple bytes (up to 6 bytes). UTF-8 is a variable-length encoding. In Python 3 UTF-8 is the default so .encode() is the same as .encode('utf-8')

>>> byte_string = string.encode()
>>> print(byte_string)
b'Caf\xc3\xa9 Zer\xd0\xa4'
>>> type('byte_string')
<class 'bytes'>

The escape codes \xc3\xa9 show the hexidecimal bytes for é. byte_string can now be transmitted over the network.

When getting data over the network, it will be in bytes, and needs to be decoded into Unicode.

>>> data = socket.recv(8022)
>>> type(data)
<class 'bytes'>
>>> x = data.decode()
>>> type(x)
<class 'str'>

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

What every programmer absolutely, positively needs to know about encodings and character sets to work with text

Apache notes

  • VirtualHost block in httpd.conf allows for server config specific to a subdomain (or even separate web sites / IPs) -
  • Directory block in httpd.conf allows for server config specific to a directory -
  • .htaccess files are distributed (sitting alongside .html files) configuration files and perform the same task as the Directory block in httpd.conf, but slower since they are loaded on each page request. Should only be used when there is no access to the Apache server.

Selenium Python

API to write automated Selenium WebDriver browser tests with Python

Install pip:

Full Selenium Python documentation:

Saucelabs can allow Selenium scripts to run on their wide variety OS/mobile/browser setups:


Helpful Links

General Links



To avoid needing sudo for docker commands, check if docker group exists, then add current user to docker group, then logout/login

$ cat /etc/group | grep docker
$ sudo usermod -aG docker $USER

Basic Commands

Show local docker images, list running containers, list all containers

$ docker images
$ docker ps
$ docker ps -a

Create container from image and run command, create container from image and open terminal

$ docker run -t ubuntu pwd
$ docker run -it ubuntu /bin/bash

Create container from image and open terminal (give it a name, and tell it to remove container on exit) see:

$ docker run --name ubuntu_bash --rm -i -t ubuntu bash

Open terminal on running container with name ubuntu_bash

$ docker exec -it ubuntu_bash bash

Other Commands


Get info/config/mounted volumes via inspect

$ docker inspect <docker_container_id or name>

Show what container prints to STDIO and STDOUT

$ docker logs <docker_container_id or name>

Show processes running on container

$ docker top <docker_container_id or name>



Connect to postgres command line, connect and open database, switch to postgres user and connect (psql command line docs)

$ psql
$ psql my_database
$ sudo -u postgres psql

A few commands inside the psql command line:

  • \l List databases
  • \c my_database Connect to database
  • \dt List tables in current database, \dt+ gives size
  • \d my_table List data types of columns in table
  • SELECT * FROM my_table; Run SQL statements by entering them individually or in succession and terminate with semicolon to execute query
  • \q Quit psql
  • DROP DATABASE my_database; Delete database
  • \H Turns on/off HTML table query output format
  • SELECT * FROM my_table \watch Display and continually refresh query results

Database dump

 $ pg_dump my_database> /home/user/db-dump.sql 

View Queries via Log

See queries in real time by changing log level:


Mark error script performed

"Task that indicates that the failed script was manually performed. The script will NOT be run again in the next update. No scripts will be executed by this task."

If on a database update a script runs and partially executes database changes before having an error: You can fix script, manually execute the rest of the sql in the script, and tell dbmaintain the script has been performed

$ ./ markErrorScriptPerformed scripts/01_v1.0/01_products_and_orders.sql 

Mark the database as up-to-date

"This operation updates the state of the database to indicate that all scripts have been executed, without actually executing them. This can be useful when you want to start using DbMaintain on an existing database, or after having fixed a problem directly on the database."

In the dbmaintain_scripts table it updates the executed_at column to the current datetime for all scripts (undesirable).

Can be used if there is "irregular script update" exception, but better to fix the script (change name, update md5 checksum in database if same content, etc), and then update database.

Can be used if the database scripts were renamed after an update and dbmaintain returns "Renamed indexed script", but may be better to mark error script performed.

Mark error script reverted

"Task that indicates that the failed script was manually reverted. The script will be run again in the next update. No scripts will be executed by this task."

Also you can fix script, manually rollback changes, and use markErrorScriptReverted to tell dbmaintain the script can be re-run on database update.

See more DbMaintain operations: path/to/scriptFolderOrArchive

SQL Basics

Good introduction text:

SELECT Examples

Every SELECT statement on source table(s) yields a result table:

sales=> SELECT loc_id, part_id, quant AS amount FROM inventory;
 loc_id | part_id | amount
      1 |       1 |     17
      2 |       1 |     17

SELECT examples

SELECT * FROM animals LIMIT 10;
SELECT * FROM animals WHERE species = 'orangutan' ORDER BY birthdate;
SELECT MAX(name) FROM animals;
SELECT name, loc FROM animals WHERE species = 'orangutan' ORDER BY birthdate DESC;
SELECT species, MIN(birthdate) FROM animals GROUP BY species;

GROUP BY makes the aggregate happen not over all rows, but just those sharing common value. Use OFFSET for pagination. DESC makes the ORDER BY sort descending.

SELECT name, COUNT(*) AS num FROM animals
  WHERE species != 'armadillo'
  GROUP BY name
  LIMIT 10 OFFSET 20;"

HAVING filters result table, WHERE filters source table (and used before GROUP BY)

SELECT species, COUNT(*) AS num
  FROM animals GROUP BY species HAVING num = 1

JOIN Examples

Inner vs. Outer Joins

  • Inner join (default for the JOIN keyword) gives only rows where the tables have entries matching the join condition. Implicit join does not use JOIN keyword, but executes an inner join
  • Left outer join (LEFT JOIN) returns same as above plus rows where left has entry and right doesn’t (RIGHT JOIN does the opposite)

Implicit vs. Explicit Inner Joins

Implicit join over 2 tables:

SELECT name, animals.species
  FROM animals, diet
  WHERE animals.species = diet.species AND = 'fish'; 

Explicit join over 2 tables:

SELECT name, animals.species
  FROM animals JOIN diet
    ON animals.species = diet.species
  WHERE = 'fish';

Implicit join over 3 tables:

SELECT, COUNT(*) AS num_of_animals
  FROM animals, taxonomy, ordernames
  WHERE animals.species = AND taxonomy.t_order = ordernames.t_order
  ORDER BY num_of_animals DESC;

Explicit join over 3 tables (can be harder to read):

SELECT, count(*) AS num_of_animals
  FROM (animals JOIN taxonomy ON animals.species = AS ani_tax
    JOIN ordernames
    ON ani_tax.t_order = ordernames.t_order
  ORDER BY num_of_animals DESC;

Self Join

Query finds members who have membership in 2 gyms. For a member of gyms 1 and 2, a.gym_id < b.gym_id prevents displaying 4 rows (1 1, 1 2, 2 1, and 2 2) and displays just one row (1 2).

SELECT a.gym_id, b.gym_id, a.mem_id
  FROM memberships AS a, memberships AS b
  WHERE a.mem_id = b.mem_id
    AND a.gym_id < b.gym_id;

Outer Join

Ensures that inventory items without any entries in sales still show (as 0) in results

SELECT, COUNT(sales.inv_id) AS sales_count
  FROM inventory LEFT JOIN sales
    ON = sales.inv_id


Create and drop a database

CREATE DATABASE my_database;
DROP DATABASE my_database;

Create a table with primary key in single and multiple fields. PostgreSQL CREATE TABLE examples

  id            SERIAL PRIMARY KEY,
  description   VARCHAR(40));
CREATE TABLE warehouse_inventory (
  warehouse_id  INTEGER REFERENCES warehouses (id), 
  part_id       INTEGER REFERENCES parts (id), 
  amount        INTEGER,
  PRIMARY KEY (warehouse_id, part_id));

Insert and update

INSERT INTO animals (name, species, birthdate)
  VALUES ('Barker', 'opossum', '2018-04-11'); 
UPDATE posts
  SET content = 'better content'
  WHERE id = 120;

Python DB-API Examples

  • DB-API is a standard for python to talk to database libraries that talk to databases
  • DB-API module libs include: sqlite3, psycopg2 (PostgreSQL), pyodbc (ODBC), mysql.connector
  • DB-API basic steps: connect, create a cursor pass commands, execute SQL command(s), commit to make changes, fetchall to retrieve SELECT results, close connection

Select example. Note: bleach.clean cleans up script injection attacks (via script tags insider user text), bleach.linkify tells scrapers to not follow links posted in your web app.

import psycopg2
import bleach
DBNAME = 'forum'

def get_posts():
    db = psycopg2.connect(database=DBNAME)
    curr = db.cursor()
    curr.execute("SELECT content, time FROM posts ORDER BY time DESC;")
    posts = curr.fetchall()
    return [(bleach.clean(bleach.linkify(post)), time) for (post, time) in posts]

Insert/update example. Note the "... VALUES (%s, %s)", [var1, var2]) pattern inserts text safely into the query preventing SQL injection attacks (via SQL commands inside user text).

def add_post(content):
    content = bleach.clean(bleach.linkify(content))
    db = psycopg2.connect(database=DBNAME)
    curr = db.cursor()
    curr.execute("INSERT INTO posts (content) VALUES (%s);", [content])
    curr.execute("UPDATE tracking SET last_post_date = WHERE user_id = session['user_id'];")

Transactions are atomic - each transaction must be successful during a commit or everything is rolled back - "all or nothing".

Views and Subqueries


A view is a select query stored in the db and can be used like a table (similar to reusing a function in code)

Aggregation view

  SELECT warehouse_id, COUNT(*) AS num_diff_items
    FROM inventory
    GROUP BY warehouse_id;

Can also do a subset view which limits the number of rows and/or columns displayed for a single table. A database like PostgreSQL allows you update and delete a simple subset view.


Subqueries are queries inside of queries, PostgreSQL examples below.

FROM example. Can select from a table that is result of a select

  (SELECT MAX(score) AS topscore
    FROM my_table
    GROUP BY team)
  AS maxes;
SELECT name, weight 
  FROM players, (SELECT AVG(weight) AS av FROM players)
  WHERE weight < av;

Scalar example

SELECT name, (SELECT max(pop) FROM cities WHERE cities.state =
  FROM states;

EXISTS example

  FROM tab1
  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);

If the subquery returns at least one row, the result of EXISTS is “true”, otherwise “false”

Normalized / De-normalized / NoSQL

Normalized is the standard SQL paradigm for relational/structured data, it reduces duplication by splitting data across many tables.

De-normalized within SQL duplicates data and combines tables to make queries faster without joins... however database indexes (for faster searches) and views can be better choices.

NoSQL is data storage that is non-relational/unstructured and not considered normalized. NoSQL can store data as key-value pairs, or object collections of documents similar to JSON - MongoDB stores in BSON, or graph data where the relations use interconnected elements.

Normalized design rules

  1. Each row has same number of columns
  2. There is a unique key, everything in row says something about the key
  3. Facts that don’t relate to the key belong in different tables
    • item | loc | address address doesn’t relate to item (and has higher duplication)
  4. Tables shouldn’t imply relationships that don’t exist
    • person | city_visited | lang_spoken should be 2 tables since person to each is 1-to-many


  • String and data values should be inside single quotes
  • != and <> are equivalent
  • SERIAL numeric type is an autoincrementing INTEGER data type
  • Good introductory reference: Udacity Intro to Relational Databases