Dev and Admin Tools

From awctech wiki
Jump to: navigation, search

Git

Git Basics

  • the simple guide, not deep: http://rogerdudler.github.io/git-guide/
  • Git Basics on git docs: https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
  • Git sections:
    • working_dir --add or stage--> index or staging_area --commit--> head or .git_dir or local_repo --push--> source_repo
  • git remote -v show remote tracked repos like origin for cloned repos, and upstream for forked repos (note that remotes point to loc of repos, not branches)
  • git status shows contents of local git Index
  • git fetch download objects and refs from another repository, remote-tracking branches are updated
  • git pull fetch from and integrate with another repository or a local branch
  • git config --list check the git config
  • git blame file.txt check who made edits on each line of a file
  • Tagging: a good tutorial for tagging releases

Undoing things

  • git reset -- example.txt unstage a file from staging area after git add
  • git checkout -- file.txt revert local change made in working directory (does not apply to Index) to what is in Head:
  • git rm file.txt removes file from working directory and local git Index
  • git commit --amend edit commit message or add files to commit, see https://www.atlassian.com/git/tutorials/rewriting-history
  • Partial changes, see stackoverflow
    • git checkout -p lets you choose individual hunks from the diff between your working copy and index to revert
    • git add -p allows you to choose hunks to add to the index
    • git reset -p allows you to choose individual hunks from the diff between the index and HEAD to back out of the index

Branching

Basics of Branching/Merging: https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging

Delete branch locally, delete branch on remote, locally prune remote-tracking branches no longer on remote:

$ git branch --delete my_branch
$ git push --delete origin my_branch
$ git pull --prune

Working on other's repos

https://gist.github.com/Chaser324/ce0505fbed06b947d962 - covers forking, keeping fork up to date, branches, and pull requests. Similar info: https://help.github.com/articles/fork-a-repo/ and https://help.github.com/articles/syncing-a-fork/ ... Summary:

Forking a repo after fork in web UI:

$ git clone <git url of my fork of repo>
$ git remote add upstream <git url of orig repo>
$ git remote -v

Syncing a Fork:

$ git fetch upstream
$ git checkout master
$ git merge upstream/master  ## one option
$ git pull --rebase upstream master  ## TBD this option may better suit needs

Push your local master sync'd repo to your remote repo:

$ git push origin master

Merging vs. Rebasing

This is a summary from: https://www.atlassian.com/git/tutorials/merging-vs-rebasing

Check out the feature to edit, then merge master into feature (feature is edited, not master)

$ git checkout feature
$ git merge master

"This creates a new 'merge commit' in the feature branch that ties together the histories of both branches... Merging is nice because it’s a non-destructive operation. The existing branches are not changed in any way."

Check out the feature to edit, then rebase feature to the tip of master (feature is edited, not master)

$ git checkout feature
$ git rebase -i master

"This moves the entire feature branch to begin on the tip of the master branch, effectively incorporating all of the new commits in master. But, instead of using a merge commit, rebasing re-writes the project history by creating brand new commits for each commit in the original branch... The major benefit of rebasing is that you get a much cleaner project history."

Alternate Pip Workflow

This pip workflow allows you to see packages added by the developer versus dependencies.

In requirements-dev.txt, put the first line as ## Packages added by developer followed by all pip packages you add:

$ cat requirements-dev.txt
## Packages added by developer
Flask
pep8

When you are ready to freeze current versions, run the following:

$ pip freeze -r requirements-dev.txt > requirements.txt
$ cat requirements.txt
## Packages added by developer
Flask==0.12.2
pep8==1.7.0
## The following requirements were added by pip freeze:
click==6.7
itsdangerous==0.24
Jinja2==2.9.6
MarkupSafe==1.0
pkg-resources==0.0.0
Werkzeug==0.12.2

When you deploy your project, pip install from requirements.txt

$ pip install -r requirements.txt

Tmux basics

  • From bash: tmux to start session, tmux attach-session -d to reload session
  • Within tmux press ctrl-b then [key combos] to issue a tmux command
  • % " [arrow_keys] to split and navigate screns. Hold down ctrl-b with [arrow_keys] to resize
  • z goes to and returns from full screen
  • c for new desktop, [comma] to rename, [num_keys] to navigate desktops
  • [enter tilde period] key combo to disconnect ssh session and keep tmux layout, bash exit will loose tmux layout
  • [ goes into scroll mode, esc to exit
  • { moves current window around other panes

Unicode

Unicode is way of representing each character, across all world languages, in a Unicode code point (Unicode is not an encoding).

Dog in Unicode is: U+0044 U+006F U+0067

In Python 3 the default 'str' type is unicode

>>> string = 'Café ZerФ'
>>> type(text)
<class 'str'>
>>> string.encode("unicode_escape")
b'Caf\\xe9 Zer\\u0424'

The special characters in string ending each word are U+00E9 and U+0424. Because U+00E9 can be represented in one byte, python displays it as \\xe9. Whether the escape character starts with \\x (for hex) or \\u (for Unicode) they are both Unicode code points in hexidecimal.

For text to be stored in memory or transmitted, its encoded into bytes. UTF-8 encoding is one very common way of storing any of the Unicode code points in memory. For code points 0-127 UTF-8 stores using 8-bit bytes, for 128 and above UTF-8 stores using multiple bytes (up to 6 bytes). UTF-8 is a variable-length encoding. In Python 3 UTF-8 is the default so .encode() is the same as .encode('utf-8')

>>> byte_string = string.encode()
>>> print(byte_string)
b'Caf\xc3\xa9 Zer\xd0\xa4'
>>> type('byte_string')
<class 'bytes'>

The escape codes \xc3\xa9 show the hexidecimal bytes for é. byte_string can now be transmitted over the network.

When getting data over the network, it will be in bytes, and needs to be decoded into Unicode.

>>> data = socket.recv(8022)
>>> type(data)
<class 'bytes'>
>>> x = data.decode()
>>> type(x)
<class 'str'>

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

What every programmer absolutely, positively needs to know about encodings and character sets to work with text

Apache notes

  • VirtualHost block in httpd.conf allows for server config specific to a subdomain (or even separate web sites / IPs) - wiki.example.com
  • Directory block in httpd.conf allows for server config specific to a directory - example.com/wiki/
  • .htaccess files are distributed (sitting alongside .html files) configuration files and perform the same task as the Directory block in httpd.conf, but slower since they are loaded on each page request. Should only be used when there is no access to the Apache server.

Selenium Python

API to write automated Selenium WebDriver browser tests with Python

Install pip: http://www.cyberciti.biz/faq/debian-ubuntu-centos-rhel-linux-install-pipclient/

Full Selenium Python documentation: http://selenium-python.readthedocs.org

Saucelabs can allow Selenium scripts to run on their wide variety OS/mobile/browser setups: https://docs.saucelabs.com/tutorials/python/

Mediawiki

Helpful Links

General MediaWiki.org Links

Docker

Setup

https://docs.docker.com/install/


To avoid needing sudo for docker commands, check if docker group exists, then add current user to docker group, then logout/login

$ cat /etc/group | grep docker
$ sudo usermod -aG docker $USER

Basic Commands

Show local docker images, list running containers, list all containers

$ docker images
$ docker ps
$ docker ps -a

Create container from image and run command, create container from image and open terminal

$ docker run -t ubuntu pwd
$ docker run -it ubuntu /bin/bash

Create container from image and open terminal (give it a name, and tell it to remove container on exit) see: https://hackernoon.com/docker-run-vs-exec-deep-dive-into-their-differences-19a1041735a3

$ docker run --name ubuntu_bash --rm -i -t ubuntu bash

Open terminal on running container with name ubuntu_bash, 2 options

$ docker exec -it ubuntu_bash bash
$ docker exec -it ubuntu_bash /bin/bash

Remove containers

From https://www.digitalocean.com/community/tutorials/how-to-remove-docker-images-containers-and-volumes

Docker provides a single command that will clean up any resources — images, containers, volumes, and networks — that are dangling (not associated with a container):

docker system prune

To additionally remove any stopped containers and all unused images (not just dangling images), add the -a flag to the command:

docker system prune -a

Other Commands

See: https://blog.syncano.io/getting-started-with-docker/


Get info/config/mounted volumes via inspect

$ docker inspect <docker_container_id or name>

Show what container prints to STDIO and STDOUT

$ docker logs <docker_container_id or name>

Show processes running on container

$ docker top <docker_container_id or name>

PostgreSQL

Commands

Connect to postgres command line, connect and open database, switch to postgres user and connect (psql command line docs)

$ psql
$ psql my_database
$ sudo -u postgres psql

A few commands inside the psql command line:

  • \l List databases
  • \c my_database Connect to database
  • \dt List tables in current database, \dt+ gives size
  • \d my_table List data types of columns in table
  • SELECT * FROM my_table; Run SQL statements by entering them individually or in succession and terminate with semicolon to execute query
  • \q Quit psql
  • DROP DATABASE my_database; Delete database
  • \H Turns on/off HTML table query output format
  • SELECT * FROM my_table \watch Display and continually refresh query results

Database dump

 $ pg_dump my_database > /home/user/db-dump.sql 

View Queries via Log

See queries in real time by changing log level: http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-STATEMENT

DbMaintain

Mark error script performed

"Task that indicates that the failed script was manually performed. The script will NOT be run again in the next update. No scripts will be executed by this task."

If on a database update a script runs and partially executes database changes before having an error: You can fix script, manually execute the rest of the sql in the script, and tell dbmaintain the script has been performed

$ ./dbmaintain.sh markErrorScriptPerformed scripts/01_v1.0/01_products_and_orders.sql 

Mark the database as up-to-date

"This operation updates the state of the database to indicate that all scripts have been executed, without actually executing them. This can be useful when you want to start using DbMaintain on an existing database, or after having fixed a problem directly on the database."

In the dbmaintain_scripts table it updates the executed_at column to the current datetime for all scripts (undesirable).

Can be used if there is "irregular script update" exception, but better to fix the script (change name, update md5 checksum in database if same content, etc), and then update database.

Can be used if the database scripts were renamed after an update and dbmaintain returns "Renamed indexed script", but may be better to mark error script performed.

Mark error script reverted

"Task that indicates that the failed script was manually reverted. The script will be run again in the next update. No scripts will be executed by this task."

Also you can fix script, manually rollback changes, and use markErrorScriptReverted to tell dbmaintain the script can be re-run on database update.

See more DbMaintain operations: http://www.dbmaintain.org/tutorial.html#DbMaintain_operations path/to/scriptFolderOrArchive

Django

Django Quickstart

Quickstart mainly adapted from http://www.django-rest-framework.org/tutorial/quickstart/

# Create the project directory
$ mkdir myproj
$ cd myproj

# Create a virtualenv to isolate our package dependencies locally
$ virtualenv venv
$ source venv/bin/activate
 
# Install Django into the virtualenv
$ pip install django

# Set up a new project with a single application inside mysite dir
$ django-admin.py startproject mysite .  # Note the trailing '.' character
$ cd mysite
$ django-admin.py startapp myapp   # or $ python ../manage.py startapp myapp

# sync your database for the first time
$ cd ..
$ python manage.py migrate

# create superuser
$ python manage.py createsuperuser

Start development server and login to admin with superuser creds: http://127.0.0.1:8000/admin/

$ python manage.py runserver

Create app view, create app urls, edit site urls. See https://docs.djangoproject.com/en/2.0/intro/tutorial01/

$ vim myapp/views.py 
$ vim myapp/urls.py 
$ vim mysite/urls.py 

Add models, and add myapp to mysite

$ vim myapp/models.py
$ vim mysite/settings.py  # to INSTALLED_APPS add: 'myapp.apps.MyappConfig',


Make a migration, see the sql that will be executed, check if issues without touching database. Then perform migration. See https://docs.djangoproject.com/en/2.0/intro/tutorial02/

# migrations after model creation/edit
$ python manage.py makemigrations myapp
$ python manage.py sqlmigrate myapp 0001  # use the number found in myapp/migrations/
$ python manage.py check

# execute sql, perform migration
$ python manage.py migrate


Use shell to interact with objects like in the code. Example: q_list = Question.objects.filter(question_text__contains='how')

$ python manage.py shell


See this tutorial section for more info on views: https://docs.djangoproject.com/en/2.0/intro/tutorial03/

"The template system uses dot-lookup syntax to access variable attributes. In the example of...
{{ question.question_text }}
first Django does a dictionary lookup on the object question. Failing that, it tries an attribute lookup – which works, in this case. If attribute lookup had failed, it would’ve tried a list-index lookup."

SQL Basics

Good introduction text: http://philip.greenspun.com/sql/

SELECT Examples

Every SELECT statement on source table(s) yields a result table:

sales=> SELECT loc_id, part_id, quant AS amount FROM inventory;
 loc_id | part_id | amount
--------+---------+--------
      1 |       1 |     17
      2 |       1 |     17

SELECT examples

SELECT * FROM animals LIMIT 10;
SELECT * FROM animals WHERE species = 'orangutan' ORDER BY birthdate;
SELECT MAX(name) FROM animals;
SELECT name, loc FROM animals WHERE species = 'orangutan' ORDER BY birthdate DESC;
SELECT species, MIN(birthdate) FROM animals GROUP BY species;

GROUP BY makes the aggregate happen not over all rows, but just those sharing common value. Use OFFSET for pagination. DESC makes the ORDER BY sort descending.

SELECT name, COUNT(*) AS num FROM animals
  WHERE species != 'armadillo'
  GROUP BY name
  ORDER BY num DESC
  LIMIT 10 OFFSET 20;"

HAVING filters result table, WHERE filters source table (and used before GROUP BY)

SELECT species, COUNT(*) AS num
  FROM animals GROUP BY species HAVING num = 1

JOIN Examples

Inner vs. Outer Joins

  • Inner join (default for the JOIN keyword) gives only rows where the tables have entries matching the join condition. Implicit join does not use JOIN keyword, but executes an inner join
  • Left outer join (LEFT JOIN) returns same as above plus rows where left has entry and right doesn’t (RIGHT JOIN does the opposite)

Implicit vs. Explicit Inner Joins

Implicit join over 2 tables:

SELECT name, animals.species
  FROM animals, diet
  WHERE animals.species = diet.species AND diet.food = 'fish'; 


Explicit join over 2 tables:

SELECT name, animals.species
  FROM animals JOIN diet
    ON animals.species = diet.species
  WHERE diet.food = 'fish';


Implicit join over 3 tables:

SELECT ordernames.name, COUNT(*) AS num_of_animals
  FROM animals, taxonomy, ordernames
  WHERE animals.species = taxonomy.name AND taxonomy.t_order = ordernames.t_order
  GROUP BY ordernames.name
  ORDER BY num_of_animals DESC;


Explicit join over 3 tables (can be harder to read):

SELECT ordernames.name, count(*) AS num_of_animals
  FROM (animals JOIN taxonomy ON animals.species = taxonomy.name) AS ani_tax
    JOIN ordernames
    ON ani_tax.t_order = ordernames.t_order
  GROUP BY ordernames.name
  ORDER BY num_of_animals DESC;

Self Join

Query finds members who have membership in 2 gyms. For a member of gyms 1 and 2, a.gym_id < b.gym_id prevents displaying 4 rows (1 1, 1 2, 2 1, and 2 2) and displays just one row (1 2).

SELECT a.gym_id, b.gym_id, a.mem_id
  FROM memberships AS a, memberships AS b
  WHERE a.mem_id = b.mem_id
    AND a.gym_id < b.gym_id;

Outer Join

Ensures that inventory items without any entries in sales still show (as 0) in results

SELECT inventory.name, COUNT(sales.inv_id) AS sales_count
  FROM inventory LEFT JOIN sales
    ON inventory.id = sales.inv_id
  GROUP BY inventory.id;

CREATE, INSERT, UPDATE Examples

Create and drop a database

CREATE DATABASE my_database;
DROP DATABASE my_database;

Create a table with primary key in single and multiple fields. PostgreSQL CREATE TABLE examples

CREATE TABLE parts (
  id            SERIAL PRIMARY KEY,
  description   VARCHAR(40));
CREATE TABLE warehouse_inventory (
  warehouse_id  INTEGER REFERENCES warehouses (id), 
  part_id       INTEGER REFERENCES parts (id), 
  amount        INTEGER,
  PRIMARY KEY (warehouse_id, part_id));

Insert and update

INSERT INTO animals (name, species, birthdate)
  VALUES ('Barker', 'opossum', '2018-04-11'); 
UPDATE posts
  SET content = 'better content'
  WHERE id = 120;

Python DB-API Examples

  • DB-API is a standard for python to talk to database libraries that talk to databases
  • DB-API module libs include: sqlite3, psycopg2 (PostgreSQL), pyodbc (ODBC), mysql.connector
  • DB-API basic steps: connect, create a cursor pass commands, execute SQL command(s), commit to make changes, fetchall to retrieve SELECT results, close connection

Select example. Note: bleach.clean cleans up script injection attacks (via script tags insider user text), bleach.linkify tells scrapers to not follow links posted in your web app.

import psycopg2
import bleach
DBNAME = 'forum'

def get_posts():
    db = psycopg2.connect(database=DBNAME)
    curr = db.cursor()
    curr.execute("SELECT content, time FROM posts ORDER BY time DESC;")
    posts = curr.fetchall()
    db.close()
    return [(bleach.clean(bleach.linkify(post)), time) for (post, time) in posts]

Insert/update example. Note the "... VALUES (%s, %s)", [var1, var2]) pattern inserts text safely into the query preventing SQL injection attacks (via SQL commands inside user text).

def add_post(content):
    content = bleach.clean(bleach.linkify(content))
    db = psycopg2.connect(database=DBNAME)
    curr = db.cursor()
    curr.execute("INSERT INTO posts (content) VALUES (%s);", [content])
    curr.execute("UPDATE tracking SET last_post_date = datetime.datetime.now() WHERE user_id = session['user_id'];")
    db.commit()
    db.close()

Transactions are atomic - each transaction must be successful during a commit or everything is rolled back - "all or nothing".

Views and Subqueries

Views

A view is a select query stored in the db and can be used like a table (similar to reusing a function in code)

Aggregation view

CREATE VIEW my_view AS
  SELECT warehouse_id, COUNT(*) AS num_diff_items
    FROM inventory
    GROUP BY warehouse_id;

Can also do a subset view which limits the number of rows and/or columns displayed for a single table. A database like PostgreSQL allows you update and delete a simple subset view.

Subqueries

Subqueries are queries inside of queries, PostgreSQL examples below.

FROM example. Can select from a table that is result of a select

SELECT AVG(topscore) FROM
  (SELECT MAX(score) AS topscore
    FROM my_table
    GROUP BY team)
  AS maxes;
SELECT name, weight 
  FROM players, (SELECT AVG(weight) AS av FROM players)
  WHERE weight < av;

Scalar example

SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
  FROM states;

EXISTS example

SELECT col1
  FROM tab1
  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);

If the subquery returns at least one row, the result of EXISTS is “true”, otherwise “false”

Normalized / De-normalized / NoSQL

Normalized is the standard SQL paradigm for relational/structured data, it reduces duplication by splitting data across many tables.

De-normalized within SQL duplicates data and combines tables to make queries faster without joins... however database indexes (for faster searches) and views can be better choices.

NoSQL is data storage that is non-relational/unstructured and not considered normalized. NoSQL can store data as key-value pairs, or object collections of documents similar to JSON - MongoDB stores in BSON, or graph data where the relations use interconnected elements.

Normalized design rules

  1. Each row has same number of columns
  2. There is a unique key, everything in row says something about the key
  3. Facts that don’t relate to the key belong in different tables
    • item | loc | address address doesn’t relate to item (and has higher duplication)
  4. Tables shouldn’t imply relationships that don’t exist
    • person | city_visited | lang_spoken should be 2 tables since person to each is 1-to-many

Notes

  • String and data values should be inside single quotes
  • != and <> are equivalent
  • SERIAL numeric type is an autoincrementing INTEGER data type
  • Good introductory reference: Udacity Intro to Relational Databases