feat: restructure docs into "chapters" (#12)

* feat(docker, k8s): create containers folder and kubernetes notes
This commit is contained in:
Marcello 2023-03-13 11:33:51 +00:00
parent b1cb858508
commit 2725e3cb70
92 changed files with 777 additions and 367 deletions

View file

@ -0,0 +1,167 @@
# [Beautiful Soup Library](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
## Making the Soup
```py
from bs4 import BeautifulSoup
import requests
import lxml # better html parser than built-in
response = requests.get("url") # retrieve a web page
soup = BeautifulSoup(response.text, "html.parser") # parse HTML from response w/ python default HTML parser
soup = BeautifulSoup(response.text, "lxml") # parse HTML from response w/ lxml parser
soup.prettify() # prettify parsed HTML for display
```
## Kinds of Objects
Beautiful Soup transforms a complex HTML document into a complex tree of Python objects.
### Tag
A Tag object corresponds to an XML or HTML tag in the original document
```py
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser') # parse HTML/XML
tag = soup.b
type(tag) # <class 'bs4.element.Tag'>
print(tag) # <b class="boldest">Extremely bold</b>
tag.name # tag name
tag["attribute"] # access to tag attribute values
tag.attrs # dict of attribue-value pairs
```
### Navigable String
A string corresponds to a bit of text within a tag. Beautiful Soup uses the `NavigableString` class to contain these bits of text.
## Navigating the Tree
### Going Down
```py
soup.<tag>.<child_tag> # navigate using tag names
<tag>.contents # direct children as a list
<tag>.children # direct children as a generator for iteration
<tag>.descendants # iterator over all children, recursive
<tag>.string # tag contents, does not have further children
# If a tag's only child is another tag, and that tag has a .string, then the parent tag is considered to have the same .string as its child
# If a tag contains more than one thing, then it's not clear what .string should refer to, so .string is defined to be None
<tag>.strings # generator to iterate over all children's strings (will list white space)
<tag>.stripped_strings # generator to iterate over all children's strings (will NOT list white space)
```
### Going Up
```py
<tag>.parent # tags direct parent (BeautifulSoup has parent None, html has parent BeautifulSoup)
<tag>.parents # iterable over all parents
```
### Going Sideways
```py
<tag>.previous_sibling
<tag>.next_sibling
<tag>.previous_siblings
<tag>.next_siblings
```
### Going Back and Forth
```py
<tag>.previous_element # whatever was parsed immediately before
<tag>.next_element # whatever was parsed immediately afterwards
<tag>.previous_elements # whatever was parsed immediately before as a list
<tag>.next_elements # whatever was parsed immediately afterwards as a list
```
## Searching the Tree
## Filter Types
```py
soup.find_all("tag") # by name
soup.find_all(["tag1", "tag2"]) # multiple tags in a list
soup.find_all(function) # based on a bool function
soup.find_all(True) # Match everything
```
## Methods
Methods arguments:
- `name` (string): tag to search for
- `attrs` (dict): attribute-value pai to search for
- `string` (string): search by string contents rather than by tag
- `limit` (int). limit number of results
- `**kwargs`: be turned into a filter on one of a tag's attributes.
```py
find_all(name, attrs, recursive, string, limit, **kwargs) # several results
find(name, attrs, recursive, string, **kwargs) # one result
find_parents(name, attrs, string, limit, **kwargs) # several results
find_parent(name, attrs, string, **kwargs) # one result
find_next_siblings(name, attrs, string, limit, **kwargs) # several results
find_next_sibling(name, attrs, string, **kwargs) # one result
find_previous_siblings(name, attrs, string, limit, **kwargs) # several results
find_previous_sibling(name, attrs, string, **kwargs) # one result
find_all_next(name, attrs, string, limit, **kwargs) # several results
find_next(name, attrs, string, **kwargs) # one result
find_all_previous(name, attrs, string, limit, **kwargs) # several results
find_previous(name, attrs, string, **kwargs) # one result
soup("html_tag") # same as soup.find_all("html_tag")
soup.find("html_tag").text # text of the found tag
soup.select("css_selector") # search for CSS selectors of HTML tags
```
## Modifying the Tree
### Changing Tag Names an Attributes
```py
<tag>.name = "new_html_tag" # modify the tag type
<tag>["attribute"] = "value" # modify the attribute value
del <tag>["attribute"] # remove the attribute
soup.new_tag("name", <attribute> = "value") # create a new tag with specified name and attributes
<tag>.string = "new content" # modify tag text content
<tag>.append(item) # append to Tag content
<tag>.extend([item1, item2]) # add every element of the list in order
<tag>.insert(position: int, item) # like .insert in Python list
<tag>.insert_before(new_tag) # insert tags or strings immediately before something else in the parse tree
<tag>.insert_after(new_tag) # insert tags or strings immediately before something else in the parse tree
<tag>.clear() # remove all tag's contents
<tag>.extract() # extract and return the tag from the tree (operates on self)
<tag>.string.extract() # extract and return the string from the tree (operates on self)
<tag>.decompose() # remove a tag from the tree, then completely destroy it and its contents
<tag>.decomposed # check if tag has be decomposed
<tag>.replace_with(item) # remove a tag or string from the tree, and replaces it with the tag or string of choice
<tag>.wrap(other_tag) # wrap an element in the tag you specify, return the new wrapper
<tag>.unwrap() # replace a tag with whatever's inside, good for stripping out markup
<tag>.smooth() # clean up the parse tree by consolidating adjacent strings
```

View file

@ -0,0 +1,328 @@
# NumPy Lib
## MOST IMPORTANT ATTRIBUTES ATTRIBUTES
```py
array.ndim # number of axes (dimensions) of the array
array.shape # dimensions of the array, tuple of integers
array.size # total number of elements in the array
array.itemsize # size in bytes of each element
array.data # buffer containing the array elements
```
## ARRAY CREATION
Unless explicitly specified `np.array` tries to infer a good data type for the array that it creates.
The data type is stored in a special dtype object.
```py
var = np.array(sequence) # creates array
var = np.asarray(sequence) # convert input to array
var = np.ndarray(*sequence) # creates multidimensional array
var = np.asanyarray(*sequence) # convert the input to an ndarray
# nested sequences will be converted to multidimensional array
var = np.zeros(ndarray.shape) # array with all zeros
var = np.ones(ndarray.shape) # array with all ones
var = np.empty(ndarray.shape) # array with random values
var = np.identity(n) # identity array (n x n)
var = np.arange(start, stop, step) # creates an array with parameters specified
var = np.linspace(start, stop, num_of_elements) # step of elements calculated based on parameters
```
## DATA TYPES FOR NDARRAYS
```py
var = array.astype(np.dtype) # copy of the array, cast to a specified type
# return TypeError if casting fails
```
The numerical `dtypes` are named the same way: a type name followed by a number indicating the number of bits per element.
| TYPE | TYPE CODE | DESCRIPTION |
|-----------------------------------|--------------|--------------------------------------------------------------------------------------------|
| int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types |
| int16, uint16 | i2, u2 | Signed and unsigned 16-bit integer types |
| int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types |
| int64, uint64 | i8, u8 | Signed and unsigned 32-bit integer types |
| float16 | f2 | Half-precision floating point |
| float32 | f4 or f | Standard single-precision floating point. Compatible with C float |
| float64, float128 | f8 or d | Standard double-precision floating point. Compatible with C double and Python float object |
| float128 | f16 or g | Extended-precision floating point |
| complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two 32, 64, or 128 floats, respectively |
| bool | ? | Boolean type storing True and False values |
| object | O | Python object type |
| string_ | `S<num>` | Fixed-length string type (1 byte per character), `<num>` is string length |
| unicode_ | `U<num>` | Fixed-length unicode type, `<num>` is length |
## OPERATIONS BETWEEN ARRAYS AND SCALARS
Any arithmetic operations between equal-size arrays applies the operation element-wise.
array `+` scalar --> element-wise addition (`[1, 2, 3] + 2 = [3, 4, 5]`)
array `-` scalar --> element-wise subtraction (`[1 , 2, 3] - 2 = [-2, 0, 1]`)
array `*` scalar --> element-wise multiplication (`[1, 2, 3] * 3 = [3, 6, 9]`)
array / scalar --> element-wise division (`[1, 2, 3] / 2 = [0.5 , 1 , 1.5]`)
array_1 `+` array_2 --> element-wise addition (`[1, 2, 3] + [1, 2, 3] = [2, 4, 6]`)
array_1 `-` array_2 --> element-wise subtraction (`[1, 2, 4] - [3 , 2, 1] = [-2, 0, 2]`)
array_1 `*` array_2 --> element-wise multiplication (`[1, 2, 3] * [3, 2, 1] = [3, 4, 3]`)
array_1 `/` array_2 --> element-wise division (`[1, 2, 3] / [3, 2, 1] = [0.33, 1, 3]`)
## SHAPE MANIPULATION
```py
np.reshape(array, new_shape) # changes the shape of the array
np.ravel(array) # returns the array flattened
array.resize(shape) # modifies the array itself
array.T # returns the array transposed
np.transpose(array) # returns the array transposed
np.swapaxes(array, first_axis, second_axis) # interchange two axes of an array
# if array is an ndarray, then a view of it is returned; otherwise a new array is created
```
## JOINING ARRAYS
```py
np.vstack((array1, array2)) # takes tuple, vertical stack of arrays (column wise)
np.hstack((array1, array2)) # takes a tuple, horizontal stack of arrays (row wise)
np.dstack((array1, array2)) # takes a tuple, depth wise stack of arrays (3rd dimension)
np.stack(*arrays, axis) # joins a sequence of arrays along a new axis (axis is an int)
np.concatenate((array1, array2, ...), axis) # joins a sequence of arrays along an existing axis (axis is an int)
```
## SPLITTING ARRAYS
```py
np.split(array, indices) # splits an array into equall7 long sub-arrays (indices is int), if not possible raises error
np.vsplit(array, indices) # splits an array equally into sub-arrays vertically (row wise) if not possible raises error
np.hsplit(array, indices) # splits an array equally into sub-arrays horizontally (column wise) if not possible raises error
np.dsplit(array, indices) # splits an array into equally sub-arrays along the 3rd axis (depth) if not possible raises error
np.array_split(array, indices) # splits an array into sub-arrays, arrays can be of different lengths
```
## VIEW()
```py
var = array.view() # creates a new array that looks at the same data
# slicing returns a view
# view shapes are separated but assignment changes all arrays
```
## COPY()
```py
var = array.copy() # creates a deep copy of the array
```
## INDEXING, SLICING, ITERATING
1-dimensional --> sliced, iterated and indexed as standard
n-dimensional --> one index per axis, index given in tuple separated by commas `[i, j] (i, j)`
dots (`...`) represent as many colons as needed to produce complete indexing tuple
- `x[1, 2, ...] == [1, 2, :, :, :]`
- `x[..., 3] == [:, :, :, :, 3]`
- `x[4, ..., 5, :] == [4, :, :, 5, :]`
iteration on first index, use .flat() to iterate over each element
- `x[*bool]` returns row with corresponding True index
- `x[condition]` return only elements that satisfy condition
- x`[[*index]]` return rows ordered by indexes
- `x[[*i], [*j]]` return elements selected by tuple (i, j)
- `x[ np.ix_( [*i], [*j] ) ]` return rectangular region
## UNIVERSAL FUNCTIONS (ufunc)
Functions that performs element-wise operations (vectorization).
```py
np.abs(array) # vectorized abs(), return element absolute value
np.fabs(array) # faster abs() for non-complex values
np.sqrt(array) # vectorized square root (x^0.5)
np.square(array) # vectorized square (x^2)
np.exp(array) # vectorized natural exponentiation (e^x)
np.log(array) # vectorized natural log(x)
np.log10(array) # vectorized log10(x)
np.log2(array) # vectorized log2(x)
np.log1p(array) # vectorized log(1 + x)
np.sign(array) # vectorized sign (1, 0, -1)
np.ceil(array) # vectorized ceil()
np.floor(array) # vectorized floor()
np.rint(array) # vectorized round() to nearest int
np.modf(array) # vectorized divmod(), returns the fractional and integral parts of element
np.isnan(array) # vectorized x == NaN, return boolean array
np.isinf(array) # vectorized test for positive or negative infinity, return boolean array
np.isfineite(array) # vectorized test fo finiteness, returns boolean array
np.cos(array) # vectorized cos(x)
np.sin(array) # vectorized sin(x)
np.tan(array) # vectorized tan(x)
np.cosh(array) # vectorized cosh(x)
np.sinh(array) # vector sinh(x)
np.tanh(array) # vectorized tanh(x)
np.arccos(array) # vectorized arccos(x)
np.arcsinh(array) # vectorized arcsinh(x)
np.arctan(array) # vectorized arctan(x)
np.arccosh(array) # vectorized arccosh(x)
np.arcsinh(array) # vectorized arcsin(x)
np.arctanh(array) # vectorized arctanh(x)
np.logical_not(array) # vectorized not(x), equivalent to -array
np.add(x_array, y_array) # vectorized addition
np.subtract(x_array, y_array) # vectorized subtraction
np.multiply(x_array, y_array) # vectorized multiplication
np.divide(x_array, y_array) # vectorized division
np.floor_divide(x_array, y_array) # vectorized floor division
np.power(x_array, y_array) # vectorized power
np.maximum(x_array, y_array) # vectorized maximum
np.minimum(x_array, y_array) # vectorized minimum
np.fmax(x_array, y_array) # vectorized maximum, ignores NaN
np.fmin(x_array, y_array) # vectorized minimum, ignores NaN
np.mod(x_array, y_array) # vectorized modulus
np.copysign(x_array, y_array) # vectorized copy sign from y_array to x_array
np.greater(x_array, y_array) # vectorized x > y
np.less(x_array, y_array) # vectorized x < y
np.greter_equal(x_array, y_array) # vectorized x >= y
np.less_equal(x_array, y_array) # vectorized x <= y
np.equal(x_array, y_array) # vectorized x == y
np.not_equal(x_array, y_array) # vectorized x != y
np.logical_and(x_array, y_array) # vectorized x & y
np.logical_or(x_array, y_array) # vectorized x | y
np.logical_xor(x_array, y_array) # vectorized x ^ y
```
## CONDITIONAL LOGIC AS ARRAY OPERATIONS
```py
np.where(condition, x, y) # return x if condition == True, y otherwise
```
## MATHEMATICAL AND STATISTICAL METHODS
`np.method(array, args)` or `array.method(args)`.
Boolean values are coerced to 1 (`True`) and 0 (`False`).
```py
np.sum(array, axis=None) # sum of array elements over a given axis
np.median(array, axis=None) # median along the specified axis
np.mean(array, axis=None) # arithmetic mean along the specified axis
np.average(array, axis=None) # weighted average along the specified axis
np.std(array, axis=None) # standard deviation along the specified axis
np.var(array, axis=None) # variance along the specified axis
np.min(array, axis=None) # minimum value along the specified axis
np.max(array, axis=None) # maximum value along the specified axis
np.argmin(array, axis=None) # indices of the minimum values along an axis
np.argmax(array, axis=None) # indices of the maximum values
np.cumsum(array, axis=None) # cumulative sum of the elements along a given axis
np.cumprod(array, axis=None) # cumulative sum of the elements along a given axis
```
## METHODS FOR BOOLEAN ARRAYS
```py
np.all(array, axis=None) # test whether all array elements along a given axis evaluate to True
np.any(array, axis=None) # test whether any array element along a given axis evaluates to True
```
## SORTING
```py
array.sort(axis=-1) # sort an array in-place (axis = None applies on flattened array)
np.sort(array, axis=-1) # return a sorted copy of an array (axis = None applies on flattened array)
```
## SET LOGIC
```py
np.unique(array) # sorted unique elements of an array
np.intersect1d(x, y) # sorted common elements in x and y
np.union1d(x, y) # sorte union of elements
np.in1d(x, y) # boolean array indicating whether each element of x is contained in y
np.setdiff1d(x, y) # Set difference, elements in x that are not in y
np.setxor1d() # Set symmetric differences; elements that are in either of the arrays, but not both
```
## FILE I/O WITH ARRAYS
```py
np.save(file, array) # save array to binary file in .npy format
np.savez(file, *array) # save several arrays into a single file in uncompressed .npz format
np.savez_compressed(file, *args, *kwargs) # save several arrays into a single file in compressed .npz format
# *ARGS: arrays to save to the file. arrays will be saved with names "arr_0", "arr_1", and so on
# **KWARGS: arrays to save to the file. arrays will be saved in the file with the keyword names
np.savetxt(file, X, fmt="%.18e", delimiter=" ") # save array to text file
# X: 1D or 2D
# FMT: Python Format Specification Mini-Language
# DELIMITER: {str} -- string used to separate values
np.load(file, allow_pickle=False) # load arrays or pickled objects from .npy, .npz or pickled files
np.loadtxt(file, dtype=float, comments="#", delimiter=None)
# DTYPE: {data type} -- data-type of the resulting array
# COMMENTS: {str} -- characters used to indicate the start of a comment. None implies no comments
# DELIMITER: {str} -- string used to separate values
```
## LINEAR ALGEBRA
```py
np.diag(array, k=0) # extract a diagonal or construct a diagonal array
# K: {int} -- k>0 diagonals above main diagonal, k<0 diagonals below main diagonal (main diagonal k = 0)
np.dot(x ,y) # matrix dot product
np.trace(array, offset=0, dtype=None, out=None) # return the sum along diagonals of the array
# OFFSET: {int} -- offset of the diagonal from the main diagonal
# dtype: {dtype} -- determines the data-type of the returned array
# OUT: {ndarray} -- array into which the output is placed
np.linalg.det(A) # compute the determinant of an array
np.linalg.eig(A) # compute the eigenvalues and right eigenvectors of a square array
np.linalg.inv(A) # compute the (multiplicative) inverse of a matrix
# A_inv satisfies dot(A, A_inv) = dor(A_inv, A) = eye(A.shape[0])
np.linalg.pinv(A) # compute the (Moore-Penrose) pseudo-inverse of a matrix
np.linalg.qr() # factor the matrix a as qr, where q is orthonormal and r is upper-triangular
np.linalg.svd(A) # Singular Value Decomposition
np.linalg.solve(A, B) # solve a linear matrix equation, or system of linear scalar equations AX = B
np.linalg.lstsq(A, B) # return the least-squares solution to a linear matrix equation AX = B
```
## RANDOM NUMBER GENERATION
```py
np.random.seed()
np.random.rand()
np.random.randn()
np.random.randint()
np.random.Generator.permutation(x) # randomly permute a sequence, or return a permuted range
np.random.Generator.shuffle(x) # Modify a sequence in-place by shuffling its contents
np.random.Generator.beta(a, b, size=None) # draw samples from a Beta distribution
# A: {float, array floats} -- Alpha, > 0
# B: {int, tuple ints} -- Beta, > 0
np.random.Generator.binomial(n, p, size=None) # draw samples from a binomial distribution
# N: {int, array ints} -- parameter of the distribution, >= 0
# P: {float, arrey floats} -- Parameter of the distribution, >= 0 and <= 1
np.random.Generator.chisquare(df, size=None)
# DF: {float, array floats} -- degrees of freedom, > 0
np.random.Generator.gamma(shape, scale=1.0, size=None) # draw samples from a Gamma distribution
# SHAPE: {float, array floats} -- shape of the gamma distribution, != 0
np.random.Generator.normal(loc=0.0, scale=1.0, Size=None) # draw random samples from a normal (Gaussian) distribution
# LOC: {float, all floats} -- mean ("centre") of distribution
# SCALE: {float, all floats} -- standard deviation of distribution, != 0
np.random.Generator.poisson(lam=1.0, size=None) # draw samples from a Poisson distribution
# LAM: {float, all floats} -- expectation of interval, >= 0
np.random.Generator.uniform(low=0.0,high=1.0, size=None) # draw samples from a uniform distribution
# LOW: {float, all floats} -- lower boundary of the output interval
# HIGH: {float, all floats} -- upper boundary of the output interval
np.random.Generator.zipf(a, size=None) # draw samples from a Zipf distribution
# A: {float, all floats} -- distribution parameter, > 1
```

View file

@ -0,0 +1,646 @@
# Pandas
## Basic Pandas Imports
```py
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
```
## SERIES
1-dimensional labelled array, axis label referred as INDEX.
Index can contain repetitions.
```py
s = Series(data, index=index, name='name')
# DATA: {python dict, ndarray, scalar value}
# NAME: {string}
s = Series(dict) # Series created from python dict, dict keys become index values
```
### INDEXING / SELECTION / SLICING
```py
s['index'] # selection by index label
s[condition] # return slice selected by condition
s[ : ] # slice endpoint included
s[ : ] = *value # modify value of entire slice
s[condition] = *value # modify slice by condition
```
## MISSING DATA
Missing data appears as NaN (Not a Number).
```py
pd.isnull(array) # return a Series index-bool indicating which indexes don't have data
pd.notnull(array) # return a Series index-bool indicating which indexes have data
array.isnull()
array.notnull()
```
### SERIES ATTRIBUTES
```py
s.values # NumPy representation of Series
s.index # index object of Series
s.name = "Series name" # renames Series object
s.index.name = "index name" # renames index
```
### SERIES METHODS
```py
pd.Series.isin(self, values) # boolean Series showing whether elements in Series matches elements in values exactly
# Conform Series to new index, new object produced unless the new index is equivalent to current one and copy=False
pd.Series.reindex(self, index=None, **kwargs)
# INDEX: {array} -- new labels / index
# METHOD: {none (don't fill gaps), pad (fill or carry values forward), backfill (fill or carry values backward)}-- hole filling method
# COPY: {bool} -- return new object even if index is same -- DEFAULT True
# FILLVALUE: {scalar} --value to use for missing values. DEFAULT NaN
pd.Series.drop(self, index=None, **kwargs) # return Series with specified index labels removed
# INPLACE: {bool} -- if true do operation in place and return None -- DEFAULT False
# ERRORS: {ignore, raise} -- If "ignore", suppress error and existing labels are dropped
# KeyError raised if not all of the labels are found in the selected axis
pd.Series.value_counts(self, normalize=False, sort=True, ascending=False, bins=None, dropna=True)
# NORMALIZE: {bool} -- if True then object returned will contain relative frequencies of unique values
# SORT: {bool} -- sort by frequency -- DEFAULT True
# ASCENDING: {bool} -- sort in ascending order -- DEFAULT False
# BINS: {int} -- group values into half-open bins, only works with numeric data
# DROPNA: {bool} -- don't include counts of NaN
```
## DATAFRAME
2-dimensional labeled data structure with columns of potentially different types.
Index and columns can contain repetitions.
```py
df = DataFrame(data, index=row_labels, columns=column_labels)
# DATA: {list, dict (of lists), nested dicts, series, dict of 1D ndarray, 2D ndarray, DataFrame}
# INDEX: {list of row_labels}
# COLUMNS: {list of column_labels}
# outer dict keys interpreted as index labels, inner dict keys interpreted as column labels
# INDEXING / SELECTION / SLICING
df[col] # column selection
df.at[row, col] # access a single value for a row/column label pair
df.iat[row, col] # access a single value for a row/column pair by integer position
df.column_label # column selection
df.loc[label] # row selection by label
df.iloc[loc] # row selection by integer location
df[ : ] # slice rows
df[bool_vec] # slice rows by boolean vector
df[condition] # slice rows by condition
df.loc[:, ["column_1", "column_2"]] # slice columns by names
df.loc[:, [bool_vector]] # slice columns by names
df[col] = *value # modify column contents, if colon is missing it will be created
df[ : ] = *value # modify rows contents
df[condition] = *value # modify contents
del df[col] # delete column
```
### DATAFRAME ATTRIBUTES
```py
df.index # row labels
df.columns # column labels
df.values # NumPy representation of DataFrame
df.index.name = "index name"
df.columns.index.name = "columns name"
df.T # transpose
```
### DATAFRAME METHODS
```py
pd.DataFrame.isin(self , values) # boolean DataFrame showing whether elements in DataFrame matches elements in values exactly
# Conform DataFrame to new index, new object produced unless the new index is equivalent to current one and copy=False
pd.DataFrame.reindex(self, index=None, columns=None, **kwargs)
# INDEX: {array} -- new labels / index
# COLUMNS: {array} -- new labels / columns
# METHOD: {none (don't fill gaps), pad (fill or carry values forward), backfill (fill or carry values backward)}-- hole filling method
# COPY: {bool} -- return new object even if index is same -- DEFAULT True
# FILLVALUE: {scalar} --value to use for missing values. DEFAULT NaN
pd.DataFrame.drop(self, index=None, columns=None, **kwargs) # Remove rows or columns by specifying label names
# INPLACE: {bool} -- if true do operation in place and return None -- DEFAULT False
# ERRORS: {ignore, raise} -- If "ignore", suppress error and existing labels are dropped
# KeyError raised if not all of the labels are found in the selected axis
```
## INDEX OBJECTS
Holds axis labels and metadata, immutable.
### INDEX TYPES
```py
pd.Index # immutable ordered ndarray, sliceable. stores axis labels
pd.Int64Index # special case of Index with purely integer labels
pd.MultiIndex # multi-level (hierarchical) index object for pandas objects
pd.PeriodINdex # immutable ndarray holding ordinal values indicating regular periods in time
pd.DatetimeIndex # nanosecond timestamps (uses Numpy datetime64)
```
### INDEX ATTRIBUTERS
```py
pd.Index.is_monotonic_increasing # Return True if the index is monotonic increasing (only equal or increasing) values
pd.Index.is_monotonic_decreasing # Return True if the index is monotonic decreasing (only equal or decreasing) values
pd.Index.is_unique # Return True if the index has unique values.
pd.Index.hasnans # Return True if the index has NaNs
```
### INDEX METHODS
```py
pd.Index.append(self, other) # append a collection of Index options together
pd.Index.difference(self, other, sort=None) # set difference of two Index objects
# SORT: {None (attempt sorting), False (don't sort)}
pd.Index.intersection(self, other, sort=None) # set intersection of two Index objects
# SORT: {None (attempt sorting), False (don't sort)}
pd.Index.union(self, other, sort=None) # set union of two Index objects
# SORT: {None (attempt sorting), False (don't sort)}
pd.Index.isin(self, values, level=None) # boolean array indicating where the index values are in values
pd.Index.insert(self, loc, item) # make new Index inserting new item at location
pd.Index.delete(self, loc) # make new Index with passed location(-s) deleted
pd.Index.drop(self, labels, errors='raise') # Make new Index with passed list of labels deleted
# ERRORS: {ignore, raise} -- If 'ignore', suppress error and existing labels are dropped
# KeyError raised if not all of the labels are found in the selected axis
pd.Index.reindex(self, target, **kwargs) # create index with target's values (move/add/delete values as necessary)
# METHOD: {none (don't fill gaps), pad (fill or carry values forward), backfill (fill or carry values backward)}-- hole filling method
```
## ARITHMETIC OPERATIONS
NumPy arrays operations preserve labels-value link.
Arithmetic operations automatically align differently indexed data.
Missing values propagate in arithmetic computations (NaN `<operator>` value = NaN)
### ADDITION
```py
self + other
pd.Series.add(self, other, fill_value=None) # add(), supports substitution of NaNs
pd,Series.radd(self, other, fill_value=None) # radd(), supports substitution of NaNs
pd.DataFrame.add(self, other, axis=columns, fill_value=None) # add(), supports substitution of NaNs
pd.DataFrame.radd(self, other, axis=columns, fill_value=None) # radd(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### SUBTRACTION
```py
self - other
pd.Series.sub(self, other, fill_value=None) # sub(), supports substitution of NaNs
pd.Series.radd(self, other, fill_value=None) # radd(), supports substitution of NaNs
ps.DataFrame.sub(self, other, axis=columns, fill_value=None) # sub(), supports substitution of NaNs
pd.DataFrame.rsub(self, other, axis=columns, fill_value=None) # rsub(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### MULTIPLICATION
```py
self * other
pd.Series.mul(self, other, fill_value=None) # mul(), supports substitution of NaNs
pd.Series.rmul(self, other, fill_value=None) # rmul(), supports substitution of NaNs
ps.DataFrame.mul(self, other, axis=columns, fill_value=None) # mul(), supports substitution of NaNs
pd.DataFrame.rmul(self, other, axis=columns, fill_value=None) # rmul(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### DIVISION (float division)
```py
self / other
pd.Series.div(self, other, fill_value=None) # div(), supports substitution of NaNs
pd.Series.rdiv(self, other, fill_value=None) # rdiv(), supports substitution of NaNs
pd.Series.truediv(self, other, fill_value=None) # truediv(), supports substitution of NaNs
pd.Series.rtruediv(self, other, fill_value=None) # rtruediv(), supports substitution of NaNs
ps.DataFrame.div(self, other, axis=columns, fill_value=None) # div(), supports substitution of NaNs
pd.DataFrame.rdiv(self, other, axis=columns, fill_value=None) # rdiv(), supports substitution of NaNs
ps.DataFrame.truediv(self, other, axis=columns, fill_value=None) # truediv(), supports substitution of NaNs
pd.DataFrame.rtruediv(self, other, axis=columns, fill_value=None) # rtruediv(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### FLOOR DIVISION
```py
self // other
pd.Series.floordiv(self, other, fill_value=None) # floordiv(), supports substitution of NaNs
pd.Series.rfloordiv(self, other, fill_value=None) # rfloordiv(), supports substitution of NaNs
ps.DataFrame.floordiv(self, other, axis=columns, fill_value=None) # floordiv(), supports substitution of NaNs
pd.DataFrame.rfloordiv(self, other, axis=columns, fill_value=None) # rfloordiv(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### MODULO
```py
self % other
pd.Series.mod(self, other, fill_value=None) # mod(), supports substitution of NaNs
pd.Series.rmod(self, other, fill_value=None) # rmod(), supports substitution of NaNs
ps.DataFrame.mod(self, other, axis=columns, fill_value=None) # mod(), supports substitution of NaNs
pd.DataFrame.rmod(self, other, axis=columns, fill_value=None) # rmod(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
### POWER
```py
other ** self
pd.Series.pow(self, other, fill_value=None) # pow(), supports substitution of NaNs
pd.Series.rpow(self, other, fill_value=None) # rpow(), supports substitution of NaNs
ps.DataFrame.pow(self, other, axis=columns, fill_value=None) # pow(), supports substitution of NaNs
pd.DataFrame.rpow(self, other, axis=columns, fill_value=None) # rpow(), supports substitution of NaNs
# OTHER: {scalar, sequence, Series, DataFrame}
# AXIS: {0, 1, index, columns} -- whether to compare by the index or columns
# FILLVALUE: {None, float} -- fill missing value
```
## ESSENTIAL FUNCTIONALITY
### FUNCTION APPLICATION AND MAPPING
NumPy ufuncs work fine with pandas objects.
```py
pd.DataFrame.applymap(self, func) # apply function element-wise
pd.DataFrame.apply(self, func, axis=0, args=()) # apply a function along an axis of a DataFrame
# FUNC: {function} -- function to apply
# AXIS: {O, 1, index, columns} -- axis along which the function is applied
# ARGS: {tuple} -- positional arguments to pass to func in addition to the array/series
# SORTING AND RANKING
pd.Series.sort_index(self, ascending=True **kwargs) # sort Series by index labels
pd.Series.sort_values(self, ascending=True, **kwargs) # sort series by the values
# ASCENDING: {bool} -- if True, sort values in ascending order, otherwise descending -- DEFAULT True
# INPALCE: {bool} -- if True, perform operation in-place
# KIND: {quicksort, mergesort, heapsort} -- sorting algorithm
# NA_POSITION {first, last} -- 'first' puts NaNs at the beginning, 'last' puts NaNs at the end
pd.DataFrame.sort_index(self, axis=0, ascending=True, **kwargs) # sort object by labels along an axis
pd.DataFrame.sort_values(self, axis=0, ascending=True, **kwargs) # sort object by values along an axis
# AXIS: {0, 1, index, columns} -- the axis along which to sort
# ASCENDING: {bool} -- if True, sort values in ascending order, otherwise descending -- DEFAULT True
# INPALCE: {bool} -- if True, perform operation in-place
# KIND: {quicksort, mergesort, heapsort} -- sorting algorithm
# NA_POSITION {first, last} -- 'first' puts NaNs at the beginning, 'last' puts NaNs at the end
```
## DESCRIPTIVE AND SUMMARY STATISTICS
### COUNT
```py
pd.Series.count(self) # return number of non-NA/null observations in the Series
pd.DataFrame.count(self, numeric_only=False) # count non-NA cells for each column or row
# NUMERIC_ONLY: {bool} -- Include only float, int or boolean data -- DEFAULT False
```
### DESCRIBE
Generate descriptive statistics summarizing central tendency, dispersion and shape of dataset's distribution (exclude NaN).
```py
pd.Series.describe(self, percentiles=None, include=None, exclude=None)
pd.DataFrame.describe(self, percentiles=None, include=None, exclude=None)
# PERCENTILES: {list-like of numbers} -- percentiles to include in output,between 0 and 1 -- DEFAULT [.25, .5, .75]
# INCLUDE: {all, None, list of dtypes} -- white list of dtypes to include in the result. ignored for Series
# EXCLUDE: {None, list of dtypes} -- black list of dtypes to omit from the result. ignored for Series
```
### MAX - MIN
```py
pd.Series.max(self, skipna=None, numeric_only=None) # maximum of the values for the requested axis
pd.Series.min(self, skipna=None, numeric_only=None) # minimum of the values for the requested axis
pd.DataFrame.max(self, axis=None, skipna=None, numeric_only=None) # maximum of the values for the requested axis
pd.DataFrame.min(self, axis=None, skipna=None, numeric_only=None) # minimum of the values for the requested axis
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### IDXMAX - IDXMIN
```py
pd.Series.idxmax(self, skipna=True) # row label of the maximum value
pd.Series.idxmin(self, skipna=True) # row label of the minimum value
pd.DataFrame.idxmax(self, axis=0, skipna=True) # Return index of first occurrence of maximum over requested axis
pd.DataFrame.idxmin(self, axis=0, skipna=True) # Return index of first occurrence of minimum over requested axis
# AXIS:{0, 1, index, columns} -- row-wise or column-wise
# SKIPNA: {bool} -- exclude NA/null values. ff an entire row/column is NA, result will be NA
```
### QUANTILE
```py
pd.Series.quantile(self, q=0.5, interpolation='linear') # return values at the given quantile
pd.DataFrame.quantile(self, q=0.5, axis=0, numeric_only=True, interpolation='linear') # return values at the given quantile over requested axis
# Q: {flaot, array} -- value between 0 <= q <= 1, the quantile(s) to compute -- DEFAULT 0.5 (50%)
# NUMERIC_ONLY: {bool} -- if False, quantile of datetime and timedelta data will be computed as well
# INTERPOLATION: {linear, lower, higher, midpoint, nearest} -- SEE DOCS
```
### SUM
```py
pd.Series.sum(self, skipna=None, numeric_only=None, min_count=0) # sum of the values
pd.DataFrame.sum(self, axis=None, skipna=None, numeric_only=None, min_count=0) # sum of the values for the requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
# MIN_COUNT: {int} -- required number of valid values to perform the operation. if fewer than min_count non-NA values are present the result will be NA
```
### MEAN
```py
pd.Series.mean(self, skipna=None, numeric_only=None) # mean of the values
pd.DataFrame.mean(self, axis=None, skipna=None, numeric_only=None) # mean of the values for the requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### MEDIAN
```py
pd.Series.median(self, skipna=None, numeric_only=None) # median of the values
pd.DataFrame.median(self, axis=None, skipna=None, numeric_only=None) # median of the values for the requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### MAD (mean absolute deviation)
```py
pd.Series.mad(self, skipna=None) # mean absolute deviation
pd.DataFrame.mad(self, axis=None, skipna=None) # mean absolute deviation of the values for the requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
```
### VAR (variance)
```py
pd.Series.var(self, skipna=None, numeric_only=None) # unbiased variance
pd.DataFrame.var(self, axis=None, skipna=None, ddof=1, numeric_only=None) # unbiased variance over requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values. if an entire row/column is NA, the result will be NA
# DDOF: {int} -- Delta Degrees of Freedom. divisor used in calculations is N - ddof (N represents the number of elements) -- DEFAULT 1
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### STD (standard deviation)
```py
pd.Series.std(self, skipna=None, ddof=1, numeric_only=None) # sample standard deviation
pd.Dataframe.std(self, axis=None, skipna=None, ddof=1, numeric_only=None) # sample standard deviation over requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values. if an entire row/column is NA, the result will be NA
# DDOF: {int} -- Delta Degrees of Freedom. divisor used in calculations is N - ddof (N represents the number of elements) -- DEFAULT 1
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### SKEW
```py
pd.Series.skew(self, skipna=None, numeric_only=None) # unbiased skew Normalized bt N-1
pd.DataFrame.skew(self, axis=None, skipna=None, numeric_only=None) # unbiased skew over requested axis Normalized by N-1
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### KURT
Unbiased kurtosis over requested axis using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
```py
pd.Series.kurt(self, skipna=None, numeric_only=None)
pd.Dataframe.kurt(self, axis=None, skipna=None, numeric_only=None)
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values when computing the result
# NUMERIC_ONLY: {bool} -- include only float, int, boolean columns, not implemented for Series
```
### CUMSUM (cumulative sum)
```py
pd.Series.cumsum(self, skipna=True) # cumulative sum
pd.Dataframe.cumsum(self, axis=None, skipna=True) # cumulative sum over requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values. if an entire row/column is NA, the result will be NA
```
### CUMMAX - CUMMIN (cumulative maximum - minimum)
```py
pd.Series.cummax(self, skipna=True) # cumulative maximum
pd.Series.cummin(self, skipna=True) # cumulative minimum
pd.Dataframe.cummax(self, axis=None, skipna=True) # cumulative maximum over requested axis
pd.Dataframe.cummin(self, axis=None, skipna=True) # cumulative minimum over requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values. if an entire row/column is NA, the result will be NA
```
### CUMPROD (cumulative product)
```py
pd.Series.cumprod(self, skipna=True) # cumulative product
pd.Dataframe.cumprod(self, axis=None, skipna=True) # cumulative product over requested axis
# AXIS: {0, 1, index, columns} -- axis for the function to be applied on
# SKIPNA: {bool} -- exclude NA/null values. if an entire row/column is NA, the result will be NA
```
### DIFF
Calculates the difference of a DataFrame element compared with another element in the DataFrame.
(default is the element in the same column of the previous row)
```py
pd.Series.diff(self, periods=1)
pd.DataFrame.diff(self, periods=1, axis=0)
# PERIODS: {int} -- Periods to shift for calculating difference, accepts negative values -- DEFAULT 1
# AXIS: {0, 1, index, columns} -- Take difference over rows or columns
```
### PCT_CHANGE
Percentage change between the current and a prior element.
```py
pd.Series.Pct_change(self, periods=1, fill_method='pad', limit=None, freq=None)
pd.Dataframe.pct_change(self, periods=1, fill_method='pad', limit=None)
# PERIODS:{int} -- periods to shift for forming percent change
# FILL_METHOD: {str, pda} -- How to handle NAs before computing percent changes -- DEFAULT pad
# LIMIT: {int} -- number of consecutive NAs to fill before stopping -- DEFAULT None
```
## HANDLING MISSING DATA
### FILTERING OUT MISSING DATA
```py
pd.Series.dropna(self, inplace=False) # return a new Series with missing values removed
pd.DataFrame.dropna(axis=0, how='any', tresh=None, subset=None, inplace=False) # return a new DataFrame with missing values removed
# AXIS: {tuple, list} -- tuple or list to drop on multiple axes. only a single axis is allowed
# HOW: {any, all} -- determine if row or column is removed from DataFrame (ANY = if any NA present, ALL = if all values are NA). DEFAULT any
# TRESH: {int} -- require that many non-NA values
# SUBSET: {array} -- labels along other axis to consider
# INPLACE: {bool} -- if True, do operation inplace and return None -- DEFAULT False
```
### FILLING IN MISSING DATA
Fill NA/NaN values using the specified method.
```py
pd.Series.fillna(self, value=None, method=None, inplace=False, limit=None)
pd.DataFrame.fillna(self, value=None, method=None, axis=None, inplace=False, limit=None)
# VALUE: {scalar, dict, Series, DataFrame} -- value to use to fill holes, dict/Series/DataFrame specifying which value to use for each index or column
# METHOD: {backfill, pad, None} -- method to use for filling holes -- DEFAULT None
# AXIS: {0, 1, index, columns} -- axis along which to fill missing values
# INPLACE: {bool} -- if true fill in-place (will modify views of object) -- DEFAULT False
# LIMIT: {int} -- maximum number of consecutive NaN values to forward/backward fill -- DEFAULT None
```
## HIERARCHICAL INDEXING (MultiIndex)
Enables storing and manipulation of data with an arbitrary number of dimensions.
In lower dimensional data structures like Series (1d) and DataFrame (2d).
### MULTIIINDEX CREATION
```py
pd.MultiIndex.from_arrays(*arrays, names=None) # convert arrays to MultiIndex
pd.MultiIndex.from_tuples(*arrays, names=None) # convert tuples to MultiIndex
pd.MultiIndex.from_frame(df, names=None) # convert DataFrame to MultiIndex
pd.MultiIndex.from_product(*iterables, names=None) # MultiIndex from cartesian product of iterables
pd.Series(*arrays) # Index constructor makes MultiIndex from Series
pd.DataFrame(*arrays) # Index constructor makes MultiINdex from DataFrame
```
### MULTIINDEX LEVELS
Vector of label values for requested level, equal to the length of the index.
```py
pd.MultiIndex.get_level_values(self, level)
```
### PARTIAL AND CROSS-SECTION SELECTION
Partial selection "drops" levels of the hierarchical index in the result in a completely analogous way to selecting a column in a regular DataFrame.
```py
pd.Series.xs(self, key, axis=0, level=None, drop_level=True) # cross-section from Series
pd.DataFrame.xs(self, key, axis=0, level=None, drop_level=True) # cross-section from DataFrame
# KEY: {label, tuple of label} -- label contained in the index, or partially in a MultiIndex
# AXIS: {0, 1, index, columns} -- axis to retrieve cross-section on -- DEFAULT 0
# LEVEL: -- in case of key partially contained in MultiIndex, indicate which levels are used. Levels referred by label or position
# DROP_LEVEL: {bool} -- If False, returns object with same levels as self -- DEFAULT True
```
### INDEXING, SLICING
Multi index keys take the form of tuples.
```py
df.loc[('lvl_1', 'lvl_2', ...)] # selection of single row
df.loc[('idx_lvl_1', 'idx_lvl_2', ...), ('col_lvl_1', 'col_lvl_2', ...)] # selection of single value
df.loc['idx_lvl_1':'idx_lvl_1'] # slice of rows (aka partial selection)
df.loc[('idx_lvl_1', 'idx_lvl_2') : ('idx_lvl_1', 'idx_lvl_2')] # slice of rows with levels
```
### REORDERING AND SORTING LEVELS
```py
pd.MultiIndex.swaplevel(self, i=-2, j=-1) # swap level i with level j
pd.Series.swaplevel(self, i=-2, j=-1) # swap levels i and j in a MultiIndex
pd.DataFrame.swaplevel(self, i=-2, j=-1, axis=0) # swap levels i and j in a MultiIndex on a partivular axis
pd.MultiIndex.sortlevel(self, level=0, ascending=True, sort_remaining=True) # sort MultiIndex at requested level
# LEVEL: {str, int, list-like} -- DEFAULT 0
# ASCENDING: {bool} -- if True, sort values in ascending order, otherwise descending -- DEFAULT True
# SORT_REMAINING: {bool} -- sort by the remaining levels after level
```
## DATA LOADING, STORAGE FILE FORMATS
```py
pd.read_fwf(filepath, colspecs='infer', widths=None, infer_nrows=100) # read a table of fixed-width formatted lines into DataFrame
# FILEPATH: {str, path object} -- any valid string path is acceptable, could be a URL. Valid URLs: http, ftp, s3, and file
# COLSPECS: {list of tuple (int, int), 'infer'} -- list of tuples giving extents of fixed-width fields of each line as half-open intervals { [from, to) }
# WIDTHS: {list of int} -- list of field widths which can be used instead of "colspecs" if intervals are contiguous
# INFER_ROWS: {int} -- number of rows to consider when letting parser determine colspecs -- DEFAULT 100
pd.read_excel() # read an Excel file into a pandas DataFrame
pd.read_json() # convert a JSON string to pandas object
pd.read_html() # read HTML tables into a list of DataFrame objects
pd.read_sql() # read SQL query or database table into a DataFrame
pd.read_csv(filepath, sep=',', *args, **kwargs ) # read a comma-separated values (csv) file into DataFrame
pd.read_table(filepath, sep='\t', *args, **kwargs) # read general delimited file into DataFrame
# FILEPATH: {str, path object} -- any valid string path is acceptable, could be a URL. Valid URLs: http, ftp, s3, and file
# SEP: {str} -- delimiter to use -- DEFAULT \t (tab)
# HEADER {int, list of int, 'infer'} -- row numbers to use as column names, and the start of the data -- DEFAULT 'infer'
# NAMES:{array} -- list of column names to use -- DEFAULT None
# INDEX_COL: {int, str, False, sequnce of int/str, None} -- Columns to use as row labels of DataFrame, given as string name or column index -- DEFAULT None
# SKIPROWS: {list-like, int, callable} -- Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file
# NA_VALUES: {scalar, str, list-like, dict} -- additional strings to recognize as NA/NaN. if dict passed, specific per-column NA values
# THOUSANDS: {str} -- thousand separator
# *ARGS, **KWARGS -- SEE DOCS
# write object to a comma-separated values (csv) file
pd.DataFrame.to_csv(self, path_or_buf, sep=',', na_rep='', columns=None, header=True, index=True, encoding='utf-8', line_terminator=None, decimal='.', *args, **kwargs)
# SEP: {str len 1} -- Field delimiter for the output file
# NA_REP: {str} -- missing data representation
# COLUMNS: {sequence} -- colums to write
# HEADER: {bool, list of str} -- write out column names. if list of strings is given its assumed to be aliases for column names
# INDEX: {bool, list of str} -- write out row names (index)
# ENCODING: {str} -- string representing encoding to use -- DEFAULT "utf-8"
# LINE_TERMINATOR: {str} -- newline character or character sequence to use in the output file -- DEFAULT os.linesep
# DECIMAL: {str} -- character recognized as decimal separator (in EU ,)
pd.DataFrame.to_excel()
pd.DataFrame.to_json()
pd.DataFrame.to_html()
pd.DataFrame.to_sql()
```

View file

@ -0,0 +1,146 @@
# Requests Lib
## GET REQUEST
Get or retrieve data from specified resource
```py
response = requests.get('URL') # returns response object
# PAYLOAD -> valuable information of response
response.status_code # http status code
```
The response message consists of:
- status line which includes the status code and reason message
- response header fields (e.g., Content-Type: text/html)
- empty line
- optional message body
```text
1xx -> INFORMATIONAL RESPONSE
2xx -> SUCCESS
200 OK -> request successful
3xx -> REDIRECTION
4xx -> CLIENT ERRORS
404 NOT FOUND -> resource not found
5xx -> SERVER ERRORS
```
```py
# raise exception HTTPError for error status codes
response.raise_for_status()
response.content # raw bytes of payload
response.encoding = 'utf-8' # specify encoding
response.text # string payload (serialized JSON)
response.json() # dict of payload
response.headers # response headers (dict)
```
### QUERY STRING PARAMETERS
```py
response = requests.get('URL', params={'q':'query'})
response = requests.get('URL', params=[('q', 'query')])
response = requests.get('URL', params=b'q=query')
```
### REQUEST HEADERS
```py
response = requests.get(
'URL',
params={'q': 'query'},
headers={'header': 'header_query'}
)
```
## OTHER HTTP METHODS
### DATA INPUT
```py
# requests that entity enclosed be stored as a new subordinate of the web resource identified by the URI
requests.post('URL', data={'key':'value'})
# requests that the enclosed entity be stored under the supplied URI
requests.put('URL', data={'key':'value'})
# applies partial modification
requests.patch('URL', data={'key':'value'})
# deletes specified resource
requests.delete('URL')
# ask for a response but without the response body (only headers)
requests.head('URL')
# returns supported HTTP methods of the server
requests.options('URL')
```
### SENDING JSON DATA
```py
requests.post('URL', json={'key': 'value'})
```
### INSPECTING THE REQUEST
```py
# requests lib prepares the requests before sending it
response = requests.post('URL', data={'key':'value'})
response.request.something # inspect request field
```
## AUTHENTICATION
```py
requests.get('URL', auth=('username', 'password')) # use implicit HTTP Basic Authorization
# explicit HTTP Basic Authorization and other
from requests.auth import HTTPBasicAuth, HTTPDigestAuth, HTTPProxyAuth
from getpass import getpass
requests.get('URL', auth=HTTPBasicAuth('username', getpass()))
```
### PERSONALIZED AUTH
```py
from requests.auth import AuthBase
class TokenAuth(AuthBase):
"custom authentication scheme"
def __init__(self, token):
self.token = token
def __call__(self, r):
"""Attach API token to custom auth"""
r.headers['X-TokenAuth'] = f'{self.token}'
return r
requests.get('URL', auth=TokenAuth('1234abcde-token'))
```
### DISABLING SSL VERIFICATION
```py
requests.get('URL', verify=False)
```
## PERFORMANCE
### REQUEST TIMEOUT
```py
# raise Timeout exception if request times out
requests.get('URL', timeout=(connection_timeout, read_timeout))
```
### MAX RETRIES
```py
from requests.adapters import HTTPAdapter
URL_adapter = HTTPAdapter(max_retries = int)
session = requests.Session()
# use URL_adapter for all requests to URL
session.mount('URL', URL_adapter)
```

View file

@ -0,0 +1,218 @@
# Seaborn Lib
## Basic Imports For Seaborn
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# set aesthetic parameters in one step
sns.set(style='darkgrid')
#STYLE: {None, darkgrid, whitegrid, dark, white, ticks}
```
## REPLOT (relationship)
```python
sns.replot(x='name_in_data', y='name_in_data', hue='point_color', size='point_size', style='point_shape', data=data)
# HUE, SIZE and STYLE: {name in data} -- used to differentiate points, a sort-of 3rd dimension
# hue behaves differently if the data is categorical or numerical, numerical uses a color gradient
# SORT: {False, True} -- avoid sorting data in function of x
# CI: {None, sd} -- avoid computing confidence intervals or plot standard deviation
# (aggregate multiple measurements at each x value by plotting the mean and the 95% confidence interval around the mean)
# ESTIMATOR: {None} -- turn off aggregation of multiple observations
# MARKERS: {True, False} -- evidenziate observations with dots
# DASHES: {True, False} -- evidenziate observations with dashes
# COL, ROW: {name in data} -- categorical variables that will determine the grid of plots
# COL_WRAP: {int} -- "Wrap" the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.
# SCATTERPLOT
# depicts the joint distribution of two variables using a cloud of points
# kind can be omitted since scatterplot is the default for replot
sns.replot(kind='scatter') # calls scatterplot()
sns.scatterplot() # underlying axis-level function of replot()
```
### LINEPLOT
Using semantics in lineplot will determine the aggregation of data.
```python
sns.replot(ci=None, sort=bool, kind='line')
sns.lineplot() # underlying axis-level function of replot()
```
## CATPLOT (categorical)
Categorical: divided into discrete groups.
```python
sns.catplot(x='name_in_data', y='name_in_data', data=data)
# HUE: {name in data} -- used to differenziate points, a sort-of 3rd dimension
# COL, ROW: {name in data} -- categorical variables that will determine the grid of plots
# COL_WRAP: {int} -- "Wrap" the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.
# ORDER, HUE_ORDER: {list of strings} -- order of categorical levels of the plot
# ROW_ORDER, COL_ORDER: {list of strings} -- order to organize the rows and/or columns of the grid in
# ORIENT: {'v', 'h'} -- Orientation of the plot (can also swap x&y assignment)
# COLOR: {matplotlib color} -- Color for all of the elements, or seed for a gradient palette
# CATEGORICAL SCATTERPLOT - STRIPPLOT
# adjust the positions of points on the categorical axis with a small amount of random “jitter”
sns.catplot(kind='strip', jitter=float)
sns.stripplot()
# SIZE: {float} -- Diameter of the markers, in points
# JITTER: {False, float} -- magnitude of points jitter (distance from axis)
```
### CATEGORICAL SCATTERPLOT - SWARMPLOT
Adjusts the points along the categorical axis preventing overlap.
```py
sns.catplot(kind='swarm')
sns.swarmplot()
# SIZE: {float} -- Diameter of the markers, in points
# CATEGORICAL DISTRIBUTION - BOXPLOT
# shows the three quartile values of the distribution along with extreme values
sns.catplot(kind='box')
sns.boxplot()
# HUE: {name in data} -- box for each level of the semantic moved along the categorical axis so they dont overlap
# DODGE: {bool} -- whether elements should be shifted along the categorical axis if hue is used
```
### CATEGORICAL DISTRIBUTION - VIOLINPLOT
Combines a boxplot with the kernel density estimation procedure.
```py
sns.catplot(kind='violin')
sns.violonplot()
```
### CATEGORICAL DISTRIBUTION - BOXENPLOT
Plot similar to boxplot but optimized for showing more information about the shape of the distribution.
It is best suited for larger datasets.
```py
sns.catplot(kind='boxen')
sns.boxenplot()
```
### CATEGORICAL ESTIMATE - POINTPLOT
Show point estimates and confidence intervals using scatter plot glyphs.
```py
sns.catplot(kind='point')
sns.pointplot()
# CI: {float, sd} -- size of confidence intervals to draw around estimated values, sd -> standard deviation
# MARKERS: {string, list of strings} -- markers to use for each of the hue levels
# LINESTYLES: {string, list of strings} -- line styles to use for each of the hue levels
# DODGE: {bool, float} -- amount to separate the points for each hue level along the categorical axis
# JOIN: {bool} -- if True, lines will be drawn between point estimates at the same hue level
# SCALE: {float} -- scale factor for the plot elements
# ERRWIDTH: {float} -- thickness of error bar lines (and caps)
# CAPSIZE: {float} -- width of the "caps" on error bars
```
### CATEGORICAL ESTIMATE - BARPLOT
Show point estimates and confidence intervals as rectangular bars.
```py
sns.catplot(kind='bar')
sns.barplot()
# CI: {float, sd} -- size of confidence intervals to draw around estimated values, sd -> standard deviation
# ERRCOLOR: {matplotlib color} -- color for the lines that represent the confidence interval
# ERRWIDTH: {float} -- thickness of error bar lines (and caps)
# CAPSIZE: {float} -- width of the "caps" on error bars
# DODGE: {bool} -- whether elements should be shifted along the categorical axis if hue is used
```
### CATEGORICAL ESTIMATE - COUNTPLOT
Show the counts of observations in each categorical bin using bars.
```py
sns.catplot(kind='count')
sns.countplot()
# DODGE: {bool} -- whether elements should be shifted along the categorical axis if hue is used
```
## UNIVARIATE DISTRIBUTIONS
### DISTPLOT
Flexibly plot a univariate distribution of observations
```py
# A: {series, 1d-array, list}
sns.distplot(a=data)
# BINS: {None, arg for matplotlib hist()} -- specification of hist bins, or None to use Freedman-Diaconis rule
# HIST: {bool} - whether to plot a (normed) histogram
# KDE: {bool} - whether to plot a gaussian kernel density estimate
# HIST_KWD, KDE_KWD, RUG_KWD: {dict} -- keyword arguments for underlying plotting functions
# COLOR: {matplotlib color} -- color to plot everything but the fitted curve in
```
### RUGPLOT
Plot datapoints in an array as sticks on an axis.
```py
# A: {vector} -- 1D array of observations
sns.rugplot(a=data) # -> axes obj with plot on it
# HEIGHT: {scalar} -- height of ticks as proportion of the axis
# AXIS: {'x', 'y'} -- axis to draw rugplot on
# AX: {matplotlib axes} -- axes to draw plot into, otherwise grabs current axes
```
### KDEPLOT
Fit and plot a univariate or bivariate kernel density estimate.
```py
# DATA: {1D array-like} -- input data
sns.kdeplot(data=data)
# DATA2 {1D array-like} -- second input data. if present, a bivariate KDE will be estimated.
# SHADE: {bool} -- if True, shade-in the area under KDE curve (or draw with filled contours is bivariate)
```
## BIVARIATE DISTRIBUTION
### JOINTPLOT
Draw a plot of two variables with bivariate and univariate graphs.
```py
# X, Y: {string, vector} -- data or names of variables in data
sns.jointplot(x=data, y=data)
# DATA:{pandas DataFrame} -- DataFrame when x and y are variable names
# KIND: {'scatter', 'reg', 'resid', 'kde', 'hex'} -- kind of plot to draw
# COLOR: {matplotlib color} -- color used for plot elements
# HEIGHT: {numeric} -- size of figure (it will be square)
# RATIO: {numeric} -- ratio of joint axes height to marginal axes height
# SPACE: {numeric} -- space between the joint and marginal axes
# JOINT_KWD, MARGINAL_KWD, ANNOT_KWD: {dict} -- additional keyword arguments for the plot components
```
## PAIR-WISE RELATIONISPS IN DATASET
### PAIRPLOT
Plot pairwise relationships in a dataset.
```py
# DATA: {pandas DataFrame} -- tidy (long-form) dataframe where each column is a variable and each row is an observation
sns.pairplot(data=pd.DataFrame)
# HUE: {string (variable name)} -- variable in data to map plot aspects to different colors
# HUE_ORDER: {list of strings} -- order for the levels of the hue variable in the palette
# VARS: {list of variable names} -- variables within data to use, otherwise every column with numeric datatype
# X_VARS, Y_VARS: {list of variable names} -- variables within data to use separately for rows and columns of figure
# KIND: {'scatter', 'reg'} -- kind of plot for the non-identity relationships
# DIAG_KIND: {'auto', 'hist', 'kde'} -- Kind of plot for the diagonal subplots. default depends hue
# MARKERS: {matplotlib marker or list}
# HEIGHT:{scalar} -- height (in inches) of each facet
# ASPECT: {scalar} -- aspect * height gives the width (in inches) of each facet
```

View file

@ -0,0 +1,579 @@
# Tkinter Module/Library
## Standard Imports
```py
from tkinter import * # import Python Tk Binding
from tkinter import ttk # import Themed Widgets
```
## GEOMETRY MANAGEMENT
Putting widgets on screen
master widget --> top-level window, frame
slave widget --> widgets contained in master widget
geometry managers determine size and oder widget drawing properties
## EVENT HANDLING
event loop receives events from the OS
customizable events provide a callback as a widget configuration
```py
widget.bind('event', function) # method to capture any event and than execute an arbitrary piece of code (generally a function or lambda)
```
VIRTUAL EVENT --> hig level event generated by widget (listed in widget docs)
## WIDGETS
Widgets are objects and all things on screen. All widgets are children of a window.
```py
widget_name = tk_object(parent_window) # widget is inserted into widget hierarchy
```
## FRAME WIDGET
Displays a single rectangle, used as container for other widgets
```py
frame = ttk.Frame(parent, width=None, height=None, borderwidth=num:int)
# BORDERWIDTH: sets frame border width (default: 0)
# width, height MUST be specified if frame is empty, otherwise determined by parent geometry manager
```
### FRAME PADDING
Extra space inside widget (margin).
```py
frame['padding'] = num # same padding for every border
frame['padding'] = (horizontal, vertical) # set horizontal THEN vertical padding
frame['padding'] = (left, top, right, bottom) # set left, top, right, bottom padding
# RELIEF: set border style, [flat (default), raised, sunken, solid, ridge, groove]
frame['relief'] = border_style
```
## LABEL WIDGET
Display text or image without interactivity.
```py
label = ttk.Label(parent, text='label text')
```
### DEFINING UPDATING LABEL
```py
var = StringVar() # variable containing text, watches for changes. Use get, set methods to interact with the value
label['textvariable'] = var # attach var to label (only of type StringVar)
var.set("new text label") # change label text
```
### DISPLAY IMAGES (2 steps)
```py
image = PhotoImage(file='filename') # create image object
label['image'] = image # use image config
```
### DISPLAY IMAGE AND-OR TEXT
```py
label['compound'] = value
```
Compound value:
- none ../img if present, text otherwise)
- text (text only)
- image (image only)
- center (text in center of image)
- top (image above text), left, bottom, right
## LAYOUT
Specifies edge or corner that the label is attached.
```py
label['anchor'] = compass_direction #compass_direction: n, ne, e, se, s, sw, w, nw, center
```
### MULTI-LINE TEXT WRAP
```py
# use \n for multi line text
label['wraplength'] = size # max line length
```
### CONTROL TEXT JUSTIFICATION
```py
label['justify'] = value #value: left, center, right
label['relief'] = label_style
label['foreground'] = color # color passed with name or HEX RGB codes
label['background'] = color # color passed with name or HEX RGB codes
```
### FONT STYLE (use with caution)
```py
# used outside style option
label['font'] = font
```
Fonts:
- TkDefaultFont -- default for all GUI items
- TkTextFont -- used for entry widgets, listboxes, etc
- TkFixedFont -- standard fixed-width font
- TkMenuFont -- used for menu items
- TkHeadingFont -- for column headings in lists and tables
- TkCaptionFont -- for window and dialog caption bars
- TkSmallCaptionFont -- smaller caption for subwindows or tool dialogs
- TkIconFont -- for icon caption
- TkTooltipFont -- for tooltips
## BUTTON
Press to perform some action
```py
button = ttk.Button(parent, text='button_text', command=action_performed)
```
### TEXT or IMAGE
```py
button['text/textvariable'], button['image'], button['compound']
```
### BUTTON INVOCATION
```py
button.invoke() # button activation in the program
```
### BUTTON STATE
Activate or deactivate the widget.
```py
button.state(['disabled']) # set the disabled flag, disabling the button
button.state(['!disabled']) # clear the disabled flag
button.instate(['disabled']) # return true if the button is disabled, else false
button.instate(['!disabled']) # return true if the button is not disabled, else false
button.instate(['!disabled'], cmd) # execute 'cmd' if the button is not disabled
# WIDGET STATE FLAGS: active, disabled, focus, pressed, selected, background, readonly, alternate, invalid
```
## CHECKBUTTON
Button with binary value of some kind (e.g a toggle) and also invokes a command callback
```py
checkbutton_var = TkVarType
check = ttk.Checkbutton(parent, text='button text', command=action_performed, variable=checkbutton_var, onvalue=value_on, offvalue=value_off)
```
### WIDGET VALUE
Variable option holds value of button, updated by widget toggle.
DEFAULT: 1 (while checked), 0 (while unchecked)
`onvalue`, `offvalue` are used to personalize checked and unchecked values
if linked variable is empty or different from on-off value the state flag is set to alternate
checkbutton won't set the linked variable (MUST be done in the program)
### CONFIG OPTIONS
```py
check['text/textvariable']
check['image']
check['compound']
check.state(['flag'])
check.instate(['flag'])
```
## RADIOBUTTON
Multiple-choice selection (good if options are few).
```py
#RADIOBUTTON CREATION (usually as a set)
radio_var = TkVarType
radio_1 = ttk.Radiobutton(parent, text='button text', variable=radio_var, value=button_1_value)
radio_2 = ttk.Radiobutton(parent, text='button text', variable=radio_var, value=button_2_value)
radio_3 = ttk.Radiobutton(parent, text='button text', variable=radio_var, value=button_3_value)
# if linked value does not exist flag state is alternate
# CONFIG OPTIONS
radio['text/textvariable']
radio['image']
radio['compound']
radio.state(['flag'])
radio.instate(['flag'])
```
## ENTRY
Single line text field accepting a string.
```py
entry_var = StringVar()
entry = ttk.Entry(parent, textvariable=entry_var, width=char_num, show=symbol)
# SHOW: replaces the entry test with symbol, used for password
# entries don't have an associated label, needs a separate widget
```
### CHANGE ENTRY VALUE
```py
entry.get() # returns entry value
entry.delete(start, 'end') # delete between two indices, 0-based
entry.insert(index, 'text value') # insert new text at a given index
```
### ENTRY CONFIG OPTIONS
```py
radio.state(['flag'])
radio.instate(['flag'])
```
## COMBOBOX
Drop-down list of available options.
```py
combobox_var = StringVar()
combo = ttk.Combobox(parent, textvariable=combobox_var)
combobox.get() # return combobox current value
combobox.set(value) # set combobox new value
combobox.current() # returns which item in the predefined values list is selected (0-based index of the provided list, -1 if not in the list)
#combobox will generate a bind-able <ComboboxSelected> virtual event whenever the value changes
combobox.bind('<<ComboboxSelected>>', function)
```
### PREDEFINED VALUES
```py
combobox['values'] = (value_1, value_2, ...) # provides a list of choose-able values
combobox.state(['readonly']) # restricts choose-able values to those provided with 'values' config option
# SUGGESTION: call selection clear method on value change (on ComboboxSelected event) to avoid visual oddities
```
## LISTBOX (Tk Classic)
Display list of single-line items, allows browsing and multiple selection (part og Tk classic, missing in themed Tk widgets).
```py
lstbx = Listbox(parent, height=num, listvariable=item_list:list)
# listvariable links a variable (MUST BE a list) to the listbox, each element is a item of the listbox
# manipulation of the list changes the listbox
```
### SELECTING ITEMS
```py
lstbx['selectmode'] = mode # MODE: browse (single selection), extended (multiple selection)
lstbx.curselection() # returns list of indices of selected items
# on selection change: generate event <ListboxSelect>
# often each string in the program is associated with some other data item
# keep a second list, parallel to the list of strings displayed in the listbox, which will hold the associated objects
# (association by index with .curselection() or with a dict).
```
## SCROLLBAR
```py
scroll = ttk.Scrollbar(parent, orient=direction, command=widget.view)
# ORIENT: VERTICAL, HORIZONTAL
# WIDGET.VIEW: .xview, .yview
# NEEDS ASSOCIATED WIDGET SCROLL CONFIG
widget.configure(xscrollcommand=scroll.set)
widget.configure(yscrollcommand=scroll.set)
```
## SIZEGRIP
Box in right bottom of widget, allows resize.
```py
ttk.Sizegrip(parent).grid(column=999, row=999, sticky=(S, E))
```
## TEXT (Tk Classic)
Area accepting multiple line of text.
```py
txt = Text(parent, width=num:int, height=num:int, wrap=flag) # width is character num, height is row num
# FLAG: none (no wrapping), char (wrap at every character), word (wrap at word boundaries)
txt['state'] = flag # FLAG: disabled, normal
# accepts commands xscrollcommand and yscrollcommandand and yview, xview methods
txt.see(line_num.char_num) # ensure that given line is visible (line is 1-based, char is 0-based)
txt.get( index, string) # insert string in pos index (index = line.char), 'end' is shortcut for end of text
txt.delete(start, end) # delete range of text
```
## PROGRESSBAR
Feedback about progress of lenghty operation.
```py
progbar = ttk.Progressbar(parent, orient=direction, length=num:int, value=num, maximum=num:float mode=mode)
# DIRECTION: VERTICAL, HORIZONTAL
# MODE: determinate (relative progress of completion), indeterminate (no estimate of completion)
# LENGTH: dimension in pixel
# VALUE: sets the progress, updates the bar as it changes
# MAXIMUM: total number of steps (DEFAULT: 100)
```
### DETERMINATE PROGRESS
```py
progbar.step(amount) # increment value of given amount (DEFAULT: 1.0)
```
### INDETERMINATE PROGRESS
```py
progbar.start() # starts progressbar
progbar.stop() #stoops progressbar
```
## SCALE
Provide a numeric value through direct manipulation.
```py
scale = ttk.Scale(parent, orient=DIR, length=num:int, from_=num:float, to=num:float, command=cmd)
# COMMAND: calls cmd at every scale change, appends current value to func call
scale['value'] # set or read current value
scale.set(value) # set current value
scale.get() # get current value
```
## SPINBOX
Choose numbers. The spinbox choses item from a list, arrows permit cycling lits items.
```py
spinval = StringVar()
spin = Spinbox(parent, from_=num, to=num, textvariable=spinval, increment=num, value=lst, wrap=boolean)
# INCREMENT specifies increment\decrement by arrow button
# VALUE: list of items associated with the spinbox
# WRAP: boolean value determining if value should wrap around if beyond start-end value
```
## GRID GEOMETRY MANAGER
Widgets are assigned a "column" number and a "row" number, which indicates their relative position to each other.
Column and row numbers must be integers, with the first column and row starting at 0.
Gaps in column and row numbers are handy to add more widgets in the middle of the user interface at a later time.
The width of each column (or height of each row) depends on the width or height of the widgets contained within the column or row.
Widgets can take up more than a single cell in the grid ("columnspan" and "rowspan" options).
### LAYOUT WITHIN CELL
By default, if a cell is larger than the widget contained in it, the widget will be centered within it,
both horizontally and vertically, with the master's background showing in the empty space around it.
The "sticky" option can be used to change this default behavior.
The value of the "sticky" option is a string of 0 or more of the compass directions "nsew", specifying which edges of the cell the widget should be "stuck" to.
Specifying two opposite edges means that the widget will be stretched so it is stuck to both.
Specifying "nsew" it will stick to every side.
### HANDLING RESIZE
Every column and row has a "weight" grid option associated with it, which tells it how much it should grow if there is extra room in the master to fill.
By default, the weight of each column or row is 0, meaning don't expand to fill space.
This is done using the "columnconfigure" and "rowconfigure" methods of grid.
Both "columnconfigure" and "rowconfigure" also take a "minsize" grid option, which specifies a minimum size.
### PADDING
Normally, each column or row will be directly adjacent to the next, so that widgets will be right next to each other.
"padx" puts a bit of extra space to the left and right of the widget, while "pady" adds extra space top and bottom.
A single value for the option puts the same padding on both left and right (or top and bottom),
while a two-value list lets you put different amounts on left and right (or top and bottom).
To add padding around an entire row or column, the "columnconfigure" and "rowconfigure" methods accept a "pad" option.
```py
widget.grid(column=num, row=num, columnspan=num, rowspan=num, sticky=(), padx=num, pady=num) # sticky: N, S, E, W
widget.columnconfigure(pad=num, weight=num)
widget.rowconfigure(pad=num, weight=num)
widget.grid_slaves() # returns map, list of widgets inside a master
widget.grid_info() # returns list of grid options
widget.grid_configure() # change one or more option
widget.grid_forget(slaves) # takes a list of slaves, removes slaves from grid (forgets slaves options)
widget.grid_remove(slaves) # takes a list of slaves, removes slaves from grid (remembers slaves options)
```
## WINDOWS AND DIALOGS
### CREATING TOPLEVEL WINDOW
```py
tlw = Toplevel(parent) # parent of root window, no need to grid it
window.destroy()
# can destroy every widget
# destroying parent also destroys it's children
```
### CHANGING BEHAVIOR AND STYLE
```py
# WINDOW TILE
window.title() # returns title of the window
window.title('new title') # sets title
# SIZE AND LOCATION
window.geometry(geo_specs)
'''full geometry specification: width * height +- x +- y (actual coordinates of screen)
+x --> x pixels from left edge
-x --> x pixels from right edge
+y --> y pixels from top edge
-y --> y pixels from bottom edge'''
# STACKING ORDER
# current stacking order (list from lowest to highest) --- NOT CLEANLY EXPOSED THROUGH TK API
root.tk.eval('wm stackorder ' + str(window))
# check if window is above or below
if (root.tk.eval('wm stackorder '+str(window)+' isabove '+str(otherwindow))=='1')
if (root.tk.eval('wm stackorder '+str(window)+' isbelow '+str(otherwindow))=='1')
# raise or lower windows
window.lift() # absolute position
window.lift(otherwin) # relative to other window
window.lower() # absolute position
window.lower(otherwin) # relative to other window
# RESIZE BEHAVIOR
window.resizable(boolean, boolean) # sets if resizable in width (1st param) and width (2nd param)
window.minsize(num, num) # sets min width and height
window.maxsize(num, num) # sets max width and height
# ICONIFYING AND WITHDRAWING
# WINDOW STATE: normal. iconic (iconified window), withdrawn, icon, zoomed
window.state() # returns current window state
window.state('state') # sets window state
window.iconify() # iconifies window
window.deiconify() # deiconifies window
```
### STANDARD DIALOGS
```py
# SLEETING FILE AND DIRECTORIES
# on Windows and Mac invokes underlying OS dialogs directly
from tkinter import filedialog
filename = filedialog.askopenfilename()
filename = filedialog.asksaveasfilename()
dirname = filedialog.askdirectory()
'''All of these commands produce modal dialogs, which means that the commands (and hence the program) will not continue running until the user submits the dialog.
The commands return the full pathname of the file or directory the user has chosen, or return an empty string if the user cancels out of the dialog.'''
# SELECTING COLORS
from tkinter import colorchooser
# returns HEX color code, INITIALCOLOR: exiting color, presumably to replace
colorchooser.askcolor(initialcolor=hex_color_code)
# ALERT AND COMFIRMATION DIALOGS
from tkinter import messagebox
messagebox.showinfo(title="title", message='text') # simple box with message and OK button
messagebox.showerror(title="title", message='text')
messagebox.showwarning(title="title", message='text')
messagebox.askyesno(title="title", message='text', detail='secondary text' icon='icon')
messagebor.askokcancel(message='text', icon='icon', title='title', detail='secondary text', default=button) # DEFAULT: default button, ok or cancel
messagebox.akdquestion(title="title", message='text', detail='secondary text', icon='icon')
messagebox.askretrycancel(title="title", message='text', detail='secondary text', icon='icon')
messagebox.askyesnocancel(title="title", message='text', detail='secondary text', icon='icon')
# ICON: info (default), error, question, warning
```
POSSIBLE ALERT/CONFIRMATION RETURN VALUES:
- `ok (default)` -- "ok"
- `okcancel` -- "ok" or "cancel"
- `yesno` -- "yes" or "no"
- `yesnocancel` -- "yes", "no" or "cancel"
- `retrycancel` -- "retry" or "cancel"
## SEPARATOR
```py
# horizontal or vertical line between groups of widgets
separator = ttk.Separator(parent, orient=direction)
# DIRECTION: horizontal, vertical
'''LABEL FRAME'''
# labelled frame, used to group widgets
lf = ttk.LabelFrame(parent, text='label')
'''PANED WINDOWS'''
# stack multimple resizable widgets
# panes ara adjustable (drag sash between panes)
pw = ttk.PanedWindow(parent, orient=direction)
# DIRECTION: horizontal, vertical
lf1 = ttk.LabelFrame(...)
lf2 = ttk.LabelFrame(...)
pw.add(lf1) # add widget to paned window
pw.add(lf2)
pw.insert(position, subwindow) # insert widget at given position in list of panes (0, ..., n-1)
pw.forget(subwindow) # remove widget from pane
pw.forget(position) # remove widget from pane
```
### NOTEBOOK
Allows switching between multiple pages
```py
nb = ttk.Notebook(parent)
f1 = ttk.Frame(parent, ...) # child of notebook
f2 = ttk.Frame(parent, ...)
nb.add(subwindow, text='page title', state=flag)
# TEXT: name of page, STATE: normal, dusabled (not selectable), hidden
nb.insert(position, subwindow, option=value)
nb.forget(subwindow)
nb.forget(position)
nb.tabs() # retrieve all tabs
nb.select() # return current tab
nb.select(position/subwindow) # change current tab
nb.tab(tabid, option) # retrieve tab (TABID: position or subwindow) option
nb.tab(tabid, option=value) # change tab option
```
#### FONTS, COLORS, IMAGES
#### NAMED FONTS
Creation of personalized fonts
```py
from tkinter import font
font_name = font.Font(family='font_family', size=num, weight='bold/normal', slant='roman/italic', underline=boolean, overstrike=boolean)
# FAMILY: Courier, Times, Helvetica (support guaranteed)
font.families() # all avaiable font families
```
#### COLORS
Specified w/ HEX RGB codes.
#### IMAGES
imgobj = PhotoImage(file='filename')
label['image'] = imgobj
#### IMAGES W/ Pillow
```py
from PIL import ImageTk, Image
myimg = ImageTk.PhotoImage(Image.open('filename'))
```

View file

@ -0,0 +1,215 @@
# Argpasrse Module
## Creating a parser
```py
import argparse
parser = argparse.ArgumentParser(description="description", allow_abbrev=True)
```
**Note**: All parameters should be passed as keyword arguments.
- `prog`: The name of the program (default: `sys.argv[0]`)
- `usage`: The string describing the program usage (default: generated from arguments added to parser)
- `description`: Text to display before the argument help (default: none)
- `epilog`: Text to display after the argument help (default: none)
- `parents`: A list of ArgumentParser objects whose arguments should also be included
- `formatter_class`: A class for customizing the help output
- `prefix_chars`: The set of characters that prefix optional arguments (default: -)
- `fromfile_prefix_chars`: The set of characters that prefix files from which additional arguments should be read (default: None)
- `argument_default`: The global default value for arguments (default: None)
- `conflict_handler`: The strategy for resolving conflicting optionals (usually unnecessary)
- `add_help`: Add a -h/--help option to the parser (default: True)
- `allow_abbrev`: Allows long options to be abbreviated if the abbreviation is unambiguous. (default: True)
## [Adding Arguments](https://docs.python.org/3/library/argparse.html#the-add-argument-method)
```py
ArgumentParser.add_argument("name_or_flags", nargs="...", action="...")
```
**Note**: All parameters should be passed as keyword arguments.
- `name or flags`: Either a name or a list of option strings, e.g. `foo` or `-f`, `--foo`.
- `action`: The basic type of action to be taken when this argument is encountered at the command line.
- `nargs`: The number of command-line arguments that should be consumed.
- `const`: A constant value required by some action and nargs selections.
- `default`: The value produced if the argument is absent from the command line.
- `type`: The type to which the command-line argument should be converted to.
- `choices`: A container of the allowable values for the argument.
- `required`: Whether or not the command-line option may be omitted (optionals only).
- `help`: A brief description of what the argument does.
- `metavar`: A name for the argument in usage messages.
- `dest`: The name of the attribute to be added to the object returned by `parse_args()`.
### Actions
`store`: This just stores the argument's value. This is the default action.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo')
>>> parser.parse_args('--foo 1'.split())
Namespace(foo='1')
```
`store_const`: This stores the value specified by the const keyword argument. The `store_const` action is most commonly used with optional arguments that specify some sort of flag.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='store_const', const=42)
>>> parser.parse_args(['--foo'])
Namespace(foo=42)
```
`store_true` and `store_false`: These are special cases of `store_const` used for storing the values True and False respectively. In addition, they create default values of False and True respectively.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='store_true')
>>> parser.add_argument('--bar', action='store_false')
>>> parser.add_argument('--baz', action='store_false')
>>> parser.parse_args('--foo --bar'.split())
Namespace(foo=True, bar=False, baz=True)
```
`append`: This stores a list, and appends each argument value to the list. This is useful to allow an option to be specified multiple times. Example usage:
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', action='append')
>>> parser.parse_args('--foo 1 --foo 2'.split())
Namespace(foo=['1', '2'])
```
`append_const`: This stores a list, and appends the value specified by the const keyword argument to the list. (Note that the const keyword argument defaults to None.) The `append_const` action is typically useful when multiple arguments need to store constants to the same list. For example:
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--str', dest='types', action='append_const', const=str)
>>> parser.add_argument('--int', dest='types', action='append_const', const=int)
>>> parser.parse_args('--str --int'.split())
Namespace(types=[<class 'str'>, <class 'int'>])
```
`count`: This counts the number of times a keyword argument occurs. For example, this is useful for increasing verbosity levels:
**Note**: the default will be None unless explicitly set to 0.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--verbose', '-v', action='count', default=0)
>>> parser.parse_args(['-vvv'])
Namespace(verbose=3)
```
`help`: This prints a complete help message for all the options in the current parser and then exits. By default a help action is automatically added to the parser.
`version`: This expects a version= keyword argument in the add_argument() call, and prints version information and exits when invoked:
```py
>>> import argparse
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--version', action='version', version='%(prog)s 2.0')
>>> parser.parse_args(['--version'])
PROG 2.0
```
`extend`: This stores a list, and extends each argument value to the list. Example usage:
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("--foo", action="extend", nargs="+", type=str)
>>> parser.parse_args(["--foo", "f1", "--foo", "f2", "f3", "f4"])
Namespace(foo=['f1', 'f2', 'f3', 'f4'])
```
### Nargs
ArgumentParser objects usually associate a single command-line argument with a single action to be taken.
The `nargs` keyword argument associates a different number of command-line arguments with a single action.
**Note**: If the nargs keyword argument is not provided, the number of arguments consumed is determined by the action.
`N` (an integer): N arguments from the command line will be gathered together into a list.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', nargs=2)
>>> parser.add_argument('bar', nargs=1)
>>> parser.parse_args('c --foo a b'.split())
Namespace(bar=['c'], foo=['a', 'b'])
```
**Note**: `nargs=1` produces a list of one item. This is different from the default, in which the item is produced by itself.
`?`: One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced.
For optional arguments, there is an additional case: the option string is present but not followed by a command-line argument. In this case the value from const will be produced.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', nargs='?', const='c', default='d')
>>> parser.add_argument('bar', nargs='?', default='d')
>>> parser.parse_args(['XX', '--foo', 'YY'])
Namespace(bar='XX', foo='YY')
>>> parser.parse_args(['XX', '--foo'])
Namespace(bar='XX', foo='c')
>>> parser.parse_args([])
Namespace(bar='d', foo='d')
```
`*`: All command-line arguments present are gathered into a list. Note that it generally doesn't make much sense to have more than one positional argument with `nargs='*'`, but multiple optional arguments with `nargs='*'` is possible.
```py
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--foo', nargs='*')
>>> parser.add_argument('--bar', nargs='*')
>>> parser.add_argument('baz', nargs='*')
>>> parser.parse_args('a b --foo x y --bar 1 2'.split())
Namespace(bar=['1', '2'], baz=['a', 'b'], foo=['x', 'y'])
```
`+`: All command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn't at least one command-line argument present.
```py
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('foo', nargs='+')
>>> parser.parse_args(['a', 'b'])
Namespace(foo=['a', 'b'])
>>> parser.parse_args([])
usage: PROG [-h] foo [foo ...]
PROG: error: the following arguments are required: foo
```
`argparse.REMAINDER`: All the remaining command-line arguments are gathered into a list. This is commonly useful for command line utilities that dispatch to other command line utilities.
```py
>>> parser = argparse.ArgumentParser(prog='PROG')
>>> parser.add_argument('--foo')
>>> parser.add_argument('command')
>>> parser.add_argument('args', nargs=argparse.REMAINDER)
>>> print(parser.parse_args('--foo B cmd --arg1 XX ZZ'.split()))
Namespace(args=['--arg1', 'XX', 'ZZ'], command='cmd', foo='B')
```
## Parsing Arguments
```py
# Convert argument strings to objects and assign them as attributes of the namespace. Return the populated namespace.
ArgumentParser.parse_args(args=None, namespace=None)
# assign attributes to an already existing object, rather than a new Namespace object
class C:
pass
c = C()
parser = argparse.ArgumentParser()
parser.add_argument('--foo')
parser.parse_args(args=['--foo', 'BAR'], namespace=c)
c.foo # BAR
# return a dict instead of a Namespace
args = parser.parse_args(['--foo', 'BAR'])
vars(args) # {'foo': 'BAR'}
```

View file

@ -0,0 +1,78 @@
# Collections Module
``` py
# COUNTER ()
# subclass dictionary for counting hash-capable objects
from collections import Counter
Counter (sequence) # -> Counter object
# {item: num included in sequence, ...}
var = Counter (sequence)
var.most_common (n) # produce list of most common elements (most common n)
sum (var.values ()) # total of all counts
var.clear () #reset all counts
list (var) # list unique items
set (var) # convert to a set
dict (var) # convert to regular dictionary
var.items () # convert to a list of pairs (element, count)
Counter (dict (list_of_pairs)) # convert from a list of pairs
var.most_common [: - n-1: -1] # n less common elements
var + = Counter () # remove zero and negative counts
# DEFAULTDICT ()
# dictionary-like object that takes a default type as its first argument
# defaultdict will never raise a KeyError exception.
# non-existent keys return a default value (default_factory)
from collections import defaultdict
var = defaultdict (default_factory)
var.popitem () # remove and return first element
var.popitem (last = True) # remove and return last item
# OREDERDDICT ()
# subclass dictionary that "remembers" the order in which the contents are entered
# Normal dictionaries have random order
name_dict = OrderedDict ()
# OrderedDict with same elements but different order are considered different
# USERDICT ()
# pure implementation in pythondi a map that works like a normal dictionary.
# Designated to create subclasses
UserDict.data # recipient of UserDict content
# NAMEDTUPLE ()
# each namedtuple is represented by its own class
from collections import namedtuple
NomeClasse = namedtuple (NomeClasse, parameters_separated_from_space)
var = ClassName (parameters)
var.attribute # access to attributes
var [index] # access to attributes
var._fields # access to attribute list
var = class._make (iterable) # transformain namedtuple
var._asdict () # Return OrderedDict object starting from namedtuple
# DEQUE ()
# double ended queue (pronounced "deck")
# list editable on both "sides"
from collections import deque
var = deque (iterable, maxlen = num) # -> deque object
var.append (item) # add item to the bottom
var.appendleft (item) # add item to the beginning
var.clear () # remove all elements
var.extend (iterable) # add iterable to the bottom
var.extendleft (iterable) # add iterable to the beginning '
var.insert (index, item) # insert index position
var.index (item, start, stop) # returns position of item
var.count (item)
var.pop ()
var.popleft ()
var.remove (value)
var.reverse () # reverse element order
var.rotate (n) # move the elements of n steps (dx if n> 0, sx if n <0)
var.sort ()
```

View file

@ -0,0 +1,83 @@
# CSV Module
``` python
# iterate lines of csvfile
.reader (csvfile, dialect, ** fmtparams) -> reader object
# READER METHODS
.__ next __ () # returns next iterable object line as a list or dictionary
# READER ATTRIBUTES
dialect # read-only description of the dialec used
line_num # number of lines from the beginning of the iterator
fieldnames
# convert data to delimited strings
# csvfile must support .write ()
#type None converted to empty string (simplify SQL NULL dump)
.writer (csvfile, dialect, ** fmtparams) -> writer object
# WRITER METHODS
# row must be iterable of strings or numbers or of dictionaries
.writerow (row) # write row formatted according to the current dialect
.writerows (rows) # write all elements in rows formatted according to the current dialect. rows is iterable of row
# CSV METHODS
# associate dialect to name (name must be string)
.register_dialect (name, dialect, ** fmtparams)
# delete the dialect associated with name
.unregister_dialect ()
# returns the dialect associated with name
.get_dialect (name)
# list of dialects associated with name
.list_dialect (name)
# returns (if empty) or sets the limit of the csv field
.field_size_limit (new_limit)
'''
csvfile - iterable object returning a string on each __next __ () call
if csv is a file it must be opened with newline = '' (universal newline)
dialect - specify the dialect of csv (Excel, ...) (OPTIONAL)
fmtparams --override formatting parameters (OPTIONAL) https://docs.python.org/3/library/csv.html#csv-fmt-params
'''
# object operating as a reader but maps the info in each row into an OrderedDict whose keys are optional and passed through fieldnames
class csv.Dictreader (f, fieldnames = None, restket = none, restval = None, dialect, * args, ** kwargs)
'''
f - files to read
fieldnames --sequence, defines the names of the csv fields. if omitted use the first line of f
restval, restkey --se len (row)> fieldnames excess data stored in restval and restkey
additional parameters passed to the underlying reader instance
'''
class csv.DictWriter (f, fieldnames, restval = '', extrasaction, dialect, * args, ** kwargs)
'''
f - files to read
fieldnames --sequence, defines the names of the csv fields. (NECESSARY)
restval --se len (row)> fieldnames excess data stored in restval and restkey
extrasaction - if the dictionary passed to writerow () contains key not present in fieldnames extrasaction decides action to be taken (raise cause valueError, ignore ignores additional keys)
additional parameters passed to the underlying writer instance
'''
# DICTREADER METHODS
.writeheader () # write a header line of fields as specified by fieldnames
# class used to infer the format of the CSV
class csv.Sniffer
.sniff (sample, delimiters = None) #parse the sample and return a Dialect class. delimiter is a sequence of possible box delimiters
.has_header (sample) -> bool # True if first row is a series of column headings
#CONSTANTS
csv.QUOTE_ALL # instructs writer to quote ("") all fields
csv.QUOTE_MINIMAL # instructs write to quote only fields containing special characters such as delimiter, quote char ...
csv.QUOTE_NONNUMERIC # instructs the writer to quote all non-numeric fields
csv.QUOTE_NONE # instructs write to never quote fields
```

View file

@ -0,0 +1,70 @@
# Ftplib Module
## FTP CLASSES
```py
ftplib.FTP(host="", user="", password="", acct="")
# if HOST => connect(host)
# if USER => login(user, password, acct)
ftplib.FTP_TLS(host="", user="", password="", acct="")
```
## EXCEPTIONS
```py
ftplib.error_reply # unexpected error from server
ftplib.error_temp # temporary error (response codes 400-499)
ftplib.error_perm # permanent error (response codes 500-599)
ftplib.error_proto # error not in ftp specs
ftplib.all_errors # tuple of all exceptions
```
## FTP OBJECTS
```py
# method on text files: -lines
# method on binary files: -binary
# CONNECTION
FTP.connect(host="", port=0) # used once per instance
# DON'T CALL if host was supplied at instance creation
FTP.getwelcome() # return welcome message
FTP.login(user='anonymous', password='', acct='')
# called once per instance after connection is established
# DEFAULT PASSWORD: anonymous@
# DON'T CALL if host was supplied at instance creation
FTP.sendcmd(cmd) # send command string and return response
FTP.voidcmd(cmd) # send command string and return nothing if successful
# FILE TRANSFER
FTP.abort() # abort in progress file transfer (can fail)
FTTP.transfercmd(cmd, rest=None) # returns socket for connection
# CMD active mode: send EPRT or PORT command and CMD and accept connection
# CMD passive mode: send EPSV or PASV and start transfer command
FTP.retrbinary(cmd, callback, blocksize=8192, rest=None) # retrieve file in binary mode
# CMD: appropriate RETR command ('RETR filename')
# CALLBACK: func called on every block of data received
FTP.rertlines(cmd, callback=None)
# retrieve file or dir list in ASCII transfer mode
# CMD: appropriate RETR, LSIT (list and info of files), NLST (list of file names)
# DEFAULT CALLBACK: sys.stdout
FTP.set_pasv(value) # set passive mode if value is true, otherwise disable it
# passive mode on by default
FTP.storbinary(cmd, fp, blocksize=8192, callback=None, rest=None) # store file in binary mode
# CMD: appropriate STOR command ('STOR filename')
# FP: {file object in binary mode} read until EOF in blocks of blocksize
# CALLBACK: func called on each block after sending
FTP.storlines(cmd, fp, callback=None) # store file in ASCII transfer mode
# CMD: appropriate STOR command ('STOR filename')
# FP: {file object} read until EOF
# CALLBACK: func called on each block after sending
```

View file

@ -0,0 +1,72 @@
# Itertools Module
``` py
# accumulate ([1,2,3,4,5]) -> 1, 3 (1 + 2), 6 (1 + 2 + 3), 10 (1 + 2 + 3 + 6), 15 (1+ 2 + 3 + 4 + 5)
# accumulate (iter, func (,)) -> iter [0], func (iter [0] + iter [1]) + func (prev + iter [2]), ...
accumulate (iterable, func (_, _))
# iterator returns elements from the first iterable,
# then proceeds to the next until the end of the iterables
# does not work if there is only one iterable
chain (* iterable)
# concatenates elements of the single iterable even if it contains sequences
chain.from_iterable (iterable)
# returns sequences of length r starting from the iterable
# items treated as unique based on their value
combinations (iterable, r)
# # returns sequences of length r starting from the iterable allowing the repetition of the elements
combinations_with_replacement (iterable, r)
# iterator filters date elements returning only those that have
# a corresponding element in selectors that is true
compress (data, selectors)
count (start, step)
# iterator returning values in infinite sequence
cycle (iterable)
# iterator discards elements of the iterable as long as the predicate is true
dropwhile (predicate, iterable)
# iterator returning values if predicate is false
filterfalse (predicate, iterable)
# iterator returns tuple (key, group)
# key is the grouping criterion
# group is a generator returning group members
groupby (iterable, key = None)
# iterator returns slices of the iterable
isslice (iterable, stop)
isslice (iterable, start, stop, step)
# returns all permutations of length r of the iterable
permutations (iterable, r = None)
# Cartesian product of iterables
# loops iterables in order of input
# [product ('ABCD', 'xy') -> Ax Ay Bx By Cx Cy Dx Dy]
# [product ('ABCD', repeat = 2) -> AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD]
product (* iterable, repetitions = 1)
# returns an object infinite times if repetition is not specified
repeat (object, repetitions)
# iterator compute func (iterable)
# used if iterable is pre-zipped sequence (seq of tuples grouping elements)
starmap (func, iterable)
# iterator returning values from iterable as long as predicate is true
takewhile (predicate, iterable)
# returns n independent iterators from the single iterable
tee (iterable, n = 2)
# produces an iterator that aggregates elements from each iterable
# if the iterables have different lengths the missing values are filled according to fillervalue
zip_longest (* iterable, fillvalue = None)
```

View file

@ -0,0 +1,110 @@
# JSON Module
## JSON Format
JSON (JavaScript Object Notation) is a lightweight data-interchange format.
It is easy for humans to read and write.
It is easy for machines to parse and generate.
JSON is built on two structures:
- A collection of name/value pairs.
- An ordered list of values.
An OBJECT is an unordered set of name/value pairs.
An object begins with `{` (left brace) and ends with `}` (right brace).
Each name is followed by `:` (colon) and the name/value pairs are separated by `,` (comma).
An ARRAY is an ordered collection of values.
An array begins with `[` (left bracket) and ends with `]` (right bracket).
Values are separated by `,` (comma).
A VALUE can be a string in double quotes, or a number,
or true or false or null, or an object or an array.
These structures can be nested.
A STRING is a sequence of zero or more Unicode characters,
wrapped in double quotes, using backslash escapes.
A CHARACTER is represented as a single character string.
A STRING is very much like a C or Java string.
A NUMBER is very much like a C or Java number,
except that the octal and hexadecimal formats are not used.
WHITESPACE can be inserted between any pair of tokens.
## Usage
```python
# serialize obj as JSON formatted stream to fp
json.dump(obj, fp, cls=None, indent=None, separators=None, sort_keys=False)
# CLS: {custom JSONEncoder} -- specifies custom encoder to be used
# INDENT: {int > 0, string} -- array elements, object members pretty-printed with indent level
# SEPARATORS: {tuple} -- (item_separator, key_separator)
# [default: (', ', ': ') if indent=None, (',', ':') otherwise],
# specify (',', ':') to eliminate whitespace
# SORT_KEYS: {bool} -- if True dict sorted by key
# serialize obj as JSON formatted string
json.dumps(obj, cls=None, indent=None, separators=None, sort_keys=False)
# CLS: {custom JSONEncoder} -- specifies custom encoder to be used
# INDENT: {int > 0, string} -- array elements, object members pretty-printed with indent level
# SEPARATORS: {tuple} -- (item_separator, key_separator)
# [default: (', ', ': ') if indent=None, (',', ':') otherwise],
# specify (',', ':') to eliminate whitespace
# SORT_KEYS: {bool} -- if True dict sorted by key
# deserialize fp to python object
json.load(fp, cls=None)
# CLS: {custom JSONEncoder} -- specifies custom decoder to be used
# deserialize s (string, bytes or bytearray containing JSON doc) to python object
json.loads(s, cls=None)
# CLS: {custom JSONEncoder} -- specifies custom decoder to be used
```
## Default Decoder (`json.JSONDecoder()`)
Conversions (JSON -> Python):
- object -> dict
- array -> list
- string -> str
- number (int) -> int
- number (real) -> float
- true -> True
- false -> False
- null -> None
## Default Encoder (`json.JSONEncoder()`)
Conversions (Python -> Json):
- dict -> object
- list, tuple -> array
- str -> string
- int, float, Enums -> number
- True -> true
- False -> false
- None -> null
## Extending JSONEncoder (Example)
```python
import json
class ComplexEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, complex):
return [obj.real, obj.image]
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, obj)
```
## Retrieving Data from json dict
```python
data = json.loads(json)
data["key"] # retrieve the value associated with the key
data["outer key"]["nested key"] # nested key value retrieval
```

View file

@ -0,0 +1,85 @@
# Logging Module
## Configuration
```python
# basic configuration for the logging system
logging.basicConfig(filename="relpath", level=logging.LOG_LEVEL, format=f"message format", **kwargs)
# DATEFMT: Use the specified date/time format, as accepted by time.strftime().
# create a logger with a name (useful for having multiple loggers)
logger = logging.getLogger(name="logger name")
logger.level # LOG_LEVEL for this logger
# disable all logging calls of severity level and below
# alternative to basicConfig(level=logging.LOG_LEVEL)
logging.disable(level=LOG_LEVEL)
```
### Format (`basicConfig(format="")`)
| Attribute name | Format | Description |
|----------------|-------------------|-------------------------------------------------------------------------------------------|
| asctime | `%(asctime)s` | Human-readable time when the LogRecord was created. Modified by `basicConfig(datefmt="")` |
| created | `%(created)f` | Time when the LogRecord was created (as returned by `time.time()`). |
| filename | `%(filename)s` | Filename portion of pathname. |
| funcName | `%(funcName)s` | Name of function containing the logging call. |
| levelname | `%(levelname)s` | Text logging level for the message. |
| levelno | `%(levelno)s` | Numeric logging level for the message. |
| lineno | `%(lineno)d` | Source line number where the logging call was issued (if available). |
| message | `%(message)s` | The logged message, computed as `msg % args`. |
| module | `%(module)s` | Module (name portion of filename). |
| msecs | `%(msecs)d` | Millisecond portion of the time when the LogRecord was created. |
| name | `%(name)s` | Name of the logger used to log the call. |
| pathname | `%(pathname)s` | Full pathname of the source file where the logging call was issued (if available). |
| process | `%(process)d` | Process ID (if available). |
| processName | `%(processName)s` | Process name (if available). |
| thread | `%(thread)d` | Thread ID (if available). |
| threadName | `%(threadName)s` | Thread name (if available). |
### Datefmt (`basicConfig(datefmt="")`)
| Directive | Meaning |
|-----------|------------------------------------------------------------------------------------------------------------------------------|
| `%a` | Locale's abbreviated weekday name. |
| `%A` | Locale's full weekday name. |
| `%b` | Locale's abbreviated month name. |
| `%B` | Locale's full month name. |
| `%c` | Locale's appropriate date and time representation. |
| `%d` | Day of the month as a decimal number [01,31]. |
| `%H` | Hour (24-hour clock) as a decimal number [00,23]. |
| `%I` | Hour (12-hour clock) as a decimal number [01,12]. |
| `%j` | Day of the year as a decimal number [001,366]. |
| `%m` | Month as a decimal number [01,12]. |
| `%M` | Minute as a decimal number [00,59]. |
| `%p` | Locale's equivalent of either AM or PM. |
| `%S` | Second as a decimal number [00,61]. |
| `%U` | Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. |
| `%w` | Weekday as a decimal number [0(Sunday),6]. |
| `%W` | Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. |
| `%x` | Locale's appropriate date representation. |
| `%X` | Locale's appropriate time representation. |
| `%y` | Year without century as a decimal number [00,99]. |
| `%Y` | Year with century as a decimal number. |
| `%z` | Time zone offset indicating a positive or negative time difference from UTC/GMT of the form +HHMM or -HHMM [-23:59, +23:59]. |
| `%Z` | Time zone name (no characters if no time zone exists). |
| `%%` | A literal '%' character. |
## Logs
Log Levels (Low To High):
- default: `0`
- debug: `10`
- info: `20`
- warning: `30`
- error: `40`
- critical: `50`
```python
logging.debug(msg) # Logs a message with level DEBUG on the root logger
logging.info(msg) # Logs a message with level INFO on the root logger
logging.warning(msg) # Logs a message with level WARNING on the root logger
logging.error(msg) # Logs a message with level ERROR on the root logger
logging.critical(msg) # Logs a message with level CRITICAL on the root logger
```

View file

@ -0,0 +1,52 @@
# Shutil Module
High-level file operations
```python
# copy file src to fil dst, return dst in most efficient way
shutil.copyfile(src, dst)
# dst MUST be complete target name
# if dst already exists it will be overwritten
# copy file src to directory dst, return path to new file
shutil.copy(src, dst)
# Recursively copy entire dir-tree rooted at src to directory named dst
# return the destination directory
shutil.copytree(src, dst, dirs_exist_ok=False)
# DIRS_EXIST_OK: {bool} -- dictates whether to raise an exception in case dst
# or any missing parent directory already exists
# delete an entire directory tree
shutil.rmtree(path, ignore_errors=False, onerror=None)
# IGNORE_ERROR: {bool} -- if true errors (failed removals) will be ignored
# ON_ERROR: handler for removal errors (if ignore_errors=False or omitted)
# recursively move file or directory (src) to dst, return dst
shutil.move(src, dst)
# if the destination is an existing directory, then src is moved inside that directory.
# if the destination already exists but is not a directory,
# it may be overwritten depending on os.rename() semantics
# used to rename files
# change owner user and/or group of the given path
shutil.chown(path, user=None, group=None)
# user can be a system user name or a uid; the same applies to group.
# At least one argument is required
# create archive file and return its name
shutil.make_archive(base_name, format, [root_dir, base_dir])
# BASE_NAME: {string} -- name of the archive, including path, excluding extension
# FROMAT: {zip, tar, gztar, bztar, xztar} -- archive format
# ROOT_DIR: {path} -- root directory of archive (location of archive)
# BASE_DIR: {path} -- directory where the archiviation starts
# unpack an archive
shutil.unpack_archive(filename, [extract_dir, format])
# FILENAME: full path of archive
# EXTRACT_DIR: {path} -- directory to unpack into
# FORMAT: {zip, tar, gztar, bztar, xztar} -- archive format
# return disk usage statistics as Namedtuple w/ attributes total, used, free
shutil.disk_usage(path)
```

View file

@ -0,0 +1,43 @@
# SMTPlib Module
```python
import smtplib
# SMTP instance that encapsulates a SMTP connection
# If the optional host and port parameters are given, the SMTP connect() method is called with those parameters during initialization.
s = smtplib.SMTP(host="host_smtp_address", port="smtp_service_port", **kwargs)
s = smtplib.SMTP_SSL(host="host_smtp_address", port="smtp_service_port", **kwargs)
# An SMTP_SSL instance behaves exactly the same as instances of SMTP.
# SMTP_SSL should be used for situations where SSL is required from the beginning of the connection
# and using starttls() is not appropriate.
# If host is not specified, the local host is used.
# If port is zero, the standard SMTP-over-SSL port (465) is used.
SMTP.connect(host='localhost', port=0)
#Connect to a host on a given port. The defaults are to connect to the local host at the standard SMTP port (25). If the hostname ends with a colon (':') followed by a number, that suffix will be stripped off and the number interpreted as the port number to use. This method is automatically invoked by the constructor if a host is specified during instantiation. Returns a 2-tuple of the response code and message sent by the server in its connection response.
SMTP.verify(address) # Check the validity of an address on this server using SMTP VRFY
SMTP.login(user="full_user_mail", password="user_password") # Log-in on an SMTP server that requires authentication
SMTP.SMTPHeloError # The server didn't reply properly to the HELO greeting
SMTP.SMTPAuthenticationError # The server didn't accept the username/password combination.
SMTP.SMTPNotSupportedError # The AUTH command is not supported by the server.
SMTP.SMTPException # No suitable authentication method was found.
SMTP.starttls(keyfile=None, certfile=None, **kwargs) # Put the SMTP connection in TLS (Transport Layer Security) mode. All SMTP commands that follow will be encrypted
# from_addr & to_addrs are used to construct the message envelope used by the transport agents. sendmail does not modify the message headers in any way.
# msg may be a string containing characters in the ASCII range, or a byte string. A string is encoded to bytes using the ascii codec, and lone \r and \n characters are converted to \r\n characters. A byte string is not modified.
SMTP.sendmail(from_addr, to_addrs, msg, **kwargs)
# from_addr: {string} -- RFC 822 from-address string
# ro_addrs: {string, list of strings} -- list of RFC 822 to-address strings
# msg: {string} -- message string
# This is a convenience method for calling sendmail() with the message represented by an email.message.Message object.
SMTP.send_message(msg, from_addr=None, to_addrs=None, **kwargs)
# from_addr: {string} -- RFC 822 from-address string
# ro_addrs: {string, list of strings} -- list of RFC 822 to-address strings
# msg: {email.message.Message object} -- message string
SMTP.quit() # Terminate the SMTP session and close the connection. Return the result of the SMTP QUIT command
```

View file

@ -0,0 +1,31 @@
# Socket Module
## Definition
A network socket is an internal endpoint for sending or receiving data within a node on a computer network.
In practice, socket usually refers to a socket in an Internet Protocol (IP) network, in particular for the **Transmission Control Protocol (TCP)**, which is a protocol for *one-to-one* connections.
In this context, sockets are assumed to be associated with a specific socket address, namely the **IP address** and a **port number** for the local node, and there is a corresponding socket address at the foreign node (other node), which itself has an associated socket, used by the foreign process. Associating a socket with a socket address is called *binding*.
## Socket Creation & Connection
```python
import socket
# socket over the internet, socket is a stream of data
socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.connect = (("URL", port: int)) # connect to socket
socket.close() # close connection
```
## Making HTTP Requests
```python
import socket
HTTP_Method = "GET http://url/resource HTTP/version\n\n".encode() # set HTTP request (encoded string from UTF-8 to bytes)
socket.send(HTTP_Method) # make HTTP request
data = socket.recv(buffer_size) # receive data from socket
decoded = data.decode() # decode data (from bytes to UTF-8)
```

View file

@ -0,0 +1,96 @@
# sqlite3 Module
## Connecting To The Database
To use the module, you must first create a Connection object that represents the database.
```python
import sqlite3
connection = sqlite3.connect("file.db")
```
Once you have a `Connection`, you can create a `Cursor` object and call its `execute()` method to perform SQL commands.
```python
cursor = connection.cursor()
cursor.execute(sql)
executemany(sql, seq_of_parameters) # Executes an SQL command against all parameter sequences or mappings found in the sequence seq_of_parameters.
cursor.close() # close the cursor now
# ProgrammingError exception will be raised if any operation is attempted with the cursor.
```
The data saved is persistent and is available in subsequent sessions.
### Query Construction
Usually your SQL operations will need to use values from Python variables.
You shouldn't assemble your query using Python's string operations because doing so is insecure:
it makes your program vulnerable to an [SQL injection attack](https://en.wikipedia.org/wiki/SQL_injection)
Put `?` as a placeholder wherever you want to use a value, and then provide a _tuple of values_ as the second argument to the cursor's `execute()` method.
```python
# Never do this -- insecure!
c.execute("SELECT * FROM stocks WHERE symbol = value")
# Do this instead
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print(c.fetchone())
# Larger example that inserts many records at a time
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
('2006-04-06', 'SELL', 'IBM', 500, 53.00),
]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
```
### Writing Operations to Disk
```python
cursor = connection.cursor()
cursor.execute("SQL")
connection.commit()
```
### Multiple SQL Instructions
```python
connection = sqlite3.connect("file.db")
cur = con.cursor()
cur.executescript("""
QUERY_1;
QUERY_2;
...
QUERY_N;
""")
con.close()
```
### Retrieving Records
```python
# Fetches the next row of a query result set, returning a single sequence.
# Returns None when no more data is available.
cursor.fetchone()
# Fetches all (remaining) rows of a query result, returning a list.
# An empty list is returned when no rows are available.
cursor.fetchall()
# Fetches the next set of rows of a query result, returning a list.
# An empty list is returned when no more rows are available.
fetchmany(size=cursor.arraysize)
```
The number of rows to fetch per call is specified by the `size` parameter. If it is not given, the cursor's `arraysize` determines the number of rows to be fetched.
The method should try to fetch as many rows as indicated by the size parameter.
If this is not possible due to the specified number of rows not being available, fewer rows may be returned.
Note there are performance considerations involved with the size parameter.
For optimal performance, it is usually best to use the arraysize attribute.
If the size parameter is used, then it is best for it to retain the same value from one `fetchmany()` call to the next.

View file

@ -0,0 +1,64 @@
# Time & Datetime
## Time
```py
# epoch: elapsed time in seconds (in UNIX starts from 01-010-1970)
import time # UNIX time
variable = time.time () # returns the time (in seconds) elapsed since 01-01-1970
variable = time.ctime (epochseconds) # transform epoch into date
var = time.perf_counter () # returns the current running time
# execution time = start time - end time
```
### time.srtfrime() format
| Format | Data |
|--------|------------------------------------------------------------------------------------------------------------|
| `%a` | Locale's abbreviated weekday name. |
| `%A` | Locale's full weekday name. |
| `%b` | Locale's abbreviated month name. |
| `%B` | Locale's full month name. |
| `%c` | Locale's appropriate date and time representation. |
| `%d` | Day of the month as a decimal number `[01,31]`. |
| `%H` | Hour (24-hour clock) as a decimal number `[00,23]`. |
| `%I` | Hour (12-hour clock) as a decimal number `[01,12]`. |
| `%j` | Day of the year as a decimal number `[001,366]`. |
| `%m` | Month as a decimal number `[01,12]`. |
| `%M` | Minute as a decimal number `[00,59]`. |
| `%p` | Locale's equivalent of either AM or PM. |
| `%S` | Second as a decimal number `[00,61]`. |
| `%U` | Week number of the year (Sunday as the first day of the week) as a decimal number `[00,53]`. |
| `%w` | Weekday as a decimal number `[0(Sunday),6]`. |
| `%W` | Week number of the year (Monday as the first day of the week) as a decimal number `[00,53]`. |
| `%x` | Locale's appropriate date representation. |
| `%X` | Locale's appropriate time representation. |
| `%y` | Year without century as a decimal number `[00,99]`. |
| `%Y` | Year with century as a decimal number. |
| `%z` | Time zone offset indicating a positive or negative time difference from UTC/GMT of the form +HHMM or -HHMM |
| `%Z` | Time zone name (no characters if no time zone exists). |
| `%%` | A literal `%` character. |
## Datetime
```py
import datetime
today = datetime.date.today () # returns current date
today = datetime.datetime.today () # returns the current date and time
# formatting example
print ('Current Date: {} - {} - {}' .format (today.day, today.month, today.year))
print ('Current Time: {}: {}. {}' .format (today.hour, today.minute, today.second))
var_1 = datetime.date (year, month, day) # create date object
var_2 = datetime.time (hour, minute, second, micro-second) # create time object
dt = datetime.combine (var_1, var_2) # combine date and time objects into one object
date_1 = datetime.date ('year', 'month', 'day')
date_2 = date_1.replace (year = 'new_year')
#DATETIME ARITHMETIC
date_1 - date_2 # -> datetime.timedelta (num_of_days)
datetime.timedelta # duration expressing the difference between two date, time or datetime objects
```

View file

@ -0,0 +1,64 @@
# Unittest Module
```py
import unittest
import module_under_test
class Test(unittest.TestCase):
def test_1(self):
self.assert*(output, expected_output)
if __name__ == '__main__':
unittest.main()
```
## TestCase Class
Instances of the `TestCase` class represent the logical test units in the unittest universe. This class is intended to be used as a base class, with specific tests being implemented by concrete subclasses. This class implements the interface needed by the test runner to allow it to drive the tests, and methods that the test code can use to check for and report various kinds of failure.
### Assert Methods
| Method | Checks that |
|-----------------------------|------------------------|
| `assertEqual(a, b)` | `a == b` |
| `assertNotEqual(a, b)` | `a != b` |
| `assertTrue(x)` | `bool(x) is True` |
| `assertFalse(x)` | `bool(x) is False` |
| `assertIs(a, b)` | `a is b` |
| `assertIsNot(a, b)` | `a is not b` |
| `assertIsNone(x)` | `x is None` |
| `assertIsNotNone(x)` | `x is not None` |
| `assertIn(a, b)` | `a in b` |
| `assertNotIn(a, b)` | `a not in b` |
| `assertIsInstance(a, b)` | `isinstance(a, b)` |
| `assertNotIsInstance(a, b)` | `not isinstance(a, b)` |
| Method | Checks that |
|-------------------------------------------------|---------------------------------------------------------------------|
| `assertRaises(exc, fun, *args, **kwds)` | `fun(*args, **kwds)` raises *exc* |
| `assertRaisesRegex(exc, r, fun, *args, **kwds)` | `fun(*args, **kwds)` raises *exc* and the message matches regex `r` |
| `assertWarns(warn, fun, *args, **kwds)` | `fun(*args, **kwds)` raises warn |
| `assertWarnsRegex(warn, r, fun, *args, **kwds)` | `fun(*args, **kwds)` raises warn and the message matches regex *r* |
| `assertLogs(logger, level)` | The with block logs on logger with minimum level |
| Method | Checks that |
|------------------------------|-------------------------------------------------------------------------------|
| `assertAlmostEqual(a, b)` | `round(a-b, 7) == 0` |
| `assertNotAlmostEqual(a, b)` | `round(a-b, 7) != 0` |
| `assertGreater(a, b)` | `a > b` |
| `assertGreaterEqual(a, b)` | `a >= b` |
| `assertLess(a, b)` | `a < b` |
| `assertLessEqual(a, b)` | `a <= b` |
| `assertRegex(s, r)` | `r.search(s)` |
| `assertNotRegex(s, r)` | `not r.search(s)` |
| `assertCountEqual(a, b)` | a and b have the same elements in the same number, regardless of their order. |
| Method | Used to compare |
|------------------------------|--------------------|
| `assertMultiLineEqual(a, b)` | strings |
| `assertSequenceEqual(a, b)` | sequences |
| `assertListEqual(a, b)` | lists |
| `assertTupleEqual(a, b)` | tuples |
| `assertSetEqual(a, b)` | sets or frozensets |
| `assertDictEqual(a, b)` | dicts |

File diff suppressed because it is too large Load diff