Shijie Yao

Regex

Chinese \u4E00-\u9FA5

======================================================================================

Python

useful pages

FlashText: alternative of regexp

useful packages

tqdm for progress bar
pyahocorasick for efficient string matching

tips

set instance pointer
- a = b = set() a and b point to the same object! so they change with each other
random.shuffle(list()) in-place, so shouldn’t print the NoneType object, instead, should print the original list
removing items while looping lists: the list is being modified, pay attention to the realtime count of elements in the list
- better to use list comprehension: l = [n for n in l if n != 1] to remove elements unequal to 1
changing the starting index of enumerate: enumerate(something, 1) will start indexing from 1
tuple/list comparison: tuples and lists are compared lexicographically.
- e.g. (1, 2) < (2, 1) == True # as the first tuple comes before the second one
- e.g. [2, 3] < [1, 8] == False
string split
- str.split(sep, maxsplit)and str.rsplit(seq, maxsplit)
- e.g. 'a,b,c'.split(',', 1) -> ['a', 'b,c'] vs 'a,b,c'.rsplit(',', 1) -> ['a,b', 'c'], r means reverse, so splitting from the end of the string
paths
- diff between sys.path.append(os.path.abspath('path') and os.chdir('path')
- when os.chdiring inside a func? what would happen to the current directory if that func being called?
any()
- True if at least one element of an iterable is True
- False if all elements are false or if an iterable is empty
sys.argv
- if multiple consecutive arguments all of the same type, can use sys.argv[n:]
check if file or dir exists
- a very object-oriented approach
print to stdout or file if file specified
- print(out, file=fout)
sort() does sorting in place; while .sorted() returns a new array as sorted
os.system() to run subshell
don’t overuse write()
du -h -d1 & du -sh: check storage
use set() more to save time!
argparse.ArgumentParser.add_subparsers(); subparsers.add_parser(FUNCTIONALITY_NAME)
re.match(pat, str) always from the beginning of the string; re.search(pat, str) not necessarily
import from module, the ugly way is to append the project path to sys.path: import sys; path_to_add = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))); sys.path.append(path_to_add); the better way is to import by using absolute path and run the module as module python -m abso_path_to_the_module
mock command line arguments for testing

======================================================================================

Conda

conda install notebook ipykernel

======================================================================================

Shell (Bash)

Stupid frequent errors

when running a shell script which includes routes/paths direction, must check them carefully! (especially when there is route/path redirection!
if stuck, check line by line! instead of running the whole script

useful pages

tips

frequent errors (that I usually make!):
- extra | before <, especially at the last line before <
- unlike Python, commands like a=b=$1 is not allowed, instead, split them as a=$1; b=$1
- confusing use of $ bash ${file} and $ python ${file}, double check what type of the file is that I’m running
rename file/dir etc
- rename a file: mv ${oldname} ${newname}
mkdir:
- mkdir -p ${dir_name} will not prompt mkdir: ${dir_name}: File exists if already exists
bc:
- return 0 or 1: e.g. echo "5 > 4 | bc" returns 1
sed text replace
- sed -i -e 's/few/asd/g' hello.txt to replace few with asd in all lines in file hello.txt
file <file> check file type else empty
awk -F "\t" '{if ($4=="something"){print $0}}' out.anonymous.all.txt | wc -l: (count lines that have something at the 4th column) use awk to catch lines that meet the condition
combine cut, awk and sed to slice/extract text, often used in text preprocessing (btw Shell is quite ideal for dealing with text)
- e.g.
```
  cut -f ${column_num} ${oldfile} | \
  sed '/^#/ d' > ${newfile}
```
  cut then extracts the information at the ${column_num}th column; at the end sed deletes all the lines beginning (^) with a #, and saves the resultant text to ${newfile}.
using awk to deal with blocks of data: awk 'BEGIN{b=0} { if ($1!="========") {printf ("%d\t%s\n", b, $1);} else {b+=1}}' slotsbycluster | sort | uniq -c | sort -k2,2n -k1,1nr
cut using comma as the delimiter
- e.g.
  cut -d<delimiter> -f<col> <file>
keep unique items only
- e.g.
  sort | uniq
replace with tab
- e.g.
  tr ',' '\t' < <in.file> > <out.file>
remove trailing tabs: sed 's/[[:blank:]]*$// <file>
for search, can use awk, sed or more generally grep
- e.g.
  awk '/<something>/' ${file}
  sed -n '/<something>/p' ${file}
  grep 'something' ${file}
use sed to delete a specific line from a file
- sed '1d' ${file} to remove the first line from the file
use awk to extract structured information
- e.g.
```
  BEGIN {
      FS="\n"
      RS=""
      OFS=", "
      ORS="\n\n"
  }
  { print $1 $2 $3 }
```
  This will save the above as awk.awk and awk -f awk.awk ${file} to print out concatenated the first three items on each line in a block defined by FS="\n"(each field appears on its own line) and RS=""(each record is separated by a blank line), demilited by OFS=", ", with each line ending with ORS="\n\n".
combine grep and sort to search for repeated lines with line number
- e.g.
```
  grep -nFx "$(sort ${file} | uniq -d)" ${file}
```
  The inner $(sort ${file} | uniq -d) lists each line that occurs more than once. The outer grep -nFx looks again ${file} for exact -x matches to any of these lines -F and prepends their line number -n.
list non-directory files
- ls -p | grep -v / (not a backslash!)
- ls -pL | grep -v / to dereference symbolic links
About shuf (gshuf)
- to shuffle text, shuf text_to_shuffle.txt
- randomly pick up N lines from a file, gshuf -n N input > output
About screen
- screen -ls
- screen -x <session ID>
permutation
- echo {a,b}{1,2} will return a1 a2 b1 b2
back to the previous path
- cd -
check gpu: watch -n 1 nvidia-smi
check gpu by PID and kill the process: ps -ef | grep <PID>; kill -9 <PID>

======================================================================================

Java

mvn clean package install
to skip UTs, mvn clean install -DskipTests
ef bb bf: BOM(byte-order mark)
use interface names as input arguments but the exact class names as return type: public static ArrayList<String> func(List<Integer>) {} can avoid being hard-coded
final: variabes declared as final can only be assigned once
Data types:
- List<String> l = Arrays.asList("a", "b"): create a fixed-length array whose elements cannot be added/removed but could be modified like this: l.set(0, "c"); however, if created like this: List<String> l = new Array<String>(Arrays.asList("a", "b")), the arraylist could be added/removed of elements
method signature: method namd and method parameters; regardless of the return type, methods with different method parameters under the same method name are considered overloading methods.
getters and setters: make the getters and setters public while keeping the member variables private.
compiler ONLY knows reference type, it can only look in reference type class for method; while runtime follows exact runtime type of object to find method, so must match compile time method signature to appropriate method in actual object’s class; so, do this runtime of if x is instanceOf y

======================================================================================

C/C++

Do not simply copy complied files to somewhere else. Instead, should make clean and compile it again

======================================================================================

Testing

smoke testing: as long as it doesn’t burn out for the first run
acceptance testing: tested for acceptability

======================================================================================

Terminal Hacking

base64: to encode string, base64 <<< string
Ctrl+C: to clear the jobs
fg: to check the left jobs
htop: a better looking top
hexdump: a hexadecimal view of computer data; usually as part of debugging
md5: to generate the md5 sequence for specified file; NB: doesn’t work on zip files?

======================================================================================

Git

add submodule: git submodule add <url> <dir>
rebase: checkout to the commit(ID) you want to rebase git rebase master
git checkout -b
git checkout -B
git checkout -D <branch_to_delete>
with arc: git submodule update --remote --recursive --init
resolve conflicts when git merge: [git checkout –ours/their ](https://nitaym.github.io/ourstheirs/)

======================================================================================

Front-end

event handler: onmouseover, onclick, onchange, etc.
slider: <input type="range" min="10" max="100" value="10" id="sldr" oninput="dosquare()"> function dosquare() { var d1 = document.getElementById("d1"); var sizeinput = document.getElementById("sldr"); var size = sizeinput.value; var ctx = d1.getContext("2d"); ctx.clearRect(0,0, d1.width, d1.height); ctx.fillStyle="yellow"; ctx.fillRect(10,10,size,size); }
change color onclick/onchange: <input type="color" value="#CC1A57" id="clr" onchange="docolor()"> function docolor() { var d1 = document.getElementById("d1"); var colorinput = document.getElementById("clr"); var color = colorinput.value; d1.style.backgroundColor = color; }
load image: <input type="file" multiple="false" accept="image/*" id="finput" onchange="uploadImage()">

======================================================================================

Docker

list docker images: sudo docker images
enter a docker image: sudo docker run -i -t <image ID>
display all docker container IDs: docker ps -a; active ones only: docker ps
copy files from a server to a docker container: docker cp <file-or-dir> <containerID>:<path> (while writing path, instead of ~/, use /root/
enter a container: docker exec -it <mycontainer> /bin/bash
restart a dead/inactive container: docker start <mycontainer>

SSH

ssh-add: add ssh keys to keychain

======================================================================================

Json

format json string: bejson.com

Markdown

useful pages

cheatsheet

tips

======================================================================================

UML

======================================================================================

XML

parse an XML file

  from xml.etree import ElementTree as ET
  xml = '/Users/shijieyao/Library/Containers/com.taobao.Aliwangwang/Data/Library/Application Support/AliWangwang/80profiles/DefaultEmotions/EmotionConfig.xml'
  tree = ET.parse(xml)  
  root = tree.getroot()
	
  for elem in root:
      print(elem[0].text)

======================================================================================

Linux

shortcut
- switch between workspaces: Ctrl+Alt+up/down or Super+Page Up/Page Down
- re-size the window
  - Super+up (full size)
  - Super+down (smaller)
  - Super+left (left half)
  - Super+right (right half)
- switch between windows: Super/Alt/Ctrl+tab
- switch between input sources: Super+space
- show all windows in a workspace: Super
?how to add input source such as Chinese/Japanese?

======================================================================================

Excel

the first line(s) can be frozened

======================================================================================

Google Sheet

shortcut
- insert row: Alt+i+r

======================================================================================

Jupyter Notebook

measure the cell execution time: %%time
jupyter notebook --ip 0.0.0.0

======================================================================================

Atom

preview .md: Ctrl+Shift+m

======================================================================================

Sublime Text

======================================================================================

Good to know (better late than never)

Active learning: the learning algorithm can figure out what kind of data they need most and query the users! whoa kewl!
set locale: export LC_CTYPE=zh_CN.UTF-8 if Chinese does not show up; for permanent change, write to ~/.bashrc
always add a newline \n to the end of a file
steganography!
ORM: object-relational mapping
locale: w/ hexdump

======================================================================================

Mamechishiki 豆知識

Mebibyte (MiB): 1 MiB = 2²⁰ bytes = 1024 kibibytes = 1,048,576 bytes
1 MB = 1,000,000 (10⁶) bytes