Regex
- Chinese \u4E00-\u9FA5
======================================================================================
Python
useful pages
useful packages
- tqdmfor progress bar
- pyahocorasick for efficient string matching
tips
- set instance pointer- a = b = set()a and b point to the same object! so they change with each other
 
- random.shuffle(list())in-place, so shouldn’t print the NoneType object, instead, should print the original list
- removing items while looping lists: the list is being modified, pay attention to the realtime count of elements in the list- better to use list comprehension: l = [n for n in l if n != 1]to remove elements unequal to 1
 
- better to use list comprehension: 
- changing the starting index of enumerate: - enumerate(something, 1)will start indexing from 1
- tuple/list comparison: tuples and lists are compared lexicographically.- e.g. (1, 2) < (2, 1) == True# as the first tuple comes before the second one
- e.g. [2, 3] < [1, 8] == False
 
- e.g. 
- string split- str.split(sep, maxsplit)and- str.rsplit(seq, maxsplit)
- e.g. 'a,b,c'.split(',', 1) -> ['a', 'b,c']vs'a,b,c'.rsplit(',', 1) -> ['a,b', 'c'],rmeans reverse, so splitting from the end of the string
 
- paths- diff between sys.path.append(os.path.abspath('path')andos.chdir('path')
- when os.chdiring inside a func? what would happen to the current directory if that func being called?
 
- diff between 
- any()- Trueif at least one element of an iterable is True
- Falseif all elements are false or if an iterable is empty
 
- sys.argv- if multiple consecutive arguments all of the same type, can use sys.argv[n:]
 
- if multiple consecutive arguments all of the same type, can use 
- check if file or dir exists
- print to stdout or file if file specified- print(out, file=fout)
 
- sort()does sorting in place; while- .sorted()returns a new array as sorted
- os.system()to run subshell
- don’t overuse - write()
- du -h -d1&- du -sh: check storage
- use - set()more to save time!
- argparse.ArgumentParser.add_subparsers(); subparsers.add_parser(FUNCTIONALITY_NAME) 
- re.match(pat, str) always from the beginning of the string; re.search(pat, str) not necessarily 
- import from module, the ugly way is to append the project path to - sys.path:- import sys; path_to_add = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))); sys.path.append(path_to_add); the better way is to import by using absolute path and run the module as module- python -m abso_path_to_the_module
- mock command line arguments for testing
======================================================================================
Conda
======================================================================================
Shell (Bash)
Stupid frequent errors
- when running a shell script which includes routes/paths direction, must check them carefully! (especially when there is route/path redirection!
- if stuck, check line by line! instead of running the whole script
useful pages
tips
- frequent errors (that I usually make!):- extra |before<, especially at the last line before<
- unlike Python, commands like a=b=$1is not allowed, instead, split them asa=$1; b=$1
- confusing use of $ bash ${file}and$ python ${file}, double check what type of the file is that I’m running
 
- extra 
- rename file/dir etc- rename a file: mv ${oldname} ${newname}
 
- rename a file: 
- mkdir:- mkdir -p ${dir_name}will not prompt- mkdir: ${dir_name}: File existsif already exists
 
- bc:- return 0or1: e.g.echo "5 > 4 | bc"returns1
 
- return 
- sedtext replace- sed -i -e 's/few/asd/g' hello.txtto replace- fewwith- asdin all lines in file- hello.txt
 
- file <file>check file type else empty
- awk -F "\t" '{if ($4=="something"){print $0}}' out.anonymous.all.txt | wc -l: (count lines that have something at the 4th column) use- awkto catch lines that meet the condition
- combine cut,awkandsedto slice/extract text, often used in text preprocessing (btw Shell is quite ideal for dealing with text)- e.g. - cut -f ${column_num} ${oldfile} | \ sed '/^#/ d' > ${newfile}- cutthen extracts the information at the- ${column_num}th column; at the end- sed- deletes all the lines beginning (- ^) with a- #, and saves the resultant text to- ${newfile}.
 
- using - awkto deal with blocks of data:- awk 'BEGIN{b=0} { if ($1!="========") {printf ("%d\t%s\n", b, $1);} else {b+=1}}' slotsbycluster | sort | uniq -c | sort -k2,2n -k1,1nr
- cut using comma as the delimiter- e.g. - cut -d<delimiter> -f<col> <file>
 
- keep unique items only- e.g. - sort | uniq
 
- replace with tab- e.g. - tr ',' '\t' < <in.file> > <out.file>
 
- remove trailing tabs: - sed 's/[[:blank:]]*$// <file>
- for search, can use awk,sedor more generallygrep- e.g. - awk '/<something>/' ${file}- sed -n '/<something>/p' ${file}- grep 'something' ${file}
 
- use sedto delete a specific line from a file- sed '1d' ${file}to remove the first line from the file
 
- use awkto extract structured information- BEGIN { FS="\n" RS="" OFS=", " ORS="\n\n" } { print $1 $2 $3 }- This will save the above as awk.awk and - awk -f awk.awk ${file}to print out concatenated the first three items on each line in a block defined by- FS="\n"(each field appears on its own line) and- RS=""(each record is separated by a blank line), demilited by- OFS=", ", with each line ending with- ORS="\n\n".
 
- combine grepandsortto search for repeated lines with line number- grep -nFx "$(sort ${file} | uniq -d)" ${file}- The inner - $(sort ${file} | uniq -d)lists each line that occurs more than once. The outer- grep -nFxlooks again- ${file}for exact- -xmatches to any of these lines- -Fand prepends their line number- -n.
 
- list non-directory files- ls -p | grep -v /(not a backslash!)
- ls -pL | grep -v /to dereference symbolic links
 
- About shuf (gshuf)- to shuffle text, shuf text_to_shuffle.txt
- randomly pick up N lines from a file, gshuf -n N input > output
 
- to shuffle text, 
- About screen- screen -ls
- screen -x <session ID>
 
- permutation- echo {a,b}{1,2}will return- a1 a2 b1 b2
 
- back to the previous path- cd -
 
- check gpu: - watch -n 1 nvidia-smi
- check gpu by PID and kill the process: ps -ef | grep <PID>;kill -9 <PID>
======================================================================================
Java
- mvn clean package install
- to skip UTs, - mvn clean install -DskipTests
- ef bb bf: BOM(byte-order mark)
- use interface names as input arguments but the exact class names as return type: - public static ArrayList<String> func(List<Integer>) {}can avoid being hard-coded
- final: variabes declared as final can only be assigned once
- Data types:- List<String> l = Arrays.asList("a", "b"): create a fixed-length array whose elements cannot be added/removed but could be modified like this:- l.set(0, "c"); however, if created like this:- List<String> l = new Array<String>(Arrays.asList("a", "b")), the arraylist could be added/removed of elements
 
- method signature: method namd and method parameters; regardless of the return type, methods with different method parameters under the same method name are considered overloading methods. 
- getters and setters: make the getters and setters public while keeping the member variables private. 
- compiler ONLY knows reference type, it can only look in reference type class for method; while runtime follows exact runtime type of object to find method, so must match compile time method signature to appropriate method in actual object’s class; so, do this runtime of if x is instanceOf y
======================================================================================
C/C++
- Do not simply copy complied files to somewhere else. Instead, should make cleanand compile it again
======================================================================================
Testing
- smoke testing: as long as it doesn’t burn out for the first run
- acceptance testing: tested for acceptability
======================================================================================
Terminal Hacking
- base64: to encode string,- base64 <<< string
- Ctrl+C: to clear the jobs
- fg: to check the left jobs
- htop: a better looking- top
- hexdump: a hexadecimal view of computer data; usually as part of debugging
- md5: to generate the md5 sequence for specified file; NB: doesn’t work on zip files?
======================================================================================
Git
- add submodule: git submodule add <url> <dir>
- rebase: checkout to the commit(ID) you want to rebase git rebase master
- git checkout -b
- git checkout -B
- git checkout -D <branch_to_delete>
- with arc: git submodule update --remote --recursive --init
- resolve conflicts when git merge: [git checkout –ours/their](https://nitaym.github.io/ourstheirs/) 
======================================================================================
Front-end
- event handler: onmouseover, onclick, onchange, etc.
- slider: <input type="range" min="10" max="100" value="10" id="sldr" oninput="dosquare()">function dosquare() { var d1 = document.getElementById("d1"); var sizeinput = document.getElementById("sldr"); var size = sizeinput.value; var ctx = d1.getContext("2d"); ctx.clearRect(0,0, d1.width, d1.height); ctx.fillStyle="yellow"; ctx.fillRect(10,10,size,size); }
- change color onclick/onchange: <input type="color" value="#CC1A57" id="clr" onchange="docolor()">function docolor() { var d1 = document.getElementById("d1"); var colorinput = document.getElementById("clr"); var color = colorinput.value; d1.style.backgroundColor = color; }
- load image: <input type="file" multiple="false" accept="image/*" id="finput" onchange="uploadImage()">
======================================================================================
Docker
- list docker images: sudo docker images
- enter a docker image: sudo docker run -i -t <image ID>
- display all docker container IDs: docker ps -a; active ones only:docker ps
- copy files from a server to a docker container: docker cp <file-or-dir> <containerID>:<path>(while writing path, instead of~/, use/root/
- enter a container: docker exec -it <mycontainer> /bin/bash
- restart a dead/inactive container: docker start <mycontainer>
SSH
- ssh-add: add ssh keys to keychain
======================================================================================
Json
- format json string: bejson.com
Markdown
useful pages
tips
======================================================================================
UML
======================================================================================
XML
- parse an XML file - from xml.etree import ElementTree as ET xml = '/Users/shijieyao/Library/Containers/com.taobao.Aliwangwang/Data/Library/Application Support/AliWangwang/80profiles/DefaultEmotions/EmotionConfig.xml' tree = ET.parse(xml) root = tree.getroot() for elem in root: print(elem[0].text)
======================================================================================
Linux
- shortcut- switch between workspaces: Ctrl+Alt+up/downorSuper+Page Up/Page Down
- re-size the window- Super+up(full size)
- Super+down(smaller)
- Super+left(left half)
- Super+right(right half)
 
- switch between windows: Super/Alt/Ctrl+tab
- switch between input sources: Super+space
- show all windows in a workspace: Super
 
- switch between workspaces: 
- ?how to add input source such as Chinese/Japanese?
======================================================================================
Excel
- the first line(s) can be frozened
======================================================================================
Google Sheet
- shortcut- insert row: Alt+i+r
 
- insert row: 
======================================================================================
Jupyter Notebook
- measure the cell execution time: %%time
- jupyter notebook --ip 0.0.0.0
======================================================================================
Atom
- preview .md:Ctrl+Shift+m
======================================================================================
Sublime Text
======================================================================================
Good to know (better late than never)
- Active learning: the learning algorithm can figure out what kind of data they need most and query the users! whoa kewl!
- set locale: export LC_CTYPE=zh_CN.UTF-8if Chinese does not show up; for permanent change, write to ~/.bashrc
- always add a newline \nto the end of a file
- steganography!
- ORM: object-relational mapping
- locale: w/ hexdump
======================================================================================
Mamechishiki 豆知識
- Mebibyte (MiB): 1 MiB = 220 bytes = 1024 kibibytes = 1,048,576 bytes - 1 MB = 1,000,000 (106) bytes 
