Regex

  • Chinese \u4E00-\u9FA5

======================================================================================

Python

useful pages

useful packages

tips

  • set instance pointer
    • a = b = set() a and b point to the same object! so they change with each other
  • random.shuffle(list()) in-place, so shouldn’t print the NoneType object, instead, should print the original list

  • removing items while looping lists: the list is being modified, pay attention to the realtime count of elements in the list
  • changing the starting index of enumerate: enumerate(something, 1) will start indexing from 1

  • tuple/list comparison: tuples and lists are compared lexicographically.
    • e.g. (1, 2) < (2, 1) == True # as the first tuple comes before the second one
    • e.g. [2, 3] < [1, 8] == False
  • string split
    • str.split(sep, maxsplit)and str.rsplit(seq, maxsplit)
    • e.g. 'a,b,c'.split(',', 1) -> ['a', 'b,c'] vs 'a,b,c'.rsplit(',', 1) -> ['a,b', 'c'], r means reverse, so splitting from the end of the string
  • paths
    • diff between sys.path.append(os.path.abspath('path') and os.chdir('path')
    • when os.chdiring inside a func? what would happen to the current directory if that func being called?
  • any()
    • True if at least one element of an iterable is True
    • False if all elements are false or if an iterable is empty
  • sys.argv
    • if multiple consecutive arguments all of the same type, can use sys.argv[n:]
  • check if file or dir exists
  • print to stdout or file if file specified
    • print(out, file=fout)
  • sort() does sorting in place; while .sorted() returns a new array as sorted

  • os.system() to run subshell

  • don’t overuse write()

  • du -h -d1 & du -sh: check storage

  • use set() more to save time!

  • argparse.ArgumentParser.add_subparsers(); subparsers.add_parser(FUNCTIONALITY_NAME)

  • re.match(pat, str) always from the beginning of the string; re.search(pat, str) not necessarily

  • import from module, the ugly way is to append the project path to sys.path: import sys; path_to_add = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))); sys.path.append(path_to_add); the better way is to import by using absolute path and run the module as module python -m abso_path_to_the_module

  • mock command line arguments for testing

======================================================================================

Conda

======================================================================================

Shell (Bash)

Stupid frequent errors

  • when running a shell script which includes routes/paths direction, must check them carefully! (especially when there is route/path redirection!
  • if stuck, check line by line! instead of running the whole script

useful pages

tips

  • frequent errors (that I usually make!):
    • extra | before <, especially at the last line before <
    • unlike Python, commands like a=b=$1 is not allowed, instead, split them as a=$1; b=$1
    • confusing use of $ bash ${file} and $ python ${file}, double check what type of the file is that I’m running
  • rename file/dir etc
    • rename a file: mv ${oldname} ${newname}
  • mkdir:
    • mkdir -p ${dir_name} will not prompt mkdir: ${dir_name}: File exists if already exists
  • bc:
    • return 0 or 1: e.g. echo "5 > 4 | bc" returns 1
  • sed text replace
    • sed -i -e 's/few/asd/g' hello.txt to replace few with asd in all lines in file hello.txt
  • file <file> check file type else empty

  • awk -F "\t" '{if ($4=="something"){print $0}}' out.anonymous.all.txt | wc -l: (count lines that have something at the 4th column) use awk to catch lines that meet the condition

  • combine cut, awk and sed to slice/extract text, often used in text preprocessing (btw Shell is quite ideal for dealing with text)
    • e.g.

        cut -f ${column_num} ${oldfile} | \
        sed '/^#/ d' > ${newfile}
      

      cut then extracts the information at the ${column_num}th column; at the end sed deletes all the lines beginning (^) with a #, and saves the resultant text to ${newfile}.

  • using awk to deal with blocks of data: awk 'BEGIN{b=0} { if ($1!="========") {printf ("%d\t%s\n", b, $1);} else {b+=1}}' slotsbycluster | sort | uniq -c | sort -k2,2n -k1,1nr

  • cut using comma as the delimiter
    • e.g.

      cut -d<delimiter> -f<col> <file>

  • keep unique items only
    • e.g.

      sort | uniq

  • replace with tab
    • e.g.

      tr ',' '\t' < <in.file> > <out.file>

  • remove trailing tabs: sed 's/[[:blank:]]*$// <file>

  • for search, can use awk, sed or more generally grep
    • e.g.

      awk '/<something>/' ${file}

      sed -n '/<something>/p' ${file}

      grep 'something' ${file}

  • use sed to delete a specific line from a file
    • sed '1d' ${file} to remove the first line from the file
  • use awk to extract structured information
    • e.g.

        BEGIN {
            FS="\n"
            RS=""
            OFS=", "
            ORS="\n\n"
        }
        { print $1 $2 $3 }
      

      This will save the above as awk.awk and awk -f awk.awk ${file} to print out concatenated the first three items on each line in a block defined by FS="\n"(each field appears on its own line) and RS=""(each record is separated by a blank line), demilited by OFS=", ", with each line ending with ORS="\n\n".

  • combine grep and sort to search for repeated lines with line number
    • e.g.

        grep -nFx "$(sort ${file} | uniq -d)" ${file}
      

      The inner $(sort ${file} | uniq -d) lists each line that occurs more than once. The outer grep -nFx looks again ${file} for exact -x matches to any of these lines -F and prepends their line number -n.

  • list non-directory files
  • About shuf (gshuf)
  • About screen
    • screen -ls
    • screen -x <session ID>
  • permutation
    • echo {a,b}{1,2} will return a1 a2 b1 b2
  • back to the previous path
    • cd -
  • check gpu: watch -n 1 nvidia-smi

  • check gpu by PID and kill the process: ps -ef | grep <PID>; kill -9 <PID>

======================================================================================

Java

  • mvn clean package install

  • to skip UTs, mvn clean install -DskipTests

  • ef bb bf: BOM(byte-order mark)

  • use interface names as input arguments but the exact class names as return type: public static ArrayList<String> func(List<Integer>) {} can avoid being hard-coded

  • final: variabes declared as final can only be assigned once

  • Data types:
    • List<String> l = Arrays.asList("a", "b"): create a fixed-length array whose elements cannot be added/removed but could be modified like this: l.set(0, "c"); however, if created like this: List<String> l = new Array<String>(Arrays.asList("a", "b")), the arraylist could be added/removed of elements
  • method signature: method namd and method parameters; regardless of the return type, methods with different method parameters under the same method name are considered overloading methods.

  • getters and setters: make the getters and setters public while keeping the member variables private.

  • compiler ONLY knows reference type, it can only look in reference type class for method; while runtime follows exact runtime type of object to find method, so must match compile time method signature to appropriate method in actual object’s class; so, do this runtime of if x is instanceOf y

======================================================================================

C/C++

  • Do not simply copy complied files to somewhere else. Instead, should make clean and compile it again

======================================================================================

Testing

  • smoke testing: as long as it doesn’t burn out for the first run
  • acceptance testing: tested for acceptability

======================================================================================

Terminal Hacking

  • base64: to encode string, base64 <<< string

  • Ctrl+C: to clear the jobs

  • fg: to check the left jobs

  • htop: a better looking top

  • hexdump: a hexadecimal view of computer data; usually as part of debugging

  • md5: to generate the md5 sequence for specified file; NB: doesn’t work on zip files?

======================================================================================

Git

  • add submodule: git submodule add <url> <dir>
  • rebase: checkout to the commit(ID) you want to rebase git rebase master
  • git checkout -b
  • git checkout -B
  • git checkout -D <branch_to_delete>
  • with arc: git submodule update --remote --recursive --init
  • resolve conflicts when git merge: [git checkout –ours/their ](https://nitaym.github.io/ourstheirs/)

======================================================================================

Front-end

  • event handler: onmouseover, onclick, onchange, etc.
  • slider: <input type="range" min="10" max="100" value="10" id="sldr" oninput="dosquare()"> function dosquare() { var d1 = document.getElementById("d1"); var sizeinput = document.getElementById("sldr"); var size = sizeinput.value; var ctx = d1.getContext("2d"); ctx.clearRect(0,0, d1.width, d1.height); ctx.fillStyle="yellow"; ctx.fillRect(10,10,size,size); }
  • change color onclick/onchange: <input type="color" value="#CC1A57" id="clr" onchange="docolor()"> function docolor() { var d1 = document.getElementById("d1"); var colorinput = document.getElementById("clr"); var color = colorinput.value; d1.style.backgroundColor = color; }
  • load image: <input type="file" multiple="false" accept="image/*" id="finput" onchange="uploadImage()">

======================================================================================

Docker

  • list docker images: sudo docker images
  • enter a docker image: sudo docker run -i -t <image ID>
  • display all docker container IDs: docker ps -a; active ones only: docker ps
  • copy files from a server to a docker container: docker cp <file-or-dir> <containerID>:<path> (while writing path, instead of ~/, use /root/
  • enter a container: docker exec -it <mycontainer> /bin/bash
  • restart a dead/inactive container: docker start <mycontainer>

SSH

  • ssh-add: add ssh keys to keychain

======================================================================================

Json

  • format json string: bejson.com

Markdown

useful pages

tips

======================================================================================

UML

======================================================================================

XML

  • parse an XML file

      from xml.etree import ElementTree as ET
      xml = '/Users/shijieyao/Library/Containers/com.taobao.Aliwangwang/Data/Library/Application Support/AliWangwang/80profiles/DefaultEmotions/EmotionConfig.xml'
      tree = ET.parse(xml)  
      root = tree.getroot()
    	
      for elem in root:
          print(elem[0].text)
    

======================================================================================

Linux

  • shortcut
    • switch between workspaces: Ctrl+Alt+up/down or Super+Page Up/Page Down
    • re-size the window
      • Super+up (full size)
      • Super+down (smaller)
      • Super+left (left half)
      • Super+right (right half)
    • switch between windows: Super/Alt/Ctrl+tab
    • switch between input sources: Super+space
    • show all windows in a workspace: Super
  • ?how to add input source such as Chinese/Japanese?

======================================================================================

Excel

  • the first line(s) can be frozened

======================================================================================

Google Sheet

======================================================================================

Jupyter Notebook

  • measure the cell execution time: %%time
  • jupyter notebook --ip 0.0.0.0

======================================================================================

Atom

  • preview .md: Ctrl+Shift+m

======================================================================================

Sublime Text

======================================================================================

Good to know (better late than never)

  • Active learning: the learning algorithm can figure out what kind of data they need most and query the users! whoa kewl!
  • set locale: export LC_CTYPE=zh_CN.UTF-8 if Chinese does not show up; for permanent change, write to ~/.bashrc
  • always add a newline \n to the end of a file
  • steganography!
  • ORM: object-relational mapping
  • locale: w/ hexdump

======================================================================================

Mamechishiki 豆知識

  • Mebibyte (MiB): 1 MiB = 220 bytes = 1024 kibibytes = 1,048,576 bytes

    1 MB = 1,000,000 (106) bytes