Regex
- Chinese
\u4E00-\u9FA5
======================================================================================
Python
useful pages
useful packages
tqdm
for progress bar- pyahocorasick for efficient string matching
tips
- set instance pointer
a = b = set()
a and b point to the same object! so they change with each other
random.shuffle(list())
in-place, so shouldn’t print the NoneType object, instead, should print the original list- removing items while looping lists: the list is being modified, pay attention to the realtime count of elements in the list
- better to use list comprehension:
l = [n for n in l if n != 1]
to remove elements unequal to 1
- better to use list comprehension:
changing the starting index of enumerate:
enumerate(something, 1)
will start indexing from 1- tuple/list comparison: tuples and lists are compared lexicographically.
- e.g.
(1, 2) < (2, 1) == True
# as the first tuple comes before the second one - e.g.
[2, 3] < [1, 8] == False
- e.g.
- string split
str.split(sep, maxsplit)
andstr.rsplit(seq, maxsplit)
- e.g.
'a,b,c'.split(',', 1) -> ['a', 'b,c']
vs'a,b,c'.rsplit(',', 1) -> ['a,b', 'c']
,r
means reverse, so splitting from the end of the string
- paths
- diff between
sys.path.append(os.path.abspath('path')
andos.chdir('path')
- when
os.chdir
ing inside a func? what would happen to the current directory if that func being called?
- diff between
any()
True
if at least one element of an iterable is TrueFalse
if all elements are false or if an iterable is empty
sys.argv
- if multiple consecutive arguments all of the same type, can use
sys.argv[n:]
- if multiple consecutive arguments all of the same type, can use
- check if file or dir exists
- print to stdout or file if file specified
print(out, file=fout)
sort()
does sorting in place; while.sorted()
returns a new array as sortedos.system()
to run subshelldon’t overuse
write()
du -h -d1
&du -sh
: check storageuse
set()
more to save time!argparse.ArgumentParser.add_subparsers(); subparsers.add_parser(FUNCTIONALITY_NAME)
re.match(pat, str) always from the beginning of the string; re.search(pat, str) not necessarily
import from module, the ugly way is to append the project path to
sys.path
:import sys; path_to_add = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))); sys.path.append(path_to_add)
; the better way is to import by using absolute path and run the module as modulepython -m abso_path_to_the_module
- mock command line arguments for testing
======================================================================================
Conda
======================================================================================
Shell (Bash)
Stupid frequent errors
- when running a shell script which includes routes/paths direction, must check them carefully! (especially when there is route/path redirection!
- if stuck, check line by line! instead of running the whole script
useful pages
tips
- frequent errors (that I usually make!):
- extra
|
before<
, especially at the last line before<
- unlike Python, commands like
a=b=$1
is not allowed, instead, split them asa=$1; b=$1
- confusing use of
$ bash ${file}
and$ python ${file}
, double check what type of the file is that I’m running
- extra
- rename file/dir etc
- rename a file:
mv ${oldname} ${newname}
- rename a file:
mkdir
:mkdir -p ${dir_name}
will not promptmkdir: ${dir_name}: File exists
if already exists
bc
:- return
0
or1
: e.g.echo "5 > 4 | bc"
returns1
- return
sed
text replacesed -i -e 's/few/asd/g' hello.txt
to replacefew
withasd
in all lines in filehello.txt
file <file>
check file type else emptyawk -F "\t" '{if ($4=="something"){print $0}}' out.anonymous.all.txt | wc -l
: (count lines that have something at the 4th column) useawk
to catch lines that meet the condition- combine
cut
,awk
andsed
to slice/extract text, often used in text preprocessing (btw Shell is quite ideal for dealing with text)e.g.
cut -f ${column_num} ${oldfile} | \ sed '/^#/ d' > ${newfile}
cut
then extracts the information at the${column_num}
th column; at the endsed
d
eletes all the lines beginning (^
) with a#
, and saves the resultant text to${newfile}
.
using
awk
to deal with blocks of data:awk 'BEGIN{b=0} { if ($1!="========") {printf ("%d\t%s\n", b, $1);} else {b+=1}}' slotsbycluster | sort | uniq -c | sort -k2,2n -k1,1nr
- cut using comma as the delimiter
e.g.
cut -d<delimiter> -f<col> <file>
- keep unique items only
e.g.
sort | uniq
- replace with tab
e.g.
tr ',' '\t' < <in.file> > <out.file>
remove trailing tabs:
sed 's/[[:blank:]]*$// <file>
- for search, can use
awk
,sed
or more generallygrep
e.g.
awk '/<something>/' ${file}
sed -n '/<something>/p' ${file}
grep 'something' ${file}
- use
sed
to delete a specific line from a filesed '1d' ${file}
to remove the first line from the file
- use
awk
to extract structured informationBEGIN { FS="\n" RS="" OFS=", " ORS="\n\n" } { print $1 $2 $3 }
This will save the above as awk.awk and
awk -f awk.awk ${file}
to print out concatenated the first three items on each line in a block defined byFS="\n"
(each field appears on its own line) andRS=""
(each record is separated by a blank line), demilited byOFS=", "
, with each line ending withORS="\n\n"
.
- combine
grep
andsort
to search for repeated lines with line numbergrep -nFx "$(sort ${file} | uniq -d)" ${file}
The inner
$(sort ${file} | uniq -d)
lists each line that occurs more than once. The outergrep -nFx
looks again${file}
for exact-x
matches to any of these lines-F
and prepends their line number-n
.
- list non-directory files
ls -p | grep -v /
(not a backslash!)ls -pL | grep -v /
to dereference symbolic links
- About
shuf (gshuf)
- to shuffle text,
shuf text_to_shuffle.txt
- randomly pick up N lines from a file,
gshuf -n N input > output
- to shuffle text,
- About
screen
screen -ls
screen -x <session ID>
- permutation
echo {a,b}{1,2}
will returna1 a2 b1 b2
- back to the previous path
cd -
check gpu:
watch -n 1 nvidia-smi
- check gpu by PID and kill the process:
ps -ef | grep <PID>
;kill -9 <PID>
======================================================================================
Java
mvn clean package install
to skip UTs,
mvn clean install -DskipTests
ef bb bf
: BOM(byte-order mark)use interface names as input arguments but the exact class names as return type:
public static ArrayList<String> func(List<Integer>) {}
can avoid being hard-codedfinal
: variabes declared as final can only be assigned once- Data types:
List<String> l = Arrays.asList("a", "b")
: create a fixed-length array whose elements cannot be added/removed but could be modified like this:l.set(0, "c")
; however, if created like this:List<String> l = new Array<String>(Arrays.asList("a", "b"))
, the arraylist could be added/removed of elements
method signature: method namd and method parameters; regardless of the return type, methods with different method parameters under the same method name are considered overloading methods.
getters and setters: make the getters and setters public while keeping the member variables private.
- compiler ONLY knows reference type, it can only look in reference type class for method; while runtime follows exact runtime type of object to find method, so must match compile time method signature to appropriate method in actual object’s class; so, do this runtime of if x is instanceOf y
======================================================================================
C/C++
- Do not simply copy complied files to somewhere else. Instead, should
make clean
and compile it again
======================================================================================
Testing
- smoke testing: as long as it doesn’t burn out for the first run
- acceptance testing: tested for acceptability
======================================================================================
Terminal Hacking
base64
: to encode string,base64 <<< string
Ctrl+C
: to clear the jobsfg
: to check the left jobshtop
: a better lookingtop
hexdump
: a hexadecimal view of computer data; usually as part of debuggingmd5
: to generate the md5 sequence for specified file; NB: doesn’t work on zip files?
======================================================================================
Git
- add submodule:
git submodule add <url> <dir>
- rebase: checkout to the commit(ID) you want to rebase
git rebase master
git checkout -b
git checkout -B
git checkout -D <branch_to_delete>
- with arc:
git submodule update --remote --recursive --init
- resolve conflicts when
git merge
: [git checkout –ours/their](https://nitaym.github.io/ourstheirs/)
======================================================================================
Front-end
- event handler: onmouseover, onclick, onchange, etc.
- slider:
<input type="range" min="10" max="100" value="10" id="sldr" oninput="dosquare()">
function dosquare() { var d1 = document.getElementById("d1"); var sizeinput = document.getElementById("sldr"); var size = sizeinput.value; var ctx = d1.getContext("2d"); ctx.clearRect(0,0, d1.width, d1.height); ctx.fillStyle="yellow"; ctx.fillRect(10,10,size,size); }
- change color onclick/onchange:
<input type="color" value="#CC1A57" id="clr" onchange="docolor()">
function docolor() { var d1 = document.getElementById("d1"); var colorinput = document.getElementById("clr"); var color = colorinput.value; d1.style.backgroundColor = color; }
- load image:
<input type="file" multiple="false" accept="image/*" id="finput" onchange="uploadImage()">
======================================================================================
Docker
- list docker images:
sudo docker images
- enter a docker image:
sudo docker run -i -t <image ID>
- display all docker container IDs:
docker ps -a
; active ones only:docker ps
- copy files from a server to a docker container:
docker cp <file-or-dir> <containerID>:<path>
(while writing path, instead of~/
, use/root/
- enter a container:
docker exec -it <mycontainer> /bin/bash
- restart a dead/inactive container:
docker start <mycontainer>
SSH
ssh-add
: add ssh keys to keychain
======================================================================================
Json
- format json string:
bejson.com
Markdown
useful pages
tips
======================================================================================
UML
======================================================================================
XML
parse an XML file
from xml.etree import ElementTree as ET xml = '/Users/shijieyao/Library/Containers/com.taobao.Aliwangwang/Data/Library/Application Support/AliWangwang/80profiles/DefaultEmotions/EmotionConfig.xml' tree = ET.parse(xml) root = tree.getroot() for elem in root: print(elem[0].text)
======================================================================================
Linux
- shortcut
- switch between workspaces:
Ctrl+Alt+up/down
orSuper+Page Up/Page Down
- re-size the window
Super+up
(full size)Super+down
(smaller)Super+left
(left half)Super+right
(right half)
- switch between windows:
Super/Alt/Ctrl+tab
- switch between input sources:
Super+space
- show all windows in a workspace:
Super
- switch between workspaces:
- ?how to add input source such as Chinese/Japanese?
======================================================================================
Excel
- the first line(s) can be frozened
======================================================================================
Google Sheet
- shortcut
- insert row:
Alt+i+r
- insert row:
======================================================================================
Jupyter Notebook
- measure the cell execution time:
%%time
jupyter notebook --ip 0.0.0.0
======================================================================================
Atom
- preview
.md
:Ctrl+Shift+m
======================================================================================
Sublime Text
======================================================================================
Good to know (better late than never)
- Active learning: the learning algorithm can figure out what kind of data they need most and query the users! whoa kewl!
- set locale:
export LC_CTYPE=zh_CN.UTF-8
if Chinese does not show up; for permanent change, write to ~/.bashrc - always add a newline
\n
to the end of a file - steganography!
- ORM: object-relational mapping
- locale: w/ hexdump
======================================================================================
Mamechishiki 豆知識
Mebibyte (MiB): 1 MiB = 220 bytes = 1024 kibibytes = 1,048,576 bytes
1 MB = 1,000,000 (106) bytes