diff --git a/cross_species_comparison/README.md b/cross_species_comparison/README.md
index c46dad11f3f43a0d66f574f49b324b41adaff266..85fefdcafa6ae24db28cda8a2c5802e9056439c8 100644
--- a/cross_species_comparison/README.md
+++ b/cross_species_comparison/README.md
@@ -23,30 +23,90 @@ Going to try this method.
 
 To compute pairwise and multiple genome alignments, we used the human hg38 assembly as the reference (Supplementary Fig. 1 shows the entire workflow). We first built pairwise alignments between human and a query species using lastz and axtChain to compute co-linear alignment chains [82]. To align placental mammals, we used previously determined lastz parameters (K = 2400, L = 3000, Y = 9400, H = 2000, and the lastz default scoring matrix) that have a sufficient sensitivity to capture orthologous exons [16]. To align chimpanzee, bonobo, and gorilla, we changed the lastz parameters (K = 4500 and L = 4500).
 
-After building chains, we applied RepeatFiller (RRID:SCR_017414), a method that performs another round of local alignment, considering unaligning regions ≤20 kb in size that are bounded by co-linear alignment blocks up- and downstream. RepeatFiller removes any repeat masking from the unaligned region and is therefore able to detect novel alignments between repetitive regions. We have previously shown that RepeatFiller detects several megabases of aligning repetitive sequences that would be missed otherwise. After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity. Pairwise alignment chains were converted into alignment nets using a modified version of chainNet that computes real scores of partial nets. Nets were filtered using NetFilterNonNested.perl with parameters -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000, which applies the UCSC “syntenic net” score thresholds (minTopScore of 300000 and minSynScore of 200000) and keeps nested nets that align to the same locus (inversions or local translocations; net type “inv” or “syn” according to netClass) if they score ≥5,000. For the Mongolian gerbil, tarsier, Malayan flying lemur, sperm whale, Przewalski's horse, Weddell seal, Malayan pangolin, Chinese pangolin, Hoffmann's two-fingered sloth, and Cape rock hyrax that have genome assemblies with a scaffold N50 ≤1,000,000 and a contig N50 ≤100,000, we just required that nets have a score ≥100,000. For marsupials and platypus, we lowered the score threshold for nets to 10,000 and kept inv or syn nets with scores ≥3,000. Next, we used the filtered nets to compute a human-referenced multiple genome alignment with MULTIZ-tba. Finally, to distinguish between unaligning genomic regions that are truly diverged and genomic regions that do not align because they overlap assembly gaps in the query genome [83], we post-processed the multiple-genome alignment and removed all unaligning regions (e-lines in a maf block) that either overlap an assembly gap in the respective query genome(s) or are not covered by any alignment chain.
+After building chains, we applied RepeatFiller (RRID:SCR_017414), a method that performs another round of local alignment, considering unaligning regions ≤20 kb in size that are bounded by co-linear alignment blocks up- and downstream. RepeatFiller removes any repeat masking from the unaligned region and is therefore able to detect novel alignments between repetitive regions. We have previously shown that RepeatFiller detects several megabases of aligning repetitive sequences that would be missed otherwise. After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity. Pairwise alignment chains were converted into alignment nets using a modified version of chainNet that computes real scores of partial nets. Nets were filtered using NetFilterNonNested.perl with parameters -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000, which applies the UCSC “syntenic net” score thresholds (minTopScore of 300000 and minSynScore of 200000) and keeps nested nets that align to the same locus (inversions or local translocations; net type “inv” or “syn” according to netClass) if they score ≥5,000. For the Mongolian gerbil, tarsier, Malayan flying lemur, sperm whale, Przewalski's horse, Weddell seal, Malayan pangolin, Chinese pangolin, Hoffmann's two-fingered sloth, and Cape rock hyrax that have genome assemblies with a scaffold N50 ≤1,000,000 and a contig N50 ≤100,000, we just required that nets have a score ≥100,000. For marsupials and platypus, we lowered the score threshold for nets to 10,000 and kept inv or syn nets with scores ≥3,000.
 
-The main difference between this 120-mammal alignment and our previous 144-vertebrate alignment [16] is that the former focuses entirely on mammals and includes many new species (120 vs 74 mammals, see Supplementary Table 1). In addition, we updated genome assemblies of 12 species that were already included in the previous alignment (species are marked in Supplementary Table 1). Finally, the 120-mammal alignment used RepeatFiller to improve the completeness of alignments between repetitive regions.
+### Some definitions
 
-__To align non-placental mammals, we used K = 2400, L = 3000, Y = 3400, H = 2000 and the HoxD55 scoring matrix.__
+In chain and net lingo, the __target__ is the reference genome sequence and the __query__ is some other genome sequence. For example, if you are viewing Human-Mouse alignments in the Human genome browser, human is the target and mouse is the query.
 
-MASKING: Both genomes have to be repeatmasked and masked Tandem Repeat Finder (trf) first (thanks to Hiram for pointing this out)
-ALIGNING: The two genomes are aligned with BLASTZ (we don't use blastz's own chaining, see discussion (angie)). This generates lav-files, which have to be converted to psl (lavToPsl)
-CHAINING: Two matching alignments next to each other are joined into one fragment if they are close enough (axtChain). As every genomic fragment can match with several others, we keep only the longest chains : first do axtSort then filter with axtBest (more info on the mailing list)
-NETTING: Group blocks of chained alignments into longer stretches of synteny (netChain)
-MAF'ING: From the synteny-files (positions), get the sequences and re-create alignments
-PhastCons: Using the maf-files, calculate the strength of conservation for every base, similar to a Vista- or protein Conservation plot, but applicable to multiple alignments
+A __gapless block__ is a base-for-base alignment between part of the target and part of the query, possibly including mismatching bases. It has the same length in bases on the target and the query. This is the output of the most primitive alignment algorithms.
+
+A __gap__ is a link between two gapless blocks, indicating that the target or the query has sequence that should be skipped over in order to make the best-scoring alignment. In other words, the scoring penalty for skipping over one or more bases is less than the penalty for continuing to align the sequences without skipping.
+
+A __single-sided gap__ is a gap in which sequence in either target or query must be skipped over. A plausible explanation for needing to skip over a base in the target while not skipping a base in the query is that either the target has an inserted base or the query has a deleted base. Many alignment tools produce alignments with single-sided gaps between gapless blocks.
+
+A __double-sided gap__ skips over sequence in both target and query because the sum of penalties for mismatching bases exceeds the penalty for extending a gap across them. This is possible only when the penalty for extending a gap is less than the penalty for creating a new gap and less than the penalty for a mismatch, and when the alignment algorithm is capable of considering double-sided gaps.
+
+A __chain__ is a sequence of non-overlapping gapless blocks, with single- or double-sided gaps between blocks. Within a chain, target and query coords are monotonically non-decreasing (i.e. always increasing or flat). Chains are constructed by the axtChain program which finds pairwise alignments with the same target and query sequence, on the same strand, that can be merged if overlapping and joined into one longer alignment with a higher score under an affine gap-scoring system (progressively decreasing penalties for longer gaps).
+
+* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
+* not just orthologs, but paralogs too, can result in good chains. but that's useful!
+* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, Blastz's dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.
+* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
+* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).
+
+A __net__ is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, which in turn may have gaps filled in by lower-level chains and so on.
+
+* I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page.
+* a net is single-coverage for target but not for query, unless it has been filtered to be single-coverage on both target and query. By convention we add "rbest" to the net filename in that case.
+* because it's single-coverage in the target, it's no longer symmetrical.
+* the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
+* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
+* "LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.
 
 
 ### Preparation
 
+Spartan modules
+
+```
+module load foss
+module load lastz
+module load ucsc/21072020
+module load perl
+
+```
+
 #### Repeat mask dunnart genome
 
-????
+Create conda environment will all the dependencies:
+
+```
+conda create -n wga
+conda activate wga
+conda config --add channels conda-forge
+conda config --add channels biocore
+conda config --add channels bioconda
+conda install repeatmasker
+```
+
+Run RepeatModeler to de novo find repeat regions in the dunnart genome:
+```
+BuildDatabase -name dunnart -engine ncbi Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta
+
+RepeatModeler -database dunnart
+
+```
+
+Run RepeatMasker to mask repeats in dunnart genome (makes repeats lowercase):
+
+```
+RepeatMasker -xsmall Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta -default_search_engine hmmer -trf_prgm /home/lecook/.conda/envs/wga/bin/trf -hmmer_dir /home/lecook/.conda/envs/wga/bin/
+```
+
+#### Split into scaffolds
+
+Using faSplit from the UCSC Kent Tools
+
+```
+faSplit byName Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta smiCra1/
+```
+
 
 #### Create .2bit and .sizes files
 
 ```
-faToTwoBit ../../dunnart/genomes/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta smiCra1.2bit
+faToTwoBit ../../dunnart/genomes/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr_RM.fasta smiCra1.2bit
 ```
 
 ```
@@ -65,9 +125,83 @@ To align placental mammals, we used previously determined lastz parameters (K =
 
 To align placental mammals, we used the lastz alignment parameters K = 2400, L = 3000, Y = 9400, H = 2000 and the lastz default scoring matrix, correspond- ing to parameter set 2 in Table 1. To align non-placental vertebrates, we used K = 2400, L = 3000, Y = 3400, H = 2000 and the HoxD55 scoring matrix. Citation: Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res. 2017;45(14):8369–77.
 
+```
+TRA=($(for file in *.fa; do echo $file |cut -d "." -f 1;done))
+
+echo ${TRA[@]}
+
+for tr in ${TRA[@]};
+
+do
+
+echo lastz_32 /data/projects/punim0586/lecook/chipseq-pipeline/cross_species/data/genomes/mm10.fa[multi] /data/projects/punim0586/lecook/chipseq-pipeline/cross_species/data/genomes/smiCra1/${tr}.fa H=2000 K=2400 L=3000 Y=9400 --format=maf > /data/projects/punim0586/lecook/chipseq-pipeline/cross_species/data/genomes/${tr}_mm10.smiCra1.maf
+```
+
+#### Convert maf to axt-format
+
+```
+maf-convert axt my-alignments.maf > my-alignments.axt
+```
+
+
 ### axtChain
+
+We use axtChain (http://www.soe.ucsc.edu/~kent; default parameters) to build co-linear alignment chains.
+
+```
+axtChain -linearGap=loose mm10_smiCra1.axt mm10.2bit smiCra1.2bit mm10_smiCra1.chain
+
+```
 ### RepeatFiller
+https://github.com/hillerlab/GenomeAlignmentTools
+
+```
+python3 RepeatFiller.py -c mm10_smiCra1.chain -T2 mm10.2bit -Q2 smiCra1.2bit
+```
+
 ### chainCleaner
+https://github.com/hillerlab/GenomeAlignmentTools
+After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity.
+
+chainCleaner improves the specificity in genome alignment chains by detecting and removing local alignments that obscure the evolutionary history of genomic rearrangements [2]. The input is a chain file, ideally after adding alignments found with highly sensitive parameters if distal species are compared. The output is a chain file that contains re-scored and score-sorted chains after removing the local alignments from the parent chains and adding them as individual chains. The resulting output file can be used to get alignment nets by running chainNet [4].
+
+
+```
+
+```
+
 ### chainNet
+Given a set of alignment chains, chainNet produces alignment nets, which is a hierarchical collection of chains or parts of chains that attempt to capture only orthologous alignments [4]. The original chainNet implementation approximates the score of "sub-nets" (nets that come from a part of a chain and fill a gap in a higher-level net) by the fraction of aligning bases. This can lead to a bias in case the aligning blocks of a chain are not equally distributed. We implemented a new parameter "-rescore" in chainNet that computes the real score of each subnet [2].
+
+```
+chainNet in.chain target.sizes query.sizes target.net query.net
+```
+
 ### NetFilterNonNested
-### MultiZ (roast)
+
+
+
+
+### netChainSubset
+Create liftOver chain
+
+```
+netChainSubset ../net/chr.net chr.chain ../over/chr.chain
+```
+
+### LiftOver
+
+dunnart_to_mouse
+
+
+
+References
+[1] Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res., 45(14), 8369–8377, 2017
+
+[2] Suarez H, Langer BE, Ladde P, Hiller M. chainCleaner improves genome alignment specificity and sensitivity. Bioinformatics, 33(11):1596-1603, 2017
+
+[3] Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G. Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic Acids Res, 41(15):e151.
+
+[4] Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. PNAS, 100(20):11484-9, 2003
+
+[5] Osipova E, Hecker N, Hiller M. RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements, submitted