diff --git a/cross_species_comparison/README.md b/cross_species_comparison/README.md
index 4187387de0d40cfd969f5c312f03527479285966..21f3f4bc755203d07cbc02fd83e20c5fb1778508 100644
--- a/cross_species_comparison/README.md
+++ b/cross_species_comparison/README.md
@@ -1,70 +1,43 @@
-Ideas from Anshul Kundaje
+# Cross-species ChIPseq analysis
 
-Rather than BLAST, compute the gapped k-mer similarity between regions across the species. Do the following
+## Whole Genome Alignment
 
-1. Find orthologous genes between the two species (once I have dunnart genome):
-- JustOrthologs - https://academic.oup.com/bioinformatics/article/35/4/546/5063405
-This seems to be a new fast method, metrics are comparable/slightly better than OrthoMCL OMA and OrthoFinder, doesn't require all-versus-all BLAST comparisons, which are time-consuming and memory intensive which are is used by other methods. Also creates just a single step command for generating orthologs instead of multiple steps that take awhile to run like in other programs. Compares dinucleotide percentages to determine the level of sequence identity between two CDS regions. Specific parameter for distantly related species. Tested in paper on species that are 312 million years divergent and works well.
+__Email from Camille Berthelot:__
 
-```
-ARGUMENT OPTIONS:
+We have run into a similar problem with non-model species in a recent project.
+Here is the pipeline that we have had good success with, which is very similar to the conservation pipeline used by UCSC:
 
- -q	Query Fasta File					A fasta file that has been divided into CDS regions and sorted based on the number of CDS regions.
- -s	Subject Fasta File					A fasta file that has been divided into CDS regions and sorted based on the number of CDS regions.
- -o	Output File							A path to an output file that will be created from this program.
- -t	Number of Cores						The number of Cores available
- -d	For More Distantly Related Species	A flag that implements the -d algorithm, which gives better accuracy for more distantly related species.
- -c	Combine Both Algorithms				Combines both the normal and -d algorithms for the best overall accuracy.
- -r Correlation value					An optional value for changing the required correlation value between orthologs.
-```
+- build a custom whole-genome alignment between the dunnart and mouse genomes with Lastz
+- produce the chains and nets for this alignment (e.g. hierarchical organisation of aligned regions to get a 1-to-1 alignment over the whole genomes)
+- use LiftOver to map your dunnart regions of interest to mouse (and reciprocally)
 
-Fasta file of CDS regions that have been sorted based on the number of CDS regions for both the dunnart and the mouse
+We have benchmarked this pipeline against the one we used in the 20 mammals Cell paper, and there was almost no difference in the recovered regions.
 
-- mouse = `wget ftp://ftp.ensembl.org/pub/release-100/fasta/mus_musculus/cds/Mus_musculus.GRCm38.cds.all.fa.gz`
-- dunnart =
+This however assumes that you have a good grip on bioinformatics and access to sufficient computational power, as building alignments is quite intensive. There are a number of online tutorials to build the alignment and nets using Lastz:
+http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto
+https://www.bioconductor.org/packages/release/bioc/vignettes/CNEr/inst/doc/PairwiseWholeGenomeAlignment.html
+https://github.com/hillerlab/GenomeAlignmentTools (more fine-tuned and advanced)
 
-# dunnart as query and mouse are query - see how different they are
+If you do not have access to sufficient resources: I would advise against using Blastn on its own because you will often obtain several hits for every query sequence, and it may be difficult to find appropriate parameters that identify true orthologs over the entire genome or even across a large region at this evolutionary distance.
+I would then recommend to identify orthologous regions beforehand as you mentionned (e.g. 100kb around orthologous genes) and use a local aligner to align those two regions (Lastz, MUMmer4, or another). This will drastically reduce the computational power required and should run on a desktop machine. However, your alignments will not be exhaustive, and it is likely that you will not be able to assess conservation for a sizeable fraction on your ChIPseq peaks.
 
-```
-python justOrthologs.py -q orthologTest/mm10_cds.fa -s orthologTest/dunnart_cds.fa -o dunnart_mm10_orthologs -c -d
-```
+I hope this is helpful, don't hesitate to get back to me if you have questions.
 
-2. Now for gene x in species A mapped to gene y in species B, identify approximate syntenic regulatory domains (e.g. if it were human vs mouse you could use +/-100 Kb around the genes)
-- Don’t really get this step, so for two genes that are orthologs take -/+ 100kb either side of the gene for both species and I guess save that region with the orthologous gene name, species and coordinate
+__Methods from Hecker & Hiller 2020 Whole Genome Alignment:__
+Going to try this method.
 
-For dunnart - look at distance to the nearest gene for all peaks see how many fit into the 100kb range either side.
+To compute pairwise and multiple genome alignments, we used the human hg38 assembly as the reference (Supplementary Fig. 1 shows the entire workflow). We first built pairwise alignments between human and a query species using lastz and axtChain to compute co-linear alignment chains [82]. To align placental mammals, we used previously determined lastz parameters (K = 2400, L = 3000, Y = 9400, H = 2000, and the lastz default scoring matrix) that have a sufficient sensitivity to capture orthologous exons [16]. To align chimpanzee, bonobo, and gorilla, we changed the lastz parameters (K = 4500 and L = 4500).
 
-3. Now ur goal is map elements between these two domains
+After building chains, we applied RepeatFiller (RRID:SCR_017414), a method that performs another round of local alignment, considering unaligning regions ≤20 kb in size that are bounded by co-linear alignment blocks up- and downstream. RepeatFiller removes any repeat masking from the unaligned region and is therefore able to detect novel alignments between repetitive regions. We have previously shown that RepeatFiller detects several megabases of aligning repetitive sequences that would be missed otherwise. After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity. Pairwise alignment chains were converted into alignment nets using a modified version of chainNet that computes real scores of partial nets. Nets were filtered using NetFilterNonNested.perl with parameters -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000, which applies the UCSC “syntenic net” score thresholds (minTopScore of 300000 and minSynScore of 200000) and keeps nested nets that align to the same locus (inversions or local translocations; net type “inv” or “syn” according to netClass) if they score ≥5,000. For the Mongolian gerbil, tarsier, Malayan flying lemur, sperm whale, Przewalski's horse, Weddell seal, Malayan pangolin, Chinese pangolin, Hoffmann's two-fingered sloth, and Cape rock hyrax that have genome assemblies with a scaffold N50 ≤1,000,000 and a contig N50 ≤100,000, we just required that nets have a score ≥100,000. For marsupials and platypus, we lowered the score threshold for nets to 10,000 and kept inv or syn nets with scores ≥3,000. Next, we used the filtered nets to compute a human-referenced multiple genome alignment with MULTIZ-tba. Finally, to distinguish between unaligning genomic regions that are truly diverged and genomic regions that do not align because they overlap assembly gaps in the query genome [83], we post-processed the multiple-genome alignment and removed all unaligning regions (e-lines in a maf block) that either overlap an assembly gap in the respective query genome(s) or are not covered by any alignment chain.
 
-4. Map elements between these two domains (how should I do this?)
+The main difference between this 120-mammal alignment and our previous 144-vertebrate alignment [16] is that the former focuses entirely on mammals and includes many new species (120 vs 74 mammals, see Supplementary Table 1). In addition, we updated genome assemblies of 12 species that were already included in the previous alignment (species are marked in Supplementary Table 1). Finally, the 120-mammal alignment used RepeatFiller to improve the completeness of alignments between repetitive regions.
 
-5. For all pairs of elements across the species, compute their gapped k-mer similarity. You can use the gkmSVM kernel for this.
-gkmSVM kernel - http://www.beerlab.org/gkmsvm/
-tutorial: http://www.beerlab.org/gkmsvm/gkmsvm-tutorial.htm
+### lastZ
+We used previously determined lastz parameters (K = 2400, L = 3000, Y = 9400, H = 2000, and the lastz default scoring matrix) - Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res. 2017;45(14):8369–77.
 
-so the pair is the 100kb upstream in the mouse and dunnart or the downstream in the mouse and dunnart.
-
-R package
-
-```
-install.packages('gkmSVM')
-```
-
-Sample input files: ctcfpos.bed nr10mers.fa ref.fa alt.fa
-The first step is to build the kernel matrix (the pairwise similarity scores for all the sequences in the positive and negative sets).
-
-Positive set:
-Negative set:
-
-
-
-6. Do this for pairs of elements for all matched domains across species. Lets call these syntenic pairs
-
-7. As a control compute the gkm similarity for pairs of elements in non matched domains. This gives you the null distribution.
-- pairs of elements in non-matched domains?
-
-8. Compare the observed similarities of syntenic pairs against the null distribution to obtain statistical significance
-
-9. Do multiple testing correction
-
-10. Pairs that pass, are your cross-species matched enhancers
+### axtChain
+### RepeatFiller
+### chainCleaner
+### chainNet
+### NetFilterNonNested
+### MultiZ (roast)