General
- Recursion’s Mapping & Navigating Demonstration • Mason Victors 📺
Inbox
Aim to split this into multiple linked files once a good way to chop it up is clear.
Lovina: showing how to reload RP006 groups…
- go to GCP
- select rp006-prod project
- go to cloud storage
- select l3 cache bucket
- when cache manager runs, it drops groups from the redis cache and writes the new ones
- it writes all these groups to the l3 cache bucket
- we can delete all folder via the UI by selecting them all and clicking delete
- see “How To > Access MapApp Prod Data” document for how to access redix-proxy in GCP
- may need to look up the new external endpoint IP address
- connect to redis locally
- have redis cli
- log in via terminal with command in document
- the proxy passes along redis commands
- the host in the command is the external endpoint address
- look up access token in vault > secret > phenoapp > managed_redis
- that opens a prompt talking to the redis GCP instance
- use redis-cli commands to do things
- redis is a key-value store, basically
- there’s a key that stores the names of all the available groups:
- looking in the
caches.py
, you can see the name of that key is “available-groups” - run
zrange available-groups 0 -1
- Output shows all groups available in the bucket
- to delete all of them, run
flushall
- looking in the
- To repopulate the cache, run the cron job manually:
- Use kubectl, Lens, k8s etc
- confirm I’m in the correct cluster
- e.g. the rp006 app cache manager cron job is actually in the
prod-cluster
, notrp006-prod
- e.g. the rp006 app cache manager cron job is actually in the
- in workloads, find the dash-phenoapp-v2-cache-manager-rxrx-cronjob
- trigger it (starts a manual deployment of the cron job)
- view the cron job in the list of running workload cron jobs
- In RP006 Argo:
- look up rp006-neuro-cache-manager-rxrx-cronjob-config
- compare its image tag (near top of page under “images” after clicking on a pod) to the latest in eng-infra
- verify success by looking at the list of successful jobs (after
- if there’s an issue, check out the GCP logs explorer + look for an error + read its stack trace
Denton & Michael Haines AMA on March 29, 2023…
- Kafka: event published when group created
- in
phenoservice-api
- think of Kafka as another API you talk to through consumers
- in
- Auth:
- Biggest advice = use the logs
- Phenoreader = practice downloading and aggregating group data
- Practice looking at production groups
- Write a quick script that downloads data from the cache
- First step to debugging a “why is pert X missing” = see if it’s in the cache data
- Maybe it’s not in the cache
- Maybe it’s their app settings
- Not all in README
- Denton has a branch that has an example of how to import
cache-manager
and use it to download data to your local
- BioHive = practice accessing it
- it’s a supercomputer
- you can run things faster using it than using our laptops
- it can be used remotely
- it has a dedicated fast network connect to Google Cloud — so the downloads are WAY faster to BioHive than downloading it over my home network
- a huge amount of the time spent investigating group data is just waiting for things to download
- see https://github.com/recursionpharma/data-science-onboarding for setup instructions
- You ssh into it and are in a brand new linux env
- Need to set up my git credentials, install pyenv, etc
- port forwarding to biohive to use my laptop browser but be running
jupyter lab
on biohivesft ssh bh-login001 -L localhost:8888:localhost:8888
- Troubleshooting scenarios:
- why does X map look funny?
- why can’t I access X map?
- why is X pert/group not in MapApp?
- Auth:
- Google groups vs okta groups
- Google groups: https://groups.google.com/my-groups?pli=1
- Pomerium only uses Okta groups
- We’re just a customer of that system; the security team manages it
- Ask Ram 🙂
- Pomerium used for rp006
- currently no pomerium ingress for the mapapp because it uses catalyst
- Google groups vs okta groups
- GCP
principal
cluster- used to be named
primary
cluster - where the mapapp is deployed
- started segmenting things more:
- used to be named
rp006-prod
cluster- includes Pomerium
- intended for rp006 stuff
prod-cluster
- includes Pomerium
- new things should go here
- intended for internal stuff
Science background Qs
- Who uses the MapApp?
- used by inference scientists
- What do they use the MapApp for?
- investigating monogeneic diseases (diseases involving one gene)
- referred to by gene
- experiments use cells where that gene has been turned off by one method or another
- mimicking disease via gene editing
- e.g. by CRISPR
- images are taken of how the phenome (appearance) of the cells change before and after applying the compounds
- training a deep learning classifier to recognize healthy vs diseased cells (the cell’s “phenoprint”)
- images are a cheaper way to find promising compounds and fail faster on the rest
- those images are translated into vectors (arrays of floats)
- vectors cluster by similar diseases
- those vectors are translated into cosine similarity scores
- mapping cosine similarity (angle of vectors as a %)
- red = similar
- blue = opposite
- looking for compounds that successfully counteract those diseases
- Recursion’s product will eventually be drugs, but currently it’s the analysis of drugs, which is done in part via the PhenoApp
- investigating monogeneic diseases (diseases involving one gene)
Technical background Qs
Building new groups
- Cron job: how does the cron job that builds the map work?
- The cron job is defined in
[eng-infrastructure/kube/principal/dash-phenoapp-v2/cache-manager.yaml](https://github.com/recursionpharma/eng-infrastructure/blob/trunk/kube/principal/dash-phenoapp-v2/cache-manager.yaml)
- It runs every Thursday evening at 6pm MT (1am GMT) and finishes around 7pm
- The
configome.group-auto-loader.prod
key defines:- transformations: which post-embedding transformations are run on the new data
- each transformation outputs a new group
- here’s an example PR adding the
_prox_bias_reduced
transformation to the list - the last transformation in the list will become the first group in the group dropdown
- transformations: which post-embedding transformations are run on the new data
- The cron job is defined in
- Cron job: how to do a dry run (e.g. after updating it)?
- Point your
configome.yaml
to the prod phenoservice API - Make sure that dry run is set to True there
- Make sure the transformations & dl model match the production version’s config (in the eng-infrastructure repo).
- Then you should just be able to run the script like
python phenoapp/group_auto_loader.py
- Point your
Caching groups
-
Why cache group data?
- Groups are HUGE — like, ~2 GB or so
- Requesting
-
L1 cache: what is it + how is it populated?
-
L2 cache: what is it + how is it populated?
-
L2 cache: what is it + how is it populated?
-
How do the
MapApp
’s different cache layers work?L1 cache
- in-memory python cache
- 6 dataframes
- all from default group
- we have logic to determine most used groups as well, but currently it’s pointless since we don’t have room to add them (since each group includes 8 dataframes when you include split by + normalized variants)
L2 cache
- Redis cache
- most recent 3 built groups
- group built last becomes default group
- Conor on team that updates the building logic
- corresponds to top 3 groups in the Group Label dropdown
L3 cache
-
GCP bucket:
-
How is the
MapApp
Pandas DataFrame structured?
Adding the right groups to the L1 cache
- Manually restart pods after cron job completes…
- After each cron job run adding new groups to the map, we manually restart the pod
General
- local redis cache
- Benchmark database
- no local access without
configome
changes - “structure” endpoints from
ci-report
- no local access without
- Side panel
- GO (Gene Oncology) terms
- App updates generated + cached weekly on Monday nights
psycopg2
= an ORM to talk to SQL databases using Python- we want the app to be useful for for hypothesis generation
- help scientists narrow from 2.2 trillion inferences to the most useful few
- we want the app to present an intuitive workflow for narrowing down to these hypotheses
- we also want to surface novel insights (the Recursion Advantage; things only we’ve discovered) from known insights
- there’s no competitive advantage to exploring biological relationships our competitors also know about (and may be exploring)
Heatmap
- Images → Vectors (arrays of 128 floats representing 128 dimensions) → Cosine similarity (angle between two vectors)
- Vectors
- normalized to a magnitude (length) of one (to make them comparable)
- Cosine similarity
- more similar = red
- more opposite = blue
- X-axis = query perturbations, including all concentrations
Projection/Rejection (in right sidebar)
- select one target perturbation
- looking for a target gene edit or compound (compounds also include a concentration)
- graph shows
- (0, 1) = control (target)
- looking for lines angled down and to the right (aiming at the target)
- means a phenosimilar result between target + compound at that concentration (or between target + that gene edit)
Roche partnership
- RP006 app = app copy for Roche partners that will live on a separate URL
- same codebase, though
- one partnership on neuro
- going to take longer
- neurons are finicky
- another partnership on GI-ONC (gastro-intestinal oncology)
- no map commitment
- new map ready end of July
- will need onboarding, docs
- 9 users
- no need to support non-Chrome browsers
Bayer partnership
- maybe an external app
- currently optimizing
History
- in 2020, had to use brute force to find promising compounds (couldn’t infer)
Hypothesis generation
- Do I trust the gene’s phenoprint?
- Test by splitting by experiment (to see how consistent the replicates are), viewed with pairwise display
- If it’s a bad phenoprint (I don’t trust it), can I find a second gene that’s phenosimilar to the first gene (”in the same pathway”)?
- If I were to target the second gene with a compound, would it improve the symptoms of the first gene edit?
- Using the result of either (1) or (2), can I find a phenoopposite compound that reverses the gene perturbation?
- Lovina/Summer/Michael good people to ask about cache logic updates
- Summer/Michael: familiar with the cron job that builds the map
Grafana Loki queries for debugging
Conversation with MH on Sep 14, 2022:


