记第二次造访美国学习的末尾

阿拉斯加航空的飞机从波城起飞了,驶向圣地亚哥,这一次在美国已经只余下三日。

刚来的时候,我还在想在 San Diego 长住会不会时常思念祖国,然而却发现并没有,恐是一直乐不思蜀。来 Boston 之前又在想离开了以后在 Boston 会不会时常思念祖国?然此次旅途中,得见波城遍地春色。你既然如此将我迎接,以后我便将你当做是家。

回忆这一年,我觉得自己并没有什么长进。虽是每日尽全力学习工作,但是还是觉得自己有太多不会的东西,还差得很远。或许是到现在的状态了,想要再往前迈下一步,确实挺难的,也不知道下一步踩得是虚是实。即便如此,这样的一年,实在是受到了各位老师各位师兄师姐们太多太多细致入微的照顾,以至于我都不知道怎样表达我的感激之情。而今我还决定离开 UCSD 投入敌人的怀抱,啊——真是欠揍!无论如何,只有立志以后做好的 Science 工作,以回报他们一切的鼎力相助。

一想到生物,我就会激动。我对它很痴迷,对它的好奇在每一个尺度、每一个视角上,以至于后来慢慢地将兴趣集中在了 Chromatin 和 Gene regulation,觉得这是造就许多尺度下生物多样性的缘由,这种多样性一次又一次在个体的发育中有时序性地释放轮回出现,却又横跨历史长河中逐渐演化改变以适应环境。现在的时代,真的是生命科学爆炸的时代,模式生物中积累的知识和蓬勃兴起的生物技术终于让我们总结观察到许多奇妙的现象和模式,让人总是不禁地拷问如何?何如?我也不知道以后自己这样的碌碌无为之辈以后到底能不能做出啥玩意;我也不知道这些问题到底有没有终极的答案。真希望有一天能将这一切的表象都一般化、抽象化,将这细胞宇宙内看到的种种现象一览无余、重新定义。

我之所以激动又在于人,只因事在人为,每一个工作的背后都有一群人。这一年以来,深知热衷基础研究者,志同道合者,甚多。因经历有异,每个人都有他们自己的深邃的见解、独到的实践切入点和自己所关心的终极问题,大家的心境大多也是迥乎不同,就算如此,我们却总是能靠一次实验、一杯咖啡、一桌饭、一杯酒、一次步行、一场跑步相互碰撞。Porter Robinson 和 Madeon 曾合作完成一首流行一时的 Shelter。Madeon 说,虽然他们开始产出优秀作品的时候都还是独立创作,但是他们深知如果一起合作能收获更多。因此他们想要尝试一起完成一首作品以展现他们的友谊——科研不也是如此?

以后看到赤轮高挂、碧波涌起时,我会想起圣地亚哥;看到白鸥展翅、破天飞去的时候,我会忆起圣地亚哥,这是一个对我恩重如山的地方。我不会再因别离而感伤,二十多岁人生在世,我们可在神秘的虫洞重逢、在盛会再聚首,将来的日夜里,我们还能在饭桌上再续佳话,看年轻的时候是怎样的充满情怀、胸怀理想。也希望自己能将 publications、funding、tenure 抛在脑后,能一直在这场没有终点的修行中逍遥自在地洒脱前行,就像 Lermontov 的一首小诗里说的一样——我们是在大海里航行的孤帆,既没有在寻求彼岸,也不是要逃避乐土。船底是清澈的碧波,头顶是金色的阳光。

Advertisements

Why Genome Sciences?

Life, what a wonderful and elegant existence form on earth, brings the endless fantasy and surprise to the enormous universe!

How did the life originate, how did it inherit and how it evolves? What attributes to such incredible biodiversity here today? What creates the human being with intelligence? How does the life interact with each other and interact with the surroundings? Where do the diseases come from? Where is the future of us? Thousands of questions remain to be answered about it.

The mystery of life is so captivating that biologists took centuries desperately chasing for the knowledge behind the miracle created by nature, from the very beginning of the observation of inheritance by Mendel to the identification and confirmation of DNA as the main genetic materials. As what people found previously was that the phenotypes are actually inherited, scientists, who usually are also materialists, hypothesized there are some materials which contribute to the phenotype inheritance. Therefore, researchers were looking for the identity of those molecules. And the pioneer scientists who were actually chemists and physicists concluded by elegant experiments that DNA is the molecule that transmits the information to the descendants, separating them from proteins and other biomacromolecules. Later people understood the composition of the DNA along with the structure of the DNA, which were regarded as the beginning era of the modern biology, or molecular biology. I would say understand the composition of DNA molecules is more important than its structure, as, in terms of information, we are able to know DNA is actually polymerized by nucleotides, which contains four types of nitrogen bases. It is the information inside the combinatorial bases that attribute to the complexity and diversity of the life.

Later people were trying to figure out how DNA inherited our phenotype. Scientists discovered that when to disrupt some parts of the gene will generate phenotype alterations. Based on the genetic and biochemical study, people summarized the central dogma of genetic information flow that DNA finally transfer the genetic code to proteins, the functional products through the processes called transcription and translation. Then people felt excited to address the questions that DNA has encoded proteins information and is the center to command a cell behavior. Those DNA components, we called them gene.

Through those decades, people were trying them best to understand gene, and accumulated tens of thousands of knowledge in terms of cellular function, transcription, diseases and so forth. In order to understand the paranormal view of our whole information inside DNA, scientists launched Human Genome Project (HGP), aiming to sequence human genome to gain the script which necessary to human being formation.

But, here comes another question. In an individual, one may contain the same genome script, but what makes them different? Neuron, stem cells, immune cells, blood cells, cancer cells, fibroblast? As the study going deeper, people knew that it is not the numbers of gene but the gene regulation programs that contributes to the complexity of the life. Gene encodes transcription machinery, RNA polymerase for DNA transcription. Transcription factors bind sequence-specific regions and activate gene expression at a certain condition. Therefore, the gene is regulated by the interaction of cis elements inside genome and trans elements (transcription factors), which orchestrates a huge transcription network inside genome and achieves the identity of different cell types. Because we just know <1.5% of the genome, the other non-coding components were unknown, especially for those parts related to gene regulation. Therefore, it becomes important to understand the cis-regulatory elements along with transcription factor expression. But how to understand the location and distribution of the cis-regulatory elements? Perhaps, we could do the genetic screen or perform motif analysis genome-wide to help us to identify their location. But at the beginning, it is difficult to do perturbation in mammalian cells. Therefore we have ENCODE project.

ENCODE project aims to understand and annotate the function of the unknown regions of the genome. The way for us to understand the double Dutch is through epigenetic markers inside the DNA and histones. Labs around the world previously identified that some DNA modifications and histone markers are correlated with localization of cis-regulatory elements such as promoters and enhancers. Some other histone marks are able to help us identify the transcription activity and chromatin states. Other people developed chromatin accessibility assay such as DNase-Seq, MNase-Seq, and ATAC-Seq to regard the chromatin open region as active elements region and coding region. A systematic study using RNA-Seq also helps us to know the transcriptomic profiles of cells, inferring the activity of their regulatory elements. From the data we inferred from the epigenome signatures and landscape, we can understand the life better.

The story for us to understand the genome becomes more and more exciting when we are approaching the real states of our genome. Job Dekker is a biologist interested in chromatin conformation. Through his lab and other colleagues’ contributions, they developed a high-throughput method which could capture genome-wide chromatin contact to map the physical genome structure of us and others. Researchers performed Hi-C on cells and identify the structure of our chromatin. To our surprise, those chromatins are actually formed as individual topological domains across the cell types in one. Those domains mirror the hypothesized existence of insulator elements which block gene regulation by forming DNA loop. CTCF/cohesin are regarded as chromatin structure regulator helps to form DNA loops. Some studies suggested that the disruption of the insulator or Topological Domain boundary sequence causes dramatic phenotype disruption and ultimately contributes to cancer and developmental disorder.

And now here comes CRISPR. Everybody talks about it because it is such a revolutionary and powerful tool to manipulate mammalian genome. Using CRISPR, we can finally look at our genome and perform classic genetic screening to understand the elements functionally. With more precise technique, people are now also digging sequence variance, like SNP, CNV and their relationship with eQTL, TF binding preference, DNA looping variance. Besides, for a higher resolution of transcription profile, especially in heterogeneous populations like cancer tissues, stem cells, immune groups and brain structure, people develops single cell genomics, aiming to dissect cell population, link the cell to cell communications and finally bring the genome sciences at the highest resolution, the Single Cell Level.

It is very exciting, and it will be more exciting in the future. Because we are not only able to manipulate the genome but synthesize the genome. We are aiming to understand the transcription network systematically, trace the developmental lineage and transcriptome cascade of the cellular systems in single cell level, understand the causality of DNA elements function along with epigenome (those are the multiple layer information laying inside our linear sequence), synthesize the larger scale transcription module and design sophisticated artificial parts for our needs. That will be the moment when we can finally fully hack and unlock our genome to start the new era of biology. A new era for the knowledge and all mankind.