在單細胞轉錄組表達矩陣里面去除細胞周期影響

背景介紹

早在2015年發表在Nat. Biotechnol文章就提出了 scLVM (single-cell latent variable model)來在單細胞轉錄組數據里面去除細胞周期影響 但是 scLVM 僅僅是考慮 細胞周期直接相關基因,而且沒有考慮細胞類型,其實不同類型的細胞哪怕是在同一個時間點的細胞周期狀態,它們的細胞周期相關基因表達也是不同的。更重要的是,還有很多非直接細胞周期相關基因也需要考慮。

所以作者開發了ccRemover來去除單細胞轉錄組數據里面去除細胞周期影響,發表于2016年的SR

直接讀作者的說明書即可學會使用它:https://cran.r-project.org/web/packages/ccRemover/vignettes/ccRemover_tutorial.html

使用R包

核心代碼就一句話

可以看到主要就是使用ccRemover函數處理我們的單細胞表達矩陣哦

xhat <- ccremover(dat,="" bar="">FALSE

)
xhat <- ccremover(dat,="" cutoff="">3

, max_it = 4

, nboot = 200

, ntop = 10

, bar=FALSE

)

可以簡單處理,也可以根據理解,加上一系列的參數。 表達矩陣需要是歸一化的。 下面我們就具體講解。

表達矩陣的前期處理

首先安裝并且加載包和測試數據

#BiocInstaller

::biocLite("ccRemover")


library

(ccRemover

)
data

(t

.cell_data

)
## 表達矩陣如下;
head

(t

.cell_data

[,1:5]

)

## Cell

1 Cell

2 Cell

3 Cell

4 Cell

5
## Gnai3

3.12560

0.86096

2.62610

3.25490

3.7152


## Cdc45

3.09290

0.11331

3.51750

0.47994

2.5712


## Narf

0.25414

0.00000

1.79130

0.00000

1.4729


## Klf6

1.66590

3.00130

1.92170

1.93360

4.1109


## Scmh1

0.25414

0.11331

0.00000

0.00000

1.7960


## Wnt3

0.52967

0.27746

0.61211

0.60523

3.5357


#

# 基因表達量在樣本的平均值的匯總


summary(apply(t.cell_data,1, mean))

## Min

. 1st

Qu

. Median

Mean

3rd

Qu

. Max

.
## 0.0392

0.8990

1.5738

1.6202

2.2817

5.1372


#

# 樣本的所有的基因的表達量的平均值的匯總


summary(apply(t.cell_data,2, mean))

## Min

. 1st

Qu

. Median

Mean

3rd

Qu

. Max

.
## 0.3997

1.3119

1.6681

1.6202

1.9348

2.7493


#

# 保留每個基因的所有樣本表達量平均值以備后用


mean_gene_exp <->
#

每個基因減去其平均值后的表達量矩陣


t_cell_data_cen <- t.cell_data="" -="">
#

# 接近于 0 了


summary(apply(t_cell_data_cen,1,mean))

## Min

. 1st

Qu

. Median

Mean

3rd

Qu

. Max

.
## -4

.441e-16

-5

.211e-17

-6

.924e-19

-1

.453e-18

4.661e-17

3.838e-16


gene_names

<->
head(gene_names)

## [1] "Gnai3"

"Cdc45"

"Narf"

"Klf6"

"Scmh1"

"Wnt3"


## gene_indexer函數會根據包內置的細胞周期相關基因來判斷我們的表達矩陣的基因是否屬于


cell_cycle_gene_indices

<- gene_indexer(gene_names,="" species="">"mouse"

,
name_type = "symbols"

)

#

# Invalid name type input. Switching to NULLNo name format input.


#

# Checking to see if match can be found:


#

# Best guess is symbol IDs


#

# 751 matches out of a possible 7073


if_cc <->FALSE

,nrow(t_cell_data_cen))
if_cc[cell_cycle_gene_indices] <->TRUE


summary(if_cc)

#

# Mode FALSE TRUE


#

# logical 6322 751


## 構造表達矩陣以及其對應的基因是否屬于細胞周期相關基因集


dat <->list

(x=t_cell_data_cen, if_cc=if_cc)
xhat <- ccremover(dat,="" bar="">FALSE

)

## 0.1061784

of

genes

are

cell-cycle

genes


## Iteration

1 ...
## Bootstrapping

...The

bootstrap

results

on

the

top

10 components

are

: xn_load

xy_load

diff_load

t_load_boot


## PC1

4.764066

5.2458626

0.48179702

6.7167008


## PC2

1.899494

1.9400189

0.04052475

0.6834059


## PC3

1.307673

1.4149245

0.10725191

1.6660350


## PC4

1.134286

1.3889111

0.25462511

1.9876191


## PC5

1.108502

1.1296735

0.02117137

0.1670011


## PC6

1.198600

1.1571330

-0

.04146651

-0

.4241297


## PC7

1.076888

0.9899750

-0

.08691289

-1

.0550547


## PC8

1.045047

0.9508303

-0

.09421625

-1

.1941491


## PC9

1.209063

1.2256193

0.01655664

0.2071162


## PC10

1.167671

1.2846210

0.11695048

1.4982856


## The

follow

components

are

removed

: 1
##
## Iteration

2 ...
## Bootstrapping

...The

bootstrap

results

on

the

top

10 components

are

: xn_load

xy_load

diff_load

t_load_boot


## PC1

1.899494

1.9400189

0.04052475

0.8808403


## PC2

1.307673

1.4149245

0.10725191

1.5588938


## PC3

1.134286

1.3889111

0.25462511

1.8225974


## PC4

1.108502

1.1296735

0.02117137

0.1959952


## PC5

1.198600

1.1571330

-0

.04146651

-0

.4472453


## PC6

1.076888

0.9899750

-0

.08691289

-1

.2271217


## PC7

1.045047

0.9508303

-0

.09421625

-1

.3564926


## PC8

1.209063

1.2256193

0.01655664

0.2235150


## PC9

1.167671

1.2846210

0.11695048

1.4904740


## PC10

1.009986

0.9815659

-0

.02841966

-0

.4561793


## No

more

cell-cycle

effect

is

detected

.

xhat

<- xhat="" +="">
# 最后可以把基因的平均值加回去


高級參數

xhat -

ccRemover

(dat

, cutoff

= 3,

max_it

= 4,

nboot

= 200,

ntop

= 10,

bar

=FALSE)


## 0.1061784

of

genes

are

cell-cycle

genes


## Iteration

1 ...
## Bootstrapping

...The

bootstrap

results

on

the

top

10 components

are

: xn_load

xy_load

diff_load

t_load_boot


## PC1

4.764066

5.2458626

0.48179702

7.0666399


## PC2

1.899494

1.9400189

0.04052475

0.7109023


## PC3

1.307673

1.4149245

0.10725191

1.4957446


## PC4

1.134286

1.3889111

0.25462511

1.9186261


## PC5

1.108502

1.1296735

0.02117137

0.1781067


## PC6

1.198600

1.1571330

-0

.04146651

-0

.4179298


## PC7

1.076888

0.9899750

-0

.08691289

-1

.1317599


## PC8

1.045047

0.9508303

-0

.09421625

-1

.3176625


## PC9

1.209063

1.2256193

0.01655664

0.2108930


## PC10

1.167671

1.2846210

0.11695048

1.4402412


## The

follow

components

are

removed

: 1
##
## Iteration

2 ...
## Bootstrapping

...The

bootstrap

results

on

the

top

10 components

are

: xn_load

xy_load

diff_load

t_load_boot


## PC1

1.899494

1.9400189

0.04052475

0.8727071


## PC2

1.307673

1.4149245

0.10725191

1.6748962


## PC3

1.134286

1.3889111

0.25462511

1.9396499


## PC4

1.108502

1.1296735

0.02117137

0.1671651


## PC5

1.198600

1.1571330

-0

.04146651

-0

.4822679


## PC6

1.076888

0.9899750

-0

.08691289

-1

.2233668


## PC7

1.045047

0.9508303

-0

.09421625

-1

.4931038


## PC8

1.209063

1.2256193

0.01655664

0.2173142


## PC9

1.167671

1.2846210

0.11695048

1.5028994


## PC10

1.009986

0.9815659

-0

.02841966

-0

.3995873


## No

more

cell-cycle

effect

is

detected

.

建議直接看英文:

The ‘cutoff’ is used to determine which of the effects are cell-cycle effects. The default and recommended value is 3, which roughly corresponds to a p-value of 0.01. For data sets which have very low levels of cell-cycle activity this value can be lowered to increase the detection of cell-cycle effects. Example 3 in the original manuscript was a case where a lower value of the cutoff was necessary.

The ‘max.it’ value is the maximum number of iterations of the method. ccRemover will stop whenever it detects no more significant effects present in the data or it reaches its maximum number of iterations. The default value is 4 but we have found that for many data sets the cell-cycle effect will be effectively removed after 1 or 2 iterations.

The ‘nboot’ value corresponds to the number of bootstrap repetitions carried out to test the significance of the components. Please refer to the methods section original manuscript for a detailed description of the estimation process. The default value is 200 and we have found this work effectively for most data sets.

The ‘ntop’ parameter determines the number of principal components which are to be tested as cell-cycle effects upon each iteration. We have found that the default value of 10 works effectively for most data sets. However, for extremely diverse data sets within which there are expected to be many sources of variation this value could be raised so that all elements of the cell-cycle effect can be identified and removed.

作者在真實數據測試了,其中

human glioblastomas data

自己都可以重復一下:

免責聲明:本文僅代表文章作者的個人觀點,與本站無關。其原創性、真實性以及文中陳述文字和內容未經本站證實,對本文以及其中全部或者部分內容文字的真實性、完整性和原創性本站不作任何保證或承諾,請讀者僅作參考,并自行核實相關內容。

http://image95.pinlue.com/image/16.jpg
分享
評論
首頁
(-^O^-)MG金字塔的财富爆分打法 000952股票分析 qq网球比分直播 内蒙古快3官方走势图 浙江体彩6+1开奖时间是星期几 北京快乐8全包稳赚法 大发pk10计划网页 黑龙江快乐十分麻将盘 腾讯欢乐捕鱼作弊器 如何算武汉麻将 历届福彩大奖 易玩通娱乐平台 湖南幸运赛车网 双色球走势图500期 黑龙江11选5网上买 迅篮球比分直播 我要找真钱捕鱼平台