| Title: | A Survival Tree Based on Stabilized Score Tests for High-dimensional Covariates |
|---|---|
| Description: | A classification (decision) tree is constructed from survival data with high-dimensional covariates. The method is a robust version of the logrank tree, where the variance is stabilized. The main function "uni.tree" returns a classification tree for a given survival dataset. The inner nodes (splitting criterion) are selected by minimizing the P-value of the two-sample the score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument (specified by user). This tree construction algorithm is proposed by Emura et al. (2021, in review). |
| Authors: | Takeshi Emura and Wei-Chern Hsu |
| Maintainer: | Takeshi Emura <[email protected]> |
| License: | GPL-3 |
| Version: | 1.5 |
| Built: | 2026-06-02 07:31:24 UTC |
| Source: | https://github.com/cran/uni.survival.tree |
The function returns the names of features (covariates) that are selected as the internal nodes of a tree. Only the names of the covariates are shown by excluding the cutt-off values.
feature.selected(tree)feature.selected(tree)
tree |
:an object made from the "uni.tree" function |
The outputs show important features for predicting survival outcomes.
An array of characters that are the names from those covariates selected in the tree
data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE) feature.selected(res)data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE) feature.selected(res)
Given a cut-off-point and selected covariate, return the survival curve for binary classification and the P-value of two sample log-rank test.
KM.split(t.vec, d.vec, X.mat, x.name, cutoff)KM.split(t.vec, d.vec, X.mat, x.name, cutoff)
t.vec |
:Vector of survival times (time to either death or censoring) |
d.vec |
:Vector of censoring indicators (1=death, 0=censoring) |
X.mat |
:n by p matrix of covariates, where n is the sample size and p is the number of covariates |
x.name |
:the name of covariate |
cutoff |
:cut-off-point |
P-value of two sample logrank test and a plot of two KM estimates
data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] KM.split(t.vec,d.vec,x.mat,x.name="ANXA5",cutoff=1)data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] KM.split(t.vec,d.vec,x.mat,x.name="ANXA5",cutoff=1)
The function returns the ranks (1=the lowest risk, 2=the 2nd lowest risk, ..., k=the highest risk) predicted for the samples.
risk.classification(tree, X.mat)risk.classification(tree, X.mat)
tree |
:an object made from the "uni.tree" function |
X.mat |
:n by p matrix of covariates from the samples, where n is the sample size and p is the number of covariates |
If the tree has k terminal nodes, then the response 1 respresents the lowest risk and k represents the highest risk.
A vector of integers, 1, 2, ..., k, that represent the ranks predicted for the samples.
data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE) risk.classification(res,x.mat)data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE) risk.classification(res,x.mat)
The output is the summary of significance tests for binary splits, where the cut-off values are optimized for each covariate.
uni.logrank(t.vec, d.vec, X.mat)uni.logrank(t.vec, d.vec, X.mat)
t.vec |
:Vector of survival times (time to either death or censoring) |
d.vec |
:Vector of censoring indicators (1=death, 0=censoring) |
X.mat |
:n by p matrix of covariates, where n is the sample size and p is the number of covariates |
The output can be used to construct a logrank tree.
A dataframe containing:
Pvalue: the P-value of the two-sample logrank test, where the cut-off value is optimized
cut_off_point: the optimal cutt-off values of the binary splits given a feature
left.sample.size: the sample size of a left child node
right.sample.size: the sample size of a right child node
data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] uni.logrank(t.vec,d.vec,x.mat)data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] uni.logrank(t.vec,d.vec,x.mat)
This function returns a classification (decision) tree for a given survival dataset. The decision of making inner nodes (splitting criterion) is based on the univariate score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument. This tree construction algorithm is proposed by Emura et al. (2021).
uni.tree( t.vec, d.vec, X.mat, P.value = 0.01, d0 = 0.01, S.plot = FALSE, score = TRUE )uni.tree( t.vec, d.vec, X.mat, P.value = 0.01, d0 = 0.01, S.plot = FALSE, score = TRUE )
t.vec |
:Vector of survival times (time to either death or censoring) |
d.vec |
:Vector of censoring indicators (1=death, 0=censoring) |
X.mat |
:n by p matrix of covariates (features), where n is the sample size and p is the number of covariates |
P.value |
:the threshold of P-value for stop splitting (stopping criterion) |
d0 |
:A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010) |
S.plot |
:call for plot the KM estimator for each split |
score |
:TRUE = score test (Emura et al. 2019); FALSE = log-rank test |
In order to stabilize the univariate score tests, a small value "d0" is added to the variance of the score statistics (Witten and Tibshirani 2010). d0=0 corresponds to the logrank test. To perform a large number of the score tests, the "compound.Cox" packages (Emura et al.2019) is applied with d0 as a option.
A nested list describing a classification tree, consisting of inner nodes and terminal node.
Emura T, Hsu WC, Chou WC (2021). A survival tree based on stabilized score tests for high-dimensional covariates, in review
Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.
Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51
data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)data(Lung,package="compound.Cox") train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data t.vec=train_Lung[,1] d.vec=train_Lung[,2] x.mat=train_Lung[,-c(1,2,3)] uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)
Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 4 depend on the quantile.
X.pathway_discrete.balanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)X.pathway_discrete.balanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)
n |
:the number of individuals (sample size) |
p |
:the number of genes |
q1 |
:the number of genes in the first pathway |
q2 |
:the number of genes in the second pathway |
rho1 |
:the correlation coefficient for the first pathway |
rho2 |
:the correlation coefficient for the second pathway |
X n by p matrix of gene expressions
Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627
## generate 6 gene expressions from 10 individuals X.pathway_discrete.balanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)## generate 6 gene expressions from 10 individuals X.pathway_discrete.balanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)
Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 3 depend on the quantile and randomly replace a element to 4 in the last p-(q1+q2) column for each row.
X.pathway_discrete.imbalanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)X.pathway_discrete.imbalanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)
n |
:the number of individuals (sample size) |
p |
:the number of genes |
q1 |
:the number of genes in the first pathway |
q2 |
:the number of genes in the second pathway |
rho1 |
:the correlation coefficient for the first pathway |
rho2 |
:the correlation coefficient for the second pathway |
X n by p matrix of gene expressions
Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627
## generate 6 gene expressions from 10 individuals X.pathway_discrete.imbalanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)## generate 6 gene expressions from 10 individuals X.pathway_discrete.imbalanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)