Supplementary Material for “Sample Size Formulae for Two-Stage Randomized Trials with Censored Data” by Zhiguo Li and Susan Murphy 1 Sample Size Formulae for More General Two-Stage Randomized Trials In the simple two-stage sequential multiple assignment randomized trial considered in the paper, nonresponders to both first-stage treatments are rerandomized to one of two secondstage treatments, while responders are not rerandomized. In practice, all kinds of variants of two-stage designs exist. For example, in a two-stage sequential multiple assignment randomized trial that is currently underway for drug addicted pregnant women conducted by Hendree Jones at the RTI International, Research Triangle Park, North Carolina, USA, both responders and and nonresponders to the first stage of treatment are rerandomized. At first, all subjects are randomized to either a traditional reinforcement based treatment or a reduced reinforcement based treatment with reduced intensity/scope. Response to the initial treatment is assessed at week 2. The criterion for nonresponse involved adherence to treatment as well as positive urine tests. Early non-responders are rerandomized to receive either the same treatment or a greater intensity/scope of their initial treatment, while responders are randomized to either the same intensity of treatment or a decreased intensity or scope of the initial treatment. An interesting outcome in this trial is time until dropout from counseling. In another trial concerning the treatment of children with autism conducted by Connie Kasari at the University of California at Los Angels, only nonresponders to one of the two first-stage treatments are rerandomized. In this trial, all subjects are initially randomized to either joint attention/joint engagement supplemented with an individualized augmentative/alternative communication system, or joint attention/joint engagement and spoken communication intervention, both treatments lasting for 12 weeks. At the end of the 12 week period, response is assessed, using measures such as the number of words and the number of communicative functions used spontaneously during parent-child interaction. All responders stay on the initial treatment; nonresponders to joint attention/joint engagement supplemented with an individualized augmentative/alternative communication system 1 start more intense treatment, and only nonresponders to joint attention/joint engagement and spoken communication intervention are further randomized to two options: start more intense treatment or switch to joint attention/joint engagement supplemented with an individualized augmentative/alternative communication system. The primary outcome of this trial is not a failure time outcome, but the time until the number of words used spontaneously during parent-child interaction reaches a certain level could also be of interest. The approach developed in our paper for sample size calculation can be generalized to the following general two-stage sequential multiple assignment randomized trials. Denote A1 to be the coding variable for the options for the first-stage treatment, which can take values 1, 2, · · ·, or k1 . Responders to A1 = j are further randomized to one of the second-stage R treatments AR 2j = 1, 2, · · · , or k2j , and nonresponders to A1 = j are further randomized to N one of the second-stage treatments AN 2j = 1, 2, · · ·, or k2j , for j = 1, 2, · · · , k1 . The twostage sequential multiple assignment randomized trial considered in the paper corresponds N N R R to the special case in which k1 = 2, k21 = k22 = 2 and k21 = k22 = 1. The test statistic based on the weighted Kaplan–Meier estimator and the weighted log rank test statistic for comparing two treatment strategies can be defined in a similar manner as before; only the weights need to be modified. See the following for examples. Further, it is easy to see that in the more general designs, the asymptotic variance formulae in Theorems 1 and 3 given below in Section 2 remain unchanged, but with a different formula for the weight functions. The different weight functions results in different upper bounds on the variances and hence different sample size formulae. In the following we illustrate this with two examples. First consider a design in which both responders and nonresponders are rerandomized, as in the trial for drug dependent pregnant women. Suppose all subjects are first randomized to treatment A1 = j with probability p1j , j = 1, 2, · · · , k1 . Those who respond to A1 = j are Ri further randomized to AR 2j = i with probability p2j , and those who do not respond to A1 = j N i are further randomized to A2j = i with probability pN 2j . Consider the following two strategies: strategy 111, assign treatment A1 = 1 as the initial treatment, and if there is response then N assign AR 21 = 1 but if there is no response, then assign A21 = 1, and strategy 222, assign A1 = 2 first, and responders to A1 = 2 are then assigned AR 22 = 2 and nonresponders to N A1 = 2 are assigned A22 = 2. Let R be the indicator for observing a response. If we use the notation in the paper, then R = I{S > min(T, C)}, where S is time to nonresponse, T is the failure time, and C is the censoring time. In some trials a criterion for response instead of nonresponse is defined. In that case, if we denote S ∗ to be time to response, then the definition of R should be changed to R = I{S ∗ ≤ min(T, C)}. Using this notation, the time independent weight function for strategy 111 is (1 − R)I(AN I(A1 = 1) RI(AR 21 = 1) 21 = 1) W1 = + , 1 p11 pR1 pN 21 21 2 and the weight function for strategy 222 is I(A1 = 2) RI(AR (1 − R)I(AN 22 = 2) 22 = 2) W2 = . + 2 p12 pR2 pN 22 22 If we use the test based on the weighted Kaplan-Meier with time independent weights to calculate the sample size, we need upper bounds for Z τ 2 Wj E {dN (u) − Y (u)dΛj (u)} , j = 1, 2, ¯ ¯ 0 Fj (u)FC (u) where F¯j (t) and Λj (t) are the survival function and cumulative hazard function of Tj — the potential failure time under strategy jjj, j = 1, 2. Denote the counting process and the at risk process of Tj by Nj (t) and Yj (t), respectively, for j = 1, 2. By repeated expectations, we have, when j = 1, 2 Z τ W1 {dN (u) − Y (u)dΛ1 (u)} E ¯ ¯ 0 F1 (u)FC (u) 2 Z τ 1 2 {dN1 (u) − Y1 (u)dΛ1 (u)} = EW1 ¯ ¯ 0 F1 (u)FC (u) 2 Z τ 1 1 1 {dN1 (u) − Y1 (u)dΛ1 (u)} 1 = EE + − N 1 R | T1 , C (1) 1 p11 pN pR1 p21 F¯1 (u)F¯C (u) 0 21 21 hR i2 1 (u)dΛ1 (u)} N1 ≤ 1 R1 E τ {dN1 (u)−Y (pR1 21 ≤ p21 ), 0 F¯1 (u)F¯C (u) p11 p21 h i ≤ 1 E R τ {dN1 (u)−Y1 (u)dΛ1 (u)} 2 N1 (pR1 1 21 > p21 ), 0 F¯1 (u)F¯C (u) p11 pN 21 Z τ 1 1 = dΛ1 (u). R1 N 1 p11 min(p21 , p21 ) 0 F¯1 (u)F¯C (u) Similarly, for j = 2, we have Z E τ W2 {dN (u) − Y (u)dΛ2 (u)} ¯ ¯ 0 F2 (u)FC (u) Z τ 1 1 ≤ dΛ2 (u). R2 N 2 p12 min(p22 , p22 ) 0 F2 (u)F¯C (u) 2 It follows that the sample size for comparing strategies 111 and 222 is (Z1− α2 + Z1−β )2 σB2 , nK ≤ ¯ {F2 (τ ) − F¯1 (τ )}2 where σB2 Z τ Z τ F¯12 (τ ) dΛ1 (u) F¯22 (τ ) dΛ2 (u) = + . R1 N 1 R2 N 2 ¯ ¯ ¯ p11 min(p21 , p21 ) 0 F1 (u)FC (u) p12 min(p22 , p22 ) 0 F2 (u)F¯C (u) 3 Similarly, the sample size based on the weighted log rank test with time independent weights and upper bounds on variances is (Z1− α2 + Z1−β )2 1 1 Rτ + nL ≤ , N1 N2 p11 min(pR1 p12 min(pR2 ξ 2 0 F¯C (u)dF1 (u) 21 , p21 ) 22 , p22 ) where ξ is the log hazard ratio between T1 and T2 . From (1) we can see that, if the randomization probabilities are discrete uniform in the second-stage randomization and the number of options at the second stage are the same regardless of prior stage treatment, then the upper bounds are exact and there is no conservatism in the sample sizes. This was the case in the trial for drug addicted pregnant women described above. In this case, the weights are actually unnecessary. Of course, this cannot occur in designs in which some subjects are not rerandomized, for instance, in designs in which all responders are not rerandomized, implying that the number of options for these subjects is 1. For a second example, consider a design in which only nonresponders to the initial treatment A1 = 1 are rerandomized to one of two second-stage treatments, while all other subjects are given a fixed second-stage treatment, as in the autism trial mentioned above. In this design, there are three possible strategies: strategy 11, assign treatment A1 = 1 first, and then assign treatment A2 = 1 to nonresponders; strategy 12, assign treatment A1 = 1 first, and assign treatment A2 = 2 to nonresponders; strategy 2, assign treatment A1 = 2 first, and then assign a fixed treatment afterwards, which may or may not depend on the response status. Suppose the randomization probabilities are pr(A1 = 1) = p and pr(A2 = 1|R = 0) = q, where R is the indicator for response. If we use time independent weights, the weight function associated with strategy 11 is the same as the weight function W1 in the paper except the difference in the meaning of R, while the weight function associated with strategy 2 is W2 = I(A1 = 2)/(1 − p). At first, suppose we use the test based on the weighted Kaplan-Meier estimator with time independent weights to test the equivalence of strategies 11 and R 2τ to size the study. Similarly as in Section 2 3 in ¯ ¯ the paper, we need upper bounds on E 0 W1 /{F1 (u)FC (u)}{dN (u) − Y (u)dΛ1 (u)} and R τ 2 E 0 W2 /{F¯2 (u)F¯C (u)}{dN (u) − Y (u)dΛ2 (u)} , where F¯1 (u), F¯2 (u), Λ1 (t) and Λ2 (t) are the survival functions and cumulative hazard functions of the potential failure times under strategies 11 and 2, respectively. The formula for the upper bound when W1 is involved is the same as the upper bound in display (2) in the paper since the formula of W1 is unchanged. However, when W2 is involved, we actually do not need an upper bound since the quantity can be calculated precisely as 2 Z τ Z τ 1 dΛ2 (u) W2 {dN (u) − Y (u)dΛ2 (u)} = , E ¯ ¯ ¯ 1 − p 0 F2 (u)F¯C (u) 0 F2 (u)FC (u) because there is no response indicator involved in W2 . Consequently, the sample size using 4 a test based on the weighted Kaplan-Meier estimator and upper bounds of variances is (Z1− α2 + Z1−β )2 σB2 nK ≤ ¯ , {F2 (τ ) − F¯1 (τ )}2 where σB2 Z Z F¯22 (τ ) τ dΛ2 (u) F¯12 (τ ) τ dΛ1 (u) + . = ¯ ¯ pq 1 − p 0 F¯2 (u)F¯C (u) 0 F1 (u)FC (u) Similarly, the sample size derived from the weighted log rank test and upper bounds of variances is (Z1− α2 + Z1−β )2 1 1 Rτ + , nL ≤ pq 1 − p ξ 2 0 F¯C (t)dF1 (t) where ξ is the log hazard ratio between the potential failure times under strategies 11 and 2. Finally, sample size formulae for sequential multiple assignment randomized trials with more than two stages can be obtained in a similar way as for the designs discussed above. Again, the difference is only in the weight functions and the resulting upper bounds of variances, but the idea for obtaining upper bounds and the form of the upper bounds and sample size formulae are similar. 2 Asymptotic Results and Proofs 2.1 The weighted Kaplan–Meier estimators The following theorem provides the asymptotic distribution of the weighted Kaplan–Meier estimator. Theorem 1 Assume that F¯j (t) > δ0 , j = 1, 2, and F¯C (t) > δ0 for some δ0 > 0. Then 2 n1/2 {Fˆ¯ Kj (t) − F¯j (t)} →d N {0, σKj (t)}, in distribution, as n → ∞, j = 1, 2, where Z t 2 Wj 2 2 ¯ σKj (t) = Fj (t)E {dN (u) − Y (u)dΛj (u)} , ¯ ¯ 0 Fj (u)FC (u) 5 (2) when time independent weights are used and Z t 2 Wj (u) 2 2 ¯ σKj (t) = Fj (t)E {dN (u) − Y (u)dΛj (u)} , ¯ ¯ 0 Fj (u)FC (u) (3) when time dependent weights are used. Proof. We only need to show the proof for Fˆ¯ K1 (t) with time dependent weights. The proof for Fˆ¯ K2 (t) and proofs when time independent weights are used are parallel. At first, we note that martingale theory cannot be applied easily in this problem. If the filtration Ft is defined in the usual way, the process N (t) − Y (t)dΛ(t) is a martingale with respect to Ft , where Λ(t) is the cumulative hazard function of the failure time T . However, our statistics involve weights, which depend on another quantity S which is time until nonresponse to the initial treatment, and it is likely to be dependent with T . Since the variable S is not involved in the definition of the filtration Ft , the weights are not predictable with respect to this filtration. Moreover, if one adds S into the definition of Ft to make the weights predictable, then the compensator of N (t) will no longer be Y (t)dΛ(t) and it is hard to find its exact form. Due to this difficulty, we employ the empirical process theory for the proof of this theorem. Let X be the vector of observed Let P be the probability R Pndata for a single subject. measure of X. Denote Pn f (X) = i=1 f (Xi )/n, P f = f dP , and Gn f = n1/2 (Pn − P )f for any function f of X. Let dNW j (u) = Wj (u)dN (u) and YW j (u) = Wj (u)Y (u), j = 1, 2. At first, by Proposition 2 in Guo and Tsiatis (2005), for any function θ(u) on the real line, Z t θ(u) {dNW 1 (u) − YW 1 (u)dΛ1 (u)} = 0. P ¯ 0 F1 (u) It follows from this equality, along with a similar argument as in the proof of Theorem 3.2.3 in Fleming and Harrington (1991), that n1/2 {Fˆ¯ K1 (t) − F¯1 (t)} Z t ˆ¯ F K1 (u−) 1 ¯ {dNW 1 (u) − YW 1 (u)dΛ1 (u)}. = −F1 (t)Gn F¯1 (u) Y¯W 1 (u) 0 (4) We first show that the estimator Fˆ¯ K1 (u) is uniformly consistent in [0, t], for any 0 < t ≤ τ . In order to do this, we need to show that some classes of functions are Donsker (van der Vaart and Wellner, 1996, page 81). Define the classes of functions Φ = {φ(u) : φ(u) is a monotone function from [0, t] to [δ0 , 1]}, φ1 (u) Θ = θ(u) = : φ1 (u) ∈ Φ, φ2 (u) ∈ Φ , φ2 (u) 6 and Z F = fθ (X) = 0 t θ(u) {dNW 1 (u) − YW 1 (u)dΛ1 (u)} : θ(u) ∈ Θ . F¯1 (u) In the following,R denote C to be a generic constant. For any real function f defined on [0, t], t denote ||f ||21 = 0 f 2 (u)dF1 (u). By Theorem 2.7.5 in van der Vaart and Wellner (1996), the ε-bracketing number of Φ under the norm || · ||1 is of order K = exp(C/ε) for a positive constant C. Let φL1 (u), φU1 (u), · · · , φLK (u), φUK (u) be the set of ε-brackets covering Φ. For any function θ(u) in Θ, there exist functions φ1 (u) and φ2 (u) in Φ such that θ(u) = φ1 (u)/φ2 (u). Let φLr1 (u), φUr1 (u) and φLr2 (u), φUr2 (u) be ε-brackets for φ1 (u) and φ2 (u), respectively. Then φLr1 (u)/φUr2 (u) ≤ θ(u) ≤ φUr2 (u)/φLr1 (u) and L φr1 (u) φUr2 (u) 2 L U 2 L U 2 φU (u) − φL (u) ≤ C{||φr1 (u) − φr1 (u)||1 + ||φr2 (u) − φr2 (u)||1 }. r2 r1 1 This implies that the ε-bracketing number of Θ is of the same order as that of Φ, which we also denote by K = exp(C/ε). Therefore, there exist functions θjL (u) ∈ Θ, θjU (u) ∈ Θ, 1 ≤ j ≤ K such that ||θjL − θjU ||1 ≤ ε, 1 ≤ j ≤ K, and for any θ(u) ∈ Θ, θiL (u) ≤ θ(u) ≤ θiU (u) for some 1 ≤ i ≤ K. Consequently, the function fθ (X) in F satisfies Z t U Z t L θi (u) θi (u) dNW 1 (u) − YW 1 (u)dΛ1 (u) ≡ fiL (X) fθ (X) ≥ ¯ ¯ F (u) F (u) 1 1 0 0 and Z fθ (X) ≤ 0 t θiU (u) dNW 1 (u) − F¯1 (u) Z 0 t θiL (u) YW 1 (u)dΛ1 (u) ≡ fiU (X), ¯ F1 (u) and moreover, if we define ||f ||22 = P f 2 , then Z t L Z t U U L (u) (u) (u) − θ (u) − θ θ θ i i i i L U ||fi − fi ||2 = dN (u) − Y (u)dΛ (u) W 1 W 1 1 F¯1 (u) F¯1 (u) 0 0 2 Z t L Z t U U L θ (u) − θ (u) θ (u) − θ (u) i i i i ≤ dN (u) + Y (u)dΛ (u) W 1 W 1 1 ¯ ¯ F (u) F (u) 1 1 0 0 2 2 (Z 1 ) 1 Z τ {θiL (u) − θiU (u)}2 dF1 (u) ≤ C ≤ C {θiU (u) − θiL (u)}2 dΛ1 (u) + 2 0 0 (Z τ 2 21 Z t 21 ) t L U 2 U L 2 {θi (u) − θi (u)} dF1 (u) + {θi (u) − θi (u)} dF1 (u) 0 0 ≤ Cε. It follows that the functions f1l , f1U , · · · , fKl , fKU are Cε brackets under the L2 (P ) norm and they cover F. Hence the ε-bracketing number of F under the L2 (P ) norm is also of order 7 exp(C/ε), and therefore by Theorem 2.5.6 in van der Vaart and Wellner (1996), F is a P -Donsker class. It follows that, by the continuous mapping theorem, Z t ˆ F¯ K1 (u−) 1 {dNW 1 (u) − YW 1 (u)dΛ1 (u)} Gn ¯ ¯ F1 (u) YW 1 (u) 0 Z t θ(u) ≤ sup Gn {dNW 1 (u) − YW 1 (u)dΛ1 (u)} ¯ θ∈Θ 0 F1 (u) → sup |Gfθ |, θ∈Θ in distribution, as n → ∞, where {Gf : f ∈ F} is a P -Brownian bridge process. In light of (4), this implies that Fˆ¯ K1 (t) → F¯1 (t) in probability, with rate 1/n1/2 . Since both Fˆ¯ K1 (·) and F¯1 (·) are increasing and bounded functions, and F¯1 (·) is continuous, it follows that sup |Fˆ¯ K1 (u) − F¯1 (u)| → 0, and sup |Fˆ¯ K1 (u−) − F¯1 (u)| → 0, u∈[0,t] (5) u∈[0,t] in probability. This holds for every t ∈ [0, τ ]. To show the asymptotic normality of Fˆ¯ K1 (t), we write Z t 1 ˆ ¯ ¯ ¯ {dNW 1 (u) − YW 1 (u)dΛ1 (u)} F K1 (t) − F1 (t) = −F1 (t) (Pn − P ) 0 yW 1 (u) + (Pn − P )Dn (X)] , (6) where yW 1 (u) = E{W1 Y (u)} = F¯1 (u)F¯C (u) and ) Z t( Fˆ¯ K1 (u−) 1 1 − ¯ {dNW 1 (u) − YW 1 (u)dΛ1 (u)}. Dn (X) = yW 1 (u) F1 (u) Y¯W 1 (u) 0 Let θn (u) = Fˆ¯ K1 (u−)/Y¯W 1 (u) and θ0 (u) = 1/F¯C (u). Then by (5) and the law of large numbers, ||θn − θ0 ||∞ ≡ sup0≤u≤τ |θn (u) − θ0 (u)| → 0, as n → ∞. Since the class of functions F is P -Donsker, by equicontinuity (see van der Vaart and Wellner, 1996, page 89), for large n and some sequence δn → 0, with high probability, we have |n1/2 (Pn − P )Dn (X)| = |Gn {fθn (X) − fθ0 (X)}| ≤ sup |Gn {fθ (X) − fθ0 (X)}| ||θ−θ0 ||∞ ≤δn ≤ sup |Gn {fθ (X) − fθ0 (X)}| ||fθ −fθ0 ||2 ≤Cδn → 0. Therefore, (6) can be rewritten as 1/2 n {Fˆ¯ K1 (t) − F¯1 (t)} = −F¯1 (t)Gn Z 0 t 1 {dNW 1 (u) − YW 1 (u)dΛ1 (u)} + op (1), yW 1 (u) 8 (7) from which the asymptotic normality of Fˆ¯ K1 (t) follows, with an asymptotic variance that is stated in the theorem. Although it is not of interest in this paper, for completeness, we now demonstrate how we can construct a test statistic for testing for strategies 11 and 12, which is similar to testing for strategies 21 and 22. Denote the survival function of T12 , the potential failure time under strategy 12, by F¯3 (t), and denote its weighted Kaplan–Meier estimator by Fˆ¯ K3 (t). Unlike strategies with different initial treatments, strategies 11 and 12 share subjects, so Fˆ¯ K1 (t) and Fˆ¯ K3 (t) are correlated, but we can obtain their joint asymptotic distribution. Without loss of generality, assume time dependent weights are used. By the proof of Theorem 1, we can obtain the following equality similarly as (7): Z t 1 1/2 ˆ ¯ ¯ ¯ {dNW 3 (u) − YW 3 (u)dΛ3 (u)} + op (1), (8) n {F K3 (t) − F3 (t)} = −F3 (t)Gn 0 yW 3 (u) where Λ3 (t) is the cumulative hazard function of T12 , yW 3 (u) = F¯3 (u)F¯C (u), dNW 3 (u) = W3 (u)dN (u), YW 3 (u) = W3 (u)Y (u), and I(A1 = 1) I(A2 = 2) R(u) W3 (u) = 1 − R(u) + p 1−q is the time dependent weight function for strategy 12. By (7) and (8), n1/2 {Fˆ¯ K1 (t) − F¯1 (t), Fˆ¯ K3 (t) − F¯3 (t)} converges in distribution to N (0, Σ), as n → ∞, where 2 σK1 (t) σK13 (t) Σ= 2 (t) σK13 (t) σK3 2 (t) defined by (3) for j = 3, and with σK3 Y Z t Wj (u) ¯ ¯ σK13 (t) = F1 (t)F3 (t)E {dN (u) − Y (u)dΛj (u)}. ¯ ¯ 0 Fj (u)FC (u) (9) j∈{1,3} From this result, a test statistic for testing H0 : F¯1 (t) = F¯3 (t) for some fixed t using weighted Kaplan–Meier estimators can be constructed similarly as testing H0 : F¯1 (t) = F¯2 (t) in the paper. The only difference is that here we need to estimate the covariance σK13 (t) by an empirical estimator based on (9). 2.2 The weighted sample proportion estimator This estimator is a modification of the third weighted sample proportion estimator of F¯j (t), for j = 1, 2, in Lunceford et al. (2002) by using time dependent weights as follows. Denote 9 Gj (t, u) = E{I(u ≤ Tj1 ≤ t)}/pr(T > u), and GWj (u) = E[{Wj (T ) − 1}I(T ≥ u)]/pr(T ≥ u), j = 1, 2. Denote Lαj (t, u) = {Wj (T )I(U ≤ t) − Gj (t, u)} × {Wj (T ) − 1 − GWj (u)}I(T ≥ u) and Gαj (u) = {Wj (T ) − 1 − GWj (u)}2 I(T ≥ u). Define n ˆ W (u) = G j 1 X I(Ui ≥ u) ∆i {Wji (Ui ) − 1} , nFˆ¯ (u) i=1 Fˆ¯ C (Ui ) ˆ j (t, u) = G I(Ui ≥ u) 1 X ∆i Wji (Ui )I(Ui ≤ t) , nFˆ¯ (u) i=1 Fˆ¯ C (Ui ) n n X ˆ j (t, u)} × {Wji (Ui ) − 1 − G ˆ W (u)} I(Ui ≥ u) , ˆ α (t, u)} = 1 ∆i {Wji (Ui )I(Ui ≤ t) − G E{L j j n i=1 Fˆ¯ C (Ui ) and n X ˆ W (u)}2 I(Ui ≥ u) . ˆ α (u)} = 1 ∆i {Wji (Ui ) − 1 − G E{G j j n i=1 Fˆ¯ C (Ui ) Recall that Fˆ¯ (u) is the usual Kaplan–Meier estimator of F¯ (u), the survival function of T . Also define Z τ n I(Ui ≤ t) 1X ˆ α (t, u)} ∆i Wji (Ui ){Wji (Ui ) − 1} + dN c (u){Fˆ¯ C (u)Y (u)}−1 E{L A1j = j ˆ n i=1 ¯ 0 F C (Ui ) Z τ n 1X 2 ˆ αj (u)}, dN c (u){Fˆ¯ C (u)Y (u)}−1 E{G {Wji (Ui ) − 1} + A2j = n i=1 0 and α ˆ j = A1j /A2j , j = 1, 2, where N c (u) = I(U ≤ u, C ≤ T ) and Fˆ¯ C (u) is the usual Kaplan–Meier estimator of F¯C (u). The modified sample proportion estimator of F¯j (t) is define as Fˆ¯ Sj (t) = 1 − FˆSj (t), where n n 1 X ∆i Wji (Ui ) 1 X ∆i FˆSj (t) = I(Ui ≤ t) − α ˆj {Wji (Ui ) − 1}, j = 1, 2, n i=1 Fˆ¯ C (Ui ) n i=1 Fˆ¯ C (Ui ) Clearly, the α ˆ j defined above is a consistent estimator of αj , which is given by Z τ c −1 α αj = E[Wji (Ui ){Wji (Ui ) − 1}I(Ui ≤ t)] + λ (u)F¯C (u) E{Lj (t, u)}du 0 Z τ 2 c −1 α ÷ E{Wji (Ui ) − 1} + λ (u)F¯C (u) E{Gj (u)}du , j = 1, 2, 0 10 (10) where λc (u) is the hazard function of the censoring time C. Note that, arguing in a similar way as in Lunceford et al. (2002), this choice of αj minimizes the variance of the influence function corresponding to the estimator of Fj (t) by solving the equation n X ∆i i=1 Fˆ¯ C (Ui ) [Wji (Ui )I(Ui ≤ t) − Fj (t) − αj {Wji (Ui ) − 1}] = 0. The following theorem shows the asymptotic properties of this estimator. The proof for this theorem is omitted here, because it is parallel to the proof for the unmodified weighted sample proportion estimator in Lunceford et al. (2002). Theorem 2 Assume that F¯j (t) > δ0 , j = 1, 2, and F¯C (t) > δ0 for some δ0 > 0. Then 2 n1/2 {Fˆ¯ Sj (t) − F¯j (t)} → N {0, σSj (t)} in distribution, j = 1, 2, as n → ∞, where 2 σSj (t) = E[Wj (T )I(T ≤ t) − Fj (t) − αj {Wj (T ) − 1}]2 Z t E{Lj (t, u)}2 c λ (u)du, + F¯C (u) 0 with Lj (t, u) = [Wj (T )I(T ≤ t) − Gj (t, u) − αj {Wj (T ) − 1 − GWj (u)}]I(T ≥ u). 2.3 The weighted log rank statistics We derive the asymptotic properties of the weighted log rank test statistic using a proportional hazards assumption and under a local alternative. As described in the paper, we use the local alternative, Hn : λn2 (t) = λ1 (t) exp(γ/n1/2 ), n ≥ 1, where γ is a constant. To make it clear that the hazard function λ2 (t) depends on n under the local alternative hypothesis, we denote it by λn2 (t). We denote the distribution of the observed data under Hn as Pn , and denote the distribution under the null hypothesis, which corresponds to γ = 0, as P0 . The theorem below gives the asymptotic distribution of the weighted log rank test statistic. Theorem 3 Assume that F¯1 (τ ) > δ0 and F¯C (τ ) > δ0 for some δ0 > 0. Then 2 2 n1/2 Ln → N {µL , (σL1 + σL2 )/4} 11 in distribution, under Pn , as n → ∞, where µL = γ 2 σLj Rτ 0 F¯C (t)dF1 (t)/2 and 2 τ Z Wj {dN (t) − Y (t)dΛ1 (t)} = P0 , j = 1, 2, 0 when time independent weights are used, and Z τ 2 2 Wj (t){dN (t) − Y (t)dΛ1 (t)} , j = 1, 2, σLj = P0 0 when time dependent weights are used. Proof. In the following we assume that the weights are time dependent. The proof is similar when time independent weights are used. Define Z F= τ θ(u){dNW 2 (u) − YW 2 (u)dΛ1 (u)} : θ(u) ∈ Θ , 0 where Θ is as defined in the proof of Theorem 1. We first prove that n1/2 (Pn − Pn ) converges to GP0 in `∞ (F) under Pn , as n → ∞, where {GP0 f : f ∈ F} is a P0 -Brownian bridge process. The proof of the asymptotic normality of Ln relies on the asymptotic equi-continuity of the process n1/2 (Pn − Pn ) implied by this result. We use Theorem 2.8.9 in van der Vaart and Wellner (1996) for the proof. In order to use that theorem, we need to verify the following three conditions: 1. supf,g∈F |ρPn (f, g) − ρP0 (f, g)| → 0, where ρP (f, g) ≡ varP (f − g). 2. There exists an envelope function F of F such that lim supn→∞ Pn F 2 I(F ≥ εn1/2 ) = 0, and Pn F 2 = O(1). 3. Both Fδ,Pn = {f − g : f, g ∈ F, ||f − g||Pn ,2 < δ} and F∞ = {(f − g)2 : f, g ∈ F, ||f − g||Pn ,2 < δ} are Pn -measurable (van der Vaart and Wellner, 1996, page 110) for every δ > 0 and n. We first check condition 1. Let Z τ f= θ1 (u){dNW 2 (u) − YW 2 (u)dΛ1 (u)}, 0 and Z τ θ2 (u){dNW 2 (u) − YW 2 (u)dΛ1 (u)}, g= 0 12 for some functions θ1 (u), θ2 (u) ∈ Θ. Let fP (t21 , s) and fP (t21 ) be the probability density functions of (T21 , S) and T21 under probability measure P of the observed data, and let FC (c) be the distribution function of the censoring time C. Denoting Y21 (u) = I(T21 ≥ u, C ≥ u), by our assumptions on Θ, we can write ρP (f, g) = P {θ1 (T21 ) − θ2 (T21 )}I(T21 ≤ C)I(T21 ≤ τ )W2 (T21 ) 2 Z τ {θ1 (u) − θ2 (u)}W2 (u)Y21 (u)dΛ1 (u) − 0 1 I(S ≤ T21 ) h = P I(S > T21 ) + {θ1 (T21 ) − θ2 (T21 )}I(T21 ≤ C)I(T21 ≤ τ ) 1−p 1−q Z τ i2 − {θ1 (u) − θ2 (u)}Y21 (v)dΛ1 (u) 0 Z ∞Z ∞Z ∞ K(t21 , s, c)fP (t21 , s)dt21 dsdFC (c), (11) = 0 0 0 for a function K(t21 , s, c) that can be bounded by a constant A and is independent of n. Under Hn , we have Λn2 (u) = Λ1 (u) exp(γ/n1/2 ), where Λn2 (u) is the cumulative hazard function of T21 under Hn . This implies that γ γ exp 1/2 −1 ¯ n −1 . fPn (t21 ) − fP0 (t21 ) = fP0 (t21 ) exp 1/2 F1 (t21 ) n This, combined with the fact that fPn (s | t21 ) = fP0 (s | t21 ), yields γ γ exp 1/2 −1 n −1 . fPn (t21 , s) − fP0 (t21 , s) = fP0 (t21 , s) exp 1/2 F¯1 (t21 ) n (12) Now by (11) and (12), it follows that, for any f, g ∈ F, |ρPn (f, g) − ρP0 (f, g)| Z ∞Z ∞Z ∞ K(t21 , s, c)|fPn (t21 , s) − fP0 (t21 , s)|dt21 dsdFC (c) ≤ 0 0 0 Z ∞Z ∞Z ∞ γ γ exp 1/2 −1 n − 1 dt21 dsdFC (c). (13) ≤A fP0 (t21 , s) exp 1/2 F¯1 (t21 ) n 0 0 0 Now we claim, by the dominated convergence theorem, that the right hand side of (13) converges to 0 as n → ∞. At first, the absolute value in the integrand converges to 0 as n → ∞. In addition, when γ > 0, the absolute value in the integrand is bounded by exp(γ) + 1, and when γ < 0, it is bounded by {F¯1 (t21 )}exp(γ)−1 + 1. In the latter case, plugging the absolute value by this bound, the integral is bounded by P0 {F¯1 (T11 )}α + 1, where α = exp(γ) − 1 < 0. Since F¯1 (T11 ) is uniformly distributed in [0, 1] and α > −1, it follows thatP0 {F¯1 (T11 )}α < ∞, and hence the dominated convergence theorem applies. 13 Rτ To check condition 2, note that the functions 0 θ(u){dNW 2 (u) − YW 2 (u)dΛ1 (u)} are bounded by a constant under our assumptions. So we can choose the envelope function F to be the upper bound. For such an envelope function, the first part of condition (2) is obviously satisfied. And we also have that Pn F 2 = O(1). 2 Finally, the Pn -measurability of Fδ,Pn and F∞ in condition 3 follows since θ is a monotone function divided by another monotone function. ForP any monotone function, it is the (pointwise) limit of a series of step functions of the form ni=1 ci I(ti−1 < t ≤ ti ), where all the ti s are rational numbers. Since the set of all such functions is countable, by Example 2 are Pn -measurable. 2.3.4 in van der Vaart and Wellner (1996), both Fδ,Pn and F∞ Now we conclude from Theorem 2.8.9 in van der Vaart and Wellner (1996) that n1/2 (Pn − Pn ) converges to GP0 in `∞ (F) under Pn , as n → ∞. Similarly, if we define Z t 0 θ(u){dNW 1 (u) − YW 1 (u)dΛ1 (u)} : θ(u) ∈ Θ , F = 0 then we can also show that n1/2 (Pn − Pn ) converges to GP0 in `∞ (F 0 ) under Pn . Before we can show the asymptotic normality of Gn , we need to show that 1 Y¯W j (t) − → 0, j = 1, 2, sup ¯ ¯ 2 t∈[0,τ ] YW 1 (t) + YW 2 (t) in probability, as n → ∞. This follows from the fact that supt∈[0,τ ] |Y¯W j (t) − yW 1 (t)| → 0 in probability, j = 1, 2, which is a consequence of the asymptotic normality of n1/2 {Y¯W j (t) − yW j (t)} under Pn . The latter can be proved by the Lindeberg-Feller central limit theorem, the details of which are omitted here. From these results, it follows that, under Pn , Z τ Y¯W 2 (t) 1/2 1/2 n Ln = n (Pn − Pn ) {dNW 1 (t) − YW 1 (t)dΛ1 (t)} ¯ ¯ 0 YW 1 (t) + YW 2 (t) Z τ Y¯W 1 (t) 1/2 −n (Pn − Pn ) {dNW 2 (t) − YW 2 (t)dΛn2 (t)} ¯ ¯ Y (t) + Y (t) W1 W2 0 Z τ ¯ ¯ Y (t) Y (t) W1 W2 {λ1 (t) − λn2 (t)}dt +n1/2 ¯ ¯ 0 YW 1 (t) + YW 2 (t) Z τ 1/2 n = (Pn − Pn ) {dNW 1 (t) − YW 1 (t)dΛ1 (t)} 2 0 Z τ n1/2 − (Pn − Pn ) {dNW 2 (t) − YW 2 (t)dΛ1 (t)} 2 0 Z γ τ ¯ + F1 (t)F¯C (t)dΛ1 (t) + oPn (1). (14) 2 0 14 Again by the Lindeberg-Feller central limit theorem, the first term on the right hand side of 2 the above equality converges in distribution to N (0, σL1 /4), and the second term converges 2 to N (0, σL2 /4), both under Pn , where 2 σLj Z 2 τ Wj {dN (t) − Y (t)dΛ1 (t)} = P0 , j = 1, 2. 0 2 2 Finally, Gn /n1/2 →d N {µL , (σL1 + σL2 )/4} under Pn , where µL = γ Rτ ¯ γ 0 FC (t) dF1 (t)/2. Rτ 0 F¯C (t)dΛ1 (t)/2 = We used the independence of the first and second terms on the right hand side of (14) to obtain the asymptotic variance formula for Gn . In a weighted log rank test where the two strategies that are compared in the test start with the same initial treatment, the above independence does not hold. In such cases, one needs to add a covariance term, i.e., the covariance between the first two integrals on the right hand side of (14). Since this covariance is the expectation of the product of the two terms, it can also be estimated empirically from the observed data. References Fleming, T.R. & Harrington D.P. (1991). Counting Processes and Survival Analysis. Wiley: New York. Guo X. & Tsiatis A.A. (2005). A weighted risk set estimator for survival distributions in two-stage randomization designs with censored survival data. The International Journal of Biostatistics 1(1), 1-15. Lunceford, J.K., Davidian M. & Tsiatis, A.A. (2002). Estimation of survival distributions of treatment strategies in two-stage randomization designs in clinical trials. Biometrics 58, 48-57. van der Vaart, A.W. & Wellner, J.A. Weak Convergence and Empirical Processes, Springer-Verlag, New York. 15

© Copyright 2018