ANALYSIS SEERS BREAST CANCER DATA USING RANDOM FOREST

Yong Xu, Bong-Jin Choi

Abstract


The object of the present study is to conduct a machine learning analysis for breast cancer tumor size prediction for United States patients based on real uncensored data. Based on the result of this analysis, we will develop a statistical model with only important variables. We accomplish the objective by developing a high quality statistical model that identifies the significant attributable variables and interactions. We rank these contributing entities according to their percentage of the contribution to breast cancer tumor growth. We used this new machine learning way to help us update one of previous model which is developed through classical modeling method. This proposed statistical model can be used to conduct surface response analysis to identify the necessary restrictions on the significant attributable variables and their interactions to minimize the size of the breast tumor. One can also use the proposed model to make early prediction of tumor size based on the most attributable variables.


Keywords


Statistical modeling, machine learning; random forest; uncensored data analysis

Full Text:

PDF

References


A. W. Fyles, D. R. McCready, L. A Manchul., M. E. Trudeau, P. Merante, M. Pintile, L. M. Weir, and I. A. Olivotto, Tamoxifen with or without breast irradiation in women 50 years of age or older with early breast cancer, New England Journal of Medicine, 351 (2004) 963-970.

B. Abraham and J. Ledolter, Introduction to regression modeling, 2006

C. A. McGilchrist and C.W. Aisbett. Regression with Frailty in Survival Analysis. Biometrics, 47(2):461-466, 1991.

C. A. McGilchrist, REML Estimation for Survival Models with Fraility. Biometrics, 49(1):221-225,1993

D. Collett, Modeling survival data in medical research (Chapman & Hall/CRC) , 2003.

D. P. Harrington, T.R. Fleming, A Class of Rank Test Procedures for Censored Survival Data. Biometrika, 69(3):553-566, 1982.

D. R. Cox, Regression models and life-tables (with discussion). Journal of the Royal Statistical Society Series B, 34: 187–220, 1972.

D. R. Cox and D. Oakes, Analysis of survival data (London: Chapman & Hall), 1984.

E. A. Gehan, A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52, 1 and 2, 203-223, 1965

E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. 53:457-448, 1958.

J. P. Klein. Semiparametric Estimation of Random Effects Using the Cox Model Based on the EM Algorithm. Biometrics, 48(3)795-806, 1992.

K. Liu and C. P. Tsokos, Nonparametric Density Estimation for the Sum of Two Independent RandomVariables, Journal of Stochastic Analysis, 2000

K. Liu and C. P. Tsokos, Nonparametric Reliability Modeling for Parallel Systems, Journal of StochasticAnalysis, 1999

K. Liu and C. P. Tsokos, Optimal Bandwidth Selection for a Nonparametric Estimate of the Cumulative Distribution Function‖, International Journal of Applied Mathematics, Vol.10, No.1, pp.33-49, 2002.

N. A Ibrahim, A. Kudus, I.Daud, and M. R. Abu Bakar, Decision tree for competing risks survival probability in breast cancer study, International Journal of Biomedical Sciences Volume 3 Number 1, 2008.

N. Mantel and W. Haenszel, Statical aspects of the analysis of data from retrospective studies of disease. Journal of the National cancer Institute, 22(4), 1959.

P. Qiu and C. P. Tsokos, Accelerated Life-Testing Model Building with Box-Cox Transformation, Sankhya, Vol. 62, Series A, Pt. 2, pp. 223-235, 2000.

U.S. National institutes of Health, http://seer.cancer.gov

Wikipedia, http://en.wikipedia.org/wiki/Breast_cancer

Xu, Y., Keper, J. and Tsokos, C. P., Identify attributable variables and interactions in breast cancer, Journal of applied sciences,11 (6): 1033-1038, 2011

Xu, Y. and Tsokos, C. P., Non-homogenous Poisson Process for Evaluating Stage I & II Ductal Breast Cancer Treatment, Journal of Modern Applied Statistical Methods, JMASM, 2011, Vol. 10, No.2, 646-655

Tsokos, C. P. and Xu, Y., Statistical Modeling of Breast Cancer using Differential Equations, Istanbul University Journal of The School of Business Administration, Vol:40, No:1, 60-71, 2011

Xu, Y. and Tsokos, C. P., Probabilistic Survival Analysis Methods using Simulation and Cancer Data, Problems of Nonlinear Analysis In Engineering Systems, English/Russian, 1(37), v.18, 47-59, 2012


Refbacks

  • There are currently no refbacks.