报告题目:基于基因数据的显著基因选取和病人分类
(Variable Selection and Patient Classification Using Genetic Data)
报告摘要:The advancement of microarray technology has greatly facilitated the research in gene expression based classification of patient samples. For example, in cancer research, microarray gene expression data has been used for cancer or tumor classification. When the study is only focusing on two classes, for example two different cancer types, we propose a two-sample semiparametric model to model the distributions of gene expression level for different classes. To estimate the parameters, we consider both maximum semiparametric likelihood estimate (MLE) and minimum Hellinger distance estimate (MHDE). For each gene, Wald statistic is constructed based on either the MLE or MHDE. Significance test is then performed on each gene. We exploit the idea of weighted sum of misclassification rates to develop a novel classification model, in which previously identified significant genes only are involved. To testify the usefulness of our proposed method, we consider a predictive approach. We apply our method to analyze the acute leukemia data of Golub et al. (1999) in which a training set is used to build the classification model and the testing set is used to evaluate the accuracy of our classification model.
报告人简介:吴静静,加拿大卡尔加里大学数学与统计系副教授。1999年本科毕业于中央民族大学应用数学与软件专业, 2002年硕士毕业于北京师范大学概率论与数理统计专业,2008年于加拿大艾伯塔大学获得统计学博士学位。博士论文被加拿大统计协会评为2007年度加拿大最佳概率统计博士论文奖。新加坡国立大学长期访问教授, 多个统计杂志的编委或/和审稿人,培养硕士生博士生20余名。主要研究方向包括半参数模型,大数据中的参数降维,及其在基因数据,生物统计,经济学等中的应用。发表学术论文20余篇,科研工作受加拿大自然科学与工程学委员会个人研究基金资助(2008-2023年)。
报告时间:2018年7月2日(周一)下午2:20-3:20
报告地点:长清校区B434报告厅
欢迎各位老师和同学参加!