논문 상세보기

한국통계학회> CSAM(Communications for Statistical Applications and Methods)> On hierarchical clustering in sufficient dimension reduction

KCI등재

On hierarchical clustering in sufficient dimension reduction

Chaeyeon Yoo , Younju Yoo , Hye Yeon Um , Jae Keun Yoo
  • : 한국통계학회
  • : CSAM(Communications for Statistical Applications and Methods) 27권4호
  • : 연속간행물
  • : 2020년 07월
  • : 431-443(13pages)

DOI


목차

1. Introduction
2. Use of clustering in sufficient dimension reduction
3. Numerical studies
4. Real data examples: Minneapolis school data and Massachusetts college data
5. Discussion
Acknowledgements
References

키워드 보기


초록 보기

The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

UCI(KEPA)

간행물정보

  • : 자연과학분야  > 통계학
  • : KCI등재
  • :
  • : 격월
  • : 2287-7843
  • :
  • : 학술지
  • : 연속간행물
  • : 1994-2020
  • : 1908


저작권 안내

한국학술정보㈜의 모든 학술 자료는 각 학회 및 기관과 저작권 계약을 통해 제공하고 있습니다.

이에 본 자료를 상업적 이용, 무단 배포 등 불법적으로 이용할 시에는 저작권법 및 관계법령에 따른 책임을 질 수 있습니다.

발행기관 최신논문
| | | | 다운로드

1Nonparametric two sample tests for scale parameters of multivariate distributions

저자 : Atul R Chavan , Digambar T Shirke

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 397-412 (16 pages)

다운로드

(기관인증 필요)

초록보기

In this paper, a notion of data depth is used to propose nonparametric multivariate two sample tests for difference between scale parameters. Data depth can be used to measure the centrality or outlying-ness of the multivariate data point relative to data cloud. A difference in the scale parameters indicates the difference in the depth values of a multivariate data point. By observing this fact on a depth vs depth plot (DD-plot), we propose nonparametric multivariate two sample tests for scale parameters of multivariate distributions. The p-values of these proposed tests are obtained by using Fisher's permutation approach. The power performance of these proposed tests has been reported for few symmetric and skewed multivariate distributions with the existing tests. Illustration with real-life data is also provided.

2Bayesian estimation for the exponential distribution based on generalized multiply Type-II hybrid censoring

저자 : Young Eun Jeon , Suk-bok Kang

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 413-430 (18 pages)

다운로드

(기관인증 필요)

초록보기

The multiply Type-II hybrid censoring scheme is disadvantaged by an experiment time that is too long. To overcome this limitation, we propose a generalized multiply Type-II hybrid censoring scheme. Some estimators of the scale parameter of the exponential distribution are derived under a generalized multiply Type-II hybrid censoring scheme. First, the maximum likelihood estimator of the scale parameter of the exponential distribution is obtained under the proposed censoring scheme. Second, we obtain the Bayes estimators under different loss functions with a noninformative prior and an informative prior. We approximate the Bayes estimators by Lindleys approximation and the Tierney-Kadane method since the posterior distributions obtained by the two priors are complicated. In addition, the Bayes estimators are obtained by using the Markov Chain Monte Carlo samples. Finally, all proposed estimators are compared in the sense of the mean squared error through the Monte Carlo simulation and applied to real data.

3On hierarchical clustering in sufficient dimension reduction

저자 : Chaeyeon Yoo , Younju Yoo , Hye Yeon Um , Jae Keun Yoo

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 431-443 (13 pages)

다운로드

(기관인증 필요)

초록보기

The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

4Moderately clipped LASSO for the high-dimensional generalized linear model

저자 : Sangin Lee , Boncho Ku , Sunghoon Kown

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 445-458 (14 pages)

다운로드

(기관인증 필요)

초록보기

The least absolute shrinkage and selection operator (LASSO) is a popular method for a high-dimensional regression model. LASSO has high prediction accuracy; however, it also selects many irrelevant variables. In this paper, we consider the moderately clipped LASSO (MCL) for the high-dimensional generalized linear model which is a hybrid method of the LASSO and minimax concave penalty (MCP). The MCL preserves advantages of the LASSO and MCP since it shows high prediction accuracy and successfully selects relevant variables. We prove that the MCL achieves the oracle property under some regularity conditions, even when the number of parameters is larger than the sample size. An efficient algorithm is also provided. Various numerical studies confirm that the MCL can be a better alternative to other competitors.

5Statistical analysis of the employment future for Korea

저자 : Sanghyuk Leea , Sang-gue Parka , Chan Kyu Leeb , Yaeji Lim

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 459-468 (10 pages)

다운로드

(기관인증 필요)

초록보기

We examine the rate of substitution of jobs by artificial intelligence using a score called the “weighted ability rate of substitution (WARS).”WARS is a indicator that represents each job's potential for substitution by automation and digitalization. Since the conventionalWARS is sensitive to the particular responses from the employees, we consider a robust version of the indicator. In this paper, we propose the individualized WARS, which is a modification of the conventional WARS, and compute robust averages and confidence intervals for inference. In addition, we use the clustering method to statistically classify jobs according to the proposed individualized WARS. The proposed method is applied to Korean job data, and proposed WARS are computed for five future years. Also, we observe that 747 jobs are well-clustered according to the substitution levels.

다운로드

(기관인증 필요)

초록보기

Entropy is an important term in statistical mechanics that was originally defined in the second law of thermodynamics. In this paper, we consider the maximum likelihood estimation (MLE), maximum product spacings estimation (MPSE) and Bayesian estimation of the entropy of an inverse Weibull distribution (InW) under a generalized type I progressive hybrid censoring scheme (GePH). The MLE and MPSE of the entropy cannot be obtained in closed form; therefore, we propose using the Newton-Raphson algorithm to solve it. Further, the Bayesian estimators for the entropy of InW based on squared error loss function (SqL), precautionary loss function (PrL), general entropy loss function (GeL) and linex loss function (LiL) are derived. In addition, we derive the Lindley's approximate method (LiA) of the Bayesian estimates. Monte Carlo simulations are conducted to compare the results among MLE, MPSE, and Bayesian estimators. A real data set based on the GePH is also analyzed for illustrative purposes.

7Volatility clustering in data breach counts

저자 : Hyunoo Shim , Changki Kim , Yang Ho Choi

발행기관 : 한국통계학회 간행물 : CSAM(Communications for Statistical Applications and Methods) 27권 4호 발행 연도 : 2020 페이지 : pp. 487-500 (14 pages)

다운로드

(기관인증 필요)

초록보기

Insurers face increasing demands for cyber liability; entailed in part by a variety of new forms of risk of data breaches. As data breach occurrences develop, our understanding of the volatility in data breach counts has also become important as well as its expected occurrences. Volatility clustering, the tendency of large changes in a random variable to cluster together in time, are frequently observed in many financial asset prices, asset returns, and it is questioned whether the volatility of data breach occurrences are also clustered in time. We now present volatility analysis based on INGARCH models, i.e., integer-valued generalized autoregressive conditional heteroskedasticity time series model for frequency counts due to data breaches. Using the INGARCH(1, 1) model with data breach samples, we show evidence of temporal volatility clustering for data breaches. In addition, we present that the firms' volatilities are correlated between some they belong to and that such a clustering effect remains even after excluding the effect of financial covariates such as the VIX and the stock return of S&P500 that have their own volatility clustering.

1
주제별 간행물
간행물명 수록권호

KCI등재

응용통계연구
33권 5호 ~ 33권 5호

KCI등재

응용통계연구
33권 4호 ~ 33권 4호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
27권 3호 ~ 27권 4호

KCI등재

응용통계연구
33권 3호 ~ 33권 3호

KCI등재

응용통계연구
33권 2호 ~ 33권 2호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
27권 2호 ~ 27권 2호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
27권 1호 ~ 27권 1호

KCI등재

응용통계연구
33권 1호 ~ 33권 1호

통계연구
21권 0호 ~ 21권 0호

KCI등재

응용통계연구
32권 6호 ~ 32권 6호

KCI등재

응용통계연구
32권 6호 ~ 32권 6호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
26권 6호 ~ 26권 6호

KCI등재

응용통계연구
32권 5호 ~ 32권 5호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
26권 5호 ~ 26권 5호

KCI등재

응용통계연구
32권 4호 ~ 32권 4호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
26권 4호 ~ 26권 4호

KCI등재

응용통계연구
32권 3호 ~ 32권 3호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
26권 3호 ~ 26권 3호

KCI등재

응용통계연구
32권 2호 ~ 32권 2호

KCI등재

CSAM(Communications for Statistical Applications and Methods)
26권 2호 ~ 26권 2호
발행기관 최신논문
자료제공: 네이버학술정보
발행기관 최신논문
자료제공: 네이버학술정보

내가 찾은 최근 검색어

최근 열람 자료

맞춤 논문

보관함

내 보관함
공유한 보관함

1:1문의

닫기