Lecturer, University of Nigeria, Nsukka
A simple but highly competitive method of testing the assumption of multivariate normality
Oftentimes, a multivariate data set, which is made up of n independent data points in d variables, is assumed to have been obtained as a random sample from a multivariate distribution. For a good number of reasons, the most important of all the existing multivariate distributions, which is also mostly used in real life, is the multivariate normal distribution. A good number of test procedures to ascertain if a data set of this kind is obtained from the multivariate normal distribution exist in the literature. They however vary in simplicity/complexity as well as in efficiency. The efficiency of the goodness-of-fit test of this kind, known as the power of the test, is the ability of the test to take a right decision of rejecting the hypothesis of multivariate normality (MVN) of the data set when it is actually right to be rejected. My research area is to combine these two important properties of simplicity of tests for MVN as well as highly competitiveness of the tests in terms of power. This will help users of statistics, especially those who do not have very strong background in statistical theory to apply the test with comparative ease yet loosing nothing in terms of the correctness of their decision. The paper we developed here is founded on this philosophy. It is more powerful than most tests for MVN found in the literature. It is therefore hoped that applied statisticians would find it very interesting.
Abstract: Psychosis is a special type of mental disorder that affects around 2-3% of global population and has a strong genetic basis. Under psychosis, there is a group of diseases, which apparently may look alike and thus, it is difficult to isolate them from each other. Moreover, the credibility of real data related to psychosis is not only questionable due to its secondary nature but also its availability is grossly restricted because of the ethical constraints and prevailing social taboo. The present paper is a novel attempt to capture psychosis data by considering 24 input symptom constructs and 7 tentative responses (outputs) as per Brief Psychiatric Rating Scale-F2 (BPRS-F2). The captured input-output data as per Plackett-Burman design (PBD) of experiments (after consulting 40 psychiatrists) are statistically modeled, to determine their mutual relationships (i.e., outputs as the functions of inputs). Both Pareto-charts as well as normal probability plots are prepared to investigate the effect of each factor on different responses. Significant symptom construct(s) has/have been identified for each response. For example, emotional withdrawal has significant contribution towards schizophrenia, and so on. The psychosis data, thus collected, will be useful for further processing to extract more information of the said disease.
Pub.: 14 May '10, Pinned: 30 Jun '17
Abstract: Normal probability plots are widely used as a statistical tool for assessing whether an observed simple random sample is drawn from a normally distributed population. The users, however, have to judge subjectively, if no objective rule is provided, whether the plotted points fall close to a straight line. In this paper, we focus on how a normal probability plot can be augmented by intervals for all the points so that, if the population distribution is normal, then all the points should fall into the corresponding intervals simultaneously with probability 1-α. These simultaneous 1-α probability intervals provide therefore an objective mean to judge whether the plotted points fall close to the straight line: the plotted points fall close to the straight line if and only if all the points fall into the corresponding intervals. The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation. Based on this comparison, recommendations are given in Section 3 on which graphical tests should be used in what circumstances. An example is provided to illustrate the methods.
Pub.: 22 Oct '14, Pinned: 30 Jun '17
Abstract: Based on the results of Luati and Proietti (Ann Inst Stat Math 63:673–686, 2011) on an equivalence for a certain class of polynomial regressions between the diagonally weighted least squares (DWLS) and the generalized least squares (GLS) estimator, an alternative way to take correlations into account thanks to a diagonal covariance matrix is presented. The equivalent covariance matrix is much easier to compute than a diagonalization of the covariance matrix via eigenvalue decomposition which also implies a change of the least squares equations. This condensed matrix, for use in the least squares adjustment, can be seen as a diagonal or reduced version of the original matrix, its elements being simply the sums of the rows elements of the weighting matrix. The least squares results obtained with the equivalent diagonal matrices and those given by the fully populated covariance matrix are mathematically strictly equivalent for the mean estimator in terms of estimate and its a priori cofactor matrix. It is shown that this equivalence can be empirically extended to further classes of design matrices such as those used in GPS positioning (single point positioning, precise point positioning or relative positioning with double differences). Applying this new model to simulated time series of correlated observations, a significant reduction of the coordinate differences compared with the solutions computed with the commonly used diagonal elevation-dependent model was reached for the GPS relative positioning with double differences, single point positioning as well as precise point positioning cases. The estimate differences between the equivalent and classical model with fully populated covariance matrix were below the mm for all simulated GPS cases and below the sub-mm for the relative positioning with double differences. These results were confirmed by analyzing real data. Consequently, the equivalent diagonal covariance matrices, compared with the often used elevation-dependent diagonal covariance matrix is appropriate to take correlations in GPS least squares adjustment into account, yielding more accurate cofactor matrices of the unknown.
Pub.: 11 May '16, Pinned: 30 Jun '17