Produced with Scholar

Module A3 (2018) Survey ANALYST Creator project

Project Overview

Project Description

IMPORTANT: THIS PROJECT IS ONLY FOR SURVEY ANALYSTS.

Your Creator assignment is to draft an analysis plan that contains the following sections and tasks.

  1. Describe data cleaning checks
  2. Describe your plan for weighting
  3. Prepare table shells for five indicators
  4. Calculate the results for one of your tables
  5. Generate a graphical summary of one vaccination coverage indicator for one dose across all 13 strata
  6. Summarize methods
  7. Summarize results
  8. Identify caveats or concerns
  9. Identify strengths and limitations

DO NOT START THIS PROJECT IF YOU ARE A SURVEY MANAGER.

 

Icon for Survey ANALYST Creator project

Survey ANALYST Creator project

Cleaning the data is the first step in data analysis to ensure data quality. The following checks can be written as syntax within a selected software.

  • Check duplicate records: use electronic tablets in real time data entry and check two entry per each case and comparison of the two data files;
  • Check the uniqueness of ID by verifying that the ID is the one found in the QR code; check any missing value on respondent ID; the combination (RI01,RI02,RI11,RI12) must not contain missing data. The child ID must be identical with the ID found on the card photo;
  • Demographic variable such as sex should not contain missing data;
  • Clean date data: dates should be complete (d/m/y or y/m/d), not partial (m/y or y/m or something else), and should not contain missing parts; Check that the birth date of children are eligible birth date or that children are born before the date of survey; children age at the time of survey must fall between 12 and 23 months;
  • Vaccination dates should occur after birth date and before the date of survey; vaccination cards must have a tick mark with the date of vaccination such that child age at the date can be calculated.
  •  For multiple dose like Penta, PCV,  OPV, dates must occur in order. For example for penta1, penta2 and penta3, the minimum interval between doses and minimum age at each dose must be respected;
  • Clean indicator of caregiver’s recall: if a caregiver said a child is vaccinated whereas there is no data or tick mark in the card, this indication should be converted to missing; track information biases and correct them;
  • Check if skip-pattern was respected: when the caregiver said S/he “don’t know” if the child ever received a card (variable RI26), tick mark should not be found in the card; 

Data cleaning will be made using the ASSERTLIST (VCQI) command in Stata or classical tools in R. Mistaken data can be extracted into spreadsheet for verification.

Plan for weighting

Sample weight is a statistical measure that accounts for the contribution of survey respondents to the population from which they were sampled. In a weighted survey, each respondent selected for the sample represents a similar number of eligible respondents from the population. Therefore, weighting a sample size is important to estimate a population coverage.

There are three categories of weights: the design weights, response weights, and the post-stratification weights.

To decide whether we want to use one of this weight depends of the type of analysis. The most important ones here are the design weights and the response weights. All these will be calculated in excel shells before matching them with the eligible children.

  1. Obtain the design weights

There are obtained in 3 steps of sampling, and each step needs 2 informations.

  • step 1: we need the total number of cluster and the selected number of cluster in each strata; then p1=number cluster selected/total cluster is the probability of selection of a cluster;
  • step 2; we need the number of selected households and the total households in selected clusters; the p2=selected household/total household, it the probability of a household to be selected;
  • step 3: number of eligible children within household and total eligible in household within selected clusters; then p3=1 for all eligible children in the households.

The sampling weight for a child in a particular household within a cluster is 1/(p1*p2*1).Therefore this becomes the household design since p3=1. At this stage, we should have a spreadsheet that looks like the attached one.

2. Calculate response weights or Adjust for non-response at the household level and child level

Here we need the number of households that were not interviewed despite repeated visits, and the number of eligible respondent who did not participate. Identify on the data set using a new variable, the respondent (code 1) and non-respondent (code 0), for households and eligible children.

Then we calculate the response rate at the level of strata and respondents by dividing the number of household with complete interview (or number of respondent in a household) by the total number of household within strata (or total eligible children within a household). And the final response rate is the product of both. Therefore, the response rate of a respondent is the number of eligible with complete interviews divided by the number of eligible respondent per stratum, and the response weights is the reciprocal.

To summarize,

  • For non-response weights, record non-response/response rate in eligible households, and non-response/response rate among eligible respondents.
  • Use each geographical zone (the 12 states and the 2 zones), including sex subgroup to form the non-response classes;
  • Within each non-response class, calculate the non-adjustment factor (AdF) as the sum of design weights of respondent divided by the total design weights in the class;
  • Then in each stratum, obtain the response weights by multiplying the design weight of the respondents within each class by the adjustment factor that is, weights response=adj*design weights;
  • Thus we calculate this response weights for every individuals in the survey; a non-respondent is given a zero-weight and excluded from the final analysis;

3. Post-stratification weights

Post-stratification weight is possible when the objective is to estimate the coverage at a national level or sub-regional level, by combining estimates in classes such as gender, ethnicity, or geographical zones. This is also possible if we have accurate data on the total eligible population in each class/stratum/states. Consequently, post-stratified weights make the sum of weights in each class proportional to the known eligible population.

For this reason, I need the eligible population totals for each state and zone which can be obtained from census agency. In addition, the eligible totals for each sex subgroup within each state and zones (can also be obtained from a census agency) is also needed.

In each geographic area or demographic within geographic area, calculate a post-stratify weights given the weight (non-response weights) of each respondent by multiplying this weight by the known population of the stratum and, divide it by the sum of weights in the stratum; that is:

Scaled weights(i)=unscaled_weight(i) * (known eligible population total for stratum)/sum(of unscaled weights in the stratum)

 

  • To implement the plan, an excel file can be used to compute all probability of selection.
  • In addition, a syntax can be written using R that matches the calculated weights with the data set. Here every children in the same household will have the same weight. 
  • From the syntax, we can export the results into excel and use the new data set with the variable “psweights”.

Table shells for five indicators

 Each of the following table will be filled with weighted percentages, representing the population proportions.

  • Table 1: Crude coverage
  • Table 2: Valid coverage
  • Table3: Dropout
  • Table 4: Card availability
  • Table 5: Full-vaccinated
  • Table 6:  % of children who received the pentavalent vaccine by different source of vaccination

We have provided Table 6 to help supporting the team for the next campaign. The table will aid in documenting the design effect (DEFF) and intra-class correlation (ICC) by providing recent data (coverage results) on the proportion of vaccinated children. This proportion will be used in the calculation of the effective sample size (ESS), as the expected vaccination coverage along with a desired precision. Then the DEFF is a function of the ESS and the known target number of respondent per cluster and the ICC. The ICC could be obtained by fitting a linear mixed model.

  • Other details:

The attached spreadsheet shows the different table shells. In each table, one can include the 95% confidence interval because this shows how precise is the point estimate and its calculation will be based on the weighted sample size. One can also include the weighted and the unweighted sample size for each stratum to easily visualize how prevalence varies between stratum, and this helps in identify stratum with low coverage, and track demographic characteristics that influence the unweighted sample size. The two-sided 95% confidence interval should also accompagny the estimate. This interval contains the true value of the coverage. It means that if the survey is replicated 100 times, and we calculated at every occasion the interval, 95% will contain the true coverage value; hence we are 95% confident that it contains the population coverage.

 

Table_shells_corrected.xlsx

 

 

One result: valid dose of BCG by card

In this space, we presented the vaccination coverage of BCG disaggregated by sex and the 14 stratum. We also presented the results of sex disaggregated by the 12 states and the 2 zones. Percentages of children with evidence of vaccination based on a date on a home-based record were calculated. The syntax of the program (see attached file) that calculates the results has been written in a text file (Notepad).

R_code_Allstrata_Sex.txt

We also used the survey package in R, namely the function “svydesign” that represent the study design and the function “svyby” (which gives coverage by different level) accompagnied with “svyciprop” to calculate proportion, and “confint” to extract confidence intervals. The design here is a 2-level cluster sampling with 12 strata, 655 clusters, and 1649 households. And we considered that the primary sampling units are clusters and based the design model on clusters.

Confidence intervals were calculated using the logit method and based on the complex sampling design.

Results obtained in R were extracted into excel file to avoid hand-copy and hand-paste, and arranged according to states, zone and sex

The attached spreadsheet showed the results.

Results_All_Sex.xlsx

 

 

Graphical representation of Valid BCG

The 95% confidence interval was shown for purpose of precision and accuracy.

  •  I obtained the point estimates in R;
  • Secondly, I made a dotchart plot of the of estimates; then draw the segment corresponding to the confidence interval; thirdly, I copy the graph into Excel;
  • I insert in Excel the values of confidence interval on each stratum:
  • Finally, I copy the image into PAINT and convert it into PNG image
Corrected_Graph.bmp

 

 

 

 

Method summary and results summary

1. Summary of the methods

Data cleaning was performed on identifiers variables, demographic variables and dates. Vaccination dates were checked to be consistent and non-sensitive. In case mistakes were encountered, the data set was amended to include corrections on the indicator of interest. Correction options like the Asserlist command in VCQI, and R program codes were used.

In addition, descriptive statistics were used to calculate the respondent rates in each stratum. The probabilities of selection were documented and were used to calculate the design weights, response weights and post-stratify weights.Data was weighted to fit population totals in each state, zone and sex.  An Excel tool was used to calculate the different sampling weights and we matched them with individual patient data.

For purpose of estimation, a derived variable was obtained for a valid BCG vaccine. This indicator combined age-eligible children with home-based record, who had tick mark with date in the card. We assumed that the sum of survey weights in each stratum is an estimate of the relative counts of eligible population. Across states, zones and sex, a weighted valid coverage for BCG was obtained with their 95% confidence intervals based on a complex sample design. Results on gender were also presented dissagragated by states and zones. The analysis method computes confidence intervals for proportions using a method that produces more accurate estimates. Thus we used the logit function. All statistical analysis was carried under R, version 3.1.3.

2. Results summary of the Nigeria combined MICS/NICS, 2016-2017

Results in the 12 states, 2 zones and sex

A total of 12 states, 655 clusters and 1649 households were selected. All eligible children were of 1728 (unweighted sample size), and 5545 were the weighted sample size that fit population totals in sex, states and zones (Table 1 in the result section). The graph (image attached) showed percentages of children who had received a valid dose of BCG, disaggregated by sex (male, female), states (12), and zones (North east, South South). Vaccination coverage ranged from 4% to 69%, with a critical coverage estimate in Yobe (4.42%).  In addition, although states in the South South zone tend to have higher coverages than states in the North east zone, percentages remained fewer than 80%, and their corresponding limits bounds were also below 80%. Hence, all stratum highlighted low evidence of valid BCG. In the North East zone, 22% of the children population who were eligible for the survey would have been estimated to have a home-based record and to have received a valid dose of BCG. In contrary, 48% in the South South were concerned.

Results on sex disaggregated by states and zones

 Female coverage (30.4%, 95% CI=27-34) tended to be higher than male coverage (27%; 95% CI=22-33). In addition, of 1808 boys in the sample who received a valid dose of BCG, 44.1% were administered before the age 1 year in the South South zone, and 51.9% among the 1873 girls. The coverage decreased among boys and girls in the North East zone.

Caveats/ Strengh and limitations

1. Identification of concerns/caveats

In the planning of the study, the team must consider health facility registers to allow more data and ensure its quality. They can find support from local health authority to identify all relevant registers.In home-based records, they must consider a card with legible data of BCG vaccination with a day, a month, and a year.

The big picture here is a low-coverage of BCG found in every stratum and demographics, suggesting an urgent need of supplementary immunization activities or post-campaigns to reinforce and evaluate the BCG coverage. It will be worth to carefully look at states where none of the children in the survey received BCG. In addition, there is a concern in assessing the access to health facilities that serve these states, and how is it possible that so many children in the sample were not vaccinated. Mothers who give birth should be advised on the importance of BCG before they leave the hospitals. The different states should also be studied regarding the quality of recording and reporting vaccinations. A tick mark in the card must be accompanied by a consistent date because a low coverage could indicate that many children had home-based records but did not receive BCG at birth, or that, they received BCG and it was not indicated. Furthermore, the team committee must compare the reasons for non-vaccination with higher and lower coverage values. Sensibilization campaigns must be increased for a better adherence of caretakers to vaccination schedule.

 

2. Strenghts and limitations

  • The study is a complex sample survey that uses a 2- stage design rather than a simple random sample;
  • Valid coverage is important from an immunological point of view;

  • The survey gives the advantage to estimate coverage within state;
  • It uses a weighted approach allowing each eligible child to contribute to the study; consequently, the sample was more representative of the population sample size, and bias from non-responses was reduced;
  • The weighted estimates are closed to the true coverage in the population i.e. in Nigeria at the time of survey, many children were not vaccinated against tuberculosis.
  • Consequently, coverage estimates are accurate;

As far as limitations is concerned,

  • Alarmingly few respondent was vaccinated, consequently a better coverage could not be obtained;
  • We were not able to assess the participation rates and how might this have affected the results;
  • The survey did not include data from health facilities registers;
  • The interpretation of data did not account for additional analyses or sensitivity analyses, like comparing the coverage by source of vaccination including recall or history, identify missed opportunity of vaccination;
  • It was assumed that the sum of survey weights in each stratum is a high quality estimate of the relative counts of eligible population. Thus, analyses based on scaled weights with the known number of children in each stratum was not conducted.

 

Ressources

  • 2018 World Health Organization Vaccination Coverage Cluster Surveys Reference Manual, section 6n section 7, annexe J;
  • VCQI Forms and Variable List (FVL) Document v1.5 
  • VCQI Indicator List with Specifications - v1.9 
  • VCQI Interpretation Quick Reference - v1.2 
  • Nigeria_data_for_Scholar_course (Project dataset in XLSX formats)
  • Nigeria Dataset Variable List and Key.xlsx 
  • Survey Analyst Community Updates (videos and demonstrations);
  • Week 1 consideration for survey managers