Survey ANALYST Creator project

Analyst Creator assignment: Analysis plan

1. Data cleaning checks:

The general principles regarding the data cleaning steps are:

1.1 Check the values for completeness, correctness and consistency:

1.1a Check for each respondent counted in the correct cluster & stratum and other ID variables. No variable should be missing for any respondent. For example, ID variables above should be present for each respondent: stratum (zone), cluster (state), hhid, respid: group (RI01 RI03 RI11 RI12), interviewer number, supervisor number, start date of interview, start time of interview, household ID and individual number of child. We must try to correct the ID variable errors by using additional knowledge as interviewer ID, supervisor ID, date of Interview, device ID and Latitude/Longitude

1.1b Check that there are no duplicate respondents in the dataset. Each participant has a unique combination of ID variables in order to distinguish every participant: no duplication in respid and no duplication combinations of stratum cluster hhid respid (Sort by IDs). When data are double-entered, check that the two values entered are identical (look for contradictions in additional knowledge)

1.1c Check that all the responses to compulsory questions are present

1.1d Check that all categorical responses are coded with allowed values. For example, the variable “vaccine dose” holds only values of “Yes” and “No”. The vaccination dates should also be sensible dates. This has usually less errors while using tablets compare with paper forms.

1.11e Check that the responses to different questions are consistent with each other (relation between several variables) Examples: Every stratum (zone) should have data from exactly 6 clusters (states), Date of birth (RI32) should fall in the range described in the protocol, Child’s age at the time of the interview should be between 12-23 months, Vaccination dates should fall between the child’s date of birth and the date of the survey interview, If the record indicates that they showed a card, there should be one or more tick marks or dates in the doses on the card and finally If the record indicates they did NOT show a card, there shouldn’t be any tick marks or dates from the card

1.1f Check that the skip patterns observed correspond to the one designed

Flag disallowed & questionable responses for review: When you find a value that is not allowed or is inconsistent with other responses, put it in a list of responses to check – use a spreadsheet
Check the response and identify corrected response (where possible). Go back and check the photo of the vaccination card and eventually call the respondent to clarify. When checking, provide a space to note the correct response and provide a process for getting the corrected response into the dataset and what to do with errant values that cannot be checked (leave nonsensical value in dataset vs make them miss). Finally, write a program to substitute the corrected value
Document the data cleaning process with number of errant values, how they were checked and corrected, how many couldn’t be corrected and how they were handled.

2. Plan for weighting:

They are several considerations to consider for weighting a coverage survey data set. The three steps involved in weighting are:

2.1 Design weights: Applied to address the sampling design of the survey. If we have indicators that have all respondents in the denominator, we should consider weighting them (design weight at east). Thus, weighted indicators can estimate population characteristics. When the probability of selection varies, it is important to consider performing design weight. Thus, the scientific probability sampling is used to guarantee unbiased survey results. To compute design weights, careful documentation, sampling parameters, such as the number of selected clusters per sampling strata, the total number of households in a cluster, the number of households selected, cluster segmentation, etc., are required for estimating the selection probabilities of each household. It is important to work with statistical agencies to get all information on the sampling frame.

2.2 Adjusting for non-response (= Response weights): Where we shift the weight for the missing respondent across the groups with respondents. If we think there is an important bias from non-response at the cluster, household or individual level, it is important to consider performing the Adjusting for non-response (= Response weights). To compute response weight, it is crucial to know how many households were not interviewed despite repeated visits and how many eligible respondents did not participate

Here the data were collected by a Multiple Indicator Cluster Survey (MICS) team, so the quality of household listing is probably high. We know the outcome for every household that was selected, and we know the probability of selection of the clusters and the households and every eligible respondent should have been interviewed, so we have a good data for calculating design weights and for adjusting for non-response at the household level (where no one was at home or when they refused to participate).

2.3 Post stratification weights. In situation where survey designers decide to oversample the population in a stratum of interest, relative to their portion of the overall population, it will be important to consider doing post stratification weights (if all information are available) to obtain precise coverage estimates for that stratum. Post stratification is also recommended for calculating pooled estimates at the national level or across strata. Information needed for post stratification include the projected households size and the projected census population by gender or ethnic group. It might also be recommended when estimating the population totals (estimation of the number of children in the country who are unvaccinated, …). To decide to perform this extra weight analysis, it is important to consider which kind of survey is currently performed as both DHS and MICS choose not to post-stratify the vast majority of their surveys. If the EPI program or the government does not have recent and accurate data on the total eligible population in each stratum, post-stratification will not be possible. It is usually not recommended to post-stratify the coverage survey data if 1) the government population projections are based on a very old census, and 2) the coverage survey had excellent cluster maps (with a thorough job of listing all households in each cluster), and 3) the coverage survey listing teams did high-quality work, updating the list carefully and in the entire cluster (tracks the probability of selection at each stage) and 4) the coverage survey recorded the survey result at every selected household (tracks information necessary to adjust for non-response) and 5) each stratum includes about 30 or more clusters. These conditions are seldom all present in the same EPI survey, thus post-stratification is often to be considered if information to calculate it are available. If the weights are well constructed, the dataset can be used to estimate coverage proportions within individual strata.

3. Table shells for five indicators:

3.1 Attachment:

- the 5 table shells

Table_crude_20coverage.pdf

Table_20Valid_20coverage.pdf

Table_dropout.pdf

Table_Card_20availability.pdf

Table_Timelines.pdf

- details as footnotes, equity between sexes, statistics (one VS two-sided CI design effect, Intracluster correlation coefficient, weighted sample sizes), numerator, denominator and the method used to calculate confidence intervals.

Details_20regarding_20table_20shells.docx

3.2 Regarding design effect and ICC:

We might also include some annex’s tables to document design effect and ICC.

ICC affects the design effect (DEFF) and therefore affects the sample size calculation. To report these number might be useful in the planning stage of the next survey in the same study area, so it is relevant to document it in our report. Note that we could document many design effect and ICC estimates (coverage by card, history, etc) so it might make sense to mainly focus on reporting DEFF and ICC for crude vaccination coverage (for main vaccine dose as OPV3, Penta3, PCV3, MCV1), which is the most important one to be considered to plan the next survey. Here is reported the Design df for the BCG crude coverage (see table 5 completed)

4. Results for one table: BCG birth crude coverage

The software tool I choose to fill-in the results was Stata then transported to excel via Stata coding (putexcel) to make the available table in attachment. Table was finetuned with excel tool. As vaccine crude coverage is a weighted indicator, I incorporate the weights and use software that accounts for complex sample design when calculating confidence intervals. 95% confidence intervals were calculated using Wilson CI in Stata, as recommended.

Attachment:

- Table 1: Percentage of children 12-23 months (all children, boys, girls) with any evidence of vaccination (weighted crude vaccination coverage) for BCG birth dose, for the zones of North East and South South and their respective states, Nigeria, 2016

Table_crude_20coverage_BCG.pdf

- Syntax Stata

Syntax_20for_20table_20results.docx

5. Graphical summary of one vaccination coverage indicator for one dose across all 14 strata: BCG birth crude vaccination coverage

Figure_crude_20vaccination_20coverage_BCG.JPG

I generate a graphical summary of crude vaccination coverage, BCG birth dose, across the two zones and 12 states, for all children and per gender. 95% confidence intervals were calculated using Wilson CI in Stata. CI on graph were placed via excel (error bars -> more option -> custom to report the 95% CI).

6. Methods:

The sample for the Nigeria MICS 2016 was designed to provide estimates for many vaccination indicators in children 12-23 months. The survey includes two zones of Nigeria (“North east” and “South south” zone) with their respective 6 states. Thus, a total of 12 different strata (one for each state) was obtained. To stress the equity between sexes, the analysis compared three groups (all children, boys, girls) in each stratum. The “Vaccination Coverage Surveys – Forms & Variable Lists (FVL) Structured for Compatibility with VCQI, 2017 was used for the sample Questions for a Routine Immunization form.

Tablets were used to collect the data during the survey. Interviewer took photo of the vaccination card when available. Registers from health facilities were not used in this survey. Data cleaning was performed using Stata 15 with check for completeness, correctness, duplication and consistency. Response was corrected when possible with the use of the photo of the vaccination card and with a phone call with the respondent. Errant values that cannot be checked were left with nonsensical value in dataset. Substitution with the corrected value was performed via Stata. Full documentation of the data cleaning process was made and is available.

High quality available data allowed for calculating design weights and for adjusting for non-response at the household level. Several stratum were oversampled, relative to their portion of the overall population, to obtain precise coverage estimates for such specific stratum. Post stratification weights could not be calculated due to the lack of several information, including a recent and accurate data on the total eligible population in each stratum.

Data analysis was performed by using Stata15 software. After its transfer to Excel 2016, data were organized into tables and graphs. Was calculated the proportion, in mean percentage, of children who had any evidence of vaccination (weighted crude vaccination coverage) for the bacille Calmette–Guérin (BCG) vaccine for all children 12-23 months of age and per gender, for the North East and South South’s zones and their respective states, Nigeria, 2016. As recommended in the VCQI Results Interpretation Quick-Reference Guide Draft Version 1.2, 2017”, the crude coverage was calculated by using as denominator the Sum of weights for all respondents, and as numerator the Sum of weights for respondents who received the vaccine dose per card or recall. Confidence intervals (CI) were calculated using 2-sided 95% CI, Wilson type.

7. Summarize results:

The results of this survey report concern the proportion, in mean percentage, of children who had any evidence of vaccination (weighted crude vaccination coverage) for the bacille Calmette–Guérin (BCG) vaccine for all children 12-23 months of age and per gender, for the North East and South South’s zones and their respective states, Nigeria, 2016. In 2016, 52,6% and 83,9% of the eligible population in the North East and South south zone of Nigeria respectively, are estimated to have received the BCG birth dose, as documented by card or recall. Among the six states having the highest crude vaccination rate for BCG birth dose, five are from the South south zone of Nigeria. The proportion of boys and girls were approximately similar in the North east zone ((51.7% and 53.4% respectively) and in the South south zone (82.6% and 85.1% respectively). As the vaccination crude coverage is a weighted indicator that have all respondents in the denominator, this indicator estimates population characteristics. Thus, the statements from the crude vaccination coverage can be made about the population from which the sample was drawn.

8. Caveats, limitations and concerns:

The interpretation of these figures is subject to caveats that is related to the confidence intervals. Thus, the true population coverage can fall far below or above the 95% confidence interval if important biases in the survey methods/execution occurred. A selection bias in this survey concerns the nonuse of health facility records, relying only on the vaccination card and/or caregiver recall. Post stratification weights could not be calculated due to the lack of recent and accurate data on the total eligible population in each stratum.

An important number of data collected in the survey were not analyzed in this report. This survey report concerns only one vaccine (BCG) and one vaccination coverage type (crude coverage). Thus, this results’ survey can’t be generalized for all the doses of all the vaccines and for other vaccination coverage type as the valid one. Most of the standard indicators for vaccination were not analyzed due to lack of time. Only equity between genders for BCG was analyzed but no other demographic domains, like wealth quintile, ethnicity, caretaker’s age and education, area (urban/rural). Only two zones of Nigeria with their respective states were analyzed in this survey, not allowing to draw more general conclusion and calculate the vaccination coverage at national level.

9. Strengths and recommendations (see above for limitation, here added the recommendation which I think important):

Despite the limitations listed above, the outcome for every household that was selected was known without any missing data, allowing the calculation of the design weights at the household level. The tablets brought better data collection with probably more accurate results. The choice of a weighted indicator (crude coverage) allow to estimate population characteristics.

Regarding the 2 zones studied in this survey, investigations (dropout, miss-opportunities studies, qualitative studies with investigation regarding the reason for non-vaccination) of the states with the lowest vaccination coverage might be useful to determine the reasons of such poor coverage. It is important to write a second report including the weighted crude and valid vaccination coverage per specific vaccines and doses. Apart the BCG at birth, the vaccines must include one dose of Hepatitis B (HepB) vaccine at birth, one dose at birth of oral poliomyelitis vaccine (OPV) and three OPV doses with OPV dose one from six weeks of age, three doses of Pentavalent vaccine (diphtheria, tetanus and pertussis, Hib (Haemophilus influenzae type B) and Hepatitis B vaccine), three doses of pneumococcal conjugate vaccine (PCV), one dose of measles containing vaccine (MCV) dose one, one dose of Yellow fever (YF) vaccine, and one dose of injectable poliomyelitis vaccine (IPV). More equity demographic domains must be included in the analysis. If card availabilities are high enough, additional analyses must be included in the second survey report.

Attached here the full text above in case of any issues regarding to access it:

Analyst_20Creator_20assignment_NP_6_CLEAN.docx

Module A3 (2018) Survey ANALYST Creator project

Project Overview

Project Description

Survey ANALYST Creator project